Unix Technical Forum

Escaping of RegEx?

This is a discussion on Escaping of RegEx? within the pgsql Sql forums, part of the PostgreSQL category; --> Hello, i have a list of URLs from the HTTP-Referer. I get all URLs which contains "google". Now i ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > pgsql Sql

Register FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-19-2008, 06:01 PM
=?ISO-8859-15?Q?Torsten_Z=FChlsdorff?=
 
Posts: n/a
Default Escaping of RegEx?

Hello,

i have a list of URLs from the HTTP-Referer. I get all URLs which
contains "google". Now i want to extract the searchstring. For example:
"http://www.google.de/search?hl=de&q=porenbeton+planbauplatten+abmessung en&meta="
should return "porenbeton+planbauplatten+abmessungen"

Therefor i use this RegEx:
(?:\?|&|as_)q=(.*?)(?:&|\s)

In SQL it look like this:
SELECT
substring('http://www.google.de/search?hl=de&q=porenbeton+planbauplatten+abmessung en&meta='
from '(?:\?|&|as_)q=(.*?)(?:&|\s)');

But i get this error-message:
quantifier operand invalid

(complete errror-message in german:
WARNUNG: nicht standardkonforme Verwendung von Escape in
Zeichenkettenkonstante
ZEILE 1: ...porenbeton+planbauplatten+abmessungen&meta=' from '(?:\?|&|a...
^
TIP: Verwenden Sie die Syntax für Escape-Zeichenketten, z.B. E'\r\n'.
FEHLER: ungültiger regulärer Ausdruck: quantifier operand invalid)

How do I need to escape the RegEx?

Thank for your help & greetings from Germany,
Torsten
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 04-19-2008, 06:01 PM
shakahshakah
 
Posts: n/a
Default Re: Escaping of RegEx?

On Feb 5, 5:11 am, Torsten Zühlsdorff <f...@meisterderspiele.de>
wrote:
> Hello,
>
> i have a list of URLs from the HTTP-Referer. I get all URLs which
> contains "google". Now i want to extract the searchstring. For example:
> "http://www.google.de/search?hl=de&q=porenbeton+planbauplatten+abmessu.. .."
> should return "porenbeton+planbauplatten+abmessungen"
>
> Therefor i use this RegEx:
> (?:\?|&|as_)q=(.*?)(?:&|\s)
>
> In SQL it look like this:
> SELECT
> substring('http://www.google.de/search?hl=de&q=porenbeton+planbauplatten+abmessu.. .
> from '(?:\?|&|as_)q=(.*?)(?:&|\s)');
>
> But i get this error-message:
> quantifier operand invalid
>
> (complete errror-message in german:
> WARNUNG: nicht standardkonforme Verwendung von Escape in
> Zeichenkettenkonstante
> ZEILE 1: ...porenbeton+planbauplatten+abmessungen&meta=' from '(?:\?|&|a....
> ^
> TIP: Verwenden Sie die Syntax für Escape-Zeichenketten, z.B. E'\r\n'.
> FEHLER: ungültiger regulärer Ausdruck: quantifier operand invalid)
>
> How do I need to escape the RegEx?
>
> Thank for your help & greetings from Germany,
> Torsten


Does '^.*q=([^&=]*).*$' work for you?

reporting=# SELECT substring('http://www.google.de/search?
hl=de&q=porenbeton+planbauplatten+abmessungen&meta =' FROM
'^.*q=([^&=]*).*$') ;
substring
---------------------------------------
porenbeton+planbauplatten+abmessungen
(1 row)
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 04-19-2008, 06:01 PM
=?ISO-8859-1?Q?Torsten_Z=FChlsdorff?=
 
Posts: n/a
Default Re: Escaping of RegEx?

shakahshakah schrieb:
> On Feb 5, 5:11 am, Torsten Zühlsdorff <f...@meisterderspiele.de>
> wrote:
>> Hello,
>>
>> i have a list of URLs from the HTTP-Referer. I get all URLs which
>> contains "google". Now i want to extract the searchstring. For example:
>> "http://www.google.de/search?hl=de&q=porenbeton+planbauplatten+abmessu.. ."
>> should return "porenbeton+planbauplatten+abmessungen"
>>
>> Therefor i use this RegEx:
>> (?:\?|&|as_)q=(.*?)(?:&|\s)
>>
>> In SQL it look like this:
>> SELECT
>> substring('http://www.google.de/search?hl=de&q=porenbeton+planbauplatten+abmessu.. .
>> from '(?:\?|&|as_)q=(.*?)(?:&|\s)');
>>
>> But i get this error-message:
>> quantifier operand invalid
>>
>> (complete errror-message in german:
>> WARNUNG: nicht standardkonforme Verwendung von Escape in
>> Zeichenkettenkonstante
>> ZEILE 1: ...porenbeton+planbauplatten+abmessungen&meta=' from '(?:\?|&|a...
>> ^
>> TIP: Verwenden Sie die Syntax für Escape-Zeichenketten, z.B. E'\r\n'.
>> FEHLER: ungültiger regulärer Ausdruck: quantifier operand invalid)
>>
>> How do I need to escape the RegEx?
>>
>> Thank for your help & greetings from Germany,
>> Torsten

>
> Does '^.*q=([^&=]*).*$' work for you?


It works in most cases. But not on strings like this:
http://www.google.de/search?q=Versag...nt=firefo x-a
http://www.google.de/search?q=Lampen...nt=firefo x-a

The result is always "t":
crawler=# SELECT
SUBSTRING('http://www.google.de/search?q=Lampen+anbringen&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:defficial&client=firefo x-a'
FROM '^.*q=([^&=]*).*$');
substring
-----------
t
(1 Zeile)

I can not figure out, why it don't work, because i do not understand the
RegEx completly
But every User, which use the "firefox-google"
(http://de.start2.mozilla.com/firefox...a:de:official),
create a referer which could not be parsed by your regex :/

Greetings,
Torsten
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 04-19-2008, 06:01 PM
shakahshakah
 
Posts: n/a
Default Re: Escaping of RegEx?

On Feb 6, 2:54 am, Torsten Zühlsdorff <f...@meisterderspiele.de>
wrote:
> shakahshakah schrieb:
>
>
>
> > On Feb 5, 5:11 am, Torsten Zühlsdorff <f...@meisterderspiele.de>
> > wrote:
> >> Hello,

>
> >> i have a list of URLs from the HTTP-Referer. I get all URLs which
> >> contains "google". Now i want to extract the searchstring. For example:
> >> "http://www.google.de/search?hl=de&q=porenbeton+planbauplatten+abmessu.. ."
> >> should return "porenbeton+planbauplatten+abmessungen"

>
> >> Therefor i use this RegEx:
> >> (?:\?|&|as_)q=(.*?)(?:&|\s)

>
> >> In SQL it look like this:
> >> SELECT
> >> substring('http://www.google.de/search?hl=de&q=porenbeton+planbauplatten+abmessu.. .
> >> from '(?:\?|&|as_)q=(.*?)(?:&|\s)');

>
> >> But i get this error-message:
> >> quantifier operand invalid

>
> >> (complete errror-message in german:
> >> WARNUNG: nicht standardkonforme Verwendung von Escape in
> >> Zeichenkettenkonstante
> >> ZEILE 1: ...porenbeton+planbauplatten+abmessungen&meta=' from '(?:\?|&|a...
> >> ^
> >> TIP: Verwenden Sie die Syntax für Escape-Zeichenketten, z.B. E'\r\n'..
> >> FEHLER: ungültiger regulärer Ausdruck: quantifier operand invalid)

>
> >> How do I need to escape the RegEx?

>
> >> Thank for your help & greetings from Germany,
> >> Torsten

>
> > Does '^.*q=([^&=]*).*$' work for you?

>
> It works in most cases. But not on strings like this:http://www.google.de/search?q=Versag...&oe=utf-8&aq=t...
>
> The result is always "t":
> crawler=# SELECT
> SUBSTRING('http://www.google.de/search?q=Lampen+anbringen&ie=utf-8&oe=utf-8&aq=t...
> FROM '^.*q=([^&=]*).*$');
> substring
> -----------
> t
> (1 Zeile)
>
> I can not figure out, why it don't work, because i do not understand the
> RegEx completly
> But every User, which use the "firefox-google"
> (http://de.start2.mozilla.com/firefox...g.mozilla....),
> create a referer which could not be parsed by your regex :/


Looks like it gets tripped up by query params like "aq" (in addition
to "q") -- how about:
'^.*[?&]q=([^&=]*).*$'

reporting=> SELECT SUBSTRING('http://www.google.de/search?q=Versagen
+der
+Teilungsgenehmigung&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:defficial&client=firefo x-
a' FROM '^.*[?&]q=([^&=]*).*$');
substring
----------------------------------
Versagen+der+Teilungsgenehmigung
(1 row)
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 04-19-2008, 06:01 PM
=?ISO-8859-1?Q?Torsten_Z=FChlsdorff?=
 
Posts: n/a
Default Re: Escaping of RegEx?

shakahshakah schrieb:

>>>> i have a list of URLs from the HTTP-Referer. I get all URLs which
>>>> contains "google". Now i want to extract the searchstring. For example:
>>>> "http://www.google.de/search?hl=de&q=porenbeton+planbauplatten+abmessu.. ."
>>>> should return "porenbeton+planbauplatten+abmessungen"
>>>> Therefor i use this RegEx:
>>>> (?:\?|&|as_)q=(.*?)(?:&|\s)
>>>> In SQL it look like this:
>>>> SELECT
>>>> substring('http://www.google.de/search?hl=de&q=porenbeton+planbauplatten+abmessu.. .
>>>> from '(?:\?|&|as_)q=(.*?)(?:&|\s)');
>>>> But i get this error-message:
>>>> quantifier operand invalid
>>>> (complete errror-message in german:
>>>> WARNUNG: nicht standardkonforme Verwendung von Escape in
>>>> Zeichenkettenkonstante
>>>> ZEILE 1: ...porenbeton+planbauplatten+abmessungen&meta=' from '(?:\?|&|a...
>>>> ^
>>>> TIP: Verwenden Sie die Syntax für Escape-Zeichenketten, z.B. E'\r\n'.
>>>> FEHLER: ungültiger regulärer Ausdruck: quantifier operand invalid)
>>>> How do I need to escape the RegEx?
>>>> Thank for your help & greetings from Germany,
>>>> Torsten
>>> Does '^.*q=([^&=]*).*$' work for you?

>> It works in most cases. But not on strings like this:http://www.google.de/search?q=Versag...&oe=utf-8&aq=t...
>>
>> The result is always "t":
>> crawler=# SELECT
>> SUBSTRING('http://www.google.de/search?q=Lampen+anbringen&ie=utf-8&oe=utf-8&aq=t...
>> FROM '^.*q=([^&=]*).*$');
>> substring
>> -----------
>> t
>> (1 Zeile)
>>
>> I can not figure out, why it don't work, because i do not understand the
>> RegEx completly
>> But every User, which use the "firefox-google"
>> (http://de.start2.mozilla.com/firefox...rg.mozilla...),
>> create a referer which could not be parsed by your regex :/

>
> Looks like it gets tripped up by query params like "aq" (in addition
> to "q") -- how about:
> '^.*[?&]q=([^&=]*).*$'
> [..]


This seems to work really great. Thank you very much!

Greetings,
Torsten
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 01:44 PM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com