This is a discussion on Escaping of RegEx? within the pgsql Sql forums, part of the PostgreSQL category; --> Hello, i have a list of URLs from the HTTP-Referer. I get all URLs which contains "google". Now i ...
| |||||||
| Register | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| Hello, i have a list of URLs from the HTTP-Referer. I get all URLs which contains "google". Now i want to extract the searchstring. For example: "http://www.google.de/search?hl=de&q=porenbeton+planbauplatten+abmessung en&meta=" should return "porenbeton+planbauplatten+abmessungen" Therefor i use this RegEx: (?:\?|&|as_)q=(.*?)(?:&|\s) In SQL it look like this: SELECT substring('http://www.google.de/search?hl=de&q=porenbeton+planbauplatten+abmessung en&meta=' from '(?:\?|&|as_)q=(.*?)(?:&|\s)'); But i get this error-message: quantifier operand invalid (complete errror-message in german: WARNUNG: nicht standardkonforme Verwendung von Escape in Zeichenkettenkonstante ZEILE 1: ...porenbeton+planbauplatten+abmessungen&meta=' from '(?:\?|&|a... ^ TIP: Verwenden Sie die Syntax für Escape-Zeichenketten, z.B. E'\r\n'. FEHLER: ungültiger regulärer Ausdruck: quantifier operand invalid) How do I need to escape the RegEx? Thank for your help & greetings from Germany, Torsten |
| |||
| On Feb 5, 5:11 am, Torsten Zühlsdorff <f...@meisterderspiele.de> wrote: > Hello, > > i have a list of URLs from the HTTP-Referer. I get all URLs which > contains "google". Now i want to extract the searchstring. For example: > "http://www.google.de/search?hl=de&q=porenbeton+planbauplatten+abmessu.. .." > should return "porenbeton+planbauplatten+abmessungen" > > Therefor i use this RegEx: > (?:\?|&|as_)q=(.*?)(?:&|\s) > > In SQL it look like this: > SELECT > substring('http://www.google.de/search?hl=de&q=porenbeton+planbauplatten+abmessu.. . > from '(?:\?|&|as_)q=(.*?)(?:&|\s)'); > > But i get this error-message: > quantifier operand invalid > > (complete errror-message in german: > WARNUNG: nicht standardkonforme Verwendung von Escape in > Zeichenkettenkonstante > ZEILE 1: ...porenbeton+planbauplatten+abmessungen&meta=' from '(?:\?|&|a.... > ^ > TIP: Verwenden Sie die Syntax für Escape-Zeichenketten, z.B. E'\r\n'. > FEHLER: ungültiger regulärer Ausdruck: quantifier operand invalid) > > How do I need to escape the RegEx? > > Thank for your help & greetings from Germany, > Torsten Does '^.*q=([^&=]*).*$' work for you? reporting=# SELECT substring('http://www.google.de/search? hl=de&q=porenbeton+planbauplatten+abmessungen&meta =' FROM '^.*q=([^&=]*).*$') ; substring --------------------------------------- porenbeton+planbauplatten+abmessungen (1 row) |
| |||
| shakahshakah schrieb: > On Feb 5, 5:11 am, Torsten Zühlsdorff <f...@meisterderspiele.de> > wrote: >> Hello, >> >> i have a list of URLs from the HTTP-Referer. I get all URLs which >> contains "google". Now i want to extract the searchstring. For example: >> "http://www.google.de/search?hl=de&q=porenbeton+planbauplatten+abmessu.. ." >> should return "porenbeton+planbauplatten+abmessungen" >> >> Therefor i use this RegEx: >> (?:\?|&|as_)q=(.*?)(?:&|\s) >> >> In SQL it look like this: >> SELECT >> substring('http://www.google.de/search?hl=de&q=porenbeton+planbauplatten+abmessu.. . >> from '(?:\?|&|as_)q=(.*?)(?:&|\s)'); >> >> But i get this error-message: >> quantifier operand invalid >> >> (complete errror-message in german: >> WARNUNG: nicht standardkonforme Verwendung von Escape in >> Zeichenkettenkonstante >> ZEILE 1: ...porenbeton+planbauplatten+abmessungen&meta=' from '(?:\?|&|a... >> ^ >> TIP: Verwenden Sie die Syntax für Escape-Zeichenketten, z.B. E'\r\n'. >> FEHLER: ungültiger regulärer Ausdruck: quantifier operand invalid) >> >> How do I need to escape the RegEx? >> >> Thank for your help & greetings from Germany, >> Torsten > > Does '^.*q=([^&=]*).*$' work for you? It works in most cases. But not on strings like this: http://www.google.de/search?q=Versag...nt=firefo x-a http://www.google.de/search?q=Lampen...nt=firefo x-a The result is always "t": crawler=# SELECT SUBSTRING('http://www.google.de/search?q=Lampen+anbringen&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:de FROM '^.*q=([^&=]*).*$'); substring ----------- t (1 Zeile) I can not figure out, why it don't work, because i do not understand the RegEx completly But every User, which use the "firefox-google" (http://de.start2.mozilla.com/firefox...a:de:official), create a referer which could not be parsed by your regex :/ Greetings, Torsten |
| |||
| On Feb 6, 2:54 am, Torsten Zühlsdorff <f...@meisterderspiele.de> wrote: > shakahshakah schrieb: > > > > > On Feb 5, 5:11 am, Torsten Zühlsdorff <f...@meisterderspiele.de> > > wrote: > >> Hello, > > >> i have a list of URLs from the HTTP-Referer. I get all URLs which > >> contains "google". Now i want to extract the searchstring. For example: > >> "http://www.google.de/search?hl=de&q=porenbeton+planbauplatten+abmessu.. ." > >> should return "porenbeton+planbauplatten+abmessungen" > > >> Therefor i use this RegEx: > >> (?:\?|&|as_)q=(.*?)(?:&|\s) > > >> In SQL it look like this: > >> SELECT > >> substring('http://www.google.de/search?hl=de&q=porenbeton+planbauplatten+abmessu.. . > >> from '(?:\?|&|as_)q=(.*?)(?:&|\s)'); > > >> But i get this error-message: > >> quantifier operand invalid > > >> (complete errror-message in german: > >> WARNUNG: nicht standardkonforme Verwendung von Escape in > >> Zeichenkettenkonstante > >> ZEILE 1: ...porenbeton+planbauplatten+abmessungen&meta=' from '(?:\?|&|a... > >> ^ > >> TIP: Verwenden Sie die Syntax für Escape-Zeichenketten, z.B. E'\r\n'.. > >> FEHLER: ungültiger regulärer Ausdruck: quantifier operand invalid) > > >> How do I need to escape the RegEx? > > >> Thank for your help & greetings from Germany, > >> Torsten > > > Does '^.*q=([^&=]*).*$' work for you? > > It works in most cases. But not on strings like this:http://www.google.de/search?q=Versag...&oe=utf-8&aq=t... > > The result is always "t": > crawler=# SELECT > SUBSTRING('http://www.google.de/search?q=Lampen+anbringen&ie=utf-8&oe=utf-8&aq=t... > FROM '^.*q=([^&=]*).*$'); > substring > ----------- > t > (1 Zeile) > > I can not figure out, why it don't work, because i do not understand the > RegEx completly > But every User, which use the "firefox-google" > (http://de.start2.mozilla.com/firefox...g.mozilla....), > create a referer which could not be parsed by your regex :/ Looks like it gets tripped up by query params like "aq" (in addition to "q") -- how about: '^.*[?&]q=([^&=]*).*$' reporting=> SELECT SUBSTRING('http://www.google.de/search?q=Versagen +der +Teilungsgenehmigung&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:de a' FROM '^.*[?&]q=([^&=]*).*$'); substring ---------------------------------- Versagen+der+Teilungsgenehmigung (1 row) |
| ||||
| shakahshakah schrieb: >>>> i have a list of URLs from the HTTP-Referer. I get all URLs which >>>> contains "google". Now i want to extract the searchstring. For example: >>>> "http://www.google.de/search?hl=de&q=porenbeton+planbauplatten+abmessu.. ." >>>> should return "porenbeton+planbauplatten+abmessungen" >>>> Therefor i use this RegEx: >>>> (?:\?|&|as_)q=(.*?)(?:&|\s) >>>> In SQL it look like this: >>>> SELECT >>>> substring('http://www.google.de/search?hl=de&q=porenbeton+planbauplatten+abmessu.. . >>>> from '(?:\?|&|as_)q=(.*?)(?:&|\s)'); >>>> But i get this error-message: >>>> quantifier operand invalid >>>> (complete errror-message in german: >>>> WARNUNG: nicht standardkonforme Verwendung von Escape in >>>> Zeichenkettenkonstante >>>> ZEILE 1: ...porenbeton+planbauplatten+abmessungen&meta=' from '(?:\?|&|a... >>>> ^ >>>> TIP: Verwenden Sie die Syntax für Escape-Zeichenketten, z.B. E'\r\n'. >>>> FEHLER: ungültiger regulärer Ausdruck: quantifier operand invalid) >>>> How do I need to escape the RegEx? >>>> Thank for your help & greetings from Germany, >>>> Torsten >>> Does '^.*q=([^&=]*).*$' work for you? >> It works in most cases. But not on strings like this:http://www.google.de/search?q=Versag...&oe=utf-8&aq=t... >> >> The result is always "t": >> crawler=# SELECT >> SUBSTRING('http://www.google.de/search?q=Lampen+anbringen&ie=utf-8&oe=utf-8&aq=t... >> FROM '^.*q=([^&=]*).*$'); >> substring >> ----------- >> t >> (1 Zeile) >> >> I can not figure out, why it don't work, because i do not understand the >> RegEx completly >> But every User, which use the "firefox-google" >> (http://de.start2.mozilla.com/firefox...rg.mozilla...), >> create a referer which could not be parsed by your regex :/ > > Looks like it gets tripped up by query params like "aq" (in addition > to "q") -- how about: > '^.*[?&]q=([^&=]*).*$' > [..] This seems to work really great. Thank you very much! Greetings, Torsten |