Unix Technical Forum

A use for REGEXP, I think.

This is a discussion on A use for REGEXP, I think. within the MySQL General forum forums, part of the MySQL category; --> I have a field in a DB that contains something like "lvlscore4:4493,lvlscore5:3232,lvlscore6:6128" I need to select all records where ...


Go Back   Unix Technical Forum > Database Server Software > MySQL > MySQL General forum

Register FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 02-28-2008, 07:02 AM
David Scott
 
Posts: n/a
Default A use for REGEXP, I think.

I have a field in a DB that contains something like
"lvlscore4:4493,lvlscore5:3232,lvlscore6:6128"
I need to select all records where the "lvlscore5:" is greater than 3000.
I think this can be done using REGEXP but I have read the docs and am non
the wiser for it. Can anyone point me in the right direction?
--
Thanks
David Scott

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 02-28-2008, 07:02 AM
Rob Wultsch
 
Posts: n/a
Default Re: A use for REGEXP, I think.

On Dec 9, 2007 10:53 AM, David Scott <critters@desktopcreatures.com> wrote:
> I have a field in a DB that contains something like
> "lvlscore4:4493,lvlscore5:3232,lvlscore6:6128"
> I need to select all records where the "lvlscore5:" is greater than 3000.
> I think this can be done using REGEXP but I have read the docs and am non
> the wiser for it. Can anyone point me in the right direction?
> --
> Thanks
> David Scott
>

Insert lecture about correctly normalizing data.

I think what you suggested is the wrong path for a successful outcome.
As far as I can tell the regex engine in mysql only returns a
booleans. http://dev.mysql.com/doc/refman/5.0/en/regexp.html . Someone
correct me if I am wrong.

This would work:

SELECT 'lvlscore4:4493,lvlscore5:3232,lvlscore6:6128',
SUBSTRING('lvlscore4:4493,lvlscore5:3232,lvlscore6 :6128',
LOCATE('lvlscore5','lvlscore4:4493,lvlscore5:3232, lvlscore6:6128')+length('lvlscore5')+1,
4

) as 'lvlscore5'

4 is a magic number that will not work if the score is not always of
length 4. If that is the case replacing the last argument of substring
with a bit more logic will sort this problem.
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 02-28-2008, 07:02 AM
Michael Dykman
 
Posts: n/a
Default Re: A use for REGEXP, I think.

regex discussion are a little outside of the scope of this list, but I
will offer an opinion: You can't do math, no matter how simple, in
regular expressions. I have seen many try and fail.

While I am sure there are circumstances around how that data got that
way, what you really need to do is use the database relationally and
decompose that field into data structures instead of strings, then you
can do pretty much anything you want.

- michael dykman

On Dec 9, 2007 12:53 PM, David Scott <critters@desktopcreatures.com> wrote:
> I have a field in a DB that contains something like
> "lvlscore4:4493,lvlscore5:3232,lvlscore6:6128"
> I need to select all records where the "lvlscore5:" is greater than 3000.
> I think this can be done using REGEXP but I have read the docs and am non
> the wiser for it. Can anyone point me in the right direction?
> --
> Thanks
> David Scott
>




--
- michael dykman
- mdykman@gmail.com

- All models are wrong. Some models are useful.
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 02-28-2008, 07:02 AM
Rob Wultsch
 
Posts: n/a
Default Re: A use for REGEXP, I think.

On Dec 9, 2007 9:17 PM, Rob Wultsch <wultsch@gmail.com> wrote:
>
> On Dec 9, 2007 10:53 AM, David Scott <critters@desktopcreatures.com> wrote:
> > I have a field in a DB that contains something like
> > "lvlscore4:4493,lvlscore5:3232,lvlscore6:6128"
> > I need to select all records where the "lvlscore5:" is greater than 3000.
> > I think this can be done using REGEXP but I have read the docs and am non
> > the wiser for it. Can anyone point me in the right direction?
> > --
> > Thanks
> > David Scott
> >

> Insert lecture about correctly normalizing data.
>
> I think what you suggested is the wrong path for a successful outcome.
> As far as I can tell the regex engine in mysql only returns a
> booleans. http://dev.mysql.com/doc/refman/5.0/en/regexp.html . Someone
> correct me if I am wrong.
>
> This would work:
>
> SELECT 'lvlscore4:4493,lvlscore5:3232,lvlscore6:6128',
> SUBSTRING('lvlscore4:4493,lvlscore5:3232,lvlscore6 :6128',
> LOCATE('lvlscore5','lvlscore4:4493,lvlscore5:3232, lvlscore6:6128')+length('lvlscore5')+1,
> 4
>
> ) as 'lvlscore5'
>
> 4 is a magic number that will not work if the score is not always of
> length 4. If that is the case replacing the last argument of substring
> with a bit more logic will sort this problem.
>


To be clear the regex should work, but it will probably take more work
than what I just suggested. Using what I suggested in the where clause
rather than as shown in the select clause will probably be the easiest
solution.
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 02-28-2008, 07:02 AM
David Scott
 
Posts: n/a
Default Re: A use for REGEXP, I think.

I came up with a clunky solution: REGEXP
'lvlscore5:[6-9][0-9][0-9][0-9][0-9]';
This gives me all scores over 6000. I then add on another [0-9] and change
the [6-9] to [1-9] to do 10k+
As this was a one off and only returned 6 out of thoustands of records it
was good enough for a quick and dirty manual edit.

I would of course have gone the seperate table route to hold the name/value
pairs but the main reason I went the string route becasue the process that
read and writes the string does all the work, there should never have been a
need for MySQL to know what it contains or to process it in any way. Also as
there can be 20+ bits of data in this string and the table is going to be
pretty huge it would have made for an even larger 2nd table. (by "huge" I
mean 1million+ records)

With this in mind would it still be best to have gone the 2nd table route?
--
Dave

On 10/12/2007, Rob Wultsch <wultsch@gmail.com> wrote:
>
> On Dec 9, 2007 9:17 PM, Rob Wultsch <wultsch@gmail.com> wrote:
> >
> > On Dec 9, 2007 10:53 AM, David Scott <critters@desktopcreatures.com>

> wrote:
> > > I have a field in a DB that contains something like
> > > "lvlscore4:4493,lvlscore5:3232,lvlscore6:6128"
> > > I need to select all records where the "lvlscore5:" is greater than

> 3000.
> > > I think this can be done using REGEXP but I have read the docs and am

> non
> > > the wiser for it. Can anyone point me in the right direction?
> > > --
> > > Thanks
> > > David Scott
> > >

> > Insert lecture about correctly normalizing data.
> >
> > I think what you suggested is the wrong path for a successful outcome.
> > As far as I can tell the regex engine in mysql only returns a
> > booleans. http://dev.mysql.com/doc/refman/5.0/en/regexp.html . Someone
> > correct me if I am wrong.
> >
> > This would work:
> >
> > SELECT 'lvlscore4:4493,lvlscore5:3232,lvlscore6:6128',
> > SUBSTRING('lvlscore4:4493,lvlscore5:3232,lvlscore6 :6128',
> >

> LOCATE('lvlscore5','lvlscore4:4493,lvlscore5:3232, lvlscore6:6128')+length('lvlscore5')+1,
> > 4
> >
> > ) as 'lvlscore5'
> >
> > 4 is a magic number that will not work if the score is not always of
> > length 4. If that is the case replacing the last argument of substring
> > with a bit more logic will sort this problem.
> >

>
> To be clear the regex should work, but it will probably take more work
> than what I just suggested. Using what I suggested in the where clause
> rather than as shown in the select clause will probably be the easiest
> solution.
>
>


Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #6 (permalink)  
Old 02-28-2008, 07:02 AM
Rob Wultsch
 
Posts: n/a
Default Re: A use for REGEXP, I think.

On Dec 10, 2007 1:26 AM, David Scott <critters@desktopcreatures.com> wrote:
> I came up with a clunky solution: REGEXP
> 'lvlscore5:[6-9][0-9][0-9][0-9][0-9]';
> This gives me all scores over 6000. I then add on another [0-9] and change
> the [6-9] to [1-9] to do 10k+
> As this was a one off and only returned 6 out of thoustands of records it
> was good enough for a quick and dirty manual edit.
>
> I would of course have gone the seperate table route to hold the name/value
> pairs but the main reason I went the string route becasue the process that
> read and writes the string does all the work, there should never have been a
> need for MySQL to know what it contains or to process it in any way. Also as
> there can be 20+ bits of data in this string and the table is going to be
> pretty huge it would have made for an even larger 2nd table. (by "huge" I
> mean 1million+ records)
>
> With this in mind would it still be best to have gone the 2nd table route?


Your needs may vary, but I am paranoid and assume that my code is
buggy, and think it is easier to check sanity by viewing the actual
values with an intermediate value showing what had been pulled out,
and a boolean that describes a potential where clause action. Whether
or not a second table is created for sanity testing, and/or future
work I think is a matter of preference. Of course if you needed to
anything more than a single action it would almost certainly be more
efficent to create the second table. With any regex or
SUBSTRING+LOCATE solution every row would need to be examined, while
with a second table a index could be added which would make all
actions after a insert select much faster....

FYI:
single regex syntax would be REGEXP
'lvlscore5[6-9][0-9]{3}|[0-9]{5,})' which translates to 'lvlscore5:'
followed by EITHER a number between 6 and 9 then followed by exactly 3
more numbers of any value, OR any 5 or more digit number.
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 02:27 PM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com