This is a discussion on A use for REGEXP, I think. within the MySQL General forum forums, part of the MySQL category; --> I have a field in a DB that contains something like "lvlscore4:4493,lvlscore5:3232,lvlscore6:6128" I need to select all records where ...
| |||||||
| Register | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| I have a field in a DB that contains something like "lvlscore4:4493,lvlscore5:3232,lvlscore6:6128" I need to select all records where the "lvlscore5:" is greater than 3000. I think this can be done using REGEXP but I have read the docs and am non the wiser for it. Can anyone point me in the right direction? -- Thanks David Scott |
| |||
| On Dec 9, 2007 10:53 AM, David Scott <critters@desktopcreatures.com> wrote: > I have a field in a DB that contains something like > "lvlscore4:4493,lvlscore5:3232,lvlscore6:6128" > I need to select all records where the "lvlscore5:" is greater than 3000. > I think this can be done using REGEXP but I have read the docs and am non > the wiser for it. Can anyone point me in the right direction? > -- > Thanks > David Scott > Insert lecture about correctly normalizing data. I think what you suggested is the wrong path for a successful outcome. As far as I can tell the regex engine in mysql only returns a booleans. http://dev.mysql.com/doc/refman/5.0/en/regexp.html . Someone correct me if I am wrong. This would work: SELECT 'lvlscore4:4493,lvlscore5:3232,lvlscore6:6128', SUBSTRING('lvlscore4:4493,lvlscore5:3232,lvlscore6 :6128', LOCATE('lvlscore5','lvlscore4:4493,lvlscore5:3232, lvlscore6:6128')+length('lvlscore5')+1, 4 ) as 'lvlscore5' 4 is a magic number that will not work if the score is not always of length 4. If that is the case replacing the last argument of substring with a bit more logic will sort this problem. |
| |||
| regex discussion are a little outside of the scope of this list, but I will offer an opinion: You can't do math, no matter how simple, in regular expressions. I have seen many try and fail. While I am sure there are circumstances around how that data got that way, what you really need to do is use the database relationally and decompose that field into data structures instead of strings, then you can do pretty much anything you want. - michael dykman On Dec 9, 2007 12:53 PM, David Scott <critters@desktopcreatures.com> wrote: > I have a field in a DB that contains something like > "lvlscore4:4493,lvlscore5:3232,lvlscore6:6128" > I need to select all records where the "lvlscore5:" is greater than 3000. > I think this can be done using REGEXP but I have read the docs and am non > the wiser for it. Can anyone point me in the right direction? > -- > Thanks > David Scott > -- - michael dykman - mdykman@gmail.com - All models are wrong. Some models are useful. |
| |||
| On Dec 9, 2007 9:17 PM, Rob Wultsch <wultsch@gmail.com> wrote: > > On Dec 9, 2007 10:53 AM, David Scott <critters@desktopcreatures.com> wrote: > > I have a field in a DB that contains something like > > "lvlscore4:4493,lvlscore5:3232,lvlscore6:6128" > > I need to select all records where the "lvlscore5:" is greater than 3000. > > I think this can be done using REGEXP but I have read the docs and am non > > the wiser for it. Can anyone point me in the right direction? > > -- > > Thanks > > David Scott > > > Insert lecture about correctly normalizing data. > > I think what you suggested is the wrong path for a successful outcome. > As far as I can tell the regex engine in mysql only returns a > booleans. http://dev.mysql.com/doc/refman/5.0/en/regexp.html . Someone > correct me if I am wrong. > > This would work: > > SELECT 'lvlscore4:4493,lvlscore5:3232,lvlscore6:6128', > SUBSTRING('lvlscore4:4493,lvlscore5:3232,lvlscore6 :6128', > LOCATE('lvlscore5','lvlscore4:4493,lvlscore5:3232, lvlscore6:6128')+length('lvlscore5')+1, > 4 > > ) as 'lvlscore5' > > 4 is a magic number that will not work if the score is not always of > length 4. If that is the case replacing the last argument of substring > with a bit more logic will sort this problem. > To be clear the regex should work, but it will probably take more work than what I just suggested. Using what I suggested in the where clause rather than as shown in the select clause will probably be the easiest solution. |
| |||
| I came up with a clunky solution: REGEXP 'lvlscore5:[6-9][0-9][0-9][0-9][0-9]'; This gives me all scores over 6000. I then add on another [0-9] and change the [6-9] to [1-9] to do 10k+ As this was a one off and only returned 6 out of thoustands of records it was good enough for a quick and dirty manual edit. I would of course have gone the seperate table route to hold the name/value pairs but the main reason I went the string route becasue the process that read and writes the string does all the work, there should never have been a need for MySQL to know what it contains or to process it in any way. Also as there can be 20+ bits of data in this string and the table is going to be pretty huge it would have made for an even larger 2nd table. (by "huge" I mean 1million+ records) With this in mind would it still be best to have gone the 2nd table route? -- Dave On 10/12/2007, Rob Wultsch <wultsch@gmail.com> wrote: > > On Dec 9, 2007 9:17 PM, Rob Wultsch <wultsch@gmail.com> wrote: > > > > On Dec 9, 2007 10:53 AM, David Scott <critters@desktopcreatures.com> > wrote: > > > I have a field in a DB that contains something like > > > "lvlscore4:4493,lvlscore5:3232,lvlscore6:6128" > > > I need to select all records where the "lvlscore5:" is greater than > 3000. > > > I think this can be done using REGEXP but I have read the docs and am > non > > > the wiser for it. Can anyone point me in the right direction? > > > -- > > > Thanks > > > David Scott > > > > > Insert lecture about correctly normalizing data. > > > > I think what you suggested is the wrong path for a successful outcome. > > As far as I can tell the regex engine in mysql only returns a > > booleans. http://dev.mysql.com/doc/refman/5.0/en/regexp.html . Someone > > correct me if I am wrong. > > > > This would work: > > > > SELECT 'lvlscore4:4493,lvlscore5:3232,lvlscore6:6128', > > SUBSTRING('lvlscore4:4493,lvlscore5:3232,lvlscore6 :6128', > > > LOCATE('lvlscore5','lvlscore4:4493,lvlscore5:3232, lvlscore6:6128')+length('lvlscore5')+1, > > 4 > > > > ) as 'lvlscore5' > > > > 4 is a magic number that will not work if the score is not always of > > length 4. If that is the case replacing the last argument of substring > > with a bit more logic will sort this problem. > > > > To be clear the regex should work, but it will probably take more work > than what I just suggested. Using what I suggested in the where clause > rather than as shown in the select clause will probably be the easiest > solution. > > |
| ||||
| On Dec 10, 2007 1:26 AM, David Scott <critters@desktopcreatures.com> wrote: > I came up with a clunky solution: REGEXP > 'lvlscore5:[6-9][0-9][0-9][0-9][0-9]'; > This gives me all scores over 6000. I then add on another [0-9] and change > the [6-9] to [1-9] to do 10k+ > As this was a one off and only returned 6 out of thoustands of records it > was good enough for a quick and dirty manual edit. > > I would of course have gone the seperate table route to hold the name/value > pairs but the main reason I went the string route becasue the process that > read and writes the string does all the work, there should never have been a > need for MySQL to know what it contains or to process it in any way. Also as > there can be 20+ bits of data in this string and the table is going to be > pretty huge it would have made for an even larger 2nd table. (by "huge" I > mean 1million+ records) > > With this in mind would it still be best to have gone the 2nd table route? Your needs may vary, but I am paranoid and assume that my code is buggy, and think it is easier to check sanity by viewing the actual values with an intermediate value showing what had been pulled out, and a boolean that describes a potential where clause action. Whether or not a second table is created for sanity testing, and/or future work I think is a matter of preference. Of course if you needed to anything more than a single action it would almost certainly be more efficent to create the second table. With any regex or SUBSTRING+LOCATE solution every row would need to be examined, while with a second table a index could be added which would make all actions after a insert select much faster.... FYI: single regex syntax would be REGEXP 'lvlscore5 followed by EITHER a number between 6 and 9 then followed by exactly 3 more numbers of any value, OR any 5 or more digit number. |