This is a discussion on Case Insensitivity within the Informix forums, part of the Database Server Software category; --> Hi Folks, just checking what the lastest status is on giving us case insensitive searches etc in Informix databases. ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| Hi Folks, just checking what the lastest status is on giving us case insensitive searches etc in Informix databases. The bloody subject won't go away. Stalled in the planning committee still? Even Orrible 10 is allegedly out now with CI searches. Madison, Jonathan, I'm looking in your direction... And hello to you all, by the way. |
| |||
| Andrew, On 9.4, you can do this: CREATE FUNCTION upshift (value VARCHAR(225)) RETURNING VARCHAR(225) AS value WITH (NOT VARIANT); RETURN UPPER(value); END FUNCTION; CREATE INDEX index_name ON table_name (upshift(column_name)); Note that CREATE INDEX index_name ON table_name (UPPER(column_name)); results in SQL Error (-9844): Invalid function (upper) used in a functional key. hence the need for the SPL function wrapper. To test that it works: SET EXPLAIN ON; SELECT * FROM table_name WHERE (upshift(column_name)) LIKE 'VALUE%'; The "sqexplain.out" file will show the index being used. I know this is only a work-around, but I hope it helps. -- Regards, Doug Lawry www.douglawry.webhop.org "Andrew Hamm" <ahamm@mail.com> wrote in message news:50jb1dF1gaai2U1@mid.individual.net... > Hi Folks, > > just checking what the lastest status is on giving us case insensitive > searches etc in Informix databases. The bloody subject won't go away. > Stalled in the planning committee still? Even Orrible 10 is allegedly out > now with CI searches. > > Madison, Jonathan, I'm looking in your direction... > > And hello to you all, by the way. |
| |||
| Is there any way to do this with character sets? A long time ago when I was doing Sybase development they had a feature that allowed you to do case insensitive searches by using a character set that ignored cases on comparison but still saved the data the same way. Andrew Hamm wrote: > Hi Folks, > > just checking what the lastest status is on giving us case insensitive > searches etc in Informix databases. The bloody subject won't go away. > Stalled in the planning committee still? Even Orrible 10 is allegedly out > now with CI searches. > > Madison, Jonathan, I'm looking in your direction... > > And hello to you all, by the way. |
| |||
| Check this out: http://dev.mysql.com/doc/refman/5.0/...nsitivity.html Andrew Hamm wrote: > Hi Folks, > > just checking what the lastest status is on giving us case insensitive > searches etc in Informix databases. The bloody subject won't go away. > Stalled in the planning committee still? Even Orrible 10 is allegedly out > now with CI searches. > > Madison, Jonathan, I'm looking in your direction... > > And hello to you all, by the way. > |
| |||
| I wonder if Informix character sets work in a similar way. I looked into it but I didn't get anywhere in the past. It looks like you could set up a custom character set to do this but then you could have all kinds of issues. There has to be at least 1 expert on this on this forum. On Jan 10, 9:24 am, Lucky <l...@lucky-you.com> wrote: > Check this out:http://dev.mysql.com/doc/refman/5.0/...nsitivity.html > > Andrew Hamm wrote: > > Hi Folks, > > > just checking what the lastest status is on giving us case insensitive > > searches etc in Informix databases. The bloody subject won't go away. > > Stalled in the planning committee still? Even Orrible 10 is allegedly out > > now with CI searches. > > > Madison, Jonathan, I'm looking in your direction... > > > And hello to you all, by the way. |
| |||
| Lucky: > Check this out: > http://dev.mysql.com/doc/refman/5.0/...nsitivity.html Yeah, something of that order, but actually the opposite problem! Customers want to search by name, using "john" to find John john and JOHN. Personally I would prefer that if a column is declared case insensitive (CI), then also the strings all compare equal for unique indexing etc. I'm not just chasing case insensitive comparisons, but storage too. Bozon: > I wonder if Informix character sets work in a similar way. I looked > into it but I didn't get anywhere in the past. It looks like you could > set up a custom character set to do this but then you could have all > kinds of issues. There has to be at least 1 expert on this on this forum. With NCHAR types? I've been wondering myself if I can create a case-insensitive English character set and then change our chosen columns to NCHAR and NVARCHAR, but the more I read about UNICODE and locales, the more my brain starts to hurt. Doug: > On 9.4, you can do this: Yes, that idea is floating around here, but then all our SQL's will need to be modified to use the upshift function in all WHERE clauses. That's a frightening thought when you have millions of lines of code to deal with. Even if we did this task, the end-users with Crystal Reports etc will probably complain that there is no automagic support for them too. My understanding is that CHAR types use plain old binary comparisons in the traditional manner, hence are very fast. If you use NCHAR, then all comparisons and matching have to be fed through locale-sensitive comparison routines, hence they lose a bit of performance. If there was a CASE INSENSITIVE clause on the column definitions, then I expect a CHAR would have to redirect off at least to a case insensitive comparator function, while an NCHAR CASE INSENSITIVE would probably get away with a barely noticable degredation due to the other work going on for that datatype. I expect CHAR CI would suffer some minor performance hits because of this effect, but it couldn't be major. The work involved adding this attribute in the engine source is not something I'm willing to ponder; perhaps it would be more cost-effective for Informix to simply release CI versions of all their locales rather than dicking with the engine too much. |
| |||
| My brain hurt too looking at the unicode/locale stuff. I don't think that the performance is impacted much because the comparison value is stored in a lookup array. When I looked at the local source files ($INFORMIXDIR/gls/lc11/os/en_US.ISO8859-1.lc for example You get definitions like # # Alphabetic Lower-case to Upper-case Conversion # toupper (\x61,\x41);(\x62,\x42);(\x63,\x43);(\x64,\x44);(\ x65,\x45);\ (\x66,\x46);(\x67,\x47);(\x68,\x48);(\x69,\x49);(\ x6a,\x4a);\ (\x6b,\x4b);(\x6c,\x4c);(\x6d,\x4d);(\x6e,\x4e);(\ x6f,\x4f);\ (\x70,\x50);(\x71,\x51);(\x72,\x52);(\x73,\x53);(\ x74,\x54);\ (\x75,\x55);(\x76,\x56);(\x77,\x57);(\x78,\x58);(\ x79,\x59);\ (\x7a,\x5a) # # Alphabetic Upper-case to Lower-case Conversion # tolower (\x41,\x61);(\x42,\x62);(\x43,\x63);(\x44,\x64);(\ x45,\x65);\ (\x46,\x66);(\x47,\x67);(\x48,\x68);(\x49,\x69);(\ x4a,\x6a);\ (\x4b,\x6b);(\x4c,\x6c);(\x4d,\x6d);(\x4e,\x6e);(\ x4f,\x6f);\ (\x50,\x70);(\x51,\x71);(\x52,\x72);(\x53,\x73);(\ x54,\x74);\ (\x55,\x75);(\x56,\x76);(\x57,\x77);(\x58,\x78);(\ x59,\x79);\ (\x5a,\x7a) Which implies and is really the only sensible way to do it a look up array. they had a character and then a comparison value. So for case insensitive you had: In the collating area you have : LC_COLLATE collating-symbol <Weight_2> collating-symbol <Weight_3> collating-symbol <Weight_4> collating-symbol <Weight_5> collating-symbol <Weight_6> collating-symbol <Weight_7> collating-symbol <Weight_8> collating-symbol <Weight_9> collating-symbol <Weight_10> collating-symbol <Weight_11> collating-symbol <Weight_12> collating-symbol <Weight_13> .... collating-symbol <Weight_251> collating-symbol <Weight_252> collating-symbol <Weight_253> collating-symbol <Weight_255> collating-symbol <Weight_256> collating-symbol <Weight_257> order_start forward \x00 <Weight_2> <Weight_3> <Weight_4> <Weight_5> <Weight_6> <Weight_7> <Weight_8> <Weight_9> <Weight_10> <Weight_11> <Weight_12> <Weight_13> <Weight_14> <Weight_15> <Weight_16> <Weight_17> <Weight_18> <Weight_19> <Weight_20> <Weight_21> .... <Weight_250> <Weight_251> <Weight_252> <Weight_253> <Weight_255> <Weight_256> <Weight_257> \x01 <Weight_2> \x02 <Weight_3> \x03 <Weight_4> \x04 <Weight_5> \x05 <Weight_6> \x06 <Weight_7> \x07 <Weight_8> \x08 <Weight_9> \x09 <Weight_10> \x0a <Weight_11> \x0b <Weight_12> \x0c <Weight_13> \x0d <Weight_14> \x0e <Weight_15> \x0f <Weight_16> \x10 <Weight_17> \x11 <Weight_18> .... \xf8 <Weight_249> \xf9 <Weight_250> \xfa <Weight_251> \xfb <Weight_252> \xfc <Weight_253> \xfd <Weight_255> \xfe <Weight_256> \xff <Weight_257> order_end END LC_COLLATE This also implies that they are building some lookup tables for this task. So, to make your own case insensitive locale I would change the weights of the upper and lower case letters to the same values for example: \x41 <Weight_66> # 'A' \x61 <Weight_66> #'a' I just haven't been pushed to the point of trying it. Andrew Hamm wrote: > Bozon: > > I wonder if Informix character sets work in a similar way. I looked > > into it but I didn't get anywhere in the past. It looks like you could > > set up a custom character set to do this but then you could have all > > kinds of issues. There has to be at least 1 expert on this on this forum. > > With NCHAR types? I've been wondering myself if I can create a > case-insensitive English character set and then change our chosen columns > to NCHAR and NVARCHAR, but the more I read about UNICODE and locales, the > more my brain starts to hurt. > > If you use NCHAR, then all comparisons and matching have to be fed through > locale-sensitive comparison routines, hence they lose a bit of performance. > > If there was a CASE INSENSITIVE clause on the column definitions, then I > expect a CHAR would have to redirect off at least to a case insensitive > comparator function, while an NCHAR CASE INSENSITIVE would probably get > away with a barely noticable degredation due to the other work going on for > that datatype. > > I expect CHAR CI would suffer some minor performance hits because of this > effect, but it couldn't be major. The work involved adding this attribute > in the engine source is not something I'm willing to ponder; perhaps it > would be more cost-effective for Informix to simply release CI versions of > all their locales rather than dicking with the engine too much. |
| |||
| Andrew Hamm wrote: > Lucky: > >>Check this out: >>http://dev.mysql.com/doc/refman/5.0/...nsitivity.html > > > Yeah, something of that order, but actually the opposite problem! Customers > want to search by name, using "john" to find John john and JOHN. Personally > I would prefer that if a column is declared case insensitive (CI), then > also the strings all compare equal for unique indexing etc. I'm not just > chasing case insensitive comparisons, but storage too. You could in 9.xx/10/11 use the Virtual Index interface to define your own secondary access method to use when creating the index that will automagically convert the inputs. The magic functionality would be supplied in the am_beginscan, am_getnext, and am_rescan functions. The only problem is it will only work if that index is selected. To get it to work for post index comparisons and table scans you'd have to define a primary access method on the table to do the same. Art S. Kagel > Bozon: > >>I wonder if Informix character sets work in a similar way. I looked >>into it but I didn't get anywhere in the past. It looks like you could >>set up a custom character set to do this but then you could have all >>kinds of issues. There has to be at least 1 expert on this on this forum. > > > With NCHAR types? I've been wondering myself if I can create a > case-insensitive English character set and then change our chosen columns > to NCHAR and NVARCHAR, but the more I read about UNICODE and locales, the > more my brain starts to hurt. > > Doug: > >>On 9.4, you can do this: > > > Yes, that idea is floating around here, but then all our SQL's will need to > be modified to use the upshift function in all WHERE clauses. That's a > frightening thought when you have millions of lines of code to deal with. > Even if we did this task, the end-users with Crystal Reports etc will > probably complain that there is no automagic support for them too. > > My understanding is that CHAR types use plain old binary comparisons in the > traditional manner, hence are very fast. > > If you use NCHAR, then all comparisons and matching have to be fed through > locale-sensitive comparison routines, hence they lose a bit of performance. > > If there was a CASE INSENSITIVE clause on the column definitions, then I > expect a CHAR would have to redirect off at least to a case insensitive > comparator function, while an NCHAR CASE INSENSITIVE would probably get > away with a barely noticable degredation due to the other work going on for > that datatype. > > I expect CHAR CI would suffer some minor performance hits because of this > effect, but it couldn't be major. The work involved adding this attribute > in the engine source is not something I'm willing to ponder; perhaps it > would be more cost-effective for Informix to simply release CI versions of > all their locales rather than dicking with the engine too much. > |
| |||
| Art S. Kagel wrote: > You could in 9.xx/10/11 use the Virtual Index interface to define your own > secondary access method to use when creating the index that will > automagically convert the inputs. The magic functionality would be > supplied in the am_beginscan, am_getnext, and am_rescan functions. > > The only problem is it will only work if that index is selected. To get > it to work for post index comparisons and table scans you'd have to define > a primary access method on the table to do the same. Hey bloke, thanks for the reply. How are you keeping? This all sounds interesting, can you go slightly more into the "primary access method" bit for someone who's SQL level hasn't advanced much further than V7, functionality-wise? I think the ultimate goal, unless anyone can make a good point, is: 1) storage as-is, values go in and out in their mixed case without transformation. 2) indexes, comparisons and matches all act caseless, a key test being that john and JOHN cannot both coexist in a unique constraint. |
| ||||
| Andrew Hamm wrote: > Art S. Kagel wrote: > >> You could in 9.xx/10/11 use the Virtual Index interface to define your own >> secondary access method to use when creating the index that will >> automagically convert the inputs. The magic functionality would be >> supplied in the am_beginscan, am_getnext, and am_rescan functions. >> >> The only problem is it will only work if that index is selected. To get >> it to work for post index comparisons and table scans you'd have to define >> a primary access method on the table to do the same. > > Hey bloke, thanks for the reply. How are you keeping? > > This all sounds interesting, can you go slightly more into the "primary > access method" bit for someone who's SQL level hasn't advanced much further > than V7, functionality-wise? > > I think the ultimate goal, unless anyone can make a good point, is: > > 1) storage as-is, values go in and out in their mixed case without > transformation. > > 2) indexes, comparisons and matches all act caseless, a key test being that > john and JOHN cannot both coexist in a unique constraint. > On a real tangent, but one of the previous things I was involved in was to make ALL names "soundexed". Strip out vowels, strip out repeating characters, uppercase everything, and only store a maximum of 8 characters (4 from surname, 2 from forename, and two initials from other names) ... (and probably a few other things). So Forename => "Richard" Surname => "Smith" ends up as "SMTHRCHRD", and then input is "soundexed" and compared against the soundexed name index. This provided a more flexible approach, and some "vague" matching which was very useful for call centres taking calls from anywhere in the world. |