Unix Technical Forum

Case Insensitivity

This is a discussion on Case Insensitivity within the Informix forums, part of the Database Server Software category; --> Hi Folks, just checking what the lastest status is on giving us case insensitive searches etc in Informix databases. ...


Go Back   Unix Technical Forum > Database Server Software > Informix

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-20-2008, 03:57 PM
Andrew Hamm
 
Posts: n/a
Default Case Insensitivity

Hi Folks,

just checking what the lastest status is on giving us case insensitive
searches etc in Informix databases. The bloody subject won't go away.
Stalled in the planning committee still? Even Orrible 10 is allegedly out
now with CI searches.

Madison, Jonathan, I'm looking in your direction...

And hello to you all, by the way.

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 04-20-2008, 03:57 PM
Doug Lawry
 
Posts: n/a
Default Re: Case Insensitivity

Andrew,

On 9.4, you can do this:

CREATE FUNCTION upshift (value VARCHAR(225))
RETURNING VARCHAR(225) AS value WITH (NOT VARIANT);
RETURN UPPER(value);
END FUNCTION;

CREATE INDEX index_name ON table_name (upshift(column_name));

Note that

CREATE INDEX index_name ON table_name (UPPER(column_name));

results in

SQL Error (-9844): Invalid function (upper) used in a functional key.

hence the need for the SPL function wrapper.

To test that it works:

SET EXPLAIN ON;

SELECT * FROM table_name
WHERE (upshift(column_name)) LIKE 'VALUE%';

The "sqexplain.out" file will show the index being used.

I know this is only a work-around, but I hope it helps.

--
Regards,
Doug Lawry
www.douglawry.webhop.org


"Andrew Hamm" <ahamm@mail.com> wrote in message
news:50jb1dF1gaai2U1@mid.individual.net...
> Hi Folks,
>
> just checking what the lastest status is on giving us case insensitive
> searches etc in Informix databases. The bloody subject won't go away.
> Stalled in the planning committee still? Even Orrible 10 is allegedly out
> now with CI searches.
>
> Madison, Jonathan, I'm looking in your direction...
>
> And hello to you all, by the way.



Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 04-20-2008, 03:57 PM
bozon
 
Posts: n/a
Default Re: Case Insensitivity

Is there any way to do this with character sets? A long time ago when I
was doing Sybase development they had a feature that allowed you to do
case insensitive searches by using a character set that ignored cases
on comparison but still saved the data the same way.

Andrew Hamm wrote:
> Hi Folks,
>
> just checking what the lastest status is on giving us case insensitive
> searches etc in Informix databases. The bloody subject won't go away.
> Stalled in the planning committee still? Even Orrible 10 is allegedly out
> now with CI searches.
>
> Madison, Jonathan, I'm looking in your direction...
>
> And hello to you all, by the way.


Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 04-20-2008, 03:57 PM
Lucky
 
Posts: n/a
Default Re: Case Insensitivity

Check this out: http://dev.mysql.com/doc/refman/5.0/...nsitivity.html

Andrew Hamm wrote:
> Hi Folks,
>
> just checking what the lastest status is on giving us case insensitive
> searches etc in Informix databases. The bloody subject won't go away.
> Stalled in the planning committee still? Even Orrible 10 is allegedly out
> now with CI searches.
>
> Madison, Jonathan, I'm looking in your direction...
>
> And hello to you all, by the way.
>


Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 04-20-2008, 03:57 PM
bozon
 
Posts: n/a
Default Re: Case Insensitivity

I wonder if Informix character sets work in a similar way. I looked
into it but I didn't get anywhere in the past. It looks like you could
set up a custom character set to do this but then you could have all
kinds of issues. There has to be at least 1 expert on this on this
forum.

On Jan 10, 9:24 am, Lucky <l...@lucky-you.com> wrote:
> Check this out:http://dev.mysql.com/doc/refman/5.0/...nsitivity.html
>
> Andrew Hamm wrote:
> > Hi Folks,

>
> > just checking what the lastest status is on giving us case insensitive
> > searches etc in Informix databases. The bloody subject won't go away.
> > Stalled in the planning committee still? Even Orrible 10 is allegedly out
> > now with CI searches.

>
> > Madison, Jonathan, I'm looking in your direction...

>
> > And hello to you all, by the way.


Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #6 (permalink)  
Old 04-20-2008, 03:58 PM
Andrew Hamm
 
Posts: n/a
Default Re: Case Insensitivity

Lucky:
> Check this out:
> http://dev.mysql.com/doc/refman/5.0/...nsitivity.html


Yeah, something of that order, but actually the opposite problem! Customers
want to search by name, using "john" to find John john and JOHN. Personally
I would prefer that if a column is declared case insensitive (CI), then
also the strings all compare equal for unique indexing etc. I'm not just
chasing case insensitive comparisons, but storage too.

Bozon:
> I wonder if Informix character sets work in a similar way. I looked
> into it but I didn't get anywhere in the past. It looks like you could
> set up a custom character set to do this but then you could have all
> kinds of issues. There has to be at least 1 expert on this on this forum.


With NCHAR types? I've been wondering myself if I can create a
case-insensitive English character set and then change our chosen columns
to NCHAR and NVARCHAR, but the more I read about UNICODE and locales, the
more my brain starts to hurt.

Doug:
> On 9.4, you can do this:


Yes, that idea is floating around here, but then all our SQL's will need to
be modified to use the upshift function in all WHERE clauses. That's a
frightening thought when you have millions of lines of code to deal with.
Even if we did this task, the end-users with Crystal Reports etc will
probably complain that there is no automagic support for them too.

My understanding is that CHAR types use plain old binary comparisons in the
traditional manner, hence are very fast.

If you use NCHAR, then all comparisons and matching have to be fed through
locale-sensitive comparison routines, hence they lose a bit of performance.

If there was a CASE INSENSITIVE clause on the column definitions, then I
expect a CHAR would have to redirect off at least to a case insensitive
comparator function, while an NCHAR CASE INSENSITIVE would probably get
away with a barely noticable degredation due to the other work going on for
that datatype.

I expect CHAR CI would suffer some minor performance hits because of this
effect, but it couldn't be major. The work involved adding this attribute
in the engine source is not something I'm willing to ponder; perhaps it
would be more cost-effective for Informix to simply release CI versions of
all their locales rather than dicking with the engine too much.

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #7 (permalink)  
Old 04-20-2008, 03:58 PM
bozon
 
Posts: n/a
Default Re: Case Insensitivity

My brain hurt too looking at the unicode/locale stuff. I don't think
that the performance is impacted much because the comparison value is
stored in a lookup array. When I looked at the local source files
($INFORMIXDIR/gls/lc11/os/en_US.ISO8859-1.lc for example

You get definitions like

#
# Alphabetic Lower-case to Upper-case Conversion
#
toupper (\x61,\x41);(\x62,\x42);(\x63,\x43);(\x64,\x44);(\ x65,\x45);\
(\x66,\x46);(\x67,\x47);(\x68,\x48);(\x69,\x49);(\ x6a,\x4a);\
(\x6b,\x4b);(\x6c,\x4c);(\x6d,\x4d);(\x6e,\x4e);(\ x6f,\x4f);\
(\x70,\x50);(\x71,\x51);(\x72,\x52);(\x73,\x53);(\ x74,\x54);\
(\x75,\x55);(\x76,\x56);(\x77,\x57);(\x78,\x58);(\ x79,\x59);\
(\x7a,\x5a)

#
# Alphabetic Upper-case to Lower-case Conversion
#
tolower (\x41,\x61);(\x42,\x62);(\x43,\x63);(\x44,\x64);(\ x45,\x65);\
(\x46,\x66);(\x47,\x67);(\x48,\x68);(\x49,\x69);(\ x4a,\x6a);\
(\x4b,\x6b);(\x4c,\x6c);(\x4d,\x6d);(\x4e,\x6e);(\ x4f,\x6f);\
(\x50,\x70);(\x51,\x71);(\x52,\x72);(\x53,\x73);(\ x54,\x74);\
(\x55,\x75);(\x56,\x76);(\x57,\x77);(\x58,\x78);(\ x59,\x79);\
(\x5a,\x7a)

Which implies and is really the only sensible way to do it a look up
array.

they had a character and then a comparison value. So for case
insensitive you had:

In the collating area you have :

LC_COLLATE

collating-symbol <Weight_2>
collating-symbol <Weight_3>
collating-symbol <Weight_4>
collating-symbol <Weight_5>
collating-symbol <Weight_6>
collating-symbol <Weight_7>
collating-symbol <Weight_8>
collating-symbol <Weight_9>
collating-symbol <Weight_10>
collating-symbol <Weight_11>
collating-symbol <Weight_12>
collating-symbol <Weight_13>
....
collating-symbol <Weight_251>
collating-symbol <Weight_252>
collating-symbol <Weight_253>
collating-symbol <Weight_255>
collating-symbol <Weight_256>
collating-symbol <Weight_257>

order_start forward
\x00
<Weight_2>
<Weight_3>
<Weight_4>
<Weight_5>
<Weight_6>
<Weight_7>
<Weight_8>
<Weight_9>
<Weight_10>
<Weight_11>
<Weight_12>
<Weight_13>
<Weight_14>
<Weight_15>
<Weight_16>
<Weight_17>
<Weight_18>
<Weight_19>
<Weight_20>
<Weight_21>
....
<Weight_250>
<Weight_251>
<Weight_252>
<Weight_253>
<Weight_255>
<Weight_256>
<Weight_257>
\x01 <Weight_2>
\x02 <Weight_3>
\x03 <Weight_4>
\x04 <Weight_5>
\x05 <Weight_6>
\x06 <Weight_7>
\x07 <Weight_8>
\x08 <Weight_9>
\x09 <Weight_10>
\x0a <Weight_11>
\x0b <Weight_12>
\x0c <Weight_13>
\x0d <Weight_14>
\x0e <Weight_15>
\x0f <Weight_16>
\x10 <Weight_17>
\x11 <Weight_18>
....
\xf8 <Weight_249>
\xf9 <Weight_250>
\xfa <Weight_251>
\xfb <Weight_252>
\xfc <Weight_253>
\xfd <Weight_255>
\xfe <Weight_256>
\xff <Weight_257>
order_end

END LC_COLLATE

This also implies that they are building some lookup tables for this
task. So, to make your own case insensitive locale I would change the
weights of the upper and lower case letters to the same values for
example:
\x41 <Weight_66> # 'A'
\x61 <Weight_66> #'a'


I just haven't been pushed to the point of trying it.

Andrew Hamm wrote:
> Bozon:
> > I wonder if Informix character sets work in a similar way. I looked
> > into it but I didn't get anywhere in the past. It looks like you could
> > set up a custom character set to do this but then you could have all
> > kinds of issues. There has to be at least 1 expert on this on this forum.

>
> With NCHAR types? I've been wondering myself if I can create a
> case-insensitive English character set and then change our chosen columns
> to NCHAR and NVARCHAR, but the more I read about UNICODE and locales, the
> more my brain starts to hurt.
>


> If you use NCHAR, then all comparisons and matching have to be fed through
> locale-sensitive comparison routines, hence they lose a bit of performance.
>
> If there was a CASE INSENSITIVE clause on the column definitions, then I
> expect a CHAR would have to redirect off at least to a case insensitive
> comparator function, while an NCHAR CASE INSENSITIVE would probably get
> away with a barely noticable degredation due to the other work going on for
> that datatype.
>
> I expect CHAR CI would suffer some minor performance hits because of this
> effect, but it couldn't be major. The work involved adding this attribute
> in the engine source is not something I'm willing to ponder; perhaps it
> would be more cost-effective for Informix to simply release CI versions of
> all their locales rather than dicking with the engine too much.


Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #8 (permalink)  
Old 04-20-2008, 03:58 PM
Art S. Kagel
 
Posts: n/a
Default Re: Case Insensitivity

Andrew Hamm wrote:
> Lucky:
>
>>Check this out:
>>http://dev.mysql.com/doc/refman/5.0/...nsitivity.html

>
>
> Yeah, something of that order, but actually the opposite problem! Customers
> want to search by name, using "john" to find John john and JOHN. Personally
> I would prefer that if a column is declared case insensitive (CI), then
> also the strings all compare equal for unique indexing etc. I'm not just
> chasing case insensitive comparisons, but storage too.


You could in 9.xx/10/11 use the Virtual Index interface to define your own
secondary access method to use when creating the index that will
automagically convert the inputs. The magic functionality would be supplied
in the am_beginscan, am_getnext, and am_rescan functions.

The only problem is it will only work if that index is selected. To get it
to work for post index comparisons and table scans you'd have to define a
primary access method on the table to do the same.

Art S. Kagel

> Bozon:
>
>>I wonder if Informix character sets work in a similar way. I looked
>>into it but I didn't get anywhere in the past. It looks like you could
>>set up a custom character set to do this but then you could have all
>>kinds of issues. There has to be at least 1 expert on this on this forum.

>
>
> With NCHAR types? I've been wondering myself if I can create a
> case-insensitive English character set and then change our chosen columns
> to NCHAR and NVARCHAR, but the more I read about UNICODE and locales, the
> more my brain starts to hurt.
>
> Doug:
>
>>On 9.4, you can do this:

>
>
> Yes, that idea is floating around here, but then all our SQL's will need to
> be modified to use the upshift function in all WHERE clauses. That's a
> frightening thought when you have millions of lines of code to deal with.
> Even if we did this task, the end-users with Crystal Reports etc will
> probably complain that there is no automagic support for them too.
>
> My understanding is that CHAR types use plain old binary comparisons in the
> traditional manner, hence are very fast.
>
> If you use NCHAR, then all comparisons and matching have to be fed through
> locale-sensitive comparison routines, hence they lose a bit of performance.
>
> If there was a CASE INSENSITIVE clause on the column definitions, then I
> expect a CHAR would have to redirect off at least to a case insensitive
> comparator function, while an NCHAR CASE INSENSITIVE would probably get
> away with a barely noticable degredation due to the other work going on for
> that datatype.
>
> I expect CHAR CI would suffer some minor performance hits because of this
> effect, but it couldn't be major. The work involved adding this attribute
> in the engine source is not something I'm willing to ponder; perhaps it
> would be more cost-effective for Informix to simply release CI versions of
> all their locales rather than dicking with the engine too much.
>

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #9 (permalink)  
Old 04-20-2008, 03:58 PM
Andrew Hamm
 
Posts: n/a
Default Re: Case Insensitivity

Art S. Kagel wrote:

> You could in 9.xx/10/11 use the Virtual Index interface to define your own
> secondary access method to use when creating the index that will
> automagically convert the inputs. The magic functionality would be
> supplied in the am_beginscan, am_getnext, and am_rescan functions.
>
> The only problem is it will only work if that index is selected. To get
> it to work for post index comparisons and table scans you'd have to define
> a primary access method on the table to do the same.


Hey bloke, thanks for the reply. How are you keeping?

This all sounds interesting, can you go slightly more into the "primary
access method" bit for someone who's SQL level hasn't advanced much further
than V7, functionality-wise?

I think the ultimate goal, unless anyone can make a good point, is:

1) storage as-is, values go in and out in their mixed case without
transformation.

2) indexes, comparisons and matches all act caseless, a key test being that
john and JOHN cannot both coexist in a unique constraint.

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #10 (permalink)  
Old 04-20-2008, 03:58 PM
TBP
 
Posts: n/a
Default Re: Case Insensitivity

Andrew Hamm wrote:
> Art S. Kagel wrote:
>
>> You could in 9.xx/10/11 use the Virtual Index interface to define your own
>> secondary access method to use when creating the index that will
>> automagically convert the inputs. The magic functionality would be
>> supplied in the am_beginscan, am_getnext, and am_rescan functions.
>>
>> The only problem is it will only work if that index is selected. To get
>> it to work for post index comparisons and table scans you'd have to define
>> a primary access method on the table to do the same.

>
> Hey bloke, thanks for the reply. How are you keeping?
>
> This all sounds interesting, can you go slightly more into the "primary
> access method" bit for someone who's SQL level hasn't advanced much further
> than V7, functionality-wise?
>
> I think the ultimate goal, unless anyone can make a good point, is:
>
> 1) storage as-is, values go in and out in their mixed case without
> transformation.
>
> 2) indexes, comparisons and matches all act caseless, a key test being that
> john and JOHN cannot both coexist in a unique constraint.
>


On a real tangent, but one of the previous things I was involved in was
to make ALL names "soundexed".

Strip out vowels, strip out repeating characters, uppercase everything,
and only store a maximum of 8 characters (4 from surname, 2 from
forename, and two initials from other names) ... (and probably a few
other things).

So

Forename => "Richard"
Surname => "Smith"
ends up as "SMTHRCHRD", and then input is "soundexed" and compared
against the soundexed name index.

This provided a more flexible approach, and some "vague" matching which
was very useful for call centres taking calls from anywhere in the world.
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 08:54 AM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com