vBulletin Search Engine Optimization
| |||||||
| Register | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| HI all, Can someone explain this sorting issue? we are using V8 on Windows, but database *might* have been created with V7 (I can find out if it really matters) I have a table with one column, data VARCHAR(255) . It contains three values : 1729_at, 1773_at and 177_at. If I issue this SQL: SELECT * FROM mytable ORDER BY data I get: 1729_at 177_at 1773_at This seems to imply that in the active collating sequence, whatever it happens to be, the '_' character comes before the '3' character. Am I right? Why is that? In ASCII (and therefore also in UTF-8, I think) the 3 comes before the _. In EBCDIC, the _ comes before the 3, but is this at all possible, that someone created the database using EBCDIC on Windows? If I do the same sort in Java (uses Unicode) I get the right result, ie 1729_at 1773_at 177_at What am I missing? Thanks, Alejandrina |
| |||
| apattin wrote: > HI all, > > Can someone explain this sorting issue? > > we are using V8 on Windows, but database *might* have been created > with V7 (I can find out if it really matters) > > I have a table with one column, data VARCHAR(255) . It contains three > values : 1729_at, 1773_at and 177_at. > > If I issue this SQL: > > SELECT * FROM mytable ORDER BY data > > I get: > > 1729_at > 177_at > 1773_at > > This seems to imply that in the active collating sequence, whatever it > happens to be, the '_' character comes before the '3' character. Am I > right? Why is that? In ASCII (and therefore also in UTF-8, I think) > the 3 comes before the _. In EBCDIC, the _ comes before the 3, but is > this at all possible, that someone created the database using EBCDIC > on Windows? > > If I do the same sort in Java (uses Unicode) I get the right result, > ie > > 1729_at > 1773_at > 177_at > > What am I missing? I'm also interested in why this is the case. I've tested 3 different databases (2 V8, 1 V9) with different codepage etc. All databases created in V8: LUW V8 fixpak 14: Database territory = SE Database code page = 1252 Database code set = IBM-1252 Database country/region code = 46 Database collating sequence = UNIQUE Alternate collating sequence (ALT_COLLATE) = db2 "with T (data) as (values ('1729_at'),('177_at'),('1773_at')) select data, ascii(substr(data,3,1)), ascii(substr(data,4,1)) from T order by data" DATA 2 3 ------- ----------- ----------- 1729_at 50 57 177_at 55 95 1773_at 55 51 LUV V8 fixpak 13: Database territory = SE Database code page = 819 Database code set = ISO8859-1 Database country/region code = 46 Database collating sequence = UNIQUE Alternate collating sequence (ALT_COLLATE) = db2 "with T (data) as (values ('1729_at'),('177_at'),('1773_at')) select data, ascii(substr(data,3,1)), ascii(substr(data,4,1)) from T order by data" DATA 2 3 ------- ----------- ----------- 1729_at 50 57 177_at 55 95 1773_at 55 51 LUW V9 fixpak 0: Database territory = C Database code page = 1208 Database code set = UTF-8 Database country/region code = 1 Database collating sequence = UCA400_NO Alternate collating sequence (ALT_COLLATE) = db2 "with T (data) as (values ('1729_at'),('177_at'),('1773_at')) select data, ascii(substr(data,3,1)), ascii(substr(data,4,1)) from T order by data" DATA 2 3 ------- ----------- ----------- 1729_at 50 57 177_at 55 95 1773_at 55 51 /Lennart |
| |||
| On Wed, 06 Jun 2007 22:01:15 +0200, Lennart <erik.lennart.jonsson@gmail.com> wrote: >apattin wrote: >> HI all, >> >> Can someone explain this sorting issue? >> >> we are using V8 on Windows, but database *might* have been created >> with V7 (I can find out if it really matters) >> >> I have a table with one column, data VARCHAR(255) . It contains three >> values : 1729_at, 1773_at and 177_at. >> >> If I issue this SQL: >> >> SELECT * FROM mytable ORDER BY data >> >> I get: >> >> 1729_at >> 177_at >> 1773_at >> >> This seems to imply that in the active collating sequence, whatever it >> happens to be, the '_' character comes before the '3' character. Am I >> right? Why is that? In ASCII (and therefore also in UTF-8, I think) >> the 3 comes before the _. In EBCDIC, the _ comes before the 3, but is >> this at all possible, that someone created the database using EBCDIC >> on Windows? >> >> If I do the same sort in Java (uses Unicode) I get the right result, >> ie >> >> 1729_at >> 1773_at >> 177_at >> >> What am I missing? > >I'm also interested in why this is the case. I've tested 3 different >databases (2 V8, 1 V9) with different codepage etc. All databases >created in V8: > >LUW V8 fixpak 14: > > Database territory = SE > Database code page = 1252 > Database code set = IBM-1252 > Database country/region code = 46 > Database collating sequence = UNIQUE > Alternate collating sequence (ALT_COLLATE) = > >db2 "with T (data) as (values ('1729_at'),('177_at'),('1773_at')) select >data, ascii(substr(data,3,1)), ascii(substr(data,4,1)) from T order by data" > >DATA 2 3 >------- ----------- ----------- >1729_at 50 57 >177_at 55 95 >1773_at 55 51 > > >LUV V8 fixpak 13: > > Database territory = SE > Database code page = 819 > Database code set = ISO8859-1 > Database country/region code = 46 > Database collating sequence = UNIQUE > Alternate collating sequence (ALT_COLLATE) = > >db2 "with T (data) as (values ('1729_at'),('177_at'),('1773_at')) select >data, ascii(substr(data,3,1)), ascii(substr(data,4,1)) from T order by data" > >DATA 2 3 >------- ----------- ----------- >1729_at 50 57 >177_at 55 95 >1773_at 55 51 > > >LUW V9 fixpak 0: > > Database territory = C > Database code page = 1208 > Database code set = UTF-8 > Database country/region code = 1 > Database collating sequence = UCA400_NO > Alternate collating sequence (ALT_COLLATE) = > >db2 "with T (data) as (values ('1729_at'),('177_at'),('1773_at')) select >data, ascii(substr(data,3,1)), ascii(substr(data,4,1)) from T order by data" > >DATA 2 3 >------- ----------- ----------- >1729_at 50 57 >177_at 55 95 >1773_at 55 51 > > >/Lennart To see the order: DECLARE GLOBAL TEMPORARY TABLE Charmap (Num INT, Digit CHAR(1)) BEGIN ATOMIC DECLARE A INT DEFAULT 0; WHILE A < 257 DO INSERT INTO SESSION.Charmap(Num, Digit) VALUES (A, CHR(A)); SET A = A + 1; END WHILE; END SELECT Num, Digit FROM SESSION.Charmap ORDER BY Num SELECT Num, Digit FROM SESSION.Charmap ORDER BY Digit DROP TABLE SESSION.Charmap The ORDER BY Digit will show the order used. B. |
| |||
| On Wed, 06 Jun 2007 10:38:34 -0700, apattin <apattin@gmail.com> wrote: >HI all, > >Can someone explain this sorting issue? > >we are using V8 on Windows, but database *might* have been created >with V7 (I can find out if it really matters) > >I have a table with one column, data VARCHAR(255) . It contains three >values : 1729_at, 1773_at and 177_at. > >If I issue this SQL: > >SELECT * FROM mytable ORDER BY data > >I get: > >1729_at >177_at >1773_at > >This seems to imply that in the active collating sequence, whatever it >happens to be, the '_' character comes before the '3' character. Am I >right? Why is that? In ASCII (and therefore also in UTF-8, I think) >the 3 comes before the _. In EBCDIC, the _ comes before the 3, but is >this at all possible, that someone created the database using EBCDIC >on Windows? > >If I do the same sort in Java (uses Unicode) I get the right result, >ie > >1729_at >1773_at >177_at > >What am I missing? > >Thanks, > >Alejandrina If you want to ORDER BY the ASCII values, use ORDER BY ASCII(data); Otherwise, a different order is used. B. |
| |||
| Brian Tkatch wrote: > On Wed, 06 Jun 2007 10:38:34 -0700, apattin <apattin@gmail.com> wrote: > >>HI all, >> >>Can someone explain this sorting issue? >> >>we are using V8 on Windows, but database *might* have been created >>with V7 (I can find out if it really matters) >> >>I have a table with one column, data VARCHAR(255) . It contains three >>values : 1729_at, 1773_at and 177_at. >> >>If I issue this SQL: >> >>SELECT * FROM mytable ORDER BY data >> >>I get: >> >>1729_at >>177_at >>1773_at >> >>This seems to imply that in the active collating sequence, whatever it >>happens to be, the '_' character comes before the '3' character. Am I >>right? Why is that? In ASCII (and therefore also in UTF-8, I think) >>the 3 comes before the _. In EBCDIC, the _ comes before the 3, but is >>this at all possible, that someone created the database using EBCDIC >>on Windows? >> >>If I do the same sort in Java (uses Unicode) I get the right result, >>ie >> >>1729_at >>1773_at >>177_at >> >>What am I missing? >> >>Thanks, >> >>Alejandrina > > If you want to ORDER BY the ASCII values, use ORDER BY ASCII(data); > > Otherwise, a different order is used. > I get the same sortorder using order by ascii(data) db2 "with T (data) as (values ('1729_at'),('177_at'),('1773_at')) select * from T order by ascii(data)" DATA ------- 1729_at 177_at 1773_at |
| |||
| Brian Tkatch wrote: [...] > To see the order: > > DECLARE GLOBAL TEMPORARY TABLE Charmap (Num INT, Digit CHAR(1)) > > BEGIN ATOMIC > DECLARE A INT DEFAULT 0; > WHILE A < 257 DO > INSERT INTO > SESSION.Charmap(Num, Digit) > VALUES (A, CHR(A)); > SET A = A + 1; > END WHILE; > END > > > SELECT Num, Digit FROM SESSION.Charmap ORDER BY Num > SELECT Num, Digit FROM SESSION.Charmap ORDER BY Digit > > DROP TABLE SESSION.Charmap > > The ORDER BY Digit will show the order used. > > B. Thanks, another variant without a session table: with charmap(num, digit) as (values (30,chr(30)) union all select num+1, chr(num+1) from charmap where num < 100) select * from charmap order by digit I used a smaller interval to avoid scrambling of the screen |
| |||
| On Jun 6, 5:55 pm, Brian Tkatch <N/A> wrote: > On Wed, 06 Jun 2007 22:01:15 +0200, Lennart > > > > <erik.lennart.jons...@gmail.com> wrote: > >apattin wrote: > >> HI all, > > >> Can someone explain this sorting issue? > > >> we are using V8 on Windows, but database *might* have been created > >> with V7 (I can find out if it really matters) > > >> I have a table with one column, data VARCHAR(255) . It contains three > >> values : 1729_at, 1773_at and 177_at. > > >> If I issue this SQL: > > >> SELECT * FROM mytable ORDER BY data > > >> I get: > > >> 1729_at > >> 177_at > >> 1773_at > > >> This seems to imply that in the active collating sequence, whatever it > >> happens to be, the '_' character comes before the '3' character. Am I > >> right? Why is that? In ASCII (and therefore also in UTF-8, I think) > >> the 3 comes before the _. In EBCDIC, the _ comes before the 3, but is > >> this at all possible, that someone created the database using EBCDIC > >> on Windows? > > >> If I do the same sort in Java (uses Unicode) I get the right result, > >> ie > > >> 1729_at > >> 1773_at > >> 177_at > > >> What am I missing? > > >I'm also interested in why this is the case. I've tested 3 different > >databases (2 V8, 1 V9) with different codepage etc. All databases > >created in V8: > > >LUW V8 fixpak 14: > > > Database territory = SE > > Database code page = 1252 > > Database code set = IBM-1252 > > Database country/region code = 46 > > Database collating sequence = UNIQUE > > Alternate collating sequence (ALT_COLLATE) = > > >db2 "with T (data) as (values ('1729_at'),('177_at'),('1773_at')) select > >data, ascii(substr(data,3,1)), ascii(substr(data,4,1)) from T order by data" > > >DATA 2 3 > >------- ----------- ----------- > >1729_at 50 57 > >177_at 55 95 > >1773_at 55 51 > > >LUV V8 fixpak 13: > > > Database territory = SE > > Database code page = 819 > > Database code set = ISO8859-1 > > Database country/region code = 46 > > Database collating sequence = UNIQUE > > Alternate collating sequence (ALT_COLLATE) = > > >db2 "with T (data) as (values ('1729_at'),('177_at'),('1773_at')) select > >data, ascii(substr(data,3,1)), ascii(substr(data,4,1)) from T order by data" > > >DATA 2 3 > >------- ----------- ----------- > >1729_at 50 57 > >177_at 55 95 > >1773_at 55 51 > > >LUW V9 fixpak 0: > > > Database territory = C > > Database code page = 1208 > > Database code set = UTF-8 > > Database country/region code = 1 > > Database collating sequence = UCA400_NO > > Alternate collating sequence (ALT_COLLATE) = > > >db2 "with T (data) as (values ('1729_at'),('177_at'),('1773_at')) select > >data, ascii(substr(data,3,1)), ascii(substr(data,4,1)) from T order by data" > > >DATA 2 3 > >------- ----------- ----------- > >1729_at 50 57 > >177_at 55 95 > >1773_at 55 51 > > >/Lennart > > To see the order: > > DECLARE GLOBAL TEMPORARY TABLE Charmap (Num INT, Digit CHAR(1)) > > BEGIN ATOMIC > DECLARE A INT DEFAULT 0; > WHILE A < 257 DO > INSERT INTO > SESSION.Charmap(Num, Digit) > VALUES (A, CHR(A)); > SET A = A + 1; > END WHILE; > END > > SELECT Num, Digit FROM SESSION.Charmap ORDER BY Num > SELECT Num, Digit FROM SESSION.Charmap ORDER BY Digit > > DROP TABLE SESSION.Charmap > > The ORDER BY Digit will show the order used. > > B. Thanks, Brian. I guess I want to know is why is this specific collating sequence being used, as ASCII or Unicode appear to be different. Alejandrina |
| |||
| Collating sequence is defined at the time of creating DATABASE and can't change later. CREATE DATABASE ..... COLLATE USING SYSTEM | COMPATIBILITY | IDENTITY | IDENTITY_16BIT | UCA400_NO | UCA400_LTH | NLSCHAR SYSTEM (Default) Collating sequence based on the database territory. This option cannot be specified when creating a Unicode database. IDENTITY Identity collating sequence, in which strings are compared byte for byte. ------------------- Commands Entered -------------------- SELECT data, HEX(data) FROM (VALUES '1729_at', '1773_at', '177_at' ) AS mytable(data) ORDER BY HEX(data); -------------------------------------------------------- DATA 2 ------- -------------- 1729_at 313732395F6174 1773_at 313737335F6174 177_at 3137375F6174 3 record(s) selected. |
| |||
| On Thu, 07 Jun 2007 00:16:36 +0200, Lennart <erik.lennart.jonsson@gmail.com> wrote: >Brian Tkatch wrote: >[...] >> To see the order: >> >> DECLARE GLOBAL TEMPORARY TABLE Charmap (Num INT, Digit CHAR(1)) >> >> BEGIN ATOMIC >> DECLARE A INT DEFAULT 0; >> WHILE A < 257 DO >> INSERT INTO >> SESSION.Charmap(Num, Digit) >> VALUES (A, CHR(A)); >> SET A = A + 1; >> END WHILE; >> END >> >> >> SELECT Num, Digit FROM SESSION.Charmap ORDER BY Num >> SELECT Num, Digit FROM SESSION.Charmap ORDER BY Digit >> >> DROP TABLE SESSION.Charmap >> >> The ORDER BY Digit will show the order used. >> >> B. > >Thanks, another variant without a session table: > >with charmap(num, digit) as (values (30,chr(30)) union all select num+1, >chr(num+1) from charmap where num < 100) select * from charmap order by >digit Silly me. I *still* haven't integrated recursive WITHs into my noggin. Thanx for the lesson! > >I used a smaller interval to avoid scrambling of the screen Considering the order changes a lot, showing every character may be advantageous. B. B. |
| ||||
| On Thu, 07 Jun 2007 00:08:09 +0200, Lennart <erik.lennart.jonsson@gmail.com> wrote: >Brian Tkatch wrote: >> On Wed, 06 Jun 2007 10:38:34 -0700, apattin <apattin@gmail.com> wrote: >> >>>HI all, >>> >>>Can someone explain this sorting issue? >>> >>>we are using V8 on Windows, but database *might* have been created >>>with V7 (I can find out if it really matters) >>> >>>I have a table with one column, data VARCHAR(255) . It contains three >>>values : 1729_at, 1773_at and 177_at. >>> >>>If I issue this SQL: >>> >>>SELECT * FROM mytable ORDER BY data >>> >>>I get: >>> >>>1729_at >>>177_at >>>1773_at >>> >>>This seems to imply that in the active collating sequence, whatever it >>>happens to be, the '_' character comes before the '3' character. Am I >>>right? Why is that? In ASCII (and therefore also in UTF-8, I think) >>>the 3 comes before the _. In EBCDIC, the _ comes before the 3, but is >>>this at all possible, that someone created the database using EBCDIC >>>on Windows? >>> >>>If I do the same sort in Java (uses Unicode) I get the right result, >>>ie >>> >>>1729_at >>>1773_at >>>177_at >>> >>>What am I missing? >>> >>>Thanks, >>> >>>Alejandrina >> >> If you want to ORDER BY the ASCII values, use ORDER BY ASCII(data); >> >> Otherwise, a different order is used. >> > >I get the same sortorder using order by ascii(data) > >db2 "with T (data) as (values ('1729_at'),('177_at'),('1773_at')) select >* from T order by ascii(data)" > >DATA >------- >1729_at >177_at >1773_at My mistake. My example in the other thread used one character, and in that case ASCII will obviously work. With more than one char, ASCII returns only the first char's value, as the docs say "Returns the ASCII code value of the leftmost character of the argument as an integer." Which is pretty useless here. Tonkuma's reply uses HEX(), which does an ASCII on all the chars, which is what we want. Thanx for the catch. B. |