This is a discussion on Help: Join a "glossary" table and a "data" table referring to termsin the glossary within the Oracle Database forums, part of the Database Server Software category; --> I have a "glossary" table that holds a unique identifier (key) for each "text" string used elsewhere in the ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| I have a "glossary" table that holds a unique identifier (key) for each "text" string used elsewhere in the app. The fields in the data tables that should hold the text are then populated with the "key" instead of the text - helps clients substitute their own "glossary" for the default one, I suppose. In some cases, there may be more than one key in a given field (separated by commas). And the objective here is to extract a list of all the keys used in a given data table. I've tried to describe the scenario with a simplified example below: Given the following data: -- --------------------- -- This is my "glossary" create table vt_del_lib (key varchar2(100), text varchar2(100)); insert into vt_del_lib values ('010001', 'One'); insert into vt_del_lib values ('010002', 'Two'); insert into vt_del_lib values ('010003', 'Three'); insert into vt_del_lib values ('010004', 'Four'); insert into vt_del_lib values ('010005', 'Five'); insert into vt_del_lib values ('010006', 'Six'); insert into vt_del_lib values ('010007', 'Seven'); insert into vt_del_lib values ('010008', 'Eight'); insert into vt_del_lib values ('010009', 'Nine'); insert into vt_del_lib values ('010010', 'Ten'); commit; -- This is my "data" table create table vt_del_keys (val1 varchar2(1000)); insert into vt_del_keys values('010001'); insert into vt_del_keys values('010004,010005,010008,010010'); insert into vt_del_keys values('010005,010008'); commit; -- --------------------- Here's the query I've finally decided to use. I know it works, but not how - which version is better - is there a better way - etc. It gets pretty slow as the data table grows and I'm having trouble trying to improve its performance. -- --------------------- select distinct l.key from vt_del_lib l join vt_del_keys d on ','||d.val1||',' like '%,'||l.key||',%'; -- OR select distinct l.key from vt_del_keys d join vt_del_lib l on ','||d.val1||',' like '%,'||l.key||',%'; -- --------------------- Sorry for the cryptic subject line - I couldn't classify the problem any better TIA! Vamsi. P.S.: Some sample output... SQL> --Get used keys SQL> select l.key 2 from 3 vt_del_lib l 4 join vt_del_keys d on ','||d.val1||',' like '%,'||l.key||',%'; KEY -------------------------------------------------------------------------------- 010001 010004 010005 010008 010010 010005 010008 7 rows selected Executed in 0.375 seconds SQL> select l.key 2 from 3 vt_del_keys d 4 join vt_del_lib l on ','||d.val1||',' like '%,'||l.key||',%'; KEY -------------------------------------------------------------------------------- 010001 010004 010005 010005 010008 010008 010010 7 rows selected Executed in 0.375 seconds SQL> --Get distinct used kyes SQL> select distinct l.key 2 from 3 vt_del_lib l 4 join vt_del_keys d on ','||d.val1||',' like '%,'||l.key||',%'; KEY -------------------------------------------------------------------------------- 010001 010004 010005 010008 010010 |
| |||
| On Feb 5, 6:24*pm, Vamsidhar <tvamsid...@gmail.com> wrote: > I have a "glossary" table that holds a unique identifier (key) for > each "text" string used elsewhere in the app. > The fields in the data tables that should hold the text are then > populated with the "key" instead of the text - helps clients > substitute their own "glossary" for the default one, I suppose. In > some cases, there may be more than one key in a given field (separated > by commas). And the objective here is to extract a list of all the > keys used in a given data table. > > I've tried to describe the scenario with a simplified example below: > > Given the following data: > > -- --------------------- > -- This is my "glossary" > create table vt_del_lib (key varchar2(100), text varchar2(100)); > > insert into vt_del_lib values ('010001', 'One'); > insert into vt_del_lib values ('010002', 'Two'); > insert into vt_del_lib values ('010003', 'Three'); > insert into vt_del_lib values ('010004', 'Four'); > insert into vt_del_lib values ('010005', 'Five'); > insert into vt_del_lib values ('010006', 'Six'); > insert into vt_del_lib values ('010007', 'Seven'); > insert into vt_del_lib values ('010008', 'Eight'); > insert into vt_del_lib values ('010009', 'Nine'); > insert into vt_del_lib values ('010010', 'Ten'); > commit; > > -- This is my "data" table > create table vt_del_keys (val1 varchar2(1000)); > > insert into vt_del_keys values('010001'); > insert into vt_del_keys values('010004,010005,010008,010010'); > insert into vt_del_keys values('010005,010008'); > commit; > -- --------------------- > > Here's the query I've finally decided to use. I know it works, but not > how > - which version is better - is there a better way - etc. > > It gets pretty slow as the data table grows and I'm having trouble > trying to improve its performance. > > -- --------------------- > select distinct l.key > from > * * * * vt_del_lib l > * * * * join vt_del_keys d on ','||d.val1||',' like '%,'||l.key||',%'; > > -- OR > > select distinct l.key > from > * * * * vt_del_keys d > * * * * join vt_del_lib l on ','||d.val1||',' like '%,'||l.key||',%'; > > -- --------------------- > > Sorry for the cryptic subject line - I couldn't classify the problem > any better > > TIA! > Vamsi. > > P.S.: Some sample output... > > SQL> --Get used keys > SQL> select l.key > * 2 *from > * 3 * * vt_del_lib l > * 4 * * join vt_del_keys d on ','||d.val1||',' like '%,'||l.key||',%'; > > KEY > ---------------------------------------------------------------------------*----- > 010001 > 010004 > 010005 > 010008 > 010010 > 010005 > 010008 > > 7 rows selected > > Executed in 0.375 seconds > SQL> select l.key > * 2 *from > * 3 * * vt_del_keys d > * 4 * * join vt_del_lib l on ','||d.val1||',' like '%,'||l.key||',%'; > > KEY > ---------------------------------------------------------------------------*----- > 010001 > 010004 > 010005 > 010005 > 010008 > 010008 > 010010 > > 7 rows selected > > Executed in 0.375 seconds > SQL> --Get distinct used kyes > SQL> select distinct l.key > * 2 *from > * 3 * * vt_del_lib l > * 4 * * join vt_del_keys d on ','||d.val1||',' like '%,'||l.key||',%'; > > KEY > ---------------------------------------------------------------------------*----- > 010001 > 010004 > 010005 > 010008 > 010010 I do not have a system in front of me that I can test with but if you run explain plan on these queries I expect that both tables are being read via full table scans since you concatenate a comma before and after you key columns rendering any index useless unless you have built function based indexes on both set of keys. Because the run time is exactly equal I suspect Oracle is using the same plan for both orderings of the table name in the query. Is the goal to find the key value or to get the descriptions for the keys? -- Mark D Powell -- |
| |||
| On Feb 5, 7:15*pm, Mark D Powell <Mark.Pow...@eds.com> wrote: > On Feb 5, 6:24*pm, Vamsidhar <tvamsid...@gmail.com> wrote: > > > > > > > I have a "glossary" table that holds a unique identifier (key) for > > each "text" string used elsewhere in the app. > > The fields in the data tables that should hold the text are then > > populated with the "key" instead of the text - helps clients > > substitute their own "glossary" for the default one, I suppose. In > > some cases, there may be more than one key in a given field (separated > > by commas). And the objective here is to extract a list of all the > > keys used in a given data table. > > > I've tried to describe the scenario with a simplified example below: > > > Given the following data: > > > -- --------------------- > > -- This is my "glossary" > > create table vt_del_lib (key varchar2(100), text varchar2(100)); > > > insert into vt_del_lib values ('010001', 'One'); > > insert into vt_del_lib values ('010002', 'Two'); > > insert into vt_del_lib values ('010003', 'Three'); > > insert into vt_del_lib values ('010004', 'Four'); > > insert into vt_del_lib values ('010005', 'Five'); > > insert into vt_del_lib values ('010006', 'Six'); > > insert into vt_del_lib values ('010007', 'Seven'); > > insert into vt_del_lib values ('010008', 'Eight'); > > insert into vt_del_lib values ('010009', 'Nine'); > > insert into vt_del_lib values ('010010', 'Ten'); > > commit; > > > -- This is my "data" table > > create table vt_del_keys (val1 varchar2(1000)); > > > insert into vt_del_keys values('010001'); > > insert into vt_del_keys values('010004,010005,010008,010010'); > > insert into vt_del_keys values('010005,010008'); > > commit; > > -- --------------------- > > > Here's the query I've finally decided to use. I know it works, but not > > how > > - which version is better - is there a better way - etc. > > > It gets pretty slow as the data table grows and I'm having trouble > > trying to improve its performance. > > > -- --------------------- > > select distinct l.key > > from > > * * * * vt_del_lib l > > * * * * join vt_del_keys d on ','||d.val1||',' like '%,'||l.key||',%'; > > > -- OR > > > select distinct l.key > > from > > * * * * vt_del_keys d > > * * * * join vt_del_lib l on ','||d.val1||',' like '%,'||l.key||',%'; > > > -- --------------------- > > > Sorry for the cryptic subject line - I couldn't classify the problem > > any better > > > TIA! > > Vamsi. > > > P.S.: Some sample output... > > > SQL> --Get used keys > > SQL> select l.key > > * 2 *from > > * 3 * * vt_del_lib l > > * 4 * * join vt_del_keys d on ','||d.val1||',' like '%,'||l.key||',%'; > > > KEY > > ---------------------------------------------------------------------------**----- > > 010001 > > 010004 > > 010005 > > 010008 > > 010010 > > 010005 > > 010008 > > > 7 rows selected > > > Executed in 0.375 seconds > > SQL> select l.key > > * 2 *from > > * 3 * * vt_del_keys d > > * 4 * * join vt_del_lib l on ','||d.val1||',' like '%,'||l.key||',%'; > > > KEY > > ---------------------------------------------------------------------------**----- > > 010001 > > 010004 > > 010005 > > 010005 > > 010008 > > 010008 > > 010010 > > > 7 rows selected > > > Executed in 0.375 seconds > > SQL> --Get distinct used kyes > > SQL> select distinct l.key > > * 2 *from > > * 3 * * vt_del_lib l > > * 4 * * join vt_del_keys d on ','||d.val1||',' like '%,'||l.key||',%'; > > > KEY > > ---------------------------------------------------------------------------**----- > > 010001 > > 010004 > > 010005 > > 010008 > > 010010 > > I do not have a system in front of me that I can test with but if you > run explain plan on these queries I expect that both tables are being > read via full table scans since you concatenate a comma before and > after you key columns rendering any index useless unless you have > built function based indexes on both set of keys. > > Because the run time is exactly equal I suspect Oracle is using the > same plan for both orderings of the table name in the query. > > Is the goal to find the key value or to get the descriptions for the > keys? > > -- Mark D Powell --- Hide quoted text - > > - Show quoted text - Thanks Mark D Powell The goal is to extract the keys values. Yes, the explain plan shows that both tables undergo full table scan. I didn't think of creating an index on the data table because I thought the LIKE clause doesn't allow for an index to be used. Can you suggest what the function-based index should be like on the data table? - or the glossary table, for that matter Thanks. Vamsi. |
| |||
| On Feb 5, 8:13*pm, Vamsidhar <tvamsid...@gmail.com> wrote: > On Feb 5, 7:15*pm, Mark D Powell <Mark.Pow...@eds.com> wrote: > > On Feb 5, 6:24*pm, Vamsidhar <tvamsid...@gmail.com> wrote: > > > > I have a "glossary" table that holds a unique identifier (key) for > > > each "text" string used elsewhere in the app. > > > The fields in the data tables that should hold the text are then > > > populated with the "key" instead of the text - helps clients > > > substitute their own "glossary" for the default one, I suppose. In > > > some cases, there may be more than one key in a given field (separated > > > by commas). And the objective here is to extract a list of all the > > > keys used in a given data table. > > > > I've tried to describe the scenario with a simplified example below: > > > > Given the following data: > > > > -- --------------------- > > > -- This is my "glossary" > > > create table vt_del_lib (key varchar2(100), text varchar2(100)); > > > > insert into vt_del_lib values ('010001', 'One'); > > > insert into vt_del_lib values ('010002', 'Two'); > > > insert into vt_del_lib values ('010003', 'Three'); > > > insert into vt_del_lib values ('010004', 'Four'); > > > insert into vt_del_lib values ('010005', 'Five'); > > > insert into vt_del_lib values ('010006', 'Six'); > > > insert into vt_del_lib values ('010007', 'Seven'); > > > insert into vt_del_lib values ('010008', 'Eight'); > > > insert into vt_del_lib values ('010009', 'Nine'); > > > insert into vt_del_lib values ('010010', 'Ten'); > > > commit; > > > > -- This is my "data" table > > > create table vt_del_keys (val1 varchar2(1000)); > > > > insert into vt_del_keys values('010001'); > > > insert into vt_del_keys values('010004,010005,010008,010010'); > > > insert into vt_del_keys values('010005,010008'); > > > commit; > > > -- --------------------- > > > > Here's the query I've finally decided to use. I know it works, but not > > > how > > > - which version is better - is there a better way - etc. > > > > It gets pretty slow as the data table grows and I'm having trouble > > > trying to improve its performance. > > > > -- --------------------- > > > select distinct l.key > > > from > > > * * * * vt_del_lib l > > > * * * * join vt_del_keys d on ','||d.val1||',' like '%,'||l.key||',%'; > > > > -- OR > > > > select distinct l.key > > > from > > > * * * * vt_del_keys d > > > * * * * join vt_del_lib l on ','||d.val1||',' like '%,'||l.key||',%'; > > > > -- --------------------- > > > > Sorry for the cryptic subject line - I couldn't classify the problem > > > any better > > > > TIA! > > > Vamsi. > > > > P.S.: Some sample output... > > > > SQL> --Get used keys > > > SQL> select l.key > > > * 2 *from > > > * 3 * * vt_del_lib l > > > * 4 * * join vt_del_keys d on ','||d.val1||',' like '%,'||l.key||',%'; > > > > KEY > > > ---------------------------------------------------------------------------***----- > > > 010001 > > > 010004 > > > 010005 > > > 010008 > > > 010010 > > > 010005 > > > 010008 > > > > 7 rows selected > > > > Executed in 0.375 seconds > > > SQL> select l.key > > > * 2 *from > > > * 3 * * vt_del_keys d > > > * 4 * * join vt_del_lib l on ','||d.val1||',' like '%,'||l.key||',%'; > > > > KEY > > > ---------------------------------------------------------------------------***----- > > > 010001 > > > 010004 > > > 010005 > > > 010005 > > > 010008 > > > 010008 > > > 010010 > > > > 7 rows selected > > > > Executed in 0.375 seconds > > > SQL> --Get distinct used kyes > > > SQL> select distinct l.key > > > * 2 *from > > > * 3 * * vt_del_lib l > > > * 4 * * join vt_del_keys d on ','||d.val1||',' like '%,'||l.key||',%'; > > > > KEY > > > ---------------------------------------------------------------------------***----- > > > 010001 > > > 010004 > > > 010005 > > > 010008 > > > 010010 > > > I do not have a system in front of me that I can test with but if you > > run explain plan on these queries I expect that both tables are being > > read via full table scans since you concatenate a comma before and > > after you key columns rendering any index useless unless you have > > built function based indexes on both set of keys. > > > Because the run time is exactly equal I suspect Oracle is using the > > same plan for both orderings of the table name in the query. > > > Is the goal to find the key value or to get the descriptions for the > > keys? > > > -- Mark D Powell --- Hide quoted text - > > > - Show quoted text - > > Thanks Mark D Powell > > The goal is to extract the keys values. > > Yes, the explain plan shows that both tables undergo full table scan. > I didn't think of creating an index on the data table because I > thought the LIKE clause doesn't allow for an index to be used. Can you > suggest what the function-based index should be like on the data > table? - or the glossary table, for that matter > > Thanks. > Vamsi Thanks for posting the DDL and DML to create and populate the tables. Try a bit of experimentation working with some of Mark's suggestions. In addition to the rows that you inserted into the tables, I excuted the following to add an additional 900,000 rows into one of the tables: NSERT INTO VT_DEL_LIB SELECT TO_CHAR(ROWNUM+10010,'000000'), 'N'||TO_CHAR(ROWNUM+10010,'000000') FROM DUAL CONNECT BY LEVEL<=900000; COMMIT; Now, let's gather stats on the tables and any indexes: EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAM E=>'VT_DEL_LIB',CASCADE=>TRUE); EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAM E=>'VT_DEL_KEYS',CASCADE=>TRUE); select distinct l.key from vt_del_lib l join vt_del_keys d on ','||d.val1||',' like '%,'||l.key||', %'; The DBMS Xplan: ------------------------------------------------------------------------------------------------------------------------ | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem | ------------------------------------------------------------------------------------------------------------------------ | 1 | HASH UNIQUE | | 1 | 2 | 5 | 00:00:01.35 | 8217 | 968K| 968K| 740K (0)| | 2 | NESTED LOOPS | | 1 | 2 | 7 | 00:00:01.35 | 8217 | | | | | 3 | TABLE ACCESS FULL| VT_DEL_KEYS | 1 | 3 | 3 | 00:00:00.10 | 3 | | | | |* 4 | TABLE ACCESS FULL| VT_DEL_LIB | 3 | 1 | 7 | 00:00:01.25 | 8214 | | | | ------------------------------------------------------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 4 - filter(','||"D"."VAL1"||',' LIKE '%,'||"L"."KEY"||',%') The above completed in 1.35 seconds. Now, an experimentation by creating a unique index on the VT_DEL_LIB table, modifying the KEY column so that it cannot be NULL, and gathering table and index stats: CREATE UNIQUE INDEX IND_VT_DEL_LIB ON VT_DEL_LIB (KEY); ALTER TABLE VT_DEL_LIB MODIFY(KEY NOT NULL); EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAM E=>'VT_DEL_LIB',CASCADE=>TRUE); select distinct l.key from vt_del_lib l join vt_del_keys d on ','||d.val1||',' like '%,'||l.key||', %'; --------------------------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | A- Rows | A-Time | Buffers | Reads | OMem | 1Mem | Used-Mem | --------------------------------------------------------------------------------------------------------------------------------------- | 1 | HASH UNIQUE | | 1 | 135K| 5 |00:00:01.40 | 6816 | 3 | 968K| 968K| 742K (0)| | 2 | NESTED LOOPS | | 1 | 135K| 7 |00:00:01.40 | 6816 | 3 | | | | | 3 | TABLE ACCESS FULL | VT_DEL_KEYS | 1 | 3 | 3 |00:00:00.01 | 3 | 0 | | | | |* 4 | INDEX FAST FULL SCAN| IND_VT_DEL_LIB | 3 | 45001 | 7 |00:00:01.40 | 6813 | 3 | | | | --------------------------------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 4 - filter(','||"D"."VAL1"||',' LIKE '%,'||"L"."KEY"||',%') The query is now using the index that was just created, but the execution time increased by 0.05 seconds. SELECT DISTINCT L.KEY FROM VT_DEL_LIB L, VT_DEL_KEYS D WHERE D.VAL1 LIKE '%'||L.KEY||'%'; ------------------------------------------------------------------------------------------------------------------------------ | Id | Operation | Name | Starts | E-Rows | A- Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem | ------------------------------------------------------------------------------------------------------------------------------ | 1 | HASH UNIQUE | | 1 | 135K| 5 |00:00:00.81 | 6816 | 968K| 968K| 713K (0)| | 2 | NESTED LOOPS | | 1 | 135K| 7 |00:00:00.81 | 6816 | | | | | 3 | TABLE ACCESS FULL | VT_DEL_KEYS | 1 | 3 | 3 |00:00:00.01 | 3 | | | | |* 4 | INDEX FAST FULL SCAN| IND_VT_DEL_LIB | 3 | 45001 | 7 |00:00:00.80 | 6813 | | | | ------------------------------------------------------------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 4 - filter("D"."VAL1" LIKE '%'||"L"."KEY"||'%') The above completed in 0.81 seconds, which is a nice improvement, but still not great. Now, let's experiment with regular expressions (an adaptation of a previous solution by Maxim Demenko for splitting a sentence into words): SELECT DISTINCT REGEXP_SUBSTR(VAL1,'\w+',1,LEVEL) KEY FROM VT_DEL_KEYS CONNECT BY REGEXP_SUBSTR(VAL1,'\w+',1,LEVEL) IS NOT NULL; KEY ------ 010005 010008 010010 010004 010001 If we can take the above results and drive back into into the VT_DEL_LIB table using the index that we previously created, we obtain an efficient plan: SELECT DISTINCT L.KEY FROM (SELECT DISTINCT REGEXP_SUBSTR(VAL1,'\w+',1,LEVEL) KEY FROM VT_DEL_KEYS CONNECT BY REGEXP_SUBSTR(VAL1,'\w+',1,LEVEL) IS NOT NULL) D, VT_DEL_LIB L WHERE L.KEY=D.KEY; ---------------------------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E- Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem | ---------------------------------------------------------------------------------------------------------------------------------------- | 1 | HASH UNIQUE | | 1 | 3 | 5 |00:00:00.01 | 15 | 968K| 968K| 717K (0)| | 2 | NESTED LOOPS | | 1 | 3 | 5 |00:00:00.01 | 15 | | | | | 3 | VIEW | | 1 | 3 | 5 |00:00:00.01 | 3 | | | | | 4 | HASH UNIQUE | | 1 | 3 | 5 |00:00:00.01 | 3 | 936K| 936K| 738K (0)| | 5 | CONNECT BY WITHOUT FILTERING| | 1 | | 21 |00:00:00.01 | 3 | | | | | 6 | TABLE ACCESS FULL | VT_DEL_KEYS | 1 | 3 | 3 |00:00:00.01 | 3 | | | | |* 7 | INDEX UNIQUE SCAN | IND_VT_DEL_LIB | 5 | 1 | 5 |00:00:00.01 | 12 | | | | ---------------------------------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 7 - access("L"."KEY"="D"."KEY") The above completed in 0.01 seconds. As the data size increases, it may be necessary to add an /*+ ORDERED */ hint to make certain that the plan does not change significantly. Experiment to see what happens as the data size increases. Charles Hooper IT Manager/Oracle DBA K&M Machine-Fabricating, Inc. |
| |||
| On Feb 5, 10:11*pm, Charles Hooper <hooperc2...@yahoo.com> wrote: > On Feb 5, 8:13*pm, Vamsidhar <tvamsid...@gmail.com> wrote: > > > On Feb 5, 7:15*pm, Mark D Powell <Mark.Pow...@eds.com> wrote: > > > On Feb 5, 6:24*pm, Vamsidhar <tvamsid...@gmail.com> wrote: > > > > > I have a "glossary" table that holds a unique identifier (key) for > > > > each "text" string used elsewhere in the app. > > > > The fields in the data tables that should hold the text are then > > > > populated with the "key" instead of the text - helps clients > > > > substitute their own "glossary" for the default one, I suppose. In > > > > some cases, there may be more than one key in a given field (separated > > > > by commas). And the objective here is to extract a list of all the > > > > keys used in a given data table. > > > > > I've tried to describe the scenario with a simplified example below: > > > > > Given the following data: > > > > > -- --------------------- > > > > -- This is my "glossary" > > > > create table vt_del_lib (key varchar2(100), text varchar2(100)); > > > > > insert into vt_del_lib values ('010001', 'One'); > > > > insert into vt_del_lib values ('010002', 'Two'); > > > > insert into vt_del_lib values ('010003', 'Three'); > > > > insert into vt_del_lib values ('010004', 'Four'); > > > > insert into vt_del_lib values ('010005', 'Five'); > > > > insert into vt_del_lib values ('010006', 'Six'); > > > > insert into vt_del_lib values ('010007', 'Seven'); > > > > insert into vt_del_lib values ('010008', 'Eight'); > > > > insert into vt_del_lib values ('010009', 'Nine'); > > > > insert into vt_del_lib values ('010010', 'Ten'); > > > > commit; > > > > > -- This is my "data" table > > > > create table vt_del_keys (val1 varchar2(1000)); > > > > > insert into vt_del_keys values('010001'); > > > > insert into vt_del_keys values('010004,010005,010008,010010'); > > > > insert into vt_del_keys values('010005,010008'); > > > > commit; > > > > -- --------------------- > > > > > Here's the query I've finally decided to use. I know it works, but not > > > > how > > > > - which version is better - is there a better way - etc. > > > > > It gets pretty slow as the data table grows and I'm having trouble > > > > trying to improve its performance. > > > > > -- --------------------- > > > > select distinct l.key > > > > from > > > > * * * * vt_del_lib l > > > > * * * * join vt_del_keys d on ','||d.val1||',' like '%,'||l.key||',%'; > > > > > -- OR > > > > > select distinct l.key > > > > from > > > > * * * * vt_del_keys d > > > > * * * * join vt_del_lib l on ','||d.val1||',' like '%,'||l.key||',%'; > > > > > -- --------------------- > > > > > Sorry for the cryptic subject line - I couldn't classify the problem > > > > any better > > > > > TIA! > > > > Vamsi. > > > > > P.S.: Some sample output... > > > > > SQL> --Get used keys > > > > SQL> select l.key > > > > * 2 *from > > > > * 3 * * vt_del_lib l > > > > * 4 * * join vt_del_keys d on ','||d.val1||',' like '%,'||l.key||',%'; > > > > > KEY > > > > ---------------------------------------------------------------------------****----- > > > > 010001 > > > > 010004 > > > > 010005 > > > > 010008 > > > > 010010 > > > > 010005 > > > > 010008 > > > > > 7 rows selected > > > > > Executed in 0.375 seconds > > > > SQL> select l.key > > > > * 2 *from > > > > * 3 * * vt_del_keys d > > > > * 4 * * join vt_del_lib l on ','||d.val1||',' like '%,'||l.key||',%'; > > > > > KEY > > > > ---------------------------------------------------------------------------****----- > > > > 010001 > > > > 010004 > > > > 010005 > > > > 010005 > > > > 010008 > > > > 010008 > > > > 010010 > > > > > 7 rows selected > > > > > Executed in 0.375 seconds > > > > SQL> --Get distinct used kyes > > > > SQL> select distinct l.key > > > > * 2 *from > > > > * 3 * * vt_del_lib l > > > > * 4 * * join vt_del_keys d on ','||d.val1||',' like '%,'||l.key||',%'; > > > > > KEY > > > > ---------------------------------------------------------------------------****----- > > > > 010001 > > > > 010004 > > > > 010005 > > > > 010008 > > > > 010010 > > > > I do not have a system in front of me that I can test with but if you > > > run explain plan on these queries I expect that both tables are being > > > read via full table scans since you concatenate a comma before and > > > after you key columns rendering any index useless unless you have > > > built function based indexes on both set of keys. > > > > Because the run time is exactly equal I suspect Oracle is using the > > > same plan for both orderings of the table name in the query. > > > > Is the goal to find the key value or to get the descriptions for the > > > keys? > > > > -- Mark D Powell --- Hide quoted text - > > > > - Show quoted text - > > > Thanks Mark D Powell > > > The goal is to extract the keys values. > > > Yes, the explain plan shows that both tables undergo full table scan. > > I didn't think of creating an index on the data table because I > > thought the LIKE clause doesn't allow for an index to be used. Can you > > suggest what the function-based index should be like on the data > > table? - or the glossary table, for that matter > > > Thanks. > > Vamsi > > Thanks for posting the DDL and DML to create and populate the tables. > Try a bit of experimentation working with some of Mark's suggestions. > In addition to the rows that you inserted into the tables, I excuted > the following to add an additional 900,000 rows into one of the > tables: > NSERT INTO > * VT_DEL_LIB > SELECT > * TO_CHAR(ROWNUM+10010,'000000'), > * 'N'||TO_CHAR(ROWNUM+10010,'000000') > FROM > * DUAL > CONNECT BY > * LEVEL<=900000; > > COMMIT; > > Now, let's gather stats on the tables and any indexes: > EXEC > DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAM E=>'VT_DEL_LIB',CASCADE=>*TRUE); > EXEC > DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAM E=>'VT_DEL_KEYS',CASCADE=*>TRUE); > > select distinct l.key > from > * * * * vt_del_lib l > * * * * join vt_del_keys d on ','||d.val1||',' like '%,'||l.key||', > %'; > > The DBMS Xplan: > ---------------------------------------------------------------------------*--------------------------------------------- > | Id *| Operation * * * * * | Name * * * *| Starts | E-Rows | A-Rows > | * A-Time * | Buffers | *OMem | *1Mem | Used-Mem | > ---------------------------------------------------------------------------*--------------------------------------------- > | * 1 | *HASH UNIQUE * * * *| * * * * * * | * * *1 | * * *2 | * * *5 | > 00:00:01.35 | * *8217 | * 968K| * 968K| *740K (0)| > | * 2 | * NESTED LOOPS * * *| * * * * * * | * * *1 | * * *2 | * * *7 | > 00:00:01.35 | * *8217 | * * * | * * * | * * * * *| > | * 3 | * *TABLE ACCESS FULL| VT_DEL_KEYS | * * *1 | * * *3 | * * *3 | > 00:00:00.10 | * * * 3 | * * * | * * * | * * * * *| > |* *4 | * *TABLE ACCESS FULL| VT_DEL_LIB *| * * *3 | * **1 | * * *7 | > 00:00:01.25 | * *8214 | * * * | * * * | * * * * *| > ---------------------------------------------------------------------------*--------------------------------------------- > > Predicate Information (identified by operation id): > --------------------------------------------------- > * *4 - filter(','||"D"."VAL1"||',' LIKE '%,'||"L"."KEY"||',%') > > The above completed in 1.35 seconds. > > Now, an experimentation by creating a unique index on the VT_DEL_LIB > table, modifying the KEY column so that it cannot be NULL, and > gathering table and index stats: > CREATE UNIQUE INDEX IND_VT_DEL_LIB ON VT_DEL_LIB (KEY); > ALTER TABLE VT_DEL_LIB MODIFY(KEY NOT NULL); > EXEC > DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAM E=>'VT_DEL_LIB',CASCADE=>*TRUE); > > select distinct l.key > from > * * * * vt_del_lib l > * * * * join vt_del_keys d on ','||d.val1||',' like '%,'||l.key||', > %'; > > ---------------------------------------------------------------------------*------------------------------------------------------------ > | Id *| Operation * * * * * * *| Name * * * * * | Starts | E-Rows | A- > Rows | * A-Time * | Buffers | Reads *| *OMem | *1Mem | Used-Mem | > ---------------------------------------------------------------------------*------------------------------------------------------------ > | * 1 | *HASH UNIQUE * * * * * | * * * * * * **| * * *1 | > 135K| * * *5 |00:00:01.40 | * *6816 | * * *3 | * 968K| * 968K| *742K > (0)| > | * 2 | * NESTED LOOPS * * * * | * * * * * * * *| * * *1 | > 135K| * * *7 |00:00:01.40 | * *6816 | * * *3 | * * *| > | * * * * *| > | * 3 | * *TABLE ACCESS FULL * | VT_DEL_KEYS * *| * * *1| * * *3 > | * * *3 |00:00:00.01 | * * * 3 | * * *0 | * * * |* * * | * * * * *| > |* *4 | * *INDEX FAST FULL SCAN| IND_VT_DEL_LIB | * * *3 | *45001 > | * * *7 |00:00:01.40 | * *6813 | * * *3 | * * * | * * * | * * * * *| > ---------------------------------------------------------------------------*------------------------------------------------------------ > > Predicate Information (identified by operation id): > --------------------------------------------------- > * *4 - filter(','||"D"."VAL1"||',' LIKE '%,'||"L"."KEY"||',%') > > The query is now using the index that was just created, but the > execution time increased by 0.05 seconds. > > SELECT DISTINCT > * L.KEY > FROM > * VT_DEL_LIB L, > * VT_DEL_KEYS D > WHERE > * D.VAL1 LIKE '%'||L.KEY||'%'; > > ---------------------------------------------------------------------------*--------------------------------------------------- > | Id *| Operation * * * * * * *| Name * * * * * | Starts | E-Rows | A- > Rows | * A-Time * | Buffers | *OMem | *1Mem | Used-Mem | > ---------------------------------------------------------------------------*--------------------------------------------------- > | * 1 | *HASH UNIQUE * * * * * | * * * * * * **| * * *1 | > 135K| * * *5 |00:00:00.81 | * *6816 | * 968K| * 968K| *713K (0)| > | * 2 | * NESTED LOOPS * * * * | * * * * * * * *| * * *1 | > 135K| * * *7 |00:00:00.81 | * *6816 | * * * | * * * | * * * * *| > | * 3 | * *TABLE ACCESS FULL * | VT_DEL_KEYS * *| * * *1| * * *3 > | * * *3 |00:00:00.01 | * * * 3 | * * * | * * * | * * * * *| > |* *4 | * *INDEX FAST FULL SCAN| IND_VT_DEL_LIB | * * *3 | *45001 > | * * *7 |00:00:00.80 | * *6813 | * * * | * * * | * * * * *| > ---------------------------------------------------------------------------*--------------------------------------------------- > > Predicate Information (identified by operation id): > --------------------------------------------------- > * *4 - filter("D"."VAL1" LIKE '%'||"L"."KEY"||'%') > > The above completed in 0.81 seconds, which is a nice improvement, but > still not great. > > Now, let's experiment with regular expressions (an adaptation of a > previous solution by Maxim Demenko for splitting a sentence into > words): > SELECT DISTINCT > * REGEXP_SUBSTR(VAL1,'\w+',1,LEVEL) KEY > FROM > * VT_DEL_KEYS > CONNECT BY > * REGEXP_SUBSTR(VAL1,'\w+',1,LEVEL) IS NOT NULL; > > KEY > ------ > 010005 > 010008 > 010010 > 010004 > 010001 > > If we can take the above results and drive back into into the > VT_DEL_LIB table using the index that we previously created, we obtain > an efficient plan: > SELECT DISTINCT > * L.KEY > FROM > * (SELECT DISTINCT > * * REGEXP_SUBSTR(VAL1,'\w+',1,LEVEL) KEY > * FROM > * * VT_DEL_KEYS > * CONNECT BY > * * REGEXP_SUBSTR(VAL1,'\w+',1,LEVEL) IS NOT NULL) D, > * VT_DEL_LIB L > WHERE > * L.KEY=D.KEY; > > ---------------------------------------------------------------------------*------------------------------------------------------------- > | Id *| Operation * * * * * * * * * * * *| Name * * * * * | Starts | E- > Rows | A-Rows | * A-Time * | Buffers | *OMem | *1Mem | Used-Mem | > ---------------------------------------------------------------------------*------------------------------------------------------------- > | * 1 | *HASH UNIQUE * * * * * * * * * * | * ** * * * * *| * * *1 > | * * *3 | * * *5 |00:00:00.01 | * * *15 | * 968K| *968K| *717K (0)| > | * 2 | * NESTED LOOPS * * * * * * * * * | * * * * * * * *| * * *1 > | * * *3 | * * *5 |00:00:00.01 | * * *15 | * * * |* * * | * * * * *| > | * 3 | * *VIEW * * * * * * * * * * * * *|* * * * * * * *| * * *1 > | * * *3 | * * *5 |00:00:00.01 | * * * 3 | * * * |* * * | * * * * *| > | * 4 | * * HASH UNIQUE * * * * * * * * *| * ** * * * * *| * * *1 > | * * *3 | * * *5 |00:00:00.01 | * * * 3 | * 936K| *936K| *738K (0)| > | * 5 | * * *CONNECT BY WITHOUT FILTERING| * * * * * ** *| * * *1 > | * * * *| * * 21 |00:00:00.01 | * * * 3 | * * * |* * * | * * * * *| > | * 6 | * * * TABLE ACCESS FULL * * * * *| VT_DEL_KEYS * *| * * *1 > | * * *3 | * * *3 |00:00:00.01 | * * * 3 | * * * |* * * | * * * * *| > |* *7 | * *INDEX UNIQUE SCAN * * * * * * | IND_VT_DEL_LIB | * * *5 > | * * *1 | * * *5 |00:00:00.01 | * * *12 | * * * |* * * | * * * * *| > ---------------------------------------------------------------------------*------------------------------------------------------------- > > Predicate Information (identified by operation id): > --------------------------------------------------- > * *7 - access("L"."KEY"="D"."KEY") > > The above completed in 0.01 seconds. *As the data size increases, it > may be necessary to add an /*+ ORDERED */ hint to make certain that > the plan does not change significantly. *Experiment to see what > happens as the data size increases. > > Charles Hooper > IT Manager/Oracle DBA > K&M Machine-Fabricating, Inc. Thank you *very* much for that amazing analysis, Charles Hooper! I get the impression that the regex. support is not available in Oracle 9i - is it? We're still on 9i. But, I'm going to adapt the basic idea and try it out using a "lookup" function I wrote a while back. Something like this: SELECT DISTINCT bpr_app.bpr_tools.getTokenN(VAL1, LEVEL, ',') FROM VT_DEL_KEYS CONNECT BY bpr_app.bpr_tools.getTokenN(VAL1, LEVEL, ',') IS NOT NULL; Since the objective is to extract only the keys, I don't need to join with the LIB table anymore - the *real* data goes thro' validations that ensure that all keys used are actually valid!! :-D Removing the DISTINCT gives me strange results, but, I'll try to figure that out. (I always thought there should be support for functionality that is the reverse of GROUP BY - thanks so much for introducing it) I'll post here when I've a working design (if I can apply the outline above to my real queries) or if I run into problems. Thanks! Vamsi. |
| ||||
| On Feb 5, 3:24 pm, Vamsidhar <tvamsid...@gmail.com> wrote: > I have a "glossary" table that holds a unique identifier (key) for > each "text" string used elsewhere in the app. > The fields in the data tables that should hold the text are then > populated with the "key" instead of the text - helps clients > substitute their own "glossary" for the default one, I suppose. In > some cases, there may be more than one key in a given field (separated > by commas). And the objective here is to extract a list of all the > keys used in a given data table. > > I've tried to describe the scenario with a simplified example below: > > Given the following data: > > -- --------------------- > -- This is my "glossary" > create table vt_del_lib (key varchar2(100), text varchar2(100)); > > insert into vt_del_lib values ('010001', 'One'); > insert into vt_del_lib values ('010002', 'Two'); > insert into vt_del_lib values ('010003', 'Three'); > insert into vt_del_lib values ('010004', 'Four'); > insert into vt_del_lib values ('010005', 'Five'); > insert into vt_del_lib values ('010006', 'Six'); > insert into vt_del_lib values ('010007', 'Seven'); > insert into vt_del_lib values ('010008', 'Eight'); > insert into vt_del_lib values ('010009', 'Nine'); > insert into vt_del_lib values ('010010', 'Ten'); > commit; > > -- This is my "data" table > create table vt_del_keys (val1 varchar2(1000)); > > insert into vt_del_keys values('010001'); > insert into vt_del_keys values('010004,010005,010008,010010'); > insert into vt_del_keys values('010005,010008'); > commit; > -- --------------------- > > Here's the query I've finally decided to use. I know it works, but not > how > - which version is better - is there a better way - etc. > > It gets pretty slow as the data table grows and I'm having trouble > trying to improve its performance. > > -- --------------------- > select distinct l.key > from > vt_del_lib l > join vt_del_keys d on ','||d.val1||',' like '%,'||l.key||',%'; > > -- OR > > select distinct l.key > from > vt_del_keys d > join vt_del_lib l on ','||d.val1||',' like '%,'||l.key||',%'; > > -- --------------------- > > Sorry for the cryptic subject line - I couldn't classify the problem > any better > > TIA! > Vamsi. > > P.S.: Some sample output... > > SQL> --Get used keys > SQL> select l.key > 2 from > 3 vt_del_lib l > 4 join vt_del_keys d on ','||d.val1||',' like '%,'||l.key||',%'; > > KEY > -------------------------------------------------------------------------------- > 010001 > 010004 > 010005 > 010008 > 010010 > 010005 > 010008 > > 7 rows selected > > Executed in 0.375 seconds > SQL> select l.key > 2 from > 3 vt_del_keys d > 4 join vt_del_lib l on ','||d.val1||',' like '%,'||l.key||',%'; > > KEY > -------------------------------------------------------------------------------- > 010001 > 010004 > 010005 > 010005 > 010008 > 010008 > 010010 > > 7 rows selected > > Executed in 0.375 seconds > SQL> --Get distinct used kyes > SQL> select distinct l.key > 2 from > 3 vt_del_lib l > 4 join vt_del_keys d on ','||d.val1||',' like '%,'||l.key||',%'; > > KEY > -------------------------------------------------------------------------------- > 010001 > 010004 > 010005 > 010008 > 010010 This will only work if all keys are the same length. replace your first insert "insert into vt_del_lib values ('010001', 'One');" with this one "insert into vt_del_lib values ('100', 'One');" As you have said that you've simplified the scenario, this probably explains why the result is useless. I know that this is not what you are asking, but you would make your life so much easier if you normalized the vt_del_keys table. Mike |