Unix Technical Forum

Help: Join a "glossary" table and a "data" table referring to termsin the glossary

This is a discussion on Help: Join a "glossary" table and a "data" table referring to termsin the glossary within the Oracle Database forums, part of the Database Server Software category; --> I have a "glossary" table that holds a unique identifier (key) for each "text" string used elsewhere in the ...


Go Back   Unix Technical Forum > Database Server Software > Oracle Database

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 02-26-2008, 08:52 AM
Vamsidhar
 
Posts: n/a
Default Help: Join a "glossary" table and a "data" table referring to termsin the glossary


I have a "glossary" table that holds a unique identifier (key) for
each "text" string used elsewhere in the app.
The fields in the data tables that should hold the text are then
populated with the "key" instead of the text - helps clients
substitute their own "glossary" for the default one, I suppose. In
some cases, there may be more than one key in a given field (separated
by commas). And the objective here is to extract a list of all the
keys used in a given data table.

I've tried to describe the scenario with a simplified example below:

Given the following data:

-- ---------------------
-- This is my "glossary"
create table vt_del_lib (key varchar2(100), text varchar2(100));

insert into vt_del_lib values ('010001', 'One');
insert into vt_del_lib values ('010002', 'Two');
insert into vt_del_lib values ('010003', 'Three');
insert into vt_del_lib values ('010004', 'Four');
insert into vt_del_lib values ('010005', 'Five');
insert into vt_del_lib values ('010006', 'Six');
insert into vt_del_lib values ('010007', 'Seven');
insert into vt_del_lib values ('010008', 'Eight');
insert into vt_del_lib values ('010009', 'Nine');
insert into vt_del_lib values ('010010', 'Ten');
commit;

-- This is my "data" table
create table vt_del_keys (val1 varchar2(1000));

insert into vt_del_keys values('010001');
insert into vt_del_keys values('010004,010005,010008,010010');
insert into vt_del_keys values('010005,010008');
commit;
-- ---------------------

Here's the query I've finally decided to use. I know it works, but not
how I was wondering if someone could explain - how/why does it work
- which version is better - is there a better way - etc.

It gets pretty slow as the data table grows and I'm having trouble
trying to improve its performance.

-- ---------------------
select distinct l.key
from
vt_del_lib l
join vt_del_keys d on ','||d.val1||',' like '%,'||l.key||',%';

-- OR

select distinct l.key
from
vt_del_keys d
join vt_del_lib l on ','||d.val1||',' like '%,'||l.key||',%';

-- ---------------------


Sorry for the cryptic subject line - I couldn't classify the problem
any better


TIA!
Vamsi.


P.S.: Some sample output...


SQL> --Get used keys
SQL> select l.key
2 from
3 vt_del_lib l
4 join vt_del_keys d on ','||d.val1||',' like '%,'||l.key||',%';

KEY
--------------------------------------------------------------------------------
010001
010004
010005
010008
010010
010005
010008

7 rows selected

Executed in 0.375 seconds
SQL> select l.key
2 from
3 vt_del_keys d
4 join vt_del_lib l on ','||d.val1||',' like '%,'||l.key||',%';

KEY
--------------------------------------------------------------------------------
010001
010004
010005
010005
010008
010008
010010

7 rows selected

Executed in 0.375 seconds
SQL> --Get distinct used kyes
SQL> select distinct l.key
2 from
3 vt_del_lib l
4 join vt_del_keys d on ','||d.val1||',' like '%,'||l.key||',%';

KEY
--------------------------------------------------------------------------------
010001
010004
010005
010008
010010

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 02-26-2008, 08:52 AM
Mark D Powell
 
Posts: n/a
Default Re: Help: Join a "glossary" table and a "data" table referring toterms in the glossary

On Feb 5, 6:24*pm, Vamsidhar <tvamsid...@gmail.com> wrote:
> I have a "glossary" table that holds a unique identifier (key) for
> each "text" string used elsewhere in the app.
> The fields in the data tables that should hold the text are then
> populated with the "key" instead of the text - helps clients
> substitute their own "glossary" for the default one, I suppose. In
> some cases, there may be more than one key in a given field (separated
> by commas). And the objective here is to extract a list of all the
> keys used in a given data table.
>
> I've tried to describe the scenario with a simplified example below:
>
> Given the following data:
>
> -- ---------------------
> -- This is my "glossary"
> create table vt_del_lib (key varchar2(100), text varchar2(100));
>
> insert into vt_del_lib values ('010001', 'One');
> insert into vt_del_lib values ('010002', 'Two');
> insert into vt_del_lib values ('010003', 'Three');
> insert into vt_del_lib values ('010004', 'Four');
> insert into vt_del_lib values ('010005', 'Five');
> insert into vt_del_lib values ('010006', 'Six');
> insert into vt_del_lib values ('010007', 'Seven');
> insert into vt_del_lib values ('010008', 'Eight');
> insert into vt_del_lib values ('010009', 'Nine');
> insert into vt_del_lib values ('010010', 'Ten');
> commit;
>
> -- This is my "data" table
> create table vt_del_keys (val1 varchar2(1000));
>
> insert into vt_del_keys values('010001');
> insert into vt_del_keys values('010004,010005,010008,010010');
> insert into vt_del_keys values('010005,010008');
> commit;
> -- ---------------------
>
> Here's the query I've finally decided to use. I know it works, but not
> how I was wondering if someone could explain - how/why does it work
> - which version is better - is there a better way - etc.
>
> It gets pretty slow as the data table grows and I'm having trouble
> trying to improve its performance.
>
> -- ---------------------
> select distinct l.key
> from
> * * * * vt_del_lib l
> * * * * join vt_del_keys d on ','||d.val1||',' like '%,'||l.key||',%';
>
> -- OR
>
> select distinct l.key
> from
> * * * * vt_del_keys d
> * * * * join vt_del_lib l on ','||d.val1||',' like '%,'||l.key||',%';
>
> -- ---------------------
>
> Sorry for the cryptic subject line - I couldn't classify the problem
> any better
>
> TIA!
> Vamsi.
>
> P.S.: Some sample output...
>
> SQL> --Get used keys
> SQL> select l.key
> * 2 *from
> * 3 * * vt_del_lib l
> * 4 * * join vt_del_keys d on ','||d.val1||',' like '%,'||l.key||',%';
>
> KEY
> ---------------------------------------------------------------------------*-----
> 010001
> 010004
> 010005
> 010008
> 010010
> 010005
> 010008
>
> 7 rows selected
>
> Executed in 0.375 seconds
> SQL> select l.key
> * 2 *from
> * 3 * * vt_del_keys d
> * 4 * * join vt_del_lib l on ','||d.val1||',' like '%,'||l.key||',%';
>
> KEY
> ---------------------------------------------------------------------------*-----
> 010001
> 010004
> 010005
> 010005
> 010008
> 010008
> 010010
>
> 7 rows selected
>
> Executed in 0.375 seconds
> SQL> --Get distinct used kyes
> SQL> select distinct l.key
> * 2 *from
> * 3 * * vt_del_lib l
> * 4 * * join vt_del_keys d on ','||d.val1||',' like '%,'||l.key||',%';
>
> KEY
> ---------------------------------------------------------------------------*-----
> 010001
> 010004
> 010005
> 010008
> 010010


I do not have a system in front of me that I can test with but if you
run explain plan on these queries I expect that both tables are being
read via full table scans since you concatenate a comma before and
after you key columns rendering any index useless unless you have
built function based indexes on both set of keys.

Because the run time is exactly equal I suspect Oracle is using the
same plan for both orderings of the table name in the query.

Is the goal to find the key value or to get the descriptions for the
keys?

-- Mark D Powell --
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 02-26-2008, 08:52 AM
Vamsidhar
 
Posts: n/a
Default Re: Help: Join a "glossary" table and a "data" table referring toterms in the glossary

On Feb 5, 7:15*pm, Mark D Powell <Mark.Pow...@eds.com> wrote:
> On Feb 5, 6:24*pm, Vamsidhar <tvamsid...@gmail.com> wrote:
>
>
>
>
>
> > I have a "glossary" table that holds a unique identifier (key) for
> > each "text" string used elsewhere in the app.
> > The fields in the data tables that should hold the text are then
> > populated with the "key" instead of the text - helps clients
> > substitute their own "glossary" for the default one, I suppose. In
> > some cases, there may be more than one key in a given field (separated
> > by commas). And the objective here is to extract a list of all the
> > keys used in a given data table.

>
> > I've tried to describe the scenario with a simplified example below:

>
> > Given the following data:

>
> > -- ---------------------
> > -- This is my "glossary"
> > create table vt_del_lib (key varchar2(100), text varchar2(100));

>
> > insert into vt_del_lib values ('010001', 'One');
> > insert into vt_del_lib values ('010002', 'Two');
> > insert into vt_del_lib values ('010003', 'Three');
> > insert into vt_del_lib values ('010004', 'Four');
> > insert into vt_del_lib values ('010005', 'Five');
> > insert into vt_del_lib values ('010006', 'Six');
> > insert into vt_del_lib values ('010007', 'Seven');
> > insert into vt_del_lib values ('010008', 'Eight');
> > insert into vt_del_lib values ('010009', 'Nine');
> > insert into vt_del_lib values ('010010', 'Ten');
> > commit;

>
> > -- This is my "data" table
> > create table vt_del_keys (val1 varchar2(1000));

>
> > insert into vt_del_keys values('010001');
> > insert into vt_del_keys values('010004,010005,010008,010010');
> > insert into vt_del_keys values('010005,010008');
> > commit;
> > -- ---------------------

>
> > Here's the query I've finally decided to use. I know it works, but not
> > how I was wondering if someone could explain - how/why does it work
> > - which version is better - is there a better way - etc.

>
> > It gets pretty slow as the data table grows and I'm having trouble
> > trying to improve its performance.

>
> > -- ---------------------
> > select distinct l.key
> > from
> > * * * * vt_del_lib l
> > * * * * join vt_del_keys d on ','||d.val1||',' like '%,'||l.key||',%';

>
> > -- OR

>
> > select distinct l.key
> > from
> > * * * * vt_del_keys d
> > * * * * join vt_del_lib l on ','||d.val1||',' like '%,'||l.key||',%';

>
> > -- ---------------------

>
> > Sorry for the cryptic subject line - I couldn't classify the problem
> > any better

>
> > TIA!
> > Vamsi.

>
> > P.S.: Some sample output...

>
> > SQL> --Get used keys
> > SQL> select l.key
> > * 2 *from
> > * 3 * * vt_del_lib l
> > * 4 * * join vt_del_keys d on ','||d.val1||',' like '%,'||l.key||',%';

>
> > KEY
> > ---------------------------------------------------------------------------**-----
> > 010001
> > 010004
> > 010005
> > 010008
> > 010010
> > 010005
> > 010008

>
> > 7 rows selected

>
> > Executed in 0.375 seconds
> > SQL> select l.key
> > * 2 *from
> > * 3 * * vt_del_keys d
> > * 4 * * join vt_del_lib l on ','||d.val1||',' like '%,'||l.key||',%';

>
> > KEY
> > ---------------------------------------------------------------------------**-----
> > 010001
> > 010004
> > 010005
> > 010005
> > 010008
> > 010008
> > 010010

>
> > 7 rows selected

>
> > Executed in 0.375 seconds
> > SQL> --Get distinct used kyes
> > SQL> select distinct l.key
> > * 2 *from
> > * 3 * * vt_del_lib l
> > * 4 * * join vt_del_keys d on ','||d.val1||',' like '%,'||l.key||',%';

>
> > KEY
> > ---------------------------------------------------------------------------**-----
> > 010001
> > 010004
> > 010005
> > 010008
> > 010010

>
> I do not have a system in front of me that I can test with but if you
> run explain plan on these queries I expect that both tables are being
> read via full table scans since you concatenate a comma before and
> after you key columns rendering any index useless unless you have
> built function based indexes on both set of keys.
>
> Because the run time is exactly equal I suspect Oracle is using the
> same plan for both orderings of the table name in the query.
>
> Is the goal to find the key value or to get the descriptions for the
> keys?
>
> -- Mark D Powell --- Hide quoted text -
>
> - Show quoted text -


Thanks Mark D Powell

The goal is to extract the keys values.

Yes, the explain plan shows that both tables undergo full table scan.
I didn't think of creating an index on the data table because I
thought the LIKE clause doesn't allow for an index to be used. Can you
suggest what the function-based index should be like on the data
table? - or the glossary table, for that matter

Thanks.
Vamsi.
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 02-26-2008, 08:52 AM
Charles Hooper
 
Posts: n/a
Default Re: Help: Join a "glossary" table and a "data" table referring toterms in the glossary

On Feb 5, 8:13*pm, Vamsidhar <tvamsid...@gmail.com> wrote:
> On Feb 5, 7:15*pm, Mark D Powell <Mark.Pow...@eds.com> wrote:
> > On Feb 5, 6:24*pm, Vamsidhar <tvamsid...@gmail.com> wrote:

>
> > > I have a "glossary" table that holds a unique identifier (key) for
> > > each "text" string used elsewhere in the app.
> > > The fields in the data tables that should hold the text are then
> > > populated with the "key" instead of the text - helps clients
> > > substitute their own "glossary" for the default one, I suppose. In
> > > some cases, there may be more than one key in a given field (separated
> > > by commas). And the objective here is to extract a list of all the
> > > keys used in a given data table.

>
> > > I've tried to describe the scenario with a simplified example below:

>
> > > Given the following data:

>
> > > -- ---------------------
> > > -- This is my "glossary"
> > > create table vt_del_lib (key varchar2(100), text varchar2(100));

>
> > > insert into vt_del_lib values ('010001', 'One');
> > > insert into vt_del_lib values ('010002', 'Two');
> > > insert into vt_del_lib values ('010003', 'Three');
> > > insert into vt_del_lib values ('010004', 'Four');
> > > insert into vt_del_lib values ('010005', 'Five');
> > > insert into vt_del_lib values ('010006', 'Six');
> > > insert into vt_del_lib values ('010007', 'Seven');
> > > insert into vt_del_lib values ('010008', 'Eight');
> > > insert into vt_del_lib values ('010009', 'Nine');
> > > insert into vt_del_lib values ('010010', 'Ten');
> > > commit;

>
> > > -- This is my "data" table
> > > create table vt_del_keys (val1 varchar2(1000));

>
> > > insert into vt_del_keys values('010001');
> > > insert into vt_del_keys values('010004,010005,010008,010010');
> > > insert into vt_del_keys values('010005,010008');
> > > commit;
> > > -- ---------------------

>
> > > Here's the query I've finally decided to use. I know it works, but not
> > > how I was wondering if someone could explain - how/why does it work
> > > - which version is better - is there a better way - etc.

>
> > > It gets pretty slow as the data table grows and I'm having trouble
> > > trying to improve its performance.

>
> > > -- ---------------------
> > > select distinct l.key
> > > from
> > > * * * * vt_del_lib l
> > > * * * * join vt_del_keys d on ','||d.val1||',' like '%,'||l.key||',%';

>
> > > -- OR

>
> > > select distinct l.key
> > > from
> > > * * * * vt_del_keys d
> > > * * * * join vt_del_lib l on ','||d.val1||',' like '%,'||l.key||',%';

>
> > > -- ---------------------

>
> > > Sorry for the cryptic subject line - I couldn't classify the problem
> > > any better

>
> > > TIA!
> > > Vamsi.

>
> > > P.S.: Some sample output...

>
> > > SQL> --Get used keys
> > > SQL> select l.key
> > > * 2 *from
> > > * 3 * * vt_del_lib l
> > > * 4 * * join vt_del_keys d on ','||d.val1||',' like '%,'||l.key||',%';

>
> > > KEY
> > > ---------------------------------------------------------------------------***-----
> > > 010001
> > > 010004
> > > 010005
> > > 010008
> > > 010010
> > > 010005
> > > 010008

>
> > > 7 rows selected

>
> > > Executed in 0.375 seconds
> > > SQL> select l.key
> > > * 2 *from
> > > * 3 * * vt_del_keys d
> > > * 4 * * join vt_del_lib l on ','||d.val1||',' like '%,'||l.key||',%';

>
> > > KEY
> > > ---------------------------------------------------------------------------***-----
> > > 010001
> > > 010004
> > > 010005
> > > 010005
> > > 010008
> > > 010008
> > > 010010

>
> > > 7 rows selected

>
> > > Executed in 0.375 seconds
> > > SQL> --Get distinct used kyes
> > > SQL> select distinct l.key
> > > * 2 *from
> > > * 3 * * vt_del_lib l
> > > * 4 * * join vt_del_keys d on ','||d.val1||',' like '%,'||l.key||',%';

>
> > > KEY
> > > ---------------------------------------------------------------------------***-----
> > > 010001
> > > 010004
> > > 010005
> > > 010008
> > > 010010

>
> > I do not have a system in front of me that I can test with but if you
> > run explain plan on these queries I expect that both tables are being
> > read via full table scans since you concatenate a comma before and
> > after you key columns rendering any index useless unless you have
> > built function based indexes on both set of keys.

>
> > Because the run time is exactly equal I suspect Oracle is using the
> > same plan for both orderings of the table name in the query.

>
> > Is the goal to find the key value or to get the descriptions for the
> > keys?

>
> > -- Mark D Powell --- Hide quoted text -

>
> > - Show quoted text -

>
> Thanks Mark D Powell
>
> The goal is to extract the keys values.
>
> Yes, the explain plan shows that both tables undergo full table scan.
> I didn't think of creating an index on the data table because I
> thought the LIKE clause doesn't allow for an index to be used. Can you
> suggest what the function-based index should be like on the data
> table? - or the glossary table, for that matter
>
> Thanks.
> Vamsi


Thanks for posting the DDL and DML to create and populate the tables.
Try a bit of experimentation working with some of Mark's suggestions.
In addition to the rows that you inserted into the tables, I excuted
the following to add an additional 900,000 rows into one of the
tables:
NSERT INTO
VT_DEL_LIB
SELECT
TO_CHAR(ROWNUM+10010,'000000'),
'N'||TO_CHAR(ROWNUM+10010,'000000')
FROM
DUAL
CONNECT BY
LEVEL<=900000;

COMMIT;

Now, let's gather stats on the tables and any indexes:
EXEC
DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAM E=>'VT_DEL_LIB',CASCADE=>TRUE);
EXEC
DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAM E=>'VT_DEL_KEYS',CASCADE=>TRUE);

select distinct l.key
from
vt_del_lib l
join vt_del_keys d on ','||d.val1||',' like '%,'||l.key||',
%';

The DBMS Xplan:
------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows
| A-Time | Buffers | OMem | 1Mem | Used-Mem |
------------------------------------------------------------------------------------------------------------------------
| 1 | HASH UNIQUE | | 1 | 2 | 5 |
00:00:01.35 | 8217 | 968K| 968K| 740K (0)|
| 2 | NESTED LOOPS | | 1 | 2 | 7 |
00:00:01.35 | 8217 | | | |
| 3 | TABLE ACCESS FULL| VT_DEL_KEYS | 1 | 3 | 3 |
00:00:00.10 | 3 | | | |
|* 4 | TABLE ACCESS FULL| VT_DEL_LIB | 3 | 1 | 7 |
00:00:01.25 | 8214 | | | |
------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
4 - filter(','||"D"."VAL1"||',' LIKE '%,'||"L"."KEY"||',%')

The above completed in 1.35 seconds.

Now, an experimentation by creating a unique index on the VT_DEL_LIB
table, modifying the KEY column so that it cannot be NULL, and
gathering table and index stats:
CREATE UNIQUE INDEX IND_VT_DEL_LIB ON VT_DEL_LIB (KEY);
ALTER TABLE VT_DEL_LIB MODIFY(KEY NOT NULL);
EXEC
DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAM E=>'VT_DEL_LIB',CASCADE=>TRUE);

select distinct l.key
from
vt_del_lib l
join vt_del_keys d on ','||d.val1||',' like '%,'||l.key||',
%';

---------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-
Rows | A-Time | Buffers | Reads | OMem | 1Mem | Used-Mem |
---------------------------------------------------------------------------------------------------------------------------------------
| 1 | HASH UNIQUE | | 1 |
135K| 5 |00:00:01.40 | 6816 | 3 | 968K| 968K| 742K
(0)|
| 2 | NESTED LOOPS | | 1 |
135K| 7 |00:00:01.40 | 6816 | 3 | |
| |
| 3 | TABLE ACCESS FULL | VT_DEL_KEYS | 1 | 3
| 3 |00:00:00.01 | 3 | 0 | | | |
|* 4 | INDEX FAST FULL SCAN| IND_VT_DEL_LIB | 3 | 45001
| 7 |00:00:01.40 | 6813 | 3 | | | |
---------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
4 - filter(','||"D"."VAL1"||',' LIKE '%,'||"L"."KEY"||',%')

The query is now using the index that was just created, but the
execution time increased by 0.05 seconds.

SELECT DISTINCT
L.KEY
FROM
VT_DEL_LIB L,
VT_DEL_KEYS D
WHERE
D.VAL1 LIKE '%'||L.KEY||'%';

------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-
Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem |
------------------------------------------------------------------------------------------------------------------------------
| 1 | HASH UNIQUE | | 1 |
135K| 5 |00:00:00.81 | 6816 | 968K| 968K| 713K (0)|
| 2 | NESTED LOOPS | | 1 |
135K| 7 |00:00:00.81 | 6816 | | | |
| 3 | TABLE ACCESS FULL | VT_DEL_KEYS | 1 | 3
| 3 |00:00:00.01 | 3 | | | |
|* 4 | INDEX FAST FULL SCAN| IND_VT_DEL_LIB | 3 | 45001
| 7 |00:00:00.80 | 6813 | | | |
------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
4 - filter("D"."VAL1" LIKE '%'||"L"."KEY"||'%')

The above completed in 0.81 seconds, which is a nice improvement, but
still not great.

Now, let's experiment with regular expressions (an adaptation of a
previous solution by Maxim Demenko for splitting a sentence into
words):
SELECT DISTINCT
REGEXP_SUBSTR(VAL1,'\w+',1,LEVEL) KEY
FROM
VT_DEL_KEYS
CONNECT BY
REGEXP_SUBSTR(VAL1,'\w+',1,LEVEL) IS NOT NULL;

KEY
------
010005
010008
010010
010004
010001

If we can take the above results and drive back into into the
VT_DEL_LIB table using the index that we previously created, we obtain
an efficient plan:
SELECT DISTINCT
L.KEY
FROM
(SELECT DISTINCT
REGEXP_SUBSTR(VAL1,'\w+',1,LEVEL) KEY
FROM
VT_DEL_KEYS
CONNECT BY
REGEXP_SUBSTR(VAL1,'\w+',1,LEVEL) IS NOT NULL) D,
VT_DEL_LIB L
WHERE
L.KEY=D.KEY;

----------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-
Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------------------------------
| 1 | HASH UNIQUE | | 1
| 3 | 5 |00:00:00.01 | 15 | 968K| 968K| 717K (0)|
| 2 | NESTED LOOPS | | 1
| 3 | 5 |00:00:00.01 | 15 | | | |
| 3 | VIEW | | 1
| 3 | 5 |00:00:00.01 | 3 | | | |
| 4 | HASH UNIQUE | | 1
| 3 | 5 |00:00:00.01 | 3 | 936K| 936K| 738K (0)|
| 5 | CONNECT BY WITHOUT FILTERING| | 1
| | 21 |00:00:00.01 | 3 | | | |
| 6 | TABLE ACCESS FULL | VT_DEL_KEYS | 1
| 3 | 3 |00:00:00.01 | 3 | | | |
|* 7 | INDEX UNIQUE SCAN | IND_VT_DEL_LIB | 5
| 1 | 5 |00:00:00.01 | 12 | | | |
----------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
7 - access("L"."KEY"="D"."KEY")

The above completed in 0.01 seconds. As the data size increases, it
may be necessary to add an /*+ ORDERED */ hint to make certain that
the plan does not change significantly. Experiment to see what
happens as the data size increases.

Charles Hooper
IT Manager/Oracle DBA
K&M Machine-Fabricating, Inc.
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 02-26-2008, 08:52 AM
Vamsidhar
 
Posts: n/a
Default Re: Help: Join a "glossary" table and a "data" table referring toterms in the glossary

On Feb 5, 10:11*pm, Charles Hooper <hooperc2...@yahoo.com> wrote:
> On Feb 5, 8:13*pm, Vamsidhar <tvamsid...@gmail.com> wrote:
>
> > On Feb 5, 7:15*pm, Mark D Powell <Mark.Pow...@eds.com> wrote:
> > > On Feb 5, 6:24*pm, Vamsidhar <tvamsid...@gmail.com> wrote:

>
> > > > I have a "glossary" table that holds a unique identifier (key) for
> > > > each "text" string used elsewhere in the app.
> > > > The fields in the data tables that should hold the text are then
> > > > populated with the "key" instead of the text - helps clients
> > > > substitute their own "glossary" for the default one, I suppose. In
> > > > some cases, there may be more than one key in a given field (separated
> > > > by commas). And the objective here is to extract a list of all the
> > > > keys used in a given data table.

>
> > > > I've tried to describe the scenario with a simplified example below:

>
> > > > Given the following data:

>
> > > > -- ---------------------
> > > > -- This is my "glossary"
> > > > create table vt_del_lib (key varchar2(100), text varchar2(100));

>
> > > > insert into vt_del_lib values ('010001', 'One');
> > > > insert into vt_del_lib values ('010002', 'Two');
> > > > insert into vt_del_lib values ('010003', 'Three');
> > > > insert into vt_del_lib values ('010004', 'Four');
> > > > insert into vt_del_lib values ('010005', 'Five');
> > > > insert into vt_del_lib values ('010006', 'Six');
> > > > insert into vt_del_lib values ('010007', 'Seven');
> > > > insert into vt_del_lib values ('010008', 'Eight');
> > > > insert into vt_del_lib values ('010009', 'Nine');
> > > > insert into vt_del_lib values ('010010', 'Ten');
> > > > commit;

>
> > > > -- This is my "data" table
> > > > create table vt_del_keys (val1 varchar2(1000));

>
> > > > insert into vt_del_keys values('010001');
> > > > insert into vt_del_keys values('010004,010005,010008,010010');
> > > > insert into vt_del_keys values('010005,010008');
> > > > commit;
> > > > -- ---------------------

>
> > > > Here's the query I've finally decided to use. I know it works, but not
> > > > how I was wondering if someone could explain - how/why does it work
> > > > - which version is better - is there a better way - etc.

>
> > > > It gets pretty slow as the data table grows and I'm having trouble
> > > > trying to improve its performance.

>
> > > > -- ---------------------
> > > > select distinct l.key
> > > > from
> > > > * * * * vt_del_lib l
> > > > * * * * join vt_del_keys d on ','||d.val1||',' like '%,'||l.key||',%';

>
> > > > -- OR

>
> > > > select distinct l.key
> > > > from
> > > > * * * * vt_del_keys d
> > > > * * * * join vt_del_lib l on ','||d.val1||',' like '%,'||l.key||',%';

>
> > > > -- ---------------------

>
> > > > Sorry for the cryptic subject line - I couldn't classify the problem
> > > > any better

>
> > > > TIA!
> > > > Vamsi.

>
> > > > P.S.: Some sample output...

>
> > > > SQL> --Get used keys
> > > > SQL> select l.key
> > > > * 2 *from
> > > > * 3 * * vt_del_lib l
> > > > * 4 * * join vt_del_keys d on ','||d.val1||',' like '%,'||l.key||',%';

>
> > > > KEY
> > > > ---------------------------------------------------------------------------****-----
> > > > 010001
> > > > 010004
> > > > 010005
> > > > 010008
> > > > 010010
> > > > 010005
> > > > 010008

>
> > > > 7 rows selected

>
> > > > Executed in 0.375 seconds
> > > > SQL> select l.key
> > > > * 2 *from
> > > > * 3 * * vt_del_keys d
> > > > * 4 * * join vt_del_lib l on ','||d.val1||',' like '%,'||l.key||',%';

>
> > > > KEY
> > > > ---------------------------------------------------------------------------****-----
> > > > 010001
> > > > 010004
> > > > 010005
> > > > 010005
> > > > 010008
> > > > 010008
> > > > 010010

>
> > > > 7 rows selected

>
> > > > Executed in 0.375 seconds
> > > > SQL> --Get distinct used kyes
> > > > SQL> select distinct l.key
> > > > * 2 *from
> > > > * 3 * * vt_del_lib l
> > > > * 4 * * join vt_del_keys d on ','||d.val1||',' like '%,'||l.key||',%';

>
> > > > KEY
> > > > ---------------------------------------------------------------------------****-----
> > > > 010001
> > > > 010004
> > > > 010005
> > > > 010008
> > > > 010010

>
> > > I do not have a system in front of me that I can test with but if you
> > > run explain plan on these queries I expect that both tables are being
> > > read via full table scans since you concatenate a comma before and
> > > after you key columns rendering any index useless unless you have
> > > built function based indexes on both set of keys.

>
> > > Because the run time is exactly equal I suspect Oracle is using the
> > > same plan for both orderings of the table name in the query.

>
> > > Is the goal to find the key value or to get the descriptions for the
> > > keys?

>
> > > -- Mark D Powell --- Hide quoted text -

>
> > > - Show quoted text -

>
> > Thanks Mark D Powell

>
> > The goal is to extract the keys values.

>
> > Yes, the explain plan shows that both tables undergo full table scan.
> > I didn't think of creating an index on the data table because I
> > thought the LIKE clause doesn't allow for an index to be used. Can you
> > suggest what the function-based index should be like on the data
> > table? - or the glossary table, for that matter

>
> > Thanks.
> > Vamsi

>
> Thanks for posting the DDL and DML to create and populate the tables.
> Try a bit of experimentation working with some of Mark's suggestions.
> In addition to the rows that you inserted into the tables, I excuted
> the following to add an additional 900,000 rows into one of the
> tables:
> NSERT INTO
> * VT_DEL_LIB
> SELECT
> * TO_CHAR(ROWNUM+10010,'000000'),
> * 'N'||TO_CHAR(ROWNUM+10010,'000000')
> FROM
> * DUAL
> CONNECT BY
> * LEVEL<=900000;
>
> COMMIT;
>
> Now, let's gather stats on the tables and any indexes:
> EXEC
> DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAM E=>'VT_DEL_LIB',CASCADE=>*TRUE);
> EXEC
> DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAM E=>'VT_DEL_KEYS',CASCADE=*>TRUE);
>
> select distinct l.key
> from
> * * * * vt_del_lib l
> * * * * join vt_del_keys d on ','||d.val1||',' like '%,'||l.key||',
> %';
>
> The DBMS Xplan:
> ---------------------------------------------------------------------------*---------------------------------------------
> | Id *| Operation * * * * * | Name * * * *| Starts | E-Rows | A-Rows
> | * A-Time * | Buffers | *OMem | *1Mem | Used-Mem |
> ---------------------------------------------------------------------------*---------------------------------------------
> | * 1 | *HASH UNIQUE * * * *| * * * * * * | * * *1 | * * *2 | * * *5 |
> 00:00:01.35 | * *8217 | * 968K| * 968K| *740K (0)|
> | * 2 | * NESTED LOOPS * * *| * * * * * * | * * *1 | * * *2 | * * *7 |
> 00:00:01.35 | * *8217 | * * * | * * * | * * * * *|
> | * 3 | * *TABLE ACCESS FULL| VT_DEL_KEYS | * * *1 | * * *3 | * * *3 |
> 00:00:00.10 | * * * 3 | * * * | * * * | * * * * *|
> |* *4 | * *TABLE ACCESS FULL| VT_DEL_LIB *| * * *3 | * **1 | * * *7 |
> 00:00:01.25 | * *8214 | * * * | * * * | * * * * *|
> ---------------------------------------------------------------------------*---------------------------------------------
>
> Predicate Information (identified by operation id):
> ---------------------------------------------------
> * *4 - filter(','||"D"."VAL1"||',' LIKE '%,'||"L"."KEY"||',%')
>
> The above completed in 1.35 seconds.
>
> Now, an experimentation by creating a unique index on the VT_DEL_LIB
> table, modifying the KEY column so that it cannot be NULL, and
> gathering table and index stats:
> CREATE UNIQUE INDEX IND_VT_DEL_LIB ON VT_DEL_LIB (KEY);
> ALTER TABLE VT_DEL_LIB MODIFY(KEY NOT NULL);
> EXEC
> DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAM E=>'VT_DEL_LIB',CASCADE=>*TRUE);
>
> select distinct l.key
> from
> * * * * vt_del_lib l
> * * * * join vt_del_keys d on ','||d.val1||',' like '%,'||l.key||',
> %';
>
> ---------------------------------------------------------------------------*------------------------------------------------------------
> | Id *| Operation * * * * * * *| Name * * * * * | Starts | E-Rows | A-
> Rows | * A-Time * | Buffers | Reads *| *OMem | *1Mem | Used-Mem |
> ---------------------------------------------------------------------------*------------------------------------------------------------
> | * 1 | *HASH UNIQUE * * * * * | * * * * * * **| * * *1 |
> 135K| * * *5 |00:00:01.40 | * *6816 | * * *3 | * 968K| * 968K| *742K
> (0)|
> | * 2 | * NESTED LOOPS * * * * | * * * * * * * *| * * *1 |
> 135K| * * *7 |00:00:01.40 | * *6816 | * * *3 | * * *|
> | * * * * *|
> | * 3 | * *TABLE ACCESS FULL * | VT_DEL_KEYS * *| * * *1| * * *3
> | * * *3 |00:00:00.01 | * * * 3 | * * *0 | * * * |* * * | * * * * *|
> |* *4 | * *INDEX FAST FULL SCAN| IND_VT_DEL_LIB | * * *3 | *45001
> | * * *7 |00:00:01.40 | * *6813 | * * *3 | * * * | * * * | * * * * *|
> ---------------------------------------------------------------------------*------------------------------------------------------------
>
> Predicate Information (identified by operation id):
> ---------------------------------------------------
> * *4 - filter(','||"D"."VAL1"||',' LIKE '%,'||"L"."KEY"||',%')
>
> The query is now using the index that was just created, but the
> execution time increased by 0.05 seconds.
>
> SELECT DISTINCT
> * L.KEY
> FROM
> * VT_DEL_LIB L,
> * VT_DEL_KEYS D
> WHERE
> * D.VAL1 LIKE '%'||L.KEY||'%';
>
> ---------------------------------------------------------------------------*---------------------------------------------------
> | Id *| Operation * * * * * * *| Name * * * * * | Starts | E-Rows | A-
> Rows | * A-Time * | Buffers | *OMem | *1Mem | Used-Mem |
> ---------------------------------------------------------------------------*---------------------------------------------------
> | * 1 | *HASH UNIQUE * * * * * | * * * * * * **| * * *1 |
> 135K| * * *5 |00:00:00.81 | * *6816 | * 968K| * 968K| *713K (0)|
> | * 2 | * NESTED LOOPS * * * * | * * * * * * * *| * * *1 |
> 135K| * * *7 |00:00:00.81 | * *6816 | * * * | * * * | * * * * *|
> | * 3 | * *TABLE ACCESS FULL * | VT_DEL_KEYS * *| * * *1| * * *3
> | * * *3 |00:00:00.01 | * * * 3 | * * * | * * * | * * * * *|
> |* *4 | * *INDEX FAST FULL SCAN| IND_VT_DEL_LIB | * * *3 | *45001
> | * * *7 |00:00:00.80 | * *6813 | * * * | * * * | * * * * *|
> ---------------------------------------------------------------------------*---------------------------------------------------
>
> Predicate Information (identified by operation id):
> ---------------------------------------------------
> * *4 - filter("D"."VAL1" LIKE '%'||"L"."KEY"||'%')
>
> The above completed in 0.81 seconds, which is a nice improvement, but
> still not great.
>
> Now, let's experiment with regular expressions (an adaptation of a
> previous solution by Maxim Demenko for splitting a sentence into
> words):
> SELECT DISTINCT
> * REGEXP_SUBSTR(VAL1,'\w+',1,LEVEL) KEY
> FROM
> * VT_DEL_KEYS
> CONNECT BY
> * REGEXP_SUBSTR(VAL1,'\w+',1,LEVEL) IS NOT NULL;
>
> KEY
> ------
> 010005
> 010008
> 010010
> 010004
> 010001
>
> If we can take the above results and drive back into into the
> VT_DEL_LIB table using the index that we previously created, we obtain
> an efficient plan:
> SELECT DISTINCT
> * L.KEY
> FROM
> * (SELECT DISTINCT
> * * REGEXP_SUBSTR(VAL1,'\w+',1,LEVEL) KEY
> * FROM
> * * VT_DEL_KEYS
> * CONNECT BY
> * * REGEXP_SUBSTR(VAL1,'\w+',1,LEVEL) IS NOT NULL) D,
> * VT_DEL_LIB L
> WHERE
> * L.KEY=D.KEY;
>
> ---------------------------------------------------------------------------*-------------------------------------------------------------
> | Id *| Operation * * * * * * * * * * * *| Name * * * * * | Starts | E-
> Rows | A-Rows | * A-Time * | Buffers | *OMem | *1Mem | Used-Mem |
> ---------------------------------------------------------------------------*-------------------------------------------------------------
> | * 1 | *HASH UNIQUE * * * * * * * * * * | * ** * * * * *| * * *1
> | * * *3 | * * *5 |00:00:00.01 | * * *15 | * 968K| *968K| *717K (0)|
> | * 2 | * NESTED LOOPS * * * * * * * * * | * * * * * * * *| * * *1
> | * * *3 | * * *5 |00:00:00.01 | * * *15 | * * * |* * * | * * * * *|
> | * 3 | * *VIEW * * * * * * * * * * * * *|* * * * * * * *| * * *1
> | * * *3 | * * *5 |00:00:00.01 | * * * 3 | * * * |* * * | * * * * *|
> | * 4 | * * HASH UNIQUE * * * * * * * * *| * ** * * * * *| * * *1
> | * * *3 | * * *5 |00:00:00.01 | * * * 3 | * 936K| *936K| *738K (0)|
> | * 5 | * * *CONNECT BY WITHOUT FILTERING| * * * * * ** *| * * *1
> | * * * *| * * 21 |00:00:00.01 | * * * 3 | * * * |* * * | * * * * *|
> | * 6 | * * * TABLE ACCESS FULL * * * * *| VT_DEL_KEYS * *| * * *1
> | * * *3 | * * *3 |00:00:00.01 | * * * 3 | * * * |* * * | * * * * *|
> |* *7 | * *INDEX UNIQUE SCAN * * * * * * | IND_VT_DEL_LIB | * * *5
> | * * *1 | * * *5 |00:00:00.01 | * * *12 | * * * |* * * | * * * * *|
> ---------------------------------------------------------------------------*-------------------------------------------------------------
>
> Predicate Information (identified by operation id):
> ---------------------------------------------------
> * *7 - access("L"."KEY"="D"."KEY")
>
> The above completed in 0.01 seconds. *As the data size increases, it
> may be necessary to add an /*+ ORDERED */ hint to make certain that
> the plan does not change significantly. *Experiment to see what
> happens as the data size increases.
>
> Charles Hooper
> IT Manager/Oracle DBA
> K&M Machine-Fabricating, Inc.


Thank you *very* much for that amazing analysis, Charles Hooper!

I get the impression that the regex. support is not available in
Oracle 9i - is it? We're still on 9i. But, I'm going to adapt the
basic idea and try it out using a "lookup" function I wrote a while
back.

Something like this:

SELECT DISTINCT
bpr_app.bpr_tools.getTokenN(VAL1, LEVEL, ',')
FROM
VT_DEL_KEYS
CONNECT BY
bpr_app.bpr_tools.getTokenN(VAL1, LEVEL, ',') IS NOT NULL;

Since the objective is to extract only the keys, I don't need to join
with the LIB table anymore - the *real* data goes thro' validations
that ensure that all keys used are actually valid!! :-D Removing the
DISTINCT gives me strange results, but, I'll try to figure that out.
(I always thought there should be support for functionality that is
the reverse of GROUP BY - thanks so much for introducing it)

I'll post here when I've a working design (if I can apply the outline
above to my real queries) or if I run into problems.

Thanks!
Vamsi.
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #6 (permalink)  
Old 02-26-2008, 08:52 AM
JMoo
 
Posts: n/a
Default Re: Help: Join a "glossary" table and a "data" table referring toterms in the glossary

On Feb 5, 3:24 pm, Vamsidhar <tvamsid...@gmail.com> wrote:
> I have a "glossary" table that holds a unique identifier (key) for
> each "text" string used elsewhere in the app.
> The fields in the data tables that should hold the text are then
> populated with the "key" instead of the text - helps clients
> substitute their own "glossary" for the default one, I suppose. In
> some cases, there may be more than one key in a given field (separated
> by commas). And the objective here is to extract a list of all the
> keys used in a given data table.
>
> I've tried to describe the scenario with a simplified example below:
>
> Given the following data:
>
> -- ---------------------
> -- This is my "glossary"
> create table vt_del_lib (key varchar2(100), text varchar2(100));
>
> insert into vt_del_lib values ('010001', 'One');
> insert into vt_del_lib values ('010002', 'Two');
> insert into vt_del_lib values ('010003', 'Three');
> insert into vt_del_lib values ('010004', 'Four');
> insert into vt_del_lib values ('010005', 'Five');
> insert into vt_del_lib values ('010006', 'Six');
> insert into vt_del_lib values ('010007', 'Seven');
> insert into vt_del_lib values ('010008', 'Eight');
> insert into vt_del_lib values ('010009', 'Nine');
> insert into vt_del_lib values ('010010', 'Ten');
> commit;
>
> -- This is my "data" table
> create table vt_del_keys (val1 varchar2(1000));
>
> insert into vt_del_keys values('010001');
> insert into vt_del_keys values('010004,010005,010008,010010');
> insert into vt_del_keys values('010005,010008');
> commit;
> -- ---------------------
>
> Here's the query I've finally decided to use. I know it works, but not
> how I was wondering if someone could explain - how/why does it work
> - which version is better - is there a better way - etc.
>
> It gets pretty slow as the data table grows and I'm having trouble
> trying to improve its performance.
>
> -- ---------------------
> select distinct l.key
> from
> vt_del_lib l
> join vt_del_keys d on ','||d.val1||',' like '%,'||l.key||',%';
>
> -- OR
>
> select distinct l.key
> from
> vt_del_keys d
> join vt_del_lib l on ','||d.val1||',' like '%,'||l.key||',%';
>
> -- ---------------------
>
> Sorry for the cryptic subject line - I couldn't classify the problem
> any better
>
> TIA!
> Vamsi.
>
> P.S.: Some sample output...
>
> SQL> --Get used keys
> SQL> select l.key
> 2 from
> 3 vt_del_lib l
> 4 join vt_del_keys d on ','||d.val1||',' like '%,'||l.key||',%';
>
> KEY
> --------------------------------------------------------------------------------
> 010001
> 010004
> 010005
> 010008
> 010010
> 010005
> 010008
>
> 7 rows selected
>
> Executed in 0.375 seconds
> SQL> select l.key
> 2 from
> 3 vt_del_keys d
> 4 join vt_del_lib l on ','||d.val1||',' like '%,'||l.key||',%';
>
> KEY
> --------------------------------------------------------------------------------
> 010001
> 010004
> 010005
> 010005
> 010008
> 010008
> 010010
>
> 7 rows selected
>
> Executed in 0.375 seconds
> SQL> --Get distinct used kyes
> SQL> select distinct l.key
> 2 from
> 3 vt_del_lib l
> 4 join vt_del_keys d on ','||d.val1||',' like '%,'||l.key||',%';
>
> KEY
> --------------------------------------------------------------------------------
> 010001
> 010004
> 010005
> 010008
> 010010


This will only work if all keys are the same length. replace your
first insert
"insert into vt_del_lib values ('010001', 'One');" with this one
"insert into vt_del_lib values ('100', 'One');"

As you have said that you've simplified the scenario, this probably
explains why the result is useless. I know that this is not what you
are asking, but you would make your life so much easier if you
normalized the vt_del_keys table.
Mike
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 05:49 AM.