Unix Technical Forum

10g - Need advise on large lookup table and optimizing io

This is a discussion on 10g - Need advise on large lookup table and optimizing io within the Oracle Database forums, part of the Database Server Software category; --> Folks, I inherited an application that has 1 very large lookup table. This lookup table has a 90/10 read/write ...


Go Back   Unix Technical Forum > Database Server Software > Oracle Database

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 02-25-2008, 01:29 AM
peter
 
Posts: n/a
Default 10g - Need advise on large lookup table and optimizing io

Folks,
I inherited an application that has 1 very large lookup table. This
lookup table has
a 90/10 read/write ratio. Most of the time it's read heavily, but
during uploads
records that don't currently exist in that table get inserted.
This table looks like this.

USER_ID| EMAIL | CLIENT_ID | DUPLICATE_ID

------------------------------------------------------------------------
1000000 'email1' 12312 NULL
1000001 'email2' 13434 1

Anyway this table has 400 million records an is not partitioned. He
way 2 main types of
queries against this table.

SELECT email from THISTABLE where DUPLICATE_ID IS NULL and user_id =
?

The other type of query executed is a SELECT where this table is
simply joined on
USER_PROF_ID to pull the email address.

SELECT A.EMAIL, B.COL1,B.COL2
FROM DEMOGRAPHICS B, THISTABLE A
WHERE B.USER_PROF_ID = A.USER_PROF_ID

We now have indexes on the "THISTABLE" so that the actual data is never
read. Only the indexes in place are used to satisfy all queries. The
above SELECT query like many typical queries on our system pulls lots
of data (500K records to 1 million records).

We have one specific index on the columns
USER_ID|EMAIL|DUPLICATE_ID|CLIENT_ID
that we use to satisfy all the large select queries...these queries do
an INDEX RANGE SCAN on the index columns I just described ..and so
never need to access the table rows.

Because the table/indexes have nearly 400 million records in them, and
this index is read my all our "larger" select queries, this index is
contented for and the IO waits are high.

I know that this data model is not great, but i inherited it and can't
really change too mch now. But I'm wondering what I can do to speed up
the lookups during the SELECT queries against that very large index (
Index now has 3 levels).

I've been doing some testing with a single table hash cluster, but I'm
not sure if the benefits of a hash cluster outperform the benefits (in
terms of IO) of looking up the rows in the large index and not having
to read the table data.

If anyone out there has any input, I'd greatly appreciate it.

--peter

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 02-25-2008, 01:29 AM
DA Morgan
 
Posts: n/a
Default Re: 10g - Need advise on large lookup table and optimizing io

peter wrote:
> Folks,
> I inherited an application that has 1 very large lookup table. This
> lookup table has
> a 90/10 read/write ratio. Most of the time it's read heavily, but
> during uploads
> records that don't currently exist in that table get inserted.
> This table looks like this.
>
> USER_ID| EMAIL | CLIENT_ID | DUPLICATE_ID
>
> ------------------------------------------------------------------------
> 1000000 'email1' 12312 NULL
> 1000001 'email2' 13434 1
>
> Anyway this table has 400 million records an is not partitioned. He
> way 2 main types of
> queries against this table.
>
> SELECT email from THISTABLE where DUPLICATE_ID IS NULL and user_id =
> ?
>
> The other type of query executed is a SELECT where this table is
> simply joined on
> USER_PROF_ID to pull the email address.
>
> SELECT A.EMAIL, B.COL1,B.COL2
> FROM DEMOGRAPHICS B, THISTABLE A
> WHERE B.USER_PROF_ID = A.USER_PROF_ID
>
> We now have indexes on the "THISTABLE" so that the actual data is never
> read. Only the indexes in place are used to satisfy all queries. The
> above SELECT query like many typical queries on our system pulls lots
> of data (500K records to 1 million records).
>
> We have one specific index on the columns
> USER_ID|EMAIL|DUPLICATE_ID|CLIENT_ID
> that we use to satisfy all the large select queries...these queries do
> an INDEX RANGE SCAN on the index columns I just described ..and so
> never need to access the table rows.
>
> Because the table/indexes have nearly 400 million records in them, and
> this index is read my all our "larger" select queries, this index is
> contented for and the IO waits are high.
>
> I know that this data model is not great, but i inherited it and can't
> really change too mch now. But I'm wondering what I can do to speed up
> the lookups during the SELECT queries against that very large index (
> Index now has 3 levels).
>
> I've been doing some testing with a single table hash cluster, but I'm
> not sure if the benefits of a hash cluster outperform the benefits (in
> terms of IO) of looking up the rows in the large index and not having
> to read the table data.
>
> If anyone out there has any input, I'd greatly appreciate it.
>
> --peter


A couple of things I'd like to know about the tables ... how many
different user_ids and user_prof_ids in those 400M records. Is the
data skewed? And I'd be inclined to take a very harsh look at
"DUPLICATE_ID IS NULL" and this is forcing a full table scan.

Starting with DUPLICATE_ID ... can you use a default value in place
NULL? If not consider a function based index and query for the value
created by the function used to build the index.

With respect to the USER_ID = ... my first instinct would be to convert
the table to either a sorted hash cluster (10g) or Index-Organized
Table.

With respect ot USER_PROF_ID ... I'd need to know more about the data.

All of the above is just pure instinct and I wouldn't do any of it
without tracing and testing extensively.

HTH
--
Daniel A. Morgan
http://www.psoug.org
damorgan@x.washington.edu
(replace x with u to respond)
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 02-25-2008, 01:30 AM
Zhu Chao
 
Posts: n/a
Default Re: 10g - Need advise on large lookup table and optimizing io

duplicate_id is null is ok in this case, as it is used in composite
index.

you used user_prof_id and user_id in different two querys, are they
same column?

if you are doing index only query, rebuiild the index does help if
there is also dml on this table. to help improve sga efficiency.

index level =3 should be fine.

can you list a full structure of the two tables, and the SQLs?

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 02-25-2008, 01:31 AM
Joel Garry
 
Posts: n/a
Default Re: 10g - Need advise on large lookup table and optimizing io

Can you put the index in the keep pool and add to your SGA
(DB_KEEP_CACHE_SIZE) the number of blocks v$bh shows it has been using?
Something under a gig, I'd guess from your post, and if it is that
heavily used, you might be able to shrink your regular buffers some.
Helps to post version and platform and appropriate configuration
parameters.

jg
--
@home.com is bogus.
http://catless.ncl.ac.uk/Risks/24.14.html#subj11

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 02-25-2008, 01:52 AM
peter
 
Posts: n/a
Default Re: 10g - Need advise on large lookup table and optimizing io

Here's a quick update...
This 400MM record table is mostly used in an equality join to pull the
EMAIL address of a user.
So every customer of ours has a DATA table with a USER_ID, and this
lookup to pull the EMAIL is
typically an equality join in a 3 table join... ie

DRIVING_TABLE a, DATA_TABLE b, LARGE_LOOKUP_TABLE c
WHERE a.userid = b.userid and
a.userid = c.userid

Every query in the system (large queries which pull millions of
records) pull the EMAIL
from that large lookup table.. and typically the wait time on those
datafiles avg about 30-150 ms.
I was able to convince my management to move the EMAIL address into
each customers DATA_TABLE so that everyone is no longer contenting for
that LARGE_LOOKUP_TABLE and the indexes used to satisfy queries... I
think this simple change will dramatically decrease contention, plus it
breaks that dependency on a large lookup table to individual smaller
tables. This gives us some flexibility as we grow (either multiple
customer databases or RAC).

-- thanks to everyone for the great feedback as always.

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #6 (permalink)  
Old 02-25-2008, 01:53 AM
Jonathan Lewis
 
Posts: n/a
Default Re: 10g - Need advise on large lookup table and optimizing io


"peter" <p_msantos@yahoo.com> wrote in message
news:1136566558.089817.287670@g49g2000cwa.googlegr oups.com...
> Folks,
> I inherited an application that has 1 very large lookup table. This
> lookup table has
> a 90/10 read/write ratio. Most of the time it's read heavily, but
> during uploads
> records that don't currently exist in that table get inserted.
> This table looks like this.
>
> USER_ID| EMAIL | CLIENT_ID | DUPLICATE_ID
>
> ------------------------------------------------------------------------
> 1000000 'email1' 12312 NULL
> 1000001 'email2' 13434 1
>
> Anyway this table has 400 million records an is not partitioned. He
> way 2 main types of
> queries against this table.
>
> SELECT email from THISTABLE where DUPLICATE_ID IS NULL and user_id =
> ?
>
> The other type of query executed is a SELECT where this table is
> simply joined on
> USER_PROF_ID to pull the email address.
>
> SELECT A.EMAIL, B.COL1,B.COL2
> FROM DEMOGRAPHICS B, THISTABLE A
> WHERE B.USER_PROF_ID = A.USER_PROF_ID
>
> We now have indexes on the "THISTABLE" so that the actual data is never
> read. Only the indexes in place are used to satisfy all queries. The
> above SELECT query like many typical queries on our system pulls lots
> of data (500K records to 1 million records).
>
> We have one specific index on the columns
> USER_ID|EMAIL|DUPLICATE_ID|CLIENT_ID
> that we use to satisfy all the large select queries...these queries do
> an INDEX RANGE SCAN on the index columns I just described ..and so
> never need to access the table rows.
>
> Because the table/indexes have nearly 400 million records in them, and
> this index is read my all our "larger" select queries, this index is
> contented for and the IO waits are high.
>
> I know that this data model is not great, but i inherited it and can't
> really change too mch now. But I'm wondering what I can do to speed up
> the lookups during the SELECT queries against that very large index (
> Index now has 3 levels).
>
> I've been doing some testing with a single table hash cluster, but I'm
> not sure if the benefits of a hash cluster outperform the benefits (in
> terms of IO) of looking up the rows in the large index and not having
> to read the table data.
>
> If anyone out there has any input, I'd greatly appreciate it.
>
> --peter
>


You have shown a query on user_prof_id, which is a
column that doesn't appear to exist, so I assume it is
really user_id.

Assuming you have a primary key (user_id, email_id seems
to be the probably key) this looks like a good candidate for
a hash partitioned IOT. Gets rid of the table, automatically
satisfies the indexed access requirement, and takes the heat
off the root block.

I am, of course, guessing about the intent and statistics.


--
Regards

Jonathan Lewis

http://www.jlcomp.demon.co.uk/faq/ind_faq.html
The Co-operative Oracle Users' FAQ

http://www.jlcomp.demon.co.uk/cbo_book/ind_book.html
Cost Based Oracle: Fundamentals

http://www.jlcomp.demon.co.uk/appearances.html
Public Appearances - schedule updated 10th Jan 2006


Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #7 (permalink)  
Old 02-25-2008, 01:54 AM
peter
 
Posts: n/a
Default Re: 10g - Need advise on large lookup table and optimizing io

Thanks for the feedback Jonathan.
I'm currently testing this right now, and it seems like a good idea.
However I'm testing another solution also which I think might be even
better.

This large lookup table is joined into a 3 table total query that pulls
lots of data out of our database. The index is currently there to
assist this so that the table does not have to read, but because our
queries select an avg of 50-70 million records in a 24hr time span,
this index is very contened for... The average waits on these index
files avg from 30ms - 150ms.

If I move the column EMAIL into a customer specific table..then this
large table is not needed and now every customer query is only 2 table
without the need to join on this large table... It performs better.

I've even tested setting a parallel degree of 2 for each table in the 2
table SELECT query and the performance and IO reductions from the 3
table query is enourmous... We have CPU queues that are generally at
7-8 while the machine has 12 CPUs...

But I do have a few concerns about the memory requirements,PGA and how
memory is allocated since these queries come from shared/MTS
connections while other things in our system are a mix of
dedicate/shared connections..

everyone's feedback is always appreciated..

BTW, read your "Snark Research" posting on your website. Really good
stuff, but some pieces of memory allocation,shared/MTS connections and
(in my case parallelism) are still unclear.
I posted a thread about parallelism concerns...if you have some time to
quickly browse it I'd appreciate it (posted Jan 25 3:06 pm)

http://groups.google.com/group/comp....08a7fb9b977212

thanks again.
-peter

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 10:30 AM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com