Unix Technical Forum

Table Design Question

This is a discussion on Table Design Question within the Informix forums, part of the Database Server Software category; --> I've got this little 30million row table, the entry point wrt to the app will be via a char(12) ...


Go Back   Unix Technical Forum > Database Server Software > Informix

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-19-2008, 09:01 PM
Paul Watson
 
Posts: n/a
Default Table Design Question

I've got this little 30million row table, the entry point wrt to
the app will be via a char(12) unique field. No real hassles, a little
big of fragmentation and it's all fine but ...

The typical search is <str>* [25%] *<str> [25%] and *<str>* [50%],
and to make life more awkward it has to treat certain characters
as identical so, for example, 0 [zero] = O [letter] and vice versa.
But at least it's not case sensitive.

It has a major update applied once per week (about 10%) but apart from
that it is static.

All smart arse ideas accepted:-))


--
Paul Watson #
Oninit Ltd # Growing old is mandatory
Tel: +44 1436 672201 # Growing up is optional
Fax: +44 1436 678693 #
Mob: +44 7818 003457 #
www.oninit.com #
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 04-19-2008, 09:01 PM
Art S. Kagel
 
Posts: n/a
Default Re: Table Design Question

On Fri, 23 Jan 2004 06:28:18 -0500, Paul Watson wrote:

> I've got this little 30million row table, the entry point wrt to the app
> will be via a char(12) unique field. No real hassles, a little big of
> fragmentation and it's all fine but ...
>
> The typical search is <str>* [25%] *<str> [25%] and *<str>* [50%], and to
> make life more awkward it has to treat certain characters as identical so,
> for example, 0 [zero] = O [letter] and vice versa. But at least it's not
> case sensitive.
>
> It has a major update applied once per week (about 10%) but apart from that
> it is static.
>
> All smart arse ideas accepted:-))
>
>

Create a functional index to map the equivalent chars for that part. For the
partial searches, You might use two mapping tables (or one) containing
trailing and contained substrings with the full key column. Then a functional
index on the substring column to effect the equivalence mappings. Lead
substrings are simple and handled by the functional index on the original
column directly.

Art S. Kagel
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 04-19-2008, 09:01 PM
Obnoxio The Clown
 
Posts: n/a
Default Re: Table Design Question

Paul Watson wrote:

> I've got this little 30million row table, the entry point wrt to
> the app will be via a char(12) unique field. No real hassles, a little
> big of fragmentation and it's all fine but ...
>
> The typical search is <str>* [25%] *<str> [25%] and *<str>* [50%],
> and to make life more awkward it has to treat certain characters
> as identical so, for example, 0 [zero] = O [letter] and vice versa.
> But at least it's not case sensitive.
>
> It has a major update applied once per week (about 10%) but apart from
> that it is static.
>
> All smart arse ideas accepted:-))


Create a UDT that matches 0 to o and any other stupidities like that. You
might want to use Excalibur text index or VTS to do the indexing.

--
"C'est pas parce qu'on n'a rien à dire qu'il faut fermer sa gueule"
- Coluche
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 04-19-2008, 09:02 PM
Paul Watson
 
Posts: n/a
Default Re: Table Design Question

To get round the *<str> I was going to store the string backwards,
index and then search <rts>*, this is blindingly fast. Functional
indexes get round the odd character mapping, but the problem query is
still *<str>* which unfortunately is the main search type.

OTC suggest the textblade, which might be an idea and needs to be
investigated further

"Art S. Kagel" wrote:
>
> On Fri, 23 Jan 2004 06:28:18 -0500, Paul Watson wrote:
>
> > I've got this little 30million row table, the entry point wrt to the app
> > will be via a char(12) unique field. No real hassles, a little big of
> > fragmentation and it's all fine but ...
> >
> > The typical search is <str>* [25%] *<str> [25%] and *<str>* [50%], and to
> > make life more awkward it has to treat certain characters as identical so,
> > for example, 0 [zero] = O [letter] and vice versa. But at least it's not
> > case sensitive.
> >
> > It has a major update applied once per week (about 10%) but apart from that
> > it is static.
> >
> > All smart arse ideas accepted:-))
> >
> >

> Create a functional index to map the equivalent chars for that part. For the
> partial searches, You might use two mapping tables (or one) containing
> trailing and contained substrings with the full key column. Then a functional
> index on the substring column to effect the equivalence mappings. Lead
> substrings are simple and handled by the functional index on the original
> column directly.
>
> Art S. Kagel


--
Paul Watson #
Oninit Ltd # Growing old is mandatory
Tel: +44 1436 672201 # Growing up is optional
Fax: +44 1436 678693 #
Mob: +44 7818 003457 #
www.oninit.com #
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 04-19-2008, 09:02 PM
Curtis Crowson
 
Posts: n/a
Default Re: Table Design Question

Paul Watson <paul@oninit.com> wrote in message news:<40110552.6EB20A52@oninit.com>...
> I've got this little 30million row table, the entry point wrt to
> the app will be via a char(12) unique field. No real hassles, a little
> big of fragmentation and it's all fine but ...
>
> The typical search is <str>* [25%] *<str> [25%] and *<str>* [50%],
> and to make life more awkward it has to treat certain characters
> as identical so, for example, 0 [zero] = O [letter] and vice versa.
> But at least it's not case sensitive.
>
> It has a major update applied once per week (about 10%) but apart from
> that it is static.
>
> All smart arse ideas accepted:-))



To be really clever about this one would have to know more about the
data and queries. I'll suggest some ideas some will be quite insane
based on the realities of the situation, but one might work for you.
Or it might give you an idea for something that would work.

Idea one call it cache searching: if there are repeated queires with
the same

*<string>* values then cache the results in a relationship table:

create table cache_query_results(
fk_id integer,
str varchar(255)
) ;

create unique index cache_query_results_1x on cache_query_results(
str, fk_id ) ; -- Only store one STR for each fk_id.

This can of course be prepopulated with values that you know are
searched for often. Your cqr table can be of a manageable size and
since you are mostly static this might work for you. Do some
calculations and you should probably limit the size of what you store:
for example don't store *e* in your cqr table, don't store words over
a certain size in the table or break the *<string>* into words so that
you have a dictionary of where things are, etc.
you might only want to store strings above some number of characters
that appears to keep the size down. Of course if you don't have
predictable queries this idea is quite insane. You can even add a
cache management table like the following:

create table cache(
str varchar(255)
count integer
) ;

to trim the cqr table of things that aren't used much or recently (
change count to timestamp )



another idea is to index on some subset of the key I have used all 2
letter combinations before ( in your data there may not be as many
valid two letter combinations as you might think especially because
your data is not case sensitive and some letters are treated the same)
Of course again this can be awful depending on the data so if each key
has 60 2 letter combos you would have a table with 60*30,000,000 rows
ouch!( it might still work depending on your hardware) But I think in
practice that would not be the case you will have to run counts to see
if it is manageable.

create table cqr(
fk_id int,
char_comb char(2)
) ;

create unique index cqr_1x( char_comb, fk_id ) ;

So a query of *John Smith* would look for:

would look like:

"Jo" "hn" " S" "mi" "th" ( you can add the other two letter
combinations but this should narrow your searh just fine )

select * from cqr where char_comb = "Jo" and char_comb = "hn and
char_comb = " S" and char_comb = "mi" and char_comb = "th" and fk_id
= id and key matches( "*John Smith*" )

again this is crazy under a lot of circumstances but can be managed
with some limits to reduce your search.

I know that these ideas are hacks but you are trying to speed up
something that is hard to speed up.
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #6 (permalink)  
Old 04-19-2008, 09:02 PM
Curtis Crowson
 
Posts: n/a
Default Re: Table Design Question

Paul Watson <paul@oninit.com> wrote in message news:<40110552.6EB20A52@oninit.com>...
> I've got this little 30million row table, the entry point wrt to
> the app will be via a char(12) unique field. No real hassles, a little
> big of fragmentation and it's all fine but ...
>
> The typical search is <str>* [25%] *<str> [25%] and *<str>* [50%],
> and to make life more awkward it has to treat certain characters
> as identical so, for example, 0 [zero] = O [letter] and vice versa.
> But at least it's not case sensitive.
>
> It has a major update applied once per week (about 10%) but apart from
> that it is static.
>
> All smart arse ideas accepted:-))



To be really clever about this one would have to know more about the
data and queries. I'll suggest some ideas some will be quite insane
based on the realities of the situation, but one might work for you.
Or it might give you an idea for something that would work.

Idea one call it cache searching: if there are repeated queires with
the same

*<string>* values then cache the results in a relationship table:

create table cache_query_results(
fk_id integer,
str varchar(255)
) ;

create unique index cache_query_results_1x on cache_query_results(
str, fk_id ) ; -- Only store one STR for each fk_id.

This can of course be prepopulated with values that you know are
searched for often. Your cqr table can be of a manageable size and
since you are mostly static this might work for you. Do some
calculations and you should probably limit the size of what you store:
for example don't store *e* in your cqr table, don't store words over
a certain size in the table or break the *<string>* into words so that
you have a dictionary of where things are, etc.
you might only want to store strings above some number of characters
that appears to keep the size down. Of course if you don't have
predictable queries this idea is quite insane. You can even add a
cache management table like the following:

create table cache(
str varchar(255)
count integer
) ;

to trim the cqr table of things that aren't used much or recently (
change count to timestamp )



another idea is to index on some subset of the key I have used all 2
letter combinations before ( in your data there may not be as many
valid two letter combinations as you might think especially because
your data is not case sensitive and some letters are treated the same)
Of course again this can be awful depending on the data so if each key
has 60 2 letter combos you would have a table with 60*30,000,000 rows
ouch!( it might still work depending on your hardware) But I think in
practice that would not be the case you will have to run counts to see
if it is manageable.

create table cqr(
fk_id int,
char_comb char(2)
) ;

create unique index cqr_1x( char_comb, fk_id ) ;

So a query of *John Smith* would look for:

would look like:

"Jo" "hn" " S" "mi" "th" ( you can add the other two letter
combinations but this should narrow your searh just fine )

select * from cqr where char_comb = "Jo" and char_comb = "hn and
char_comb = " S" and char_comb = "mi" and char_comb = "th" and fk_id
= id and key matches( "*John Smith*" )

again this is crazy under a lot of circumstances but can be managed
with some limits to reduce your search.

I know that these ideas are hacks but you are trying to speed up
something that is hard to speed up.
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #7 (permalink)  
Old 04-19-2008, 09:02 PM
Curtis Crowson
 
Posts: n/a
Default Re: Table Design Question

Paul Watson <paul@oninit.com> wrote in message news:<40110552.6EB20A52@oninit.com>...
> I've got this little 30million row table, the entry point wrt to
> the app will be via a char(12) unique field. No real hassles, a little
> big of fragmentation and it's all fine but ...
>
> The typical search is <str>* [25%] *<str> [25%] and *<str>* [50%],
> and to make life more awkward it has to treat certain characters
> as identical so, for example, 0 [zero] = O [letter] and vice versa.
> But at least it's not case sensitive.
>
> It has a major update applied once per week (about 10%) but apart from
> that it is static.
>
> All smart arse ideas accepted:-))


My earlier query in the post was wrong for the 2 letter selects.

select t1.fk_id from real_table rt, cqr t1, cqr t2, cqr t3, cqr t4, cqr t6 where
t1.fk_id = t2.fk_id and
t2.fk_id = t3.fk_id and
t3.fk_id = t4.fk_id and
t5.fk_id = t6.fk_id and
t1.keypart = 'Jo' and
t2.keypart = 'hn' and
t3.keypart = ' S' and
t4.keypart = 'mi' and
t5.keypart = 'th' and
rt.key matches('*John Smith*') and
rt.id = t1.fk_id and
rt.id = t2.fk_id and -- Not strictly needed
rt.id = t3.fk_id and -- " " "
rt.id = t4.fk_id and -- " " "
rt.id = t5.fk_id and -- " " "
rt.id = t6.fk_id -- " " "
;
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 11:18 AM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com