vBulletin Search Engine Optimization
| |||||||
| Register | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| Hello, I have a database of about 130 million records. I need to be able to retrieve a random subset of those records (say random 100,000 records). I tried the simplest SELECT * FROM Table ORDER BY RAND() type of solutions but that is prohibitively slow with the database of this size. Are there any other solutions available that could help me with this task? If it's of any help the front end is PHP. Thanks, Marek |
| |||
| "mkarbarz@gmail.com" <mkarbarz@gmail.com> wrote in news:1170638836.739194.95150@q2g2000cwa.googlegrou ps.com: > I have a database of about 130 million records. I need to be able to > retrieve a random subset of those records (say random 100,000 > records). Can you give some more info on the table structure? -- felix |
| |||
| On 5 Feb, 09:28, Felix Geerinckx <felix.geerin...@gmail.com> wrote: > "mkarb...@gmail.com" <mkarb...@gmail.com> wrote innews:1170638836.739194.95150@q2g2000cwa.googlegr oups.com: > > > I have a database of about 130 million records. I need to be able to > > retrieve a random subset of those records (say random 100,000 > > records). > > Can you give some more info on the table structure? > > -- > felix And just how slow IS 'prohibitively'? |
| |||
| On Feb 5, 6:45 am, "strawberry" <zac.ca...@gmail.com> wrote: > On 5 Feb, 09:28, Felix Geerinckx <felix.geerin...@gmail.com> wrote: > > > "mkarb...@gmail.com" <mkarb...@gmail.com> wrote innews:1170638836.739194.95150@q2g2000cwa.googlegr oups.com: > > > > I have a database of about 130 million records. I need to be able to > > > retrieve a random subset of those records (say random 100,000 > > > records). > > > Can you give some more info on the table structure? > > > -- > > felix > > And just how slow IS 'prohibitively'? Table structure is very simple - a single table with about 20 columns in it (the database has already been normalized and there is no logical way to split it into multiple tables). Columns are either numeric or bit values. There are maybe 2 or 3 that are varchars but never longer than 10 characters. As for what is "prohibitively" slow, for a query where about 1.5 million possible rows are returned (after the WHEREs are considered) it takes in excess of 15 minutes to return 100,000 rows even if I only try to return a single bit column. |
| |||
| "mkarbarz@gmail.com" <mkarbarz@gmail.com> wrote in news:1170684292.236619.235980@h3g2000cwc.googlegro ups.com: > Table structure is very simple - a single table with about 20 columns > in it (the database has already been normalized and there is no > logical way to split it into multiple tables). Columns are either > numeric or bit values. There are maybe 2 or 3 that are varchars but > never longer than 10 characters. What is the primary key? Can you randomize on that? -- felix |
| ||||
| On 5 Feb, 14:04, "mkarb...@gmail.com" <mkarb...@gmail.com> wrote: > On Feb 5, 6:45 am, "strawberry" <zac.ca...@gmail.com> wrote: > > > On 5 Feb, 09:28, Felix Geerinckx <felix.geerin...@gmail.com> wrote: > > > > "mkarb...@gmail.com" <mkarb...@gmail.com> wrote innews:1170638836.739194.95150@q2g2000cwa.googlegr oups.com: > > > > > I have a database of about 130 million records. I need to be able to > > > > retrieve a random subset of those records (say random 100,000 > > > > records). > > > > Can you give some more info on the table structure? > > > > -- > > > felix > > > And just how slow IS 'prohibitively'? > > Table structure is very simple - a single table with about 20 columns > in it (the database has already been normalized and there is no > logical way to split it into multiple tables). Columns are either > numeric or bit values. There are maybe 2 or 3 that are varchars but > never longer than 10 characters. > > As for what is "prohibitively" slow, for a query where about 1.5 > million possible rows are returned (after the WHEREs are considered) > it takes in excess of 15 minutes to return 100,000 rows even if I only > try to return a single bit column. Does MySQL actually say something like '100000 total, Query took 900 sec'? |