Unix Technical Forum

Looking for a large database for testing

This is a discussion on Looking for a large database for testing within the Pgsql Performance forums, part of the PostgreSQL category; --> Hello, I would like to test the performance of my Java/PostgreSQL applications especially when making full text searches. For ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > Pgsql Performance

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-18-2008, 12:12 PM
Sebastian Hennebrueder
 
Posts: n/a
Default Looking for a large database for testing

Hello,

I would like to test the performance of my Java/PostgreSQL applications
especially when making full text searches.
For this I am looking for a database with 50 to 300 MB having text fields.
e.g. A table with books with fields holding a comment, table of content
or example chapters
or what ever else.

Does anybody have an idea where I can find a database like this or does
even have something like this?

--
Best Regards / Viele Grüße

Sebastian Hennebrueder

----

http://www.laliluna.de

Tutorials for JSP, JavaServer Faces, Struts, Hibernate and EJB

Get support, education and consulting for these technologies - uncomplicated and cheap.


---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 04-18-2008, 12:12 PM
Tino Wildenhain
 
Posts: n/a
Default Re: Looking for a large database for testing

Sebastian Hennebrueder schrieb:
> Hello,
>
> I would like to test the performance of my Java/PostgreSQL applications
> especially when making full text searches.
> For this I am looking for a database with 50 to 300 MB having text fields.
> e.g. A table with books with fields holding a comment, table of content
> or example chapters
> or what ever else.
>
> Does anybody have an idea where I can find a database like this or does
> even have something like this?
>

You can download the wikipedia content. Just browse the wikimedia site.
Its some work to change the data to be able to import into postgres,
but at least you have a lot real world data - in many languages.



---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 04-18-2008, 12:12 PM
Sebastian Hennebrueder
 
Posts: n/a
Default Re: Looking for a large database for testing

Tino Wildenhain schrieb:

> Sebastian Hennebrueder schrieb:
>
>> Hello,
>>
>> I would like to test the performance of my Java/PostgreSQL applications
>> especially when making full text searches.
>> For this I am looking for a database with 50 to 300 MB having text
>> fields.
>> e.g. A table with books with fields holding a comment, table of content
>> or example chapters
>> or what ever else.
>>
>> Does anybody have an idea where I can find a database like this or does
>> even have something like this?
>>

> You can download the wikipedia content. Just browse the wikimedia site.
> Its some work to change the data to be able to import into postgres,
> but at least you have a lot real world data - in many languages.


I have just found it. Here there is a link
http://download.wikimedia.org/
They have content in multiple languages and dumps up to 20 GB.

--
Best Regards / Viele Grüße

Sebastian Hennebrueder

----

http://www.laliluna.de

Tutorials for JSP, JavaServer Faces, Struts, Hibernate and EJB

Get support, education and consulting for these technologies -
uncomplicated and cheap.

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 04-18-2008, 12:12 PM
Mark Rae
 
Posts: n/a
Default Re: Looking for a large database for testing

On Tue, Aug 16, 2005 at 09:29:32AM +0200, Sebastian Hennebrueder wrote:
> I would like to test the performance of my Java/PostgreSQL applications
> especially when making full text searches.
> For this I am looking for a database with 50 to 300 MB having text fields.
> e.g. A table with books with fields holding a comment, table of content
> or example chapters
> or what ever else.


You could try the OMIM database, which is currently 100M
It contains both journal references and large sections of
'plain' text. It also contains a large amount of technical
terms which will really test any kind of soundex matching
if you are using that.

http://www.ncbi.nlm.nih.gov/Omim/omimfaq.html#download

Unfortunately it only comes as a flat text file, but is
very easy to parse.

And if you start reading it, you'll probably learn quite
a lot of things you really didn't want to know!! :-D

-Mark

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 04-18-2008, 12:12 PM
Oleg Bartunov
 
Posts: n/a
Default Re: Looking for a large database for testing

Sebastian,

you can try document generator. I used
http://www.cs.rmit.edu.au/~jz/resources/finnegan.zip
yuo can play with freq. of words and document length distribution.
Also, I have SentenceGenerator.java which could be used for
generation of synthetic texts.

Oleg
On Tue, 16 Aug 2005, Sebastian Hennebrueder wrote:

> Hello,
>
> I would like to test the performance of my Java/PostgreSQL applications
> especially when making full text searches.
> For this I am looking for a database with 50 to 300 MB having text fields.
> e.g. A table with books with fields holding a comment, table of content
> or example chapters
> or what ever else.
>
> Does anybody have an idea where I can find a database like this or does
> even have something like this?
>
>


Regards,
Oleg
__________________________________________________ ___________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #6 (permalink)  
Old 04-18-2008, 12:13 PM
Sebastian Hennebrueder
 
Posts: n/a
Default Re: Looking for a large database for testing

Sebastian Hennebrueder schrieb:

>Tino Wildenhain schrieb:
>
>
>
>
>>You can download the wikipedia content. Just browse the wikimedia site.
>>Its some work to change the data to be able to import into postgres,
>>but at least you have a lot real world data - in many languages.
>>
>>

>
>I have just found it. Here there is a link
>http://download.wikimedia.org/
>They have content in multiple languages and dumps up to 20 GB.
>
>
>

Just if anybody wants to import the wikipedia data. I had considerable
problems to get the proper encoding working. I downloaded the german
content from wikipedia, which is a dump of a unicode encoded database of
mysql (utf8)

I used MySql 4.1 on Windows 2000 to read the dump and then copied the
data with a small application to postgreSQL
In
mysql.ini you should configure the setting
max_allowed_packet = 10M
I set it to 10, wich worked out. Else you can not import the dump into
mysql. The error message was something like lost connection ....
The default encoding of mysql was latin1 which worked.

Then I imported the dump
mysql -uYourUserName -pPassword --default-character-set=utf8 database <
downloadedAndUnzippedFile
The default-character-set is very important

Create table in postgres (not with all the columns)
CREATE TABLE content
(
cur_id int4 NOT NULL DEFAULT nextval('public.cur_cur_id_seq'::text),
cur_namespace int2 NOT NULL DEFAULT (0)::smallint,
cur_title varchar(255) NOT NULL DEFAULT ''::character varying,
cur_text text NOT NULL,
cur_comment text,
cur_user int4 NOT NULL DEFAULT 0,
cur_user_text varchar(255) NOT NULL DEFAULT ''::character varying,
cur_timestamp varchar(14) NOT NULL DEFAULT ''::character varying
) ;

After this I copied the data from mySql to postgres with a small Java
application. The code is not beautiful.

private void copyEntries() throws Exception {
Class.forName("org.postgresql.Driver");
Class.forName("com.mysql.jdbc.Driver");
Connection conMySQL = DriverManager.getConnection(
"jdbc:mysql://localhost/wikidb", "root", "mysql");
Connection conPostgreSQL = DriverManager.getConnection(
"jdbcostgresql://localhost/wiki", "postgres", "p");
Statement selectStatement = conMySQL.createStatement();
StringBuffer sqlQuery = new StringBuffer();
sqlQuery.append("insert into content (");
sqlQuery
.append("cur_id, cur_namespace, cur_title, cur_text,
cur_comment, cur_user, ");
sqlQuery.append("cur_user_text , cur_timestamp) ");
sqlQuery.append("values (?,?,?,?,?,?,?,?)");

PreparedStatement insertStatement = conPostgreSQL
.prepareStatement(sqlQuery.toString());

// get total rows
java.sql.ResultSet resultSet = selectStatement
.executeQuery("select count(*) from cur");
resultSet.next();
int iMax = resultSet.getInt(1);


int i = 0;
while (i < iMax) {
resultSet = selectStatement
.executeQuery("select * from cur limit "+i +", 2000");
while (resultSet.next()) {
i++;
if (i % 100 == 0)
System.out.println("" + i + " von " + iMax);
insertStatement.setInt(1, resultSet.getInt(1));
insertStatement.setInt(2, resultSet.getInt(2));
insertStatement.setString(3, resultSet.getString(3));
insertStatement.setString(4, resultSet.getString(4));
// this blob field is utf-8 encoded
byte comment[] = resultSet.getBytes(5);

insertStatement.setString(5, new String(comment, "UTF-8"));
insertStatement.setInt(6, resultSet.getInt(6));
insertStatement.setString(7, resultSet.getString(7));
insertStatement.setString(8, resultSet.getString(8));
insertStatement.execute();
}
}
}

--
Best Regards / Viele Grüße

Sebastian Hennebrueder

----

http://www.laliluna.de

Tutorials for JSP, JavaServer Faces, Struts, Hibernate and EJB

Get support, education and consulting for these technologies.


---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #7 (permalink)  
Old 04-18-2008, 12:16 PM
Jim C. Nasby
 
Posts: n/a
Default Re: Looking for a large database for testing

On Tue, Aug 16, 2005 at 09:29:32AM +0200, Sebastian Hennebrueder wrote:
> Hello,
>
> I would like to test the performance of my Java/PostgreSQL applications
> especially when making full text searches.
> For this I am looking for a database with 50 to 300 MB having text fields.
> e.g. A table with books with fields holding a comment, table of content
> or example chapters
> or what ever else.
>
> Does anybody have an idea where I can find a database like this or does
> even have something like this?


Most benchmarks (such as dbt* and pgbench) have data generators you
could use.
--
Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com
Pervasive Software http://pervasive.com 512-569-9461

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 02:00 AM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com