Unix Technical Forum

Integrity on large sites

This is a discussion on Integrity on large sites within the Pgsql General forums, part of the PostgreSQL category; --> I'm working in a project at the moment that is using MySQL, and people keep making assertions like this ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > Pgsql General

Register FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-09-2008, 05:23 PM
Naz Gassiep
 
Posts: n/a
Default Integrity on large sites

I'm working in a project at the moment that is using MySQL, and people
keep making assertions like this one:

"*Really* big sites don't ever have referential integrity. Or if the few
spots they do (like with financial transactions) it's implemented on the
application level (via, say, optimistic locking), never the database level."

This sounds to me like MySQLish. A large DB working with no RI would
give me nightmares. Is it really true that large sites turn RI off to
improve performance, or is that just a MySQL thing where it gets turned
off just because MySQL allows you to turn it off and improve
performance? Can you even turn RI off in PostgreSQL? Does Oracle, DB2 or
MSSQL allow you to turn it off? Am I just being naive in thinking that
everyone runs their DBs with RI in production?

- Naz

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 04-09-2008, 05:23 PM
Stuart Cooper
 
Posts: n/a
Default Re: Integrity on large sites

> "*Really* big sites don't ever have referential integrity. Or if the few
> spots they do (like with financial transactions) it's implemented on the
> application level (via, say, optimistic locking), never the database level."


Some large sites don't even use data types!

http://www.thestar.com/News/article/189175

"in some cases the field for the social insurance number was instead
filled in with a birth date."

(search the archives for "OT: Canadian Tax Database")

Cheers,
Stuart.

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 04-09-2008, 05:23 PM
Richard P. Welty
 
Posts: n/a
Default Re: Integrity on large sites

Naz Gassiep wrote:
> I'm working in a project at the moment that is using MySQL, and people
> keep making assertions like this one:
>
> "*Really* big sites don't ever have referential integrity. Or if the few
> spots they do (like with financial transactions) it's implemented on the
> application level (via, say, optimistic locking), never the database level."
>
> This sounds to me like MySQLish. A large DB working with no RI would
> give me nightmares. Is it really true that large sites turn RI off to
> improve performance,

i know from having worked in the shop that handles it that the databases
used in processing of NYS Personal Income Tax (Informix) most assuredly
use referential integrity.

anything else would be suicide.

certain shops do turn it off for large databases. that doesn't make it a
good idea,
or the industry norm.

richard


---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 04-09-2008, 05:23 PM
Ron Johnson
 
Posts: n/a
Default Re: Integrity on large sites

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 05/22/07 21:12, Naz Gassiep wrote:
> I'm working in a project at the moment that is using MySQL, and people
> keep making assertions like this one:
>
> "*Really* big sites don't ever have referential integrity. Or if the few
> spots they do (like with financial transactions) it's implemented on the
> application level (via, say, optimistic locking), never the database level."
>
> This sounds to me like MySQLish. A large DB working with no RI would
> give me nightmares. Is it really true that large sites turn RI off to
> improve performance, or is that just a MySQL thing where it gets turned
> off just because MySQL allows you to turn it off and improve
> performance? Can you even turn RI off in PostgreSQL? Does Oracle, DB2 or
> MSSQL allow you to turn it off? Am I just being naive in thinking that
> everyone runs their DBs with RI in production?


Allow you to turn it off???

RI as in foreign keys or RI as in primary keys?

FKs are not implemented on our big transactional systems that use
Rdb/VMS. Originally this was because the extra load would slow down
a system that needed every ounce of speed back on late 1990s technology.

Now we have (some) faster hardware, but even higher posting volumes.

- --
Ron Johnson, Jr.
Jefferson LA USA

Give a man a fish, and he eats for a day.
Hit him with a fish, and he goes away for good!

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFGU6gRS9HxQb37XmcRAjSnAJwN8XhCxsHyeJHqxzi/k0Dj6O8fVACdGxrd
R1hfrTh9ifDivr51AGt1NNQ=
=CSLd
-----END PGP SIGNATURE-----

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 04-09-2008, 05:23 PM
Ben
 
Posts: n/a
Default Re: Integrity on large sites

Not using foreign keys makes sense for some applications. WORM
applications where you know you are loading accurate data, for
example. Or times when it doesn't matter if an application bug
corrupts your data.

But if you care about your data and if you can't trust your client to
edit it correctly, you'd better have referential integrity. Size is
irrelevant to that equation.

On May 22, 2007, at 7:12 PM, Naz Gassiep wrote:

> I'm working in a project at the moment that is using MySQL, and people
> keep making assertions like this one:
>
> "*Really* big sites don't ever have referential integrity. Or if
> the few
> spots they do (like with financial transactions) it's implemented
> on the
> application level (via, say, optimistic locking), never the
> database level."
>
> This sounds to me like MySQLish. A large DB working with no RI would
> give me nightmares. Is it really true that large sites turn RI off to
> improve performance, or is that just a MySQL thing where it gets
> turned
> off just because MySQL allows you to turn it off and improve
> performance? Can you even turn RI off in PostgreSQL? Does Oracle,
> DB2 or
> MSSQL allow you to turn it off? Am I just being naive in thinking that
> everyone runs their DBs with RI in production?
>
> - Naz
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/docs/faq



---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #6 (permalink)  
Old 04-09-2008, 05:24 PM
Scott Ribe
 
Posts: n/a
Default Re: Integrity on large sites

> "*Really* big sites don't ever have referential integrity. Or if the few
> spots they do (like with financial transactions) it's implemented on the
> application level (via, say, optimistic locking), never the database level."


Pure, utter, unadulterated bullshit. Speaking as someone who had years of
experience with Sybase SQL Server before either MySQL or PostgreSQL were
even created...

Some big sites do of course juggle performance vs in-database run-time
checks, but the statements as typically presented by MySQL partisans, that
it's never done in the database level, is just wrong. Whether it's a direct
deception, iow speaker knows it to be false, or an indirect deception, iow
speaker is implying a level of expertise he does not possess, either way I
would categorize it as a lie.

--
Scott Ribe
scott_ribe@killerbytes.com
http://www.killerbytes.com/
(303) 722-0567 voice



---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #7 (permalink)  
Old 04-09-2008, 05:24 PM
Scott Marlowe
 
Posts: n/a
Default Re: Integrity on large sites

Naz Gassiep wrote:
> I'm working in a project at the moment that is using MySQL, and people
> keep making assertions like this one:
>
> "*Really* big sites don't ever have referential integrity. Or if the few
> spots they do (like with financial transactions) it's implemented on the
> application level (via, say, optimistic locking), never the database level."
>
> This sounds to me like MySQLish. A large DB working with no RI would
> give me nightmares. Is it really true that large sites turn RI off to
> improve performance, or is that just a MySQL thing where it gets turned
> off just because MySQL allows you to turn it off and improve
> performance? Can you even turn RI off in PostgreSQL? Does Oracle, DB2 or
> MSSQL allow you to turn it off? Am I just being naive in thinking that
> everyone runs their DBs with RI in production?


Someone's been drinking the MySQL 3.23 kool aide.

1: The bigger the amount of data you have to store, the more likely you
are to NEED referential integrity to make sure it's not getting all
messed up. Not just financial data either. What about applications
like trouble ticketing systems? Can you imagine having tickets go
orphan in a system to keep track of issues? What about parts inventory
systems? Hospital medication tracking? Transportation scheduling?
Fantasy Football? All of those systems are likely to need RI to make
sure that the data inside them stays coherent. We don't want to have
two customers thinking they have the same quarterback / taxi /
penicillen dosage / broken network router / water pump etc...

2: Handling RI in the application doesn't scale. If everything you do
requires you to check in the app, lock the whole table to prevent race
conditions, and then commit, you'll never scale to any real number of
users. You can have reliability and performance if you do RI in the
database. You only get to pick one if you're gonna do RI in the
application.

3: Of course you can turn off RI in PostgreSQL. Either remove the FK
triggers or disable them db wide. You can the same thing in Oracle as
well. This is normally done during maintenance windows to allow data
that is known to be coherent to be imported quickly. Doing so while
processing transactions is suicidal.

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org/

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #8 (permalink)  
Old 04-09-2008, 05:24 PM
Alexander Staubo
 
Posts: n/a
Default Re: Integrity on large sites

On 5/23/07, Naz Gassiep <naz@mira.net> wrote:
> "*Really* big sites don't ever have referential integrity. Or if the few
> spots they do (like with financial transactions) it's implemented on the
> application level (via, say, optimistic locking), never the database level."


It's not just the big ones. Try using Ruby on Rails -- and its ORM,
ActiveRecord -- at some point, and you will notice the rampant
ignorance of referential integrity. ActiveRecord bears signs of having
been designed for MySQL.

For example, you need a plugin to add programmatic support for
foreign-key declarations to your schema code, and foreign key
relationships have to be explicitly defined using directives such as
"has_many". The unit test framework assumes it can delete rows in any
order, irrespective of foreign-key references. And so on.

Interestingly, ActiveRecord's support for polymorphic object
associations -- which allow you define a reference to an object in an
arbitrary table -- violates RI *per definition*. There's no support
for setting up the check constraints that would be appropriate for
such attributes. All the more annoying, since such associations are
extremely useful.

Alexander.

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #9 (permalink)  
Old 04-09-2008, 05:24 PM
PFC
 
Posts: n/a
Default Re: Integrity on large sites


> Some big sites do of course juggle performance vs in-database run-time
> checks, but the statements as typically presented by MySQL partisans,


Live from the front :

This freshly created database has had to endure a multithreaded query
assault for about 2 hours.
It gave up.

TABLE `posts` (
`post_id` int(11) NOT NULL auto_increment,
`topic_id` int(11) NOT NULL,
etc...

mysql> SELECT max(post_id) FROM posts;
+--------------+
| max(post_id) |
+--------------+
| 591257 |
+--------------+

mysql> INSERT INTO posts (topic_id,post_text,user_id) VALUES (1,'DIE
BASTARD',666);
ERROR 1062 (23000): Duplicate entry '591257' for key 1

mysql> CHECK TABLE posts;
+-------------------+-------+----------+-----------------------------+
| Table | Op | Msg_type | Msg_text |
+-------------------+-------+----------+-----------------------------+
| forum_bench.posts | check | warning | Table is marked as crashed |
| forum_bench.posts | check | error | Found 588137 keys of 588135 |
| forum_bench.posts | check | error | Corrupt |
+-------------------+-------+----------+-----------------------------+

mysql> REPAIR TABLE posts;
+-------------------+--------+----------+----------+
| Table | Op | Msg_type | Msg_text |
+-------------------+--------+----------+----------+
| forum_bench.posts | repair | status | OK |
+-------------------+--------+----------+----------+

mysql> INSERT INTO posts (topic_id,post_text,user_id) VALUES (1,'DIE
BASTARD',666);
Query OK, 1 row affected, 1 warning (0.10 sec)

mysql> SHOW WARNINGS;
+---------+------+------------------------------------------------+
| Level | Code | Message |
+---------+------+------------------------------------------------+
| Warning | 1364 | Field 'post_time' doesn't have a default value |
+---------+------+------------------------------------------------+

mysql> SELECT max(post_id) FROM posts;
+--------------+
| max(post_id) |
+--------------+
| 591257 |
+--------------+

mysql> SELECT count(*) FROM posts UNION ALL SELECT sum( topic_post_count )
FROM topics;
+----------+
| count(*) |
+----------+
| 588137 |
| 588145 |
+----------+

mysql> SELECT count(*) FROM topics WHERE topic_id NOT IN (SELECT topic_id
FROM posts);
+----------+
| count(*) |
+----------+
| 11583 |
+----------+

(Note : there cannot be a topic without a post in it, ha !)

Try Postgres :

forum_bench=> SELECT count(*) FROM posts UNION ALL SELECT
sum( topic_post_count ) FROM topics;
count
--------
536108
536108
(2 lignes)

forum_bench=> SELECT count(*) FROM topics WHERE topic_id NOT IN (SELECT
topic_id FROM posts);
count
-------
0
(1 ligne)



---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #10 (permalink)  
Old 04-09-2008, 05:24 PM
Richard P. Welty
 
Posts: n/a
Default Re: Integrity on large sites

Scott Marlowe wrote:
> 2: Handling RI in the application doesn't scale. If everything you do
> requires you to check in the app, lock the whole table to prevent race
> conditions, and then commit, you'll never scale to any real number of
> users. You can have reliability and performance if you do RI in the
> database. You only get to pick one if you're gonna do RI in the
> application.

the other risk for RI in the app is the possibility of incompatible
implementation
across different app versions or different apps that access the same data.

not at all a fun place to be, that.

richard


---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 05:27 AM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com