vBulletin Search Engine Optimization
| |||||||
| Register | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| Greetings -- I have an UPDATE query updating a 100 million row table, and allocate enough memory via shared_buffers=1500MB. However, I see two processes in top, the UPDATE process eating about 850 MB and the writer process eating about 750 MB. The box starts paging. Why is there the writer taking almost as much space as the UPDATE, and how can I shrink it? Cheers, Alexy -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance |
| |||
| On Fri, May 2, 2008 at 1:24 PM, Alexy Khrabrov <deliverable@gmail.com> wrote: > Greetings -- I have an UPDATE query updating a 100 million row table, and > allocate enough memory via shared_buffers=1500MB. However, I see two > processes in top, the UPDATE process eating about 850 MB and the writer > process eating about 750 MB. The box starts paging. Why is there the > writer taking almost as much space as the UPDATE, and how can I shrink it? Shared_buffers is NOT the main memory pool for all operations in pgsql, it is simply the buffer pool used to hold data being operated on. Things like sorts etc. use other memory and can exhaust your machine. However, I'd like to see the output of vmstat 1 or top while this is happening. How much memory does this machine have? -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance |
| |||
| On May 2, 2008, at 12:30 PM, Scott Marlowe wrote: > On Fri, May 2, 2008 at 1:24 PM, Alexy Khrabrov > <deliverable@gmail.com> wrote: >> Greetings -- I have an UPDATE query updating a 100 million row >> table, and >> allocate enough memory via shared_buffers=1500MB. However, I see two >> processes in top, the UPDATE process eating about 850 MB and the >> writer >> process eating about 750 MB. The box starts paging. Why is there >> the >> writer taking almost as much space as the UPDATE, and how can I >> shrink it? > > Shared_buffers is NOT the main memory pool for all operations in > pgsql, it is simply the buffer pool used to hold data being operated > on. > > Things like sorts etc. use other memory and can exhaust your machine. > However, I'd like to see the output of vmstat 1 or top while this is > happening. > > How much memory does this machine have? It's a 2GB RAM MacBook. Here's the top for postgres Processes: 117 total, 2 running, 6 stuck, 109 sleeping... 459 threads 12 :34:27 Load Avg: 0.27, 0.24, 0.32 CPU usage: 8.41% user, 11.06% sys, 80.53% idle SharedLibs: num = 15, resident = 40M code, 2172K data, 3172K linkedit. MemRegions: num = 20719, resident = 265M + 12M private, 1054M shared. PhysMem: 354M wired, 1117M active, 551M inactive, 2022M used, 19M free. VM: 26G + 373M 1176145(160) pageins, 1446482(2) pageouts PID COMMAND %CPU TIME #TH #PRTS #MREGS RPRVT RSHRD RSIZE VSIZE 51775 postgres 6.8% 2:40.16 1 9 39 1504K 896M 859M+ 1562M 51767 postgres 0.0% 0:39.74 1 8 28 752K 896M 752M 1560M the first is the UPDATE, the second is the writer. The query is very simple, netflix=> create index movs_mid_idx on movs(mid); CREATE INDEX netflix=> update ratings set offset1=avg-rating from movs where mid=movie_id; where the table ratings has about 100 million rows, movs has about 20,000. I randomly increased values in postgresql.conf to shared_buffers = 1500MB max_fsm_pages = 2000000 max_fsm_relations = 10000 Should I set the background writer parameters somehow to decrease the RAM consumed by the writer? Cheers, Alexy -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance |
| |||
| On Fri, May 2, 2008 at 1:38 PM, Alexy Khrabrov <deliverable@gmail.com> wrote: > > > On May 2, 2008, at 12:30 PM, Scott Marlowe wrote: > > > > On Fri, May 2, 2008 at 1:24 PM, Alexy Khrabrov <deliverable@gmail.com> > wrote: > > > > > Greetings -- I have an UPDATE query updating a 100 million row table, > and > > > allocate enough memory via shared_buffers=1500MB. However, I see two > > > processes in top, the UPDATE process eating about 850 MB and the writer > > > process eating about 750 MB. The box starts paging. Why is there the > > > writer taking almost as much space as the UPDATE, and how can I shrink > it? > > > > > > > Shared_buffers is NOT the main memory pool for all operations in > > pgsql, it is simply the buffer pool used to hold data being operated > > on. > > > > Things like sorts etc. use other memory and can exhaust your machine. > > However, I'd like to see the output of vmstat 1 or top while this is > > happening. > > > > How much memory does this machine have? > > > > It's a 2GB RAM MacBook. Here's the top for postgres > > Processes: 117 total, 2 running, 6 stuck, 109 sleeping... 459 threads > 12:34:27 > Load Avg: 0.27, 0.24, 0.32 CPU usage: 8.41% user, 11.06% sys, 80.53% > idle > SharedLibs: num = 15, resident = 40M code, 2172K data, 3172K linkedit. > MemRegions: num = 20719, resident = 265M + 12M private, 1054M shared. > PhysMem: 354M wired, 1117M active, 551M inactive, 2022M used, 19M free. > VM: 26G + 373M 1176145(160) pageins, 1446482(2) pageouts > > PID COMMAND %CPU TIME #TH #PRTS #MREGS RPRVT RSHRD RSIZE VSIZE > 51775 postgres 6.8% 2:40.16 1 9 39 1504K 896M 859M+ > 1562M > 51767 postgres 0.0% 0:39.74 1 8 28 752K 896M 752M > 1560M SOME snipping here. > I randomly increased values in postgresql.conf to > > shared_buffers = 1500MB > max_fsm_pages = 2000000 > max_fsm_relations = 10000 On a laptop with 2G ram, 1.5Gig shared buffers is probably WAY too high. > Should I set the background writer parameters somehow to decrease the RAM > consumed by the writer? No, the background writer reads through the shared buffers for dirty ones and writes them out. so, it's not really using MORE memory, it's just showing that it's attached to the ginormous shared_buffer pool you've set up. Lower your shared_buffers to about 512M or so and see how it works. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance |
| |||
| "Scott Marlowe" <scott.marlowe@gmail.com> writes: > On Fri, May 2, 2008 at 1:38 PM, Alexy Khrabrov <deliverable@gmail.com> wrote: >> I randomly increased values in postgresql.conf to >> >> shared_buffers = 1500MB >> max_fsm_pages = 2000000 >> max_fsm_relations = 10000 > On a laptop with 2G ram, 1.5Gig shared buffers is probably WAY too high. s/probably/definitely/, especially seeing that OS X is a bit of a memory hog itself. I don't think you should figure on more than 1GB being usefully available to Postgres, and you can't give all or even most of that space to shared_buffers. > No, the background writer reads through the shared buffers for dirty > ones and writes them out. so, it's not really using MORE memory, it's > just showing that it's attached to the ginormous shared_buffer pool > you've set up. Yeah. You have to be aware of top's quirky behavior for shared memory: on most platforms it will count the shared memory against *each* process, but only as much of the shared memory as that process has touched so far. So over time the reported size of any PG process will tend to climb to something over the shared memory size, but most of that isn't "real". I haven't directly checked whether OS X's top behaves that way, but given your report I think it does. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance |
| |||
| On Fri, 2 May 2008, Alexy Khrabrov wrote: > I have an UPDATE query updating a 100 million row table, and > allocate enough memory via shared_buffers=1500MB. In addition to reducing that as you've been advised, you'll probably need to increase checkpoint_segments significantly from the default (3) in order to get good performance on an update that large. Something like 30 would be a reasonable starting point. I'd suggest doing those two things, seeing how things go, and reporting back if you still think performance is unacceptable. We'd need to know your PostgreSQL version in order to really target future suggestions. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance |
| |||
| On May 2, 2008, at 1:13 PM, Tom Lane wrote: > I don't think you should figure on more than 1GB being > usefully available to Postgres, and you can't give all or even most of > that space to shared_buffers. So how should I divide say a 512 MB between shared_buffers and, um, what else? (new to pg tuning I naively thought that if I have a 100,000,000 row table, of the form (integer,integer,smallint,date), and add a real coumn to it, it will scroll through the memory reasonably fast. Yet when I had shared_buffers=128 MB, it was hanging there 8 hours before I killed it, and now with 1500MB is paging again for several hours with no end in sight. Why can't it just add a column to a row at a time and be done with it soon enough? FORTRAN or even python program and there's no index usage for this table, a sequential scan, why all the paging? Cheers, Alexy -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance |
| |||
| On May 2, 2008, at 1:22 PM, Greg Smith wrote: > On Fri, 2 May 2008, Alexy Khrabrov wrote: > >> I have an UPDATE query updating a 100 million row table, and >> allocate enough memory via shared_buffers=1500MB. > > In addition to reducing that as you've been advised, you'll probably > need to increase checkpoint_segments significantly from the default > (3) in order to get good performance on an update that large. > Something like 30 would be a reasonable starting point. > > I'd suggest doing those two things, seeing how things go, and > reporting back if you still think performance is unacceptable. We'd > need to know your PostgreSQL version in order to really target > future suggestions. PostgreSQL 8.3.1, compiled from source on Mac OSX 10.5.2 (Leopard). Saw the checkpoint_segments warning every ~20sec and increased it to 100 already. Will see what 512 MB buys me, but 128 MB was paging miserably. Cheers, Alexy -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance |
| |||
| On May 2, 2008, at 1:40 PM, Scott Marlowe wrote: > Again, a database protects your data from getting scrambled should the > program updating it quit halfway through etc... Right -- but this is a data mining work, I add a derived column to a row, and it's computed from that very row and a small second table which should fit in RAM. > Have you been vacuuming between these update attempts? Each one has > created millions of dead rows and bloated your data store. vacuum > full / cluster / reindex may be needed. I've read postgresql.conf better and see autovacuum = on is commented out, so it's on. That explains why shutting down was taking so long to shut autovacuum down too. Basically, the derived data is not critical at all, -- can I turn (1) off transactional behavior for an UPDATE, (2) should I care about vacuuming being done on the fly when saving RAM, or need I defer it/ manage it manually? I wonder what MySQL would do here on MyISAM tables without transactional behavior -- perhaps this is the case more suitable for them? Cheers, Alexy -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance |
| ||||
| On Fri, May 2, 2008 at 2:26 PM, Alexy Khrabrov <deliverable@gmail.com> wrote: > I naively thought that if I have a 100,000,000 row table, of the form > (integer,integer,smallint,date), and add a real coumn to it, it will scroll > through the memory reasonably fast. In Postgres, an update is the same as a delete/insert. That means that changing the data in one column rewrites ALL of the columns for that row, and you end up with a table that's 50% dead space, which you then have to vacuum. Sometimes if you have a "volatile" column that goes with several "static" columns, you're far better off to create a second table for the volatile data, duplicating the primary key in both tables. In your case, it would mean the difference between 10^8 inserts of (int, float), very fast, compared to what you're doing now, which is 10^8 insert and 10^8 deletes of (int, int, smallint, date, float), followed by a big vacuum/analyze (also slow). The down side of this design is that later on, it requires a join to fetch all the data for each key. You do have a primary key on your data, right? Or some sort of index? Craig -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance |