vBulletin Search Engine Optimization
| |||||||
| Register | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| Hello, I think I'm about to ask a traditional (almost religious) question, but I haven't been able to find a crystal clear answer in the mailing lists so far. Thus, here is my question: I need to store a large number of images in a PostgreSQL database. In my application, this represents a few hundreds of thousands of images. The size of each image is about 100-200 Ko. There is a large turnover in my database, i.e. each image stays about 1 week in the database, then it is deleted. Of course, I need to have a relatively fast access to each one of these images. But more importantly, I need to periodically delete a large number of images in batch process. Moreover, the disk space that is used on the hard-disk to store the images should be kept as small as possible: Precisely, after the aforementioned batch deletions, the table that contains the images should be immediately compacted (I cannot afford the internal use of a "to be deleted" flag, because of the large amount of disk space my database requires). I have three possible implementation choices in PostgreSQL: 1) Storing the images directly on the disk, and storing an URI in the database tables (but this would require a more tricky implementation, and ACID-ity would be difficult to ensure -- after all, a database should abstract the internal storage of data, may it be images). 2) Storing the images in a "bytea" column (but what about the access times, and the batch deletion process?). 3) Storing the images as large objects (this sounds like the best solution to me, but the documentation lacks clarity about the scope of these large objects). Finally, my question is what method would you recommend to me? I thank you much in advance for your answers! __________________________________________________ _________________________ Yahoo! Mail réinvente le mail ! Découvrez le nouveau Yahoo! Mail et son interface révolutionnaire. http://fr.mail.yahoo.com ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings |
| |||
| > Hello, > > I think I'm about to ask a traditional (almost > religious) question, but I haven't been able to find a > crystal clear answer in the mailing lists so far. > Thus, here is my question: > > I need to store a large number of images in a > PostgreSQL database. In my application, this > represents a few hundreds of thousands of images. The > size of each image is about 100-200 Ko. There is a > large turnover in my database, i.e. each image stays > about 1 week in the database, then it is deleted. > > Of course, I need to have a relatively fast access to > each one of these images. But more importantly, I need > to periodically delete a large number of images in > batch process. Moreover, the disk space that is used > on the hard-disk to store the images should be kept as > small as possible: Precisely, after the aforementioned > batch deletions, the table that contains the images > should be immediately compacted (I cannot afford the > internal use of a "to be deleted" flag, because of the > large amount of disk space my database requires). > > I have three possible implementation choices in > PostgreSQL: > > 1) Storing the images directly on the disk, and > storing an URI in the database tables (but this would > require a more tricky implementation, and ACID-ity > would be difficult to ensure -- after all, a database > should abstract the internal storage of data, may it > be images). > > 2) Storing the images in a "bytea" column (but what > about the access times, and the batch deletion > process?). > > 3) Storing the images as large objects (this sounds > like the best solution to me, but the documentation > lacks clarity about the scope of these large objects). > > Finally, my question is what method would you > recommend to me? > > I thank you much in advance for your answers! > > > > > > > __________________________________________________ _________________________ > Yahoo! Mail réinvente le mail ! Découvrez le nouveau Yahoo! Mail et son > interface révolutionnaire. > http://fr.mail.yahoo.com > > ---------------------------(end of broadcast)--------------------------- > TIP 5: don't forget to increase your free space map settings > Encode the image on base64 and inseert on a text field if you use Bytea it needs to be encoded and the size stored will be more than base64 encoded if you store the image on disk you need to keep the consistency between the database and the file system leonel ---------------------------(end of broadcast)--------------------------- TIP 4: Have you searched our list archives? http://archives.postgresql.org |
| |||
| TIJod wrote: > I need to store a large number of images in a > PostgreSQL database. In my application, this > represents a few hundreds of thousands of images. The > size of each image is about 100-200 Ko. There is a > large turnover in my database, i.e. each image stays > about 1 week in the database, then it is deleted. I see little value to storing the images in the database. For me that's a general statement (I'm sure others will disagree); but especially in your case, where you have a high volume and only want to store them for a couple days. Why incur all the overhead of putting them in the DB? You can't search on them or sort on them. I would just store them in the file system and put a reference in the DB. > but this wouldrequire a more tricky implementation, and ACID-ity > would be difficult to ensure -- after all, a database > should abstract the internal storage of data, may it > be images). I can't get excited about this. First, given the amount of overhead you'll be avoiding, checking the return code from storing the image in the file system seems relatively trivial. Store the image first, and if you get a failure code, don't store the rest of the data in the DB; you've just implemented data consistency. That assumes, of course, that the image is the only meaningful data you have, which in most situations is not the case. Meaning you'd want to store the rest of the data anyway with a messages saying "image not available." -- Guy Rouillier ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings |
| |||
| On Oct 4, 2006, at 12:56 PM, Guy Rouillier wrote: > TIJod wrote: >> I need to store a large number of images in a >> PostgreSQL database. In my application, this >> represents a few hundreds of thousands of images. The >> size of each image is about 100-200 Ko. There is a >> large turnover in my database, i.e. each image stays >> about 1 week in the database, then it is deleted. > > I see little value to storing the images in the database. For me > that's > a general statement (I'm sure others will disagree); but especially in > your case, where you have a high volume and only want to store them > for > a couple days. Why incur all the overhead of putting them in the DB? > You can't search on them or sort on them. I would just store them in > the file system and put a reference in the DB. > >> but this wouldrequire a more tricky implementation, and ACID-ity >> would be difficult to ensure -- after all, a database >> should abstract the internal storage of data, may it >> be images). > > I can't get excited about this. First, given the amount of overhead > you'll be avoiding, checking the return code from storing the image in > the file system seems relatively trivial. Store the image first, > and if > you get a failure code, don't store the rest of the data in the DB; > you've just implemented data consistency. That assumes, of course, > that > the image is the only meaningful data you have, which in most > situations > is not the case. Meaning you'd want to store the rest of the data > anyway with a messages saying "image not available." Combine that with an on delete trigger that adds the filename to a deletion queue (within the transaction) and a separate process that runs through the deletion queue occasionally and you get something quite useable, while still being able to use sendfile() to throw the image over the wire rather than squeezing all that data through the database. Cheers, Steve ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend |
| |||
| On 10/4/06, TIJod <tijod@yahoo.fr> wrote: > I think I'm about to ask a traditional (almost > religious) question, but I haven't been able to find a > crystal clear answer in the mailing lists so far. I think the key in deciding this, in your case, is your requirement for space reclamation: > There is a > large turnover in my database, i.e. each image stays > about 1 week in the database, then it is deleted. > ... But more importantly, I need > to periodically delete a large number of images in > batch process. Moreover, the disk space that is used > on the hard-disk to store the images should be kept as > small as possible: Precisely, after the aforementioned > batch deletions, the table that contains the images > should be immediately compacted (I cannot afford the > internal use of a "to be deleted" flag, because of the > large amount of disk space my database requires). If I understand what postgresql is doing, then DELETE will not reclaim the space immediately. What happens internally is not all that different from marking the space as deleted. A VACUUM will allow that space to be reused, (assuming your free space map is big enough), and a VACUUM FULL would be necessary to compress the space away. All of these seem incompatible with your requirements. I agree with another responder who suggested using the filesystem for your images. Jack Orenstein ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend |
| |||
| Hi. I can provide some "insight" on the difference between the two interfaces. AFAIK, the difference is in size of the file you can store, and in the interface you have when you want to access. The size is not important (I think), since you are far below the limit. For the interface, the bytea gives you a "query" based interfaces, while largeobject are able to provide a file based interface. With Large Object, you can avoid reading the whole object with one read, or you can even move inside the Large Object, which can be useful if you have large files stored. I think there are differences also in how the space is reclaimed, but my PostgreSQL - Fu stops here. Regards Marco -- Marco Bizzarri http://notenotturne.blogspot.com/ ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings |
| |||
| On 10/4/06, Guy Rouillier <guyr@masergy.com> wrote: > TIJod wrote: > > I need to store a large number of images in a > > PostgreSQL database. In my application, this > > represents a few hundreds of thousands of images. The > > size of each image is about 100-200 Ko. There is a > > large turnover in my database, i.e. each image stays > > about 1 week in the database, then it is deleted. > > I see little value to storing the images in the database. For me that's > a general statement (I'm sure others will disagree); but especially in > your case, where you have a high volume and only want to store them for > a couple days. Why incur all the overhead of putting them in the DB? > You can't search on them or sort on them. I would just store them in > the file system and put a reference in the DB. no, you can't search or sort on them but you can put metadata on fields and search on that, and you can do things like use RI to delete images that are associated with other things, etc. this would probably fit the OP's methodogy quite nicely. > > but this wouldrequire a more tricky implementation, and ACID-ity > > would be difficult to ensure -- after all, a database > > should abstract the internal storage of data, may it > > be images). > > I can't get excited about this. First, given the amount of overhead > you'll be avoiding, checking the return code from storing the image in > the file system seems relatively trivial. Store the image first, and if > you get a failure code, don't store the rest of the data in the DB; > you've just implemented data consistency. That assumes, of course, that > the image is the only meaningful data you have, which in most situations > is not the case. Meaning you'd want to store the rest of the data > anyway with a messages saying "image not available." i think this topic is interesting and deserves better treatment than assumptions. postgresql will toast all images over a cerain size which is actually pretty efficient although can be a problem if your images are really big. on the downside you have more vacuuming overhead and postgresql can't match filesystem speed for raw writing. also you can pretty much forget decent performance if your code that does the actual insertion is not in c/c++ and uses the paramaterized api. on the flip side, you have a central interface, single point of failure and you don't have to deal with thousands or millions of image files which can become it's own problem (although solvable). also you don't have to write plumbing code to get something like atomicity. PostgreSQL is getting more and more efficeint at moving large streams in and out of the database and the answer here is not as cut and try as you might think (historically, it was insane to even attempt it). i'm wondering if anybody has ever attempted to manage large collections of binary objects inside the database and has advice here. merlin ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq |
| |||
| "Merlin Moncure" <mmoncure@gmail.com> writes: > ... postgresql will toast all images over a cerain size which > is actually pretty efficient although can be a problem if your images > are really big. But any reasonable image format is compressed already (or at least, if you are using an uncompressed format while worried about disk space then you need some time at image processing boot camp). TOAST isn't going to accomplish anything. I think the arguments for keeping stuff inside the database are (a) far easier to maintain transactional semantics for insert/delete, and (b) easier to serve the data out to clients that aren't on the same machine. You aren't going to find a performance win though. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster |
| |||
| On Oct 5, 2006, at 16:18 , Merlin Moncure wrote: >> I see little value to storing the images in the database. For me >> that's >> a general statement (I'm sure others will disagree); but >> especially in >> your case, where you have a high volume and only want to store >> them for >> a couple days. Why incur all the overhead of putting them in the DB? >> You can't search on them or sort on them. I would just store them in >> the file system and put a reference in the DB. > > no, you can't search or sort on them but you can put metadata on > fields and search on that, and you can do things like use RI to delete > images that are associated with other things, etc. this would > probably fit the OP's methodogy quite nicely. I second this sentiment; there is a lot to be said for keeping your data together in a unified storage/retrieval system with ACID semantics. There is nothing inherently wrong about this model. [...] > i'm wondering if anybody has ever attempted to manage large > collections of binary objects inside the database and has advice here. We have a production system containing 10,000 images (JPEG and PNG of various sizes) totaling roughly 4GBs. We have Lighttpd running against a couple of Rails processes which crop, scale and convert images on the fly using ImageMagick; converted images are cached in the file system and subsequently served directly by Lighttpd. Functionally I have absolutely no quibbles with this system; PostgreSQL stores the data smoothly and everything works as designed. Performance-wise, I'm not sure; the amount of data seems to put a certain load on the database server, though it's impossible to tell how much. Backups are hell, taking hours and hours to do just a single dump of the database. Rails' PostgreSQL adapter uses SQL for inserts and quotes every byte as an octal escape sequence; storing a single image can take several seconds. Single-image retrieval is similarly slow, but since the adapter uses bindings that talk directly to libpq4, I believe it's caused by the overall load on the database. Because of this, we see no recourse but to move the images into the file system. Since our cluster consists of three separate machines all running the same Rails application, with no dedicated box handling the image storage, such a solution requires the use of NFS or other type of shared storage for centralized image storage; we're not sure yet about what we will end up with. Alexander. ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match |
| ||||
| On 10/5/06, Alexander Staubo <alex@purefiction.net> wrote: > On Oct 5, 2006, at 16:18 , Merlin Moncure wrote: > > i'm wondering if anybody has ever attempted to manage large > > collections of binary objects inside the database and has advice here. > > We have a production system containing 10,000 images (JPEG and PNG of > various sizes) totaling roughly 4GBs. We have Lighttpd running > against a couple of Rails processes which crop, scale and convert > images on the fly using ImageMagick; converted images are cached in > the file system and subsequently served directly by Lighttpd. > Functionally I have absolutely no quibbles with this system; > PostgreSQL stores the data smoothly and everything works as designed. > > Performance-wise, I'm not sure; the amount of data seems to put a > certain load on the database server, though it's impossible to tell > how much. Backups are hell, taking hours and hours to do just a i admit, backups could be a problem. maybe pitr is the answer. (dump style backups are a problem for any big database) > single dump of the database. Rails' PostgreSQL adapter uses SQL for > inserts and quotes every byte as an octal escape sequence; storing a ouch...the only way to do this quickly imo is to send in raw binary data directly to the database using parameterized...this eliminates both the escaping and the unescaping step. likewise the data should be pulled out binary (this will liekly be several times faster). > single image can take several seconds. Single-image retrieval is > similarly slow, but since the adapter uses bindings that talk > directly to libpq4, I believe it's caused by the overall load on the > database. > > Because of this, we see no recourse but to move the images into the > file system. Since our cluster consists of three separate machines > all running the same Rails application, with no dedicated box > handling the image storage, such a solution requires the use of NFS > or other type of shared storage for centralized image storage; we're > not sure yet about what we will end up with. cant fault you for that decision, web applications are usually pretty aggressive on caching. they also ususally fit pretty well in the various replication technlogies as well...something to consider. merlin ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq |