Unix Technical Forum

SEO

vBulletin Search Engine Optimization


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > Pgsql General

Register FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-09-2008, 11:41 AM
TIJod
 
Posts: n/a
Default Storing images in PostgreSQL databases (again)

Hello,

I think I'm about to ask a traditional (almost
religious) question, but I haven't been able to find a
crystal clear answer in the mailing lists so far.
Thus, here is my question:

I need to store a large number of images in a
PostgreSQL database. In my application, this
represents a few hundreds of thousands of images. The
size of each image is about 100-200 Ko. There is a
large turnover in my database, i.e. each image stays
about 1 week in the database, then it is deleted.

Of course, I need to have a relatively fast access to
each one of these images. But more importantly, I need
to periodically delete a large number of images in
batch process. Moreover, the disk space that is used
on the hard-disk to store the images should be kept as
small as possible: Precisely, after the aforementioned
batch deletions, the table that contains the images
should be immediately compacted (I cannot afford the
internal use of a "to be deleted" flag, because of the
large amount of disk space my database requires).

I have three possible implementation choices in
PostgreSQL:

1) Storing the images directly on the disk, and
storing an URI in the database tables (but this would
require a more tricky implementation, and ACID-ity
would be difficult to ensure -- after all, a database
should abstract the internal storage of data, may it
be images).

2) Storing the images in a "bytea" column (but what
about the access times, and the batch deletion
process?).

3) Storing the images as large objects (this sounds
like the best solution to me, but the documentation
lacks clarity about the scope of these large objects).

Finally, my question is what method would you
recommend to me?

I thank you much in advance for your answers!






__________________________________________________ _________________________
Yahoo! Mail réinvente le mail ! Découvrez le nouveau Yahoo! Mail et son interface révolutionnaire.
http://fr.mail.yahoo.com

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 04-09-2008, 11:41 AM
Leonel Nunez
 
Posts: n/a
Default Re: Storing images in PostgreSQL databases (again)

> Hello,
>
> I think I'm about to ask a traditional (almost
> religious) question, but I haven't been able to find a
> crystal clear answer in the mailing lists so far.
> Thus, here is my question:
>
> I need to store a large number of images in a
> PostgreSQL database. In my application, this
> represents a few hundreds of thousands of images. The
> size of each image is about 100-200 Ko. There is a
> large turnover in my database, i.e. each image stays
> about 1 week in the database, then it is deleted.
>
> Of course, I need to have a relatively fast access to
> each one of these images. But more importantly, I need
> to periodically delete a large number of images in
> batch process. Moreover, the disk space that is used
> on the hard-disk to store the images should be kept as
> small as possible: Precisely, after the aforementioned
> batch deletions, the table that contains the images
> should be immediately compacted (I cannot afford the
> internal use of a "to be deleted" flag, because of the
> large amount of disk space my database requires).
>
> I have three possible implementation choices in
> PostgreSQL:
>
> 1) Storing the images directly on the disk, and
> storing an URI in the database tables (but this would
> require a more tricky implementation, and ACID-ity
> would be difficult to ensure -- after all, a database
> should abstract the internal storage of data, may it
> be images).
>
> 2) Storing the images in a "bytea" column (but what
> about the access times, and the batch deletion
> process?).
>
> 3) Storing the images as large objects (this sounds
> like the best solution to me, but the documentation
> lacks clarity about the scope of these large objects).
>
> Finally, my question is what method would you
> recommend to me?
>
> I thank you much in advance for your answers!
>
>
>
>
>
>
> __________________________________________________ _________________________
> Yahoo! Mail réinvente le mail ! Découvrez le nouveau Yahoo! Mail et son
> interface révolutionnaire.
> http://fr.mail.yahoo.com
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings
>




Encode the image on base64 and inseert on a text field

if you use Bytea it needs to be encoded and the size stored will be
more than base64 encoded

if you store the image on disk you need to keep the consistency between
the database and the file system


leonel



---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 04-09-2008, 11:41 AM
Guy Rouillier
 
Posts: n/a
Default Re: Storing images in PostgreSQL databases (again)

TIJod wrote:
> I need to store a large number of images in a
> PostgreSQL database. In my application, this
> represents a few hundreds of thousands of images. The
> size of each image is about 100-200 Ko. There is a
> large turnover in my database, i.e. each image stays
> about 1 week in the database, then it is deleted.


I see little value to storing the images in the database. For me that's
a general statement (I'm sure others will disagree); but especially in
your case, where you have a high volume and only want to store them for
a couple days. Why incur all the overhead of putting them in the DB?
You can't search on them or sort on them. I would just store them in
the file system and put a reference in the DB.

> but this wouldrequire a more tricky implementation, and ACID-ity
> would be difficult to ensure -- after all, a database
> should abstract the internal storage of data, may it
> be images).


I can't get excited about this. First, given the amount of overhead
you'll be avoiding, checking the return code from storing the image in
the file system seems relatively trivial. Store the image first, and if
you get a failure code, don't store the rest of the data in the DB;
you've just implemented data consistency. That assumes, of course, that
the image is the only meaningful data you have, which in most situations
is not the case. Meaning you'd want to store the rest of the data
anyway with a messages saying "image not available."

--
Guy Rouillier

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 04-09-2008, 11:41 AM
Steve Atkins
 
Posts: n/a
Default Re: Storing images in PostgreSQL databases (again)


On Oct 4, 2006, at 12:56 PM, Guy Rouillier wrote:

> TIJod wrote:
>> I need to store a large number of images in a
>> PostgreSQL database. In my application, this
>> represents a few hundreds of thousands of images. The
>> size of each image is about 100-200 Ko. There is a
>> large turnover in my database, i.e. each image stays
>> about 1 week in the database, then it is deleted.

>
> I see little value to storing the images in the database. For me
> that's
> a general statement (I'm sure others will disagree); but especially in
> your case, where you have a high volume and only want to store them
> for
> a couple days. Why incur all the overhead of putting them in the DB?
> You can't search on them or sort on them. I would just store them in
> the file system and put a reference in the DB.
>
>> but this wouldrequire a more tricky implementation, and ACID-ity
>> would be difficult to ensure -- after all, a database
>> should abstract the internal storage of data, may it
>> be images).

>
> I can't get excited about this. First, given the amount of overhead
> you'll be avoiding, checking the return code from storing the image in
> the file system seems relatively trivial. Store the image first,
> and if
> you get a failure code, don't store the rest of the data in the DB;
> you've just implemented data consistency. That assumes, of course,
> that
> the image is the only meaningful data you have, which in most
> situations
> is not the case. Meaning you'd want to store the rest of the data
> anyway with a messages saying "image not available."


Combine that with an on delete trigger that adds the filename
to a deletion queue (within the transaction) and a separate
process that runs through the deletion queue occasionally
and you get something quite useable, while still being able
to use sendfile() to throw the image over the wire rather than
squeezing all that data through the database.

Cheers,
Steve


---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 04-09-2008, 11:41 AM
Jack Orenstein
 
Posts: n/a
Default Re: Storing images in PostgreSQL databases (again)

On 10/4/06, TIJod <tijod@yahoo.fr> wrote:
> I think I'm about to ask a traditional (almost
> religious) question, but I haven't been able to find a
> crystal clear answer in the mailing lists so far.


I think the key in deciding this, in your case, is your requirement for space
reclamation:

> There is a
> large turnover in my database, i.e. each image stays
> about 1 week in the database, then it is deleted.
> ... But more importantly, I need
> to periodically delete a large number of images in
> batch process. Moreover, the disk space that is used
> on the hard-disk to store the images should be kept as
> small as possible: Precisely, after the aforementioned
> batch deletions, the table that contains the images
> should be immediately compacted (I cannot afford the
> internal use of a "to be deleted" flag, because of the
> large amount of disk space my database requires).


If I understand what postgresql is doing, then DELETE will not
reclaim the space immediately. What happens internally is not
all that different from marking the space as deleted. A VACUUM
will allow that space to be reused, (assuming your free space map
is big enough), and a VACUUM FULL would be necessary to compress
the space away. All of these seem incompatible with your requirements.

I agree with another responder who suggested using the filesystem
for your images.

Jack Orenstein

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #6 (permalink)  
Old 04-09-2008, 11:41 AM
Marco Bizzarri
 
Posts: n/a
Default Re: Storing images in PostgreSQL databases (again)

Hi.

I can provide some "insight" on the difference between the two interfaces.

AFAIK, the difference is in size of the file you can store, and in the
interface you have when you want to access.

The size is not important (I think), since you are far below the limit.

For the interface, the bytea gives you a "query" based interfaces,
while largeobject are able to provide a file based interface.

With Large Object, you can avoid reading the whole object with one
read, or you can even move inside the Large Object, which can be
useful if you have large files stored.

I think there are differences also in how the space is reclaimed, but
my PostgreSQL - Fu stops here.

Regards
Marco

--
Marco Bizzarri
http://notenotturne.blogspot.com/

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #7 (permalink)  
Old 04-09-2008, 11:41 AM
Merlin Moncure
 
Posts: n/a
Default Re: Storing images in PostgreSQL databases (again)

On 10/4/06, Guy Rouillier <guyr@masergy.com> wrote:
> TIJod wrote:
> > I need to store a large number of images in a
> > PostgreSQL database. In my application, this
> > represents a few hundreds of thousands of images. The
> > size of each image is about 100-200 Ko. There is a
> > large turnover in my database, i.e. each image stays
> > about 1 week in the database, then it is deleted.

>
> I see little value to storing the images in the database. For me that's
> a general statement (I'm sure others will disagree); but especially in
> your case, where you have a high volume and only want to store them for
> a couple days. Why incur all the overhead of putting them in the DB?
> You can't search on them or sort on them. I would just store them in
> the file system and put a reference in the DB.


no, you can't search or sort on them but you can put metadata on
fields and search on that, and you can do things like use RI to delete
images that are associated with other things, etc. this would
probably fit the OP's methodogy quite nicely.

> > but this wouldrequire a more tricky implementation, and ACID-ity
> > would be difficult to ensure -- after all, a database
> > should abstract the internal storage of data, may it
> > be images).

>
> I can't get excited about this. First, given the amount of overhead
> you'll be avoiding, checking the return code from storing the image in
> the file system seems relatively trivial. Store the image first, and if
> you get a failure code, don't store the rest of the data in the DB;
> you've just implemented data consistency. That assumes, of course, that
> the image is the only meaningful data you have, which in most situations
> is not the case. Meaning you'd want to store the rest of the data
> anyway with a messages saying "image not available."


i think this topic is interesting and deserves better treatment than
assumptions. postgresql will toast all images over a cerain size which
is actually pretty efficient although can be a problem if your images
are really big. on the downside you have more vacuuming overhead and
postgresql can't match filesystem speed for raw writing. also you can
pretty much forget decent performance if your code that does the
actual insertion is not in c/c++ and uses the paramaterized api.

on the flip side, you have a central interface, single point of
failure and you don't have to deal with thousands or millions of image
files which can become it's own problem (although solvable). also you
don't have to write plumbing code to get something like atomicity.
PostgreSQL is getting more and more efficeint at moving large streams
in and out of the database and the answer here is not as cut and try
as you might think (historically, it was insane to even attempt it).

i'm wondering if anybody has ever attempted to manage large
collections of binary objects inside the database and has advice here.

merlin

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #8 (permalink)  
Old 04-09-2008, 11:41 AM
Tom Lane
 
Posts: n/a
Default Re: Storing images in PostgreSQL databases (again)

"Merlin Moncure" <mmoncure@gmail.com> writes:
> ... postgresql will toast all images over a cerain size which
> is actually pretty efficient although can be a problem if your images
> are really big.


But any reasonable image format is compressed already (or at least, if
you are using an uncompressed format while worried about disk space then
you need some time at image processing boot camp). TOAST isn't going to
accomplish anything.

I think the arguments for keeping stuff inside the database are
(a) far easier to maintain transactional semantics for insert/delete,
and (b) easier to serve the data out to clients that aren't on the same
machine. You aren't going to find a performance win though.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #9 (permalink)  
Old 04-09-2008, 11:41 AM
Alexander Staubo
 
Posts: n/a
Default Re: Storing images in PostgreSQL databases (again)

On Oct 5, 2006, at 16:18 , Merlin Moncure wrote:

>> I see little value to storing the images in the database. For me
>> that's
>> a general statement (I'm sure others will disagree); but
>> especially in
>> your case, where you have a high volume and only want to store
>> them for
>> a couple days. Why incur all the overhead of putting them in the DB?
>> You can't search on them or sort on them. I would just store them in
>> the file system and put a reference in the DB.

>
> no, you can't search or sort on them but you can put metadata on
> fields and search on that, and you can do things like use RI to delete
> images that are associated with other things, etc. this would
> probably fit the OP's methodogy quite nicely.


I second this sentiment; there is a lot to be said for keeping your
data together in a unified storage/retrieval system with ACID
semantics. There is nothing inherently wrong about this model.

[...]
> i'm wondering if anybody has ever attempted to manage large
> collections of binary objects inside the database and has advice here.


We have a production system containing 10,000 images (JPEG and PNG of
various sizes) totaling roughly 4GBs. We have Lighttpd running
against a couple of Rails processes which crop, scale and convert
images on the fly using ImageMagick; converted images are cached in
the file system and subsequently served directly by Lighttpd.
Functionally I have absolutely no quibbles with this system;
PostgreSQL stores the data smoothly and everything works as designed.

Performance-wise, I'm not sure; the amount of data seems to put a
certain load on the database server, though it's impossible to tell
how much. Backups are hell, taking hours and hours to do just a
single dump of the database. Rails' PostgreSQL adapter uses SQL for
inserts and quotes every byte as an octal escape sequence; storing a
single image can take several seconds. Single-image retrieval is
similarly slow, but since the adapter uses bindings that talk
directly to libpq4, I believe it's caused by the overall load on the
database.

Because of this, we see no recourse but to move the images into the
file system. Since our cluster consists of three separate machines
all running the same Rails application, with no dedicated box
handling the image storage, such a solution requires the use of NFS
or other type of shared storage for centralized image storage; we're
not sure yet about what we will end up with.

Alexander.


---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #10 (permalink)  
Old 04-09-2008, 11:41 AM
Merlin Moncure
 
Posts: n/a
Default Re: Storing images in PostgreSQL databases (again)

On 10/5/06, Alexander Staubo <alex@purefiction.net> wrote:
> On Oct 5, 2006, at 16:18 , Merlin Moncure wrote:
> > i'm wondering if anybody has ever attempted to manage large
> > collections of binary objects inside the database and has advice here.

>
> We have a production system containing 10,000 images (JPEG and PNG of
> various sizes) totaling roughly 4GBs. We have Lighttpd running
> against a couple of Rails processes which crop, scale and convert
> images on the fly using ImageMagick; converted images are cached in
> the file system and subsequently served directly by Lighttpd.
> Functionally I have absolutely no quibbles with this system;
> PostgreSQL stores the data smoothly and everything works as designed.
>
> Performance-wise, I'm not sure; the amount of data seems to put a
> certain load on the database server, though it's impossible to tell
> how much. Backups are hell, taking hours and hours to do just a


i admit, backups could be a problem. maybe pitr is the answer. (dump
style backups are a problem for any big database)

> single dump of the database. Rails' PostgreSQL adapter uses SQL for
> inserts and quotes every byte as an octal escape sequence; storing a


ouch...the only way to do this quickly imo is to send in raw binary
data directly to the database using parameterized...this eliminates
both the escaping and the unescaping step. likewise the data should be
pulled out binary (this will liekly be several times faster).

> single image can take several seconds. Single-image retrieval is
> similarly slow, but since the adapter uses bindings that talk
> directly to libpq4, I believe it's caused by the overall load on the
> database.
>
> Because of this, we see no recourse but to move the images into the
> file system. Since our cluster consists of three separate machines
> all running the same Rails application, with no dedicated box
> handling the image storage, such a solution requires the use of NFS
> or other type of shared storage for centralized image storage; we're
> not sure yet about what we will end up with.


cant fault you for that decision, web applications are usually pretty
aggressive on caching. they also ususally fit pretty well in the
various replication technlogies as well...something to consider.

merlin

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 10:35 PM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
UnixAdminTalk.com

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438