Re: MDC Question about Disk Space On Thu, 14 Aug 2003 08:35:23 -0700, Carlos wrote:
> Hello,
>
> We are trying to use MDC tables in our DataWareHouse, but the tests
> shows that MDC use a lot of space in a relation 1/10 or 1/100. (1 byte
> of source data use 10 or 100 bytes in the tablespace not including the
> space used on indexes).
>
> We did this:
>
> CREATE TABLE SALES(
> TIME INTEGER NOT NULL,
> STORE INTEGER NOT NULL,
> PRODUCT INTEGER NOT NULL,
> MEASURE1 BIGINT,
> MEASURE2 BIGINT,
> MEASURE3 BIGINT
> ) ORGANIZE BY DIMENSIONS(TIME,STORE,PRODUCT)
> IN DWHSPACE1
> INDEX IN INDEXSPACE1
>
> ALTER TABLE LOCAL ADD PRIMARY KEY (TIME,STORE,PRODUCT);
> CREATE INDEX I_SALES_TIME ON SALES
> CREATE INDEX I_SALES_STORE ON SALES
> CREATE INDEX I_SALES_PRODUCT ON SALES
>
> For every dimension we have a separate index.
>
> First we tryed a little file with 10 rows. With a normal table it
> uses 4K (I am using 4K pages), but with MDC it uses 40K. Maybe this
> is not a good example because the sample was not representative, etc,
> etc.
> So we tried to load a sample from a production source. We selected a
> month with 5,382,851 rows. Normally it uses 200MB, so a 10GB
> tablespace will be enough.
> When we tried to load after a while, an error appears indicating that
> there is no more space in the tablespace, in desperation we gave 20GB
> to the tablespace, having the same results.
>
> Later we did this change to the table, because of the high cardinality
> of time:
> ... ORGANIZE BY DIMENSIONS(STORE,PRODUCT)...
> Obtaining the same results, finally this:
> ... ORGANIZE BY DIMENSIONS(STORE)...
>
> but I am writing to the news group.
>
> ¿What are we doing wrong?
> ¿Is there somebody with a successfull example to follow?
>
>
> Greetings,
>
> Carlos Farias
Carlos,
DB2 will allocate an extent of storage for each intersection of your
dimensions. So using all of the keys in your primary key as dimensions
means you will allocate as many extents as you have rows in your table.
So you don't want to use columns with high cardinality (and time and
product are likely to be such).
If you want to include a time dimension (BTW, why are you not using a DB2
supplied TIME or TIMESTAMP for this) then I'd suggest using a derived
value (e.g. YEAR(TIME_COLUMN) + MONTH(TIME_COLUMN)) to group together
everything for a month (you could do similar things with weeks, but beware
the fact that the last few days of one year and the first few of the next
are given separate week numbers by the DB2 week function).
Phil |