vBulletin Search Engine Optimization
| |||||||
| Register | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| When profiling our data sources - we discover what OLTP sources have what percentages of NULL values for given columns. My rule of thumb was, if the % of null values was high - even past 50% - it wasn't too valuable to ETL in the DW. However when smaller % of values was null, just to 'N/A' the columns into the DW. I'm sort of going around with some users about wether to include certain columns in the data warehouse. It seemed a no brainer to me in this example: We have many columns but one for example is called "Project Name". 100% of the rows have NULL at moment as the Project Name column is something new recently rolled out. The users can put a value into here but after a month of being in production -there still aren't any values. But it's possible they will start populating it. My take is NOT to include it at all - even with 'N/A'. Until the % of colunmns reaches something much higher it is sort of useless in the data warehouse. We have 20 or so columns similar to this - that I don't think should be included until there is a small % of nulls. Is my thinking out of line here? |