This is a discussion on Merging a data mess within the DB2 forums, part of the Database Server Software category; --> I've been handed a mess of related data in the form of multiple spead sheets. Each set of data ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| I've been handed a mess of related data in the form of multiple spead sheets. Each set of data contains part or parts of the what is needed in the final database but there is no common usable column that is usable for selecting data from all sources and no two sets contain all the rows of any other. By that, I mean that one list will have member names as (last, first, middle) while another omits the middle name and a third contains only the full name, including suffix. One is formatted mixed case, another is all upper case, a third is a mix of both. Each set contains a subset of names from one master list, but not necessarily the same names as any other set. The case issue is simple - a word processor and a quick macro on the .del file solves that problem in a few minutes but I'm still left with the problem of reconciling all these disparate sources. The ambiguities (how many David Smiths can there be in 80k entries?) can be partially resolved with address matches, etc but will still have to be flagged. Anybody got a decent plan for attacking a mess like this? Ideally, the end result will be something like 3 or 4 tables with a common data column to use for linkages. So far, I've come up with several schemes but they all appear to invoke cartesian searches and run forever so I need some expert advice or a pointer to a tutorial source somewhere. -- Will Honea |