Unix Technical Forum

Merging a data mess

This is a discussion on Merging a data mess within the DB2 forums, part of the Database Server Software category; --> I've been handed a mess of related data in the form of multiple spead sheets. Each set of data ...


Go Back   Unix Technical Forum > Database Server Software > DB2

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 02-27-2008, 03:46 AM
Will Honea
 
Posts: n/a
Default Merging a data mess

I've been handed a mess of related data in the form of multiple spead
sheets. Each set of data contains part or parts of the what is needed
in the final database but there is no common usable column that is
usable for selecting data from all sources and no two sets contain all
the rows of any other. By that, I mean that one list will have member
names as (last, first, middle) while another omits the middle name and
a third contains only the full name, including suffix. One is
formatted mixed case, another is all upper case, a third is a mix of
both. Each set contains a subset of names from one master list, but
not necessarily the same names as any other set. The case issue is
simple - a word processor and a quick macro on the .del file solves
that problem in a few minutes but I'm still left with the problem of
reconciling all these disparate sources. The ambiguities (how many
David Smiths can there be in 80k entries?) can be partially resolved
with address matches, etc but will still have to be flagged.

Anybody got a decent plan for attacking a mess like this? Ideally,
the end result will be something like 3 or 4 tables with a common data
column to use for linkages. So far, I've come up with several schemes
but they all appear to invoke cartesian searches and run forever so I
need some expert advice or a pointer to a tutorial source somewhere.

--
Will Honea
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 07:36 PM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com