Unix Technical Forum

Merging rows within same table

This is a discussion on Merging rows within same table within the SQL Server forums, part of the Microsoft SQL Server category; --> I need to populate a table from several sources of raw data. For a given security (stock) it is ...


Go Back   Unix Technical Forum > Database Server Software > Microsoft SQL Server > SQL Server

Register FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 02-28-2008, 06:11 PM
Jason
 
Posts: n/a
Default Merging rows within same table

I need to populate a table from several sources of raw data. For a
given security (stock) it is possible to only receive PARTS of
information from each of the different sources. It is also possible
to have conflicting data.

I am looking to make a composite picture of a given security using the
following rules:

1) The goal is to replace all NULL and Blank values with data

2) Order of precedence (from highest to lowest) is Non-NULL Non-Blank
--> Blank --> NULL

3) In the case of Non-NULL Non-Blank values that conflict (are
different) leave existing value (even if NULL or Blank)

For example:

Given the following rows:

Symbol Identity IdSource Exchange Type SubType Name
-------- ------------ --------- --------- ------- ---------
------------------
TZA 901145102 CUSIP XNYS Stock NULL TV AZTECA
TZA 901145102 NULL NULL NULL NULL

WSM 969904101 CUSIP XNYS Stock NULL WILLIAMS
SONOMA
WSM 969904101 NULL XNYS Stock NULL
WILLIAMS-SONOMA
WSM CUSIP XNYS Stock Common NULL
WSM NULL CUSIP XASE Stock NULL WILLIAMS
SONOMA

TYC 902124106 CUSIP XNYS Stock NULL TYCO
TYC 902124106 CUSIP XNYS Stock NULL TYCO
INTERNATIONAL


I am looking for the following results ('*' indicates changed value)

Symbol Identity IdSource Exchange Type SubType Name
-------- ------------ --------- --------- ------- ---------
------------------
TZA 901145102 CUSIP XNYS Stock NULL TV AZTECA
TZA 901145102 *CUSIP *XNYS *Stock NULL *TV AZTECA

WSM 969904101 CUSIP XNYS Stock *Common WILLIAMS
SONOMA
WSM 969904101 *CUSIP XNYS Stock *Common
WILLIAMS-SONOMA
WSM *969904101 CUSIP NULL Stock Common NULL
WSM *969904101 CUSIP XASE Stock *Common WILLIAMS
SONOMA

TYC 902124106 CUSIP XNYS Stock NULL TYCO
TYC 902124106 CUSIP XNYS Stock NULL TYCO
INTERNATIONAL
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 02-28-2008, 06:11 PM
David Portas
 
Posts: n/a
Default Re: Merging rows within same table


SELECT S.symbol,
COALESCE(T.xidentity,S.xidentity), COALESCE(T.idsource,S.idsource),
COALESCE(T.exchange,S.exchange), COALESCE(T.type,S.type),
COALESCE(T.subtype,S.subtype), COALESCE(T.xname,S.xname)
FROM Stocks AS S
JOIN
(SELECT symbol,
CASE COUNT(DISTINCT NULLIF(xidentity,''))
WHEN 1 THEN MAX(xidentity) END,
CASE COUNT(DISTINCT NULLIF(idsource,''))
WHEN 1 THEN MAX(idsource) END,
CASE COUNT(DISTINCT NULLIF(exchange,''))
WHEN 1 THEN MAX(exchange) END,
CASE COUNT(DISTINCT NULLIF(type,''))
WHEN 1 THEN MAX(type) END,
CASE COUNT(DISTINCT NULLIF(subtype,''))
WHEN 1 THEN MAX(subtype) END,
CASE COUNT(DISTINCT NULLIF(xname,''))
WHEN 1 THEN MAX(xname) END
FROM Stocks
GROUP BY symbol) AS T
(symbol, xidentity, idsource, exchange, type, subtype, xname)
ON S.symbol = T.symbol

Help others to help you by posting DDL for your table(s) and including
sample data as INSERT statements. That way people can test results and don't
have to guess at datatypes, constraints and keys:

CREATE TABLE Stocks (symbol CHAR(4) NOT NULL, xidentity CHAR(9) NULL,
idsource CHAR(5) NULL, exchange CHAR(4) NULL, type CHAR(5) NULL, subtype
CHAR(6) NULL, xname CHAR(20) NULL /* PRIMARY KEY ??? */)

INSERT INTO Stocks VALUES
('TZA', '901145102', 'CUSIP', 'XNYS', 'Stock', NULL, 'TV AZTECA')
INSERT INTO Stocks VALUES
('TZA', '901145102', NULL, NULL, '', NULL, NULL)
INSERT INTO Stocks VALUES
('WSM', '969904101', 'CUSIP', 'XNYS', 'Stock', NULL, 'WILLIAMS SONOMA')
INSERT INTO Stocks VALUES
('WSM', '969904101', NULL, 'XNYS', 'Stock', NULL, 'WILLIAMS-SONOMA')
INSERT INTO Stocks VALUES
('WSM', '', 'CUSIP', 'XNYS', 'Stock', 'Common', NULL)
INSERT INTO Stocks VALUES
('WSM', NULL, 'CUSIP', 'XASE', 'Stock', NULL, 'WILLIAMS SONOMA')
INSERT INTO Stocks VALUES
('TYC', '902124106', 'CUSIP', 'XNYS', 'Stock', NULL, 'TYCO')
INSERT INTO Stocks VALUES
('TYC', '902124106', 'CUSIP', 'XNYS', 'Stock', NULL, 'TYCO INTERNATIONAL')

--
David Portas
------------
Please reply only to the newsgroup
--


Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 02-28-2008, 06:13 PM
Jason
 
Posts: n/a
Default Re: Merging rows within same table

"David Portas" <REMOVE_BEFORE_REPLYING_dportas@acm.org> wrote in message news:<4uydnUP5vLe0C-CiRVn-tQ@giganews.com>...
> SELECT S.symbol,
> COALESCE(T.xidentity,S.xidentity), COALESCE(T.idsource,S.idsource),
> COALESCE(T.exchange,S.exchange), COALESCE(T.type,S.type),
> COALESCE(T.subtype,S.subtype), COALESCE(T.xname,S.xname)
> FROM Stocks AS S
> JOIN
> (SELECT symbol,
> CASE COUNT(DISTINCT NULLIF(xidentity,''))
> WHEN 1 THEN MAX(xidentity) END,
> CASE COUNT(DISTINCT NULLIF(idsource,''))
> WHEN 1 THEN MAX(idsource) END,
> CASE COUNT(DISTINCT NULLIF(exchange,''))
> WHEN 1 THEN MAX(exchange) END,
> CASE COUNT(DISTINCT NULLIF(type,''))
> WHEN 1 THEN MAX(type) END,
> CASE COUNT(DISTINCT NULLIF(subtype,''))
> WHEN 1 THEN MAX(subtype) END,
> CASE COUNT(DISTINCT NULLIF(xname,''))
> WHEN 1 THEN MAX(xname) END
> FROM Stocks
> GROUP BY symbol) AS T
> (symbol, xidentity, idsource, exchange, type, subtype, xname)
> ON S.symbol = T.symbol
>
> Help others to help you by posting DDL for your table(s) and including
> sample data as INSERT statements. That way people can test results and don't
> have to guess at datatypes, constraints and keys:
>
> CREATE TABLE Stocks (symbol CHAR(4) NOT NULL, xidentity CHAR(9) NULL,
> idsource CHAR(5) NULL, exchange CHAR(4) NULL, type CHAR(5) NULL, subtype
> CHAR(6) NULL, xname CHAR(20) NULL /* PRIMARY KEY ??? */)
>
> INSERT INTO Stocks VALUES
> ('TZA', '901145102', 'CUSIP', 'XNYS', 'Stock', NULL, 'TV AZTECA')
> INSERT INTO Stocks VALUES
> ('TZA', '901145102', NULL, NULL, '', NULL, NULL)
> INSERT INTO Stocks VALUES
> ('WSM', '969904101', 'CUSIP', 'XNYS', 'Stock', NULL, 'WILLIAMS SONOMA')
> INSERT INTO Stocks VALUES
> ('WSM', '969904101', NULL, 'XNYS', 'Stock', NULL, 'WILLIAMS-SONOMA')
> INSERT INTO Stocks VALUES
> ('WSM', '', 'CUSIP', 'XNYS', 'Stock', 'Common', NULL)
> INSERT INTO Stocks VALUES
> ('WSM', NULL, 'CUSIP', 'XASE', 'Stock', NULL, 'WILLIAMS SONOMA')
> INSERT INTO Stocks VALUES
> ('TYC', '902124106', 'CUSIP', 'XNYS', 'Stock', NULL, 'TYCO')
> INSERT INTO Stocks VALUES
> ('TYC', '902124106', 'CUSIP', 'XNYS', 'Stock', NULL, 'TYCO INTERNATIONAL')


Thank you very much David. Your solution was clean and efficient. I
thought of using a join but did not even think about that mixture of
case, nullif, max, etc.
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 02-28-2008, 06:13 PM
Jason
 
Posts: n/a
Default Re: Merging rows within same table

"David Portas" <REMOVE_BEFORE_REPLYING_dportas@acm.org> wrote in message news:<4uydnUP5vLe0C-CiRVn-tQ@giganews.com>...
> SELECT S.symbol,
> COALESCE(T.xidentity,S.xidentity), COALESCE(T.idsource,S.idsource),
> COALESCE(T.exchange,S.exchange), COALESCE(T.type,S.type),
> COALESCE(T.subtype,S.subtype), COALESCE(T.xname,S.xname)
> FROM Stocks AS S
> JOIN
> (SELECT symbol,
> CASE COUNT(DISTINCT NULLIF(xidentity,''))
> WHEN 1 THEN MAX(xidentity) END,
> CASE COUNT(DISTINCT NULLIF(idsource,''))
> WHEN 1 THEN MAX(idsource) END,
> CASE COUNT(DISTINCT NULLIF(exchange,''))
> WHEN 1 THEN MAX(exchange) END,
> CASE COUNT(DISTINCT NULLIF(type,''))
> WHEN 1 THEN MAX(type) END,
> CASE COUNT(DISTINCT NULLIF(subtype,''))
> WHEN 1 THEN MAX(subtype) END,
> CASE COUNT(DISTINCT NULLIF(xname,''))
> WHEN 1 THEN MAX(xname) END
> FROM Stocks
> GROUP BY symbol) AS T
> (symbol, xidentity, idsource, exchange, type, subtype, xname)
> ON S.symbol = T.symbol
>
> Help others to help you by posting DDL for your table(s) and including
> sample data as INSERT statements. That way people can test results and don't
> have to guess at datatypes, constraints and keys:
>
> CREATE TABLE Stocks (symbol CHAR(4) NOT NULL, xidentity CHAR(9) NULL,
> idsource CHAR(5) NULL, exchange CHAR(4) NULL, type CHAR(5) NULL, subtype
> CHAR(6) NULL, xname CHAR(20) NULL /* PRIMARY KEY ??? */)
>
> INSERT INTO Stocks VALUES
> ('TZA', '901145102', 'CUSIP', 'XNYS', 'Stock', NULL, 'TV AZTECA')
> INSERT INTO Stocks VALUES
> ('TZA', '901145102', NULL, NULL, '', NULL, NULL)
> INSERT INTO Stocks VALUES
> ('WSM', '969904101', 'CUSIP', 'XNYS', 'Stock', NULL, 'WILLIAMS SONOMA')
> INSERT INTO Stocks VALUES
> ('WSM', '969904101', NULL, 'XNYS', 'Stock', NULL, 'WILLIAMS-SONOMA')
> INSERT INTO Stocks VALUES
> ('WSM', '', 'CUSIP', 'XNYS', 'Stock', 'Common', NULL)
> INSERT INTO Stocks VALUES
> ('WSM', NULL, 'CUSIP', 'XASE', 'Stock', NULL, 'WILLIAMS SONOMA')
> INSERT INTO Stocks VALUES
> ('TYC', '902124106', 'CUSIP', 'XNYS', 'Stock', NULL, 'TYCO')
> INSERT INTO Stocks VALUES
> ('TYC', '902124106', 'CUSIP', 'XNYS', 'Stock', NULL, 'TYCO INTERNATIONAL')


I did forget to ask a question. When I run your code I receive the
following message: 'Warning: Null value is eliminated by an aggregate
or other SET operation.'

Should I care about this?
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 02-28-2008, 06:13 PM
Erland Sommarskog
 
Posts: n/a
Default Re: Merging rows within same table

Jason (JayCallas@hotmail.com) writes:
> I did forget to ask a question. When I run your code I receive the
> following message: 'Warning: Null value is eliminated by an aggregate
> or other SET operation.'
>
> Should I care about this?


No. The cause are these expressions:

COUNT(DISTINCT NULLIF(subtype,''))

You might be able to rewrite this, but I leave that to David. :-) If the
messages bother you, you can embed the query with SET ANSI_WARNINGS OFF and
SET ANSI_WARNINGS ON.



--
Erland Sommarskog, SQL Server MVP, sommar@algonet.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinf...2000/books.asp
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #6 (permalink)  
Old 02-28-2008, 06:13 PM
Jason
 
Posts: n/a
Default Re: Merging rows within same table

JayCallas@hotmail.com (Jason) wrote in message news:<f01a7c89.0310061203.24c943e@posting.google.c om>...
> "David Portas" <REMOVE_BEFORE_REPLYING_dportas@acm.org> wrote in message news:<4uydnUP5vLe0C-CiRVn-tQ@giganews.com>...
> > SELECT S.symbol,
> > COALESCE(T.xidentity,S.xidentity), COALESCE(T.idsource,S.idsource),
> > COALESCE(T.exchange,S.exchange), COALESCE(T.type,S.type),
> > COALESCE(T.subtype,S.subtype), COALESCE(T.xname,S.xname)
> > FROM Stocks AS S
> > JOIN
> > (SELECT symbol,
> > CASE COUNT(DISTINCT NULLIF(xidentity,''))
> > WHEN 1 THEN MAX(xidentity) END,
> > CASE COUNT(DISTINCT NULLIF(idsource,''))
> > WHEN 1 THEN MAX(idsource) END,
> > CASE COUNT(DISTINCT NULLIF(exchange,''))
> > WHEN 1 THEN MAX(exchange) END,
> > CASE COUNT(DISTINCT NULLIF(type,''))
> > WHEN 1 THEN MAX(type) END,
> > CASE COUNT(DISTINCT NULLIF(subtype,''))
> > WHEN 1 THEN MAX(subtype) END,
> > CASE COUNT(DISTINCT NULLIF(xname,''))
> > WHEN 1 THEN MAX(xname) END
> > FROM Stocks
> > GROUP BY symbol) AS T
> > (symbol, xidentity, idsource, exchange, type, subtype, xname)
> > ON S.symbol = T.symbol
> >


Hit another small issue. For SOME (in this case [type]) columns I need
to set a priority. If two rows have conflicting data (where COUNT > 1)
on a particular column, I want to use the value from the first row in
the set. (I would make sure that rows get inserted in the order I of
priority.) I thought of using TOP 1 somehow but cannot figure out how
to replace the MAX function with it (I know MAX is a function while
TOP is a statement).
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #7 (permalink)  
Old 02-28-2008, 06:14 PM
David Portas
 
Posts: n/a
Default Re: Merging rows within same table

> on a particular column, I want to use the value from the first row in
> the set. (I would make sure that rows get inserted in the order I of
> priority.) I thought of using TOP 1 somehow but cannot figure out how


A table has no inherent ordering so you will have to add a column to
identify the sequence. Here's an example using Seq_No as a sequence number:

CREATE TABLE Stocks (symbol CHAR(4) NOT NULL, xidentity CHAR(9) NULL,
idsource CHAR(5) NULL, exchange CHAR(4) NULL, type CHAR(5) NULL, subtype
CHAR(6) NULL, xname CHAR(20) NULL, seq_no INTEGER NOT NULL UNIQUE /* PRIMARY
KEY ??? */)

I guess that you actually want the to take the value from the first row
which has a *populated* value for the column:

SELECT S.symbol,
COALESCE(T.xidentity,S.xidentity), COALESCE(T.idsource,S.idsource),
COALESCE(T.exchange,S.exchange), M.type,
COALESCE(T.subtype,S.subtype), COALESCE(T.xname,S.xname)
FROM Stocks AS S
JOIN
(SELECT symbol,
CASE COUNT(DISTINCT NULLIF(xidentity,''))
WHEN 1 THEN MAX(xidentity) END,
CASE COUNT(DISTINCT NULLIF(idsource,''))
WHEN 1 THEN MAX(idsource) END,
CASE COUNT(DISTINCT NULLIF(exchange,''))
WHEN 1 THEN MAX(exchange) END,
CASE COUNT(DISTINCT NULLIF(subtype,''))
WHEN 1 THEN MAX(subtype) END,
CASE COUNT(DISTINCT NULLIF(xname,''))
WHEN 1 THEN MAX(xname) END,
MIN(CASE WHEN type>'' THEN seq_no END)
FROM Stocks
GROUP BY symbol) AS T
(symbol, xidentity, idsource, exchange, subtype, xname, seq_no)
ON S.symbol = T.symbol
LEFT JOIN Stocks AS M
ON T.seq_no = M.seq_no

--
David Portas
------------
Please reply only to the newsgroup
--


Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 03:00 PM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com