vBulletin Search Engine Optimization
| |||||||
| Register | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| Hello! I have data exported from a Reference Manager 11, and need to import it into and SQL database. Each record has different number of fields. It is used to cite journal articles. (more about the format at http://www.adeptscience.co.uk/kb/article/A626) The format is very strange: TY - RPRT A1 - Esparza,J. T1 - Report of a WHO workshop on the measurement and significance of neutralizing antibody to HIV and SIV, London, 3-5 October 1988 Y1 - 1990 VL - 4 SP - 269 EP - 275 RP - Not In File CY - San Francisco CA PB - UC Berkeley KW - HIV KW - SIV KW - AIDS T3 - World Health Organisation Global Programme on AIDS ER - TY - CHAP A1 - Franks,L.M. T1 - Preface by an AIDS Victim Y1 - 1991 VL - 10 SP - vii EP - viii RP - Not In File T2 - Cancer, HIV and AIDS. CY - Berkeley CA PB - Berkeley Press KW - HIV KW - AIDS M1 - 1 M2 - 1 SN - 0-679-40110-5 ER - TY - CASE A1 - Cary,A. A1 - Friedenrich,W. T1 - Redman v. State of California Y1 - 1988/10/7 VL - 201 IS - 32 SP - 220 EP - 240 RP - Not In File CY - ATLA Law Reporter PB - San Diego County 45th Judicial District, California KW - AIDS KW - litigation KW - AIDS litigation KW - rape U1 - ISSN 0456-8125 N1 - Raped inmate can press case against officials for contracting AIDS ER - It looks like some of the columns are separted by CR but part of the same colum. For instance 'KW' is seen multiple times per record but should be one field called 'KW'. Any idea how I would import this to SQL Server with the DTS? |
| |||
| "Tmuld" <tmuldoon@spliced.com> wrote in message news:1113921439.658252.91360@g14g2000cwa.googlegro ups.com... > Hello! > > I have data exported from a Reference Manager 11, and need to import it > into and SQL database. > > Each record has different number of fields. It is used to cite journal > articles. > (more about the format at > http://www.adeptscience.co.uk/kb/article/A626) > > The format is very strange: <snip> > It looks like some of the columns are separted by CR but part of the > same colum. For instance 'KW' is seen multiple times per record but > should be one field called 'KW'. > > Any idea how I would import this to SQL Server with the DTS? The format looks rather awkward to handle in DTS transformations, so you'll probably need some other solution. One would be to load the data into a staging table, then use stored procedures to clean it up and INSERT it into the destination tables. A better option is probably to parse the data outside the database, using C#/Perl or whatever your preferred language is, then load it. You might be able to transform it into XML, for example, then use OPENXML or SQLXML Bulk Load to import it: http://www.sqlxml.org/faqs.aspx?faq=13 Whatever solution you decide on, you could still use a DTS package to manage all the steps, whether they are stored procedures or external programs. Simon |
| ||||
| I think that your best bet would be to loop through the file in a VB application which handles the logic. Usually, files that hold data in a "record" format have the columns going across the file and rows going down. They also usually have a set number of columns. While this format isn't unheard of (I've worked with similar formats in the past) most of the standard tools for importing files (BCP, DTS, BULK INSERT) are not designed to deal with it. BCP might be able to get around it if you had a fixed number of columns, but the variable number of columns would present a problem there. If you have no skill in a programming language like VB and you don't have access to a resource who has those skills then you could BCP the file into a staging table with an IDENTITY column. Make sure that the ordering of the records in the file is preserved in the rows of the table. From there you can loop through the rows sequentially with a cursor. This is one of those VERY rare instances where I think a cursor is called for. As I said though, I think you're better off using VB or something similar if you can. By looping through the records you can create a new row in your base table every time that you hit a new record tag in the file then you can create rows in children tables for each repeatable column type. Good luck, -Tom. |