This is a discussion on How to Parse Sentences into Words within the SQL Server forums, part of the Microsoft SQL Server category; --> Hi all, I have a table of text and associated data. I want to break apart the text into ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| Hi all, I have a table of text and associated data. I want to break apart the text into individual words, yet retain the data in other columns. For example: Sentence: Chapter: -------------------------- I like cats. 1 Joe likes dogs. 2 Should become: Word: Chapter: -------------------------- I 1 like 1 cats 1 Joe 2 likes 2 dogs. 2 Are there built-in SQL parsing functions, If not, what text handling features would be most useful for building them? Thanks! |
| |||
| "HumanJHawkins" <JHawkins@HumanitiesSoftware.Com> wrote in message news:Mlzhc.3777$eZ5.2539@newsread1.news.pas.earthl ink.net... > Hi all, > > I have a table of text and associated data. I want to break apart the text > into individual words, yet retain the data in other columns. For example: > > Sentence: Chapter: > -------------------------- > I like cats. 1 > Joe likes dogs. 2 > > Should become: > > Word: Chapter: > -------------------------- > I 1 > like 1 > cats 1 > Joe 2 > likes 2 > dogs. 2 > > Are there built-in SQL parsing functions, If not, what text handling > features would be most useful for building them? > > Thanks! > > There's no built-in-in SPLIT() function or whatever, but this may help: http://www.sommarskog.se/arrays-in-sql.html#iterative In general, the string functions are limited in SQL, and it's often easier to dump the data to a flat file and use an external script to do text manipulation, especially if you need advanced features such as regular expression support. Simon |
| |||
| "HumanJHawkins" <JHawkins@HumanitiesSoftware.Com> wrote in message news:<Mlzhc.3777$eZ5.2539@newsread1.news.pas.earth link.net>... > Hi all, > > I have a table of text and associated data. I want to break apart the text > into individual words, yet retain the data in other columns. For example: > > Sentence: Chapter: > -------------------------- > I like cats. 1 > Joe likes dogs. 2 > > Should become: > > Word: Chapter: > -------------------------- > I 1 > like 1 > cats 1 > Joe 2 > likes 2 > dogs. 2 > > Are there built-in SQL parsing functions, If not, what text handling > features would be most useful for building them? > > Thanks! Here's a recursive solution: CREATE PROCEDURE Split (@Sentence AS VARCHAR(1024), @CHAPTER AS INTEGER) AS BEGIN DECLARE @Position AS INTEGER, @Str AS VARCHAR(50) SET @Position = CHARINDEX(' ', @Sentence) IF (@Position = 0) INSERT INTO Words (Word, Chapter) VALUES (@Sentence, @Chapter) ELSE BEGIN SET @Str = LEFT(@Sentence, @Position-1) INSERT INTO Words (Word, Chapter) VALUES (@Str, @Chapter) SET @Str = RIGHT(@Sentence, LEN(@Sentence) - @Position) EXEC Split @Str, @Chapter END END This solution only works for sentences of 32 words or fewer since that is the maximum nesting depth for MS-SQLServer. Joe Melville |
| |||
| Wow. This looks like a great way of handling it. If anything is longer than 32 words, I should be able to find a way of stopping at 32 and running another pass later. I'll post the final code if it adds anything to what you have written here. Thanks! "Joe Melville" <jemelville@yahoo.com> wrote in message news:2b9fdb8c.0404220811.42088e22@posting.google.c om... > "HumanJHawkins" <JHawkins@HumanitiesSoftware.Com> wrote in message news:<Mlzhc.3777$eZ5.2539@newsread1.news.pas.earth link.net>... > > Hi all, > > > > I have a table of text and associated data. I want to break apart the text > > into individual words, yet retain the data in other columns. For example: > > > > Sentence: Chapter: > > -------------------------- > > I like cats. 1 > > Joe likes dogs. 2 > > > > Should become: > > > > Word: Chapter: > > -------------------------- > > I 1 > > like 1 > > cats 1 > > Joe 2 > > likes 2 > > dogs. 2 > > > > Are there built-in SQL parsing functions, If not, what text handling > > features would be most useful for building them? > > > > Thanks! > > Here's a recursive solution: > CREATE PROCEDURE Split (@Sentence AS VARCHAR(1024), @CHAPTER AS > INTEGER) AS > BEGIN > DECLARE @Position AS INTEGER, @Str AS VARCHAR(50) > > SET @Position = CHARINDEX(' ', @Sentence) > > IF (@Position = 0) > INSERT INTO Words (Word, Chapter) VALUES (@Sentence, @Chapter) > ELSE > BEGIN > SET @Str = LEFT(@Sentence, @Position-1) > INSERT INTO Words (Word, Chapter) VALUES (@Str, @Chapter) > SET @Str = RIGHT(@Sentence, LEN(@Sentence) - @Position) > EXEC Split @Str, @Chapter > END > END > > This solution only works for sentences of 32 words or fewer since that > is the maximum nesting depth for MS-SQLServer. > > Joe Melville |
| ||||
| >Wow. This looks like a great way of handling it. If anything is longer than >32 words, I should be able to find a way of stopping at 32 and running >another pass later. I'll post the final code if it adds anything to what you >have written here. > >Thanks! drop table tmpTable Go create Table tmpTable (WordString VARCHAR(500)) Insert into tmpTable (WordString) Values ('Test of some words') Insert into tmpTable (WordString) Values ('Another Test of some words') Go Drop Table tmpwords Go Create Table tmpWords (Words VARCHAR(500)) BEGIN TRAN DECLARE @PCStatus Integer Declare @TmpWords VARCHAR(500) Declare @newWord Varchar(500) /* cursor to read each certificates ssn & certificateID */ DECLARE PCcursor CURSOR FOR SELECT WordString FROM TmpTable OPEN PCcursor /* read the 1st row */ SET @PCStatus = 0 While @PCStatus = 0 Begin FETCH NEXT FROM PCcursor INTO @TmpWords SET @PCStatus = @@Fetch_status IF @PCstatus <> 0 break Set @TmpWords = @tmpWords + ' ' WHILE CHARINDEX(' ', @TmpWords) > 0 Begin SET @NewWord = LEFT(@TmpWords, CHARINDEX(' ', @TmpWords)) SET @TmpWords = SUBSTRING(@TmpWords,CHARINDEX(' ', @TmpWords)+1, len(@TmpWOrds)) insert into tmpWords (words) values (@NewWord) End End Close PcCursor commit tran deallocate PCcursor select * from tmpwords Randy http://members.aol.com/rsmeiner |
| Thread Tools | |
| Display Modes | |
|
|