Unix Technical Forum

How to Parse Sentences into Words

This is a discussion on How to Parse Sentences into Words within the SQL Server forums, part of the Microsoft SQL Server category; --> Hi all, I have a table of text and associated data. I want to break apart the text into ...


Go Back   Unix Technical Forum > Database Server Software > Microsoft SQL Server > SQL Server

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 02-29-2008, 03:34 AM
HumanJHawkins
 
Posts: n/a
Default How to Parse Sentences into Words

Hi all,

I have a table of text and associated data. I want to break apart the text
into individual words, yet retain the data in other columns. For example:

Sentence: Chapter:
--------------------------
I like cats. 1
Joe likes dogs. 2

Should become:

Word: Chapter:
--------------------------
I 1
like 1
cats 1
Joe 2
likes 2
dogs. 2

Are there built-in SQL parsing functions, If not, what text handling
features would be most useful for building them?

Thanks!


Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 02-29-2008, 03:34 AM
Simon Hayes
 
Posts: n/a
Default Re: How to Parse Sentences into Words


"HumanJHawkins" <JHawkins@HumanitiesSoftware.Com> wrote in message
news:Mlzhc.3777$eZ5.2539@newsread1.news.pas.earthl ink.net...
> Hi all,
>
> I have a table of text and associated data. I want to break apart the text
> into individual words, yet retain the data in other columns. For example:
>
> Sentence: Chapter:
> --------------------------
> I like cats. 1
> Joe likes dogs. 2
>
> Should become:
>
> Word: Chapter:
> --------------------------
> I 1
> like 1
> cats 1
> Joe 2
> likes 2
> dogs. 2
>
> Are there built-in SQL parsing functions, If not, what text handling
> features would be most useful for building them?
>
> Thanks!
>
>


There's no built-in-in SPLIT() function or whatever, but this may help:

http://www.sommarskog.se/arrays-in-sql.html#iterative

In general, the string functions are limited in SQL, and it's often easier
to dump the data to a flat file and use an external script to do text
manipulation, especially if you need advanced features such as regular
expression support.

Simon


Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 02-29-2008, 03:35 AM
Joe Melville
 
Posts: n/a
Default Re: How to Parse Sentences into Words

"HumanJHawkins" <JHawkins@HumanitiesSoftware.Com> wrote in message news:<Mlzhc.3777$eZ5.2539@newsread1.news.pas.earth link.net>...
> Hi all,
>
> I have a table of text and associated data. I want to break apart the text
> into individual words, yet retain the data in other columns. For example:
>
> Sentence: Chapter:
> --------------------------
> I like cats. 1
> Joe likes dogs. 2
>
> Should become:
>
> Word: Chapter:
> --------------------------
> I 1
> like 1
> cats 1
> Joe 2
> likes 2
> dogs. 2
>
> Are there built-in SQL parsing functions, If not, what text handling
> features would be most useful for building them?
>
> Thanks!


Here's a recursive solution:
CREATE PROCEDURE Split (@Sentence AS VARCHAR(1024), @CHAPTER AS
INTEGER) AS
BEGIN
DECLARE @Position AS INTEGER, @Str AS VARCHAR(50)

SET @Position = CHARINDEX(' ', @Sentence)

IF (@Position = 0)
INSERT INTO Words (Word, Chapter) VALUES (@Sentence, @Chapter)
ELSE
BEGIN
SET @Str = LEFT(@Sentence, @Position-1)
INSERT INTO Words (Word, Chapter) VALUES (@Str, @Chapter)
SET @Str = RIGHT(@Sentence, LEN(@Sentence) - @Position)
EXEC Split @Str, @Chapter
END
END

This solution only works for sentences of 32 words or fewer since that
is the maximum nesting depth for MS-SQLServer.

Joe Melville
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 02-29-2008, 03:35 AM
HumanJHawkins
 
Posts: n/a
Default Re: How to Parse Sentences into Words

Wow. This looks like a great way of handling it. If anything is longer than
32 words, I should be able to find a way of stopping at 32 and running
another pass later. I'll post the final code if it adds anything to what you
have written here.

Thanks!

"Joe Melville" <jemelville@yahoo.com> wrote in message
news:2b9fdb8c.0404220811.42088e22@posting.google.c om...
> "HumanJHawkins" <JHawkins@HumanitiesSoftware.Com> wrote in message

news:<Mlzhc.3777$eZ5.2539@newsread1.news.pas.earth link.net>...
> > Hi all,
> >
> > I have a table of text and associated data. I want to break apart the

text
> > into individual words, yet retain the data in other columns. For

example:
> >
> > Sentence: Chapter:
> > --------------------------
> > I like cats. 1
> > Joe likes dogs. 2
> >
> > Should become:
> >
> > Word: Chapter:
> > --------------------------
> > I 1
> > like 1
> > cats 1
> > Joe 2
> > likes 2
> > dogs. 2
> >
> > Are there built-in SQL parsing functions, If not, what text handling
> > features would be most useful for building them?
> >
> > Thanks!

>
> Here's a recursive solution:
> CREATE PROCEDURE Split (@Sentence AS VARCHAR(1024), @CHAPTER AS
> INTEGER) AS
> BEGIN
> DECLARE @Position AS INTEGER, @Str AS VARCHAR(50)
>
> SET @Position = CHARINDEX(' ', @Sentence)
>
> IF (@Position = 0)
> INSERT INTO Words (Word, Chapter) VALUES (@Sentence, @Chapter)
> ELSE
> BEGIN
> SET @Str = LEFT(@Sentence, @Position-1)
> INSERT INTO Words (Word, Chapter) VALUES (@Str, @Chapter)
> SET @Str = RIGHT(@Sentence, LEN(@Sentence) - @Position)
> EXEC Split @Str, @Chapter
> END
> END
>
> This solution only works for sentences of 32 words or fewer since that
> is the maximum nesting depth for MS-SQLServer.
>
> Joe Melville



Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 02-29-2008, 03:35 AM
RSMEINER
 
Posts: n/a
Default Re: How to Parse Sentences into Words

>Wow. This looks like a great way of handling it. If anything is longer than
>32 words, I should be able to find a way of stopping at 32 and running
>another pass later. I'll post the final code if it adds anything to what you
>have written here.
>
>Thanks!


drop table tmpTable
Go
create Table tmpTable
(WordString VARCHAR(500))
Insert into tmpTable (WordString) Values ('Test of some words')
Insert into tmpTable (WordString) Values ('Another Test of some words')
Go
Drop Table tmpwords
Go
Create Table tmpWords
(Words VARCHAR(500))

BEGIN TRAN


DECLARE @PCStatus Integer
Declare @TmpWords VARCHAR(500)
Declare @newWord Varchar(500)

/* cursor to read each certificates ssn & certificateID */
DECLARE PCcursor CURSOR FOR
SELECT WordString
FROM TmpTable
OPEN PCcursor
/* read the 1st row */
SET @PCStatus = 0
While @PCStatus = 0
Begin
FETCH NEXT FROM PCcursor INTO @TmpWords
SET @PCStatus = @@Fetch_status
IF @PCstatus <> 0
break

Set @TmpWords = @tmpWords + ' '
WHILE CHARINDEX(' ', @TmpWords) > 0
Begin
SET @NewWord = LEFT(@TmpWords, CHARINDEX(' ', @TmpWords))
SET @TmpWords = SUBSTRING(@TmpWords,CHARINDEX(' ', @TmpWords)+1,
len(@TmpWOrds))
insert into tmpWords (words) values (@NewWord)
End
End
Close PcCursor
commit tran
deallocate PCcursor

select * from tmpwords

Randy
http://members.aol.com/rsmeiner
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 09:06 PM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com