Unix Technical Forum

identify duplicate addresses?

This is a discussion on identify duplicate addresses? within the MySQL forums, part of the Database Server Software category; --> Hello all - I'm working on a survey system where a user can register to win a prize. Since ...


Go Back   Unix Technical Forum > Database Server Software > MySQL

Register FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 02-28-2008, 10:30 AM
lawpoop@gmail.com
 
Posts: n/a
Default identify duplicate addresses?

Hello all -

I'm working on a survey system where a user can register to win a
prize. Since we don't want people gaming the system, we want to remove
any duplicate entries made by the same person to increase their odds.

My thought is that the best way to do this is to look at the street
mailing address, since we are mailing a certificate to the winner.
( Of course this rules out two people living at the same address, but
we can only allow one entry per address in the official rules). I
think by focusing on the number and street name, it might be pretty
good at identifying duplicate addresses.

For instance, these two are the same address
100 West Oak Street
100 W. Oak St

If I take out the 'West' and 'Street', then I have a numeric part,
'100', and also 'Oak'. ( this does pose a problem for two people at
two different dwellings, one at 100 Oak St and 100-1/2 Oak St, but we
might want to consider that a duplicate address )

I could compose a list of all street address extras, such as 'East',
'West', 'E', 'W', '.', 'Street', 'St', etc. Once those are removed, I
should have a numeric address and the street name.

Then I have to make equivalences amongst the different ways of writing
ordinal numbers -- e.g. 'first' == 'First' == '1st' == '1 st'.

Does this make sense? Is there anything I've overlooked?
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 02-28-2008, 10:30 AM
Jerry Stuckle
 
Posts: n/a
Default Re: identify duplicate addresses?

Rik Wasmus wrote:
> On Mon, 19 Nov 2007 16:47:36 +0100, <lawpoop@gmail.com> wrote:
>
>> Hello all -
>>
>> I'm working on a survey system where a user can register to win a
>> prize. Since we don't want people gaming the system, we want to remove
>> any duplicate entries made by the same person to increase their odds.
>>
>> My thought is that the best way to do this is to look at the street
>> mailing address, since we are mailing a certificate to the winner.
>> ( Of course this rules out two people living at the same address, but
>> we can only allow one entry per address in the official rules). I
>> think by focusing on the number and street name, it might be pretty
>> good at identifying duplicate addresses.
>>
>> For instance, these two are the same address
>> 100 West Oak Street
>> 100 W. Oak St

>
>> Does this make sense? Is there anything I've overlooked?

>
> Euhm, in Holland a _postal code_ + house number is unique. It this not
> true for the US/UK/wherever you're from?
>


Rik,

Nope. Here in the U.S. a 9 digit postal code will identify a few square
block area, which may have the same house number on different streets.

What about apartments?

> If you can spare the money, validating the address against a database
> for illegal '-1','appt. c', etc. additions is often also possible. Such
> a service/database can usually be provided by the/a postal company or
> other third parties. The whole database usually doesn't come cheap
> though, for a low traffic site you might be better off with a 'check per
> address' deal.



--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstucklex@attglobal.net
==================

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 02-28-2008, 10:30 AM
Jerry Stuckle
 
Posts: n/a
Default Re: identify duplicate addresses?

lawpoop@gmail.com wrote:
> On Nov 19, 12:24 pm, "Rik Wasmus" <luiheidsgoe...@hotmail.com> wrote:
>
>> Euhm, in Holland a _postal code_ + house number is unique. It this not
>> true for the US/UK/wherever you're from?

>
> I'm from the US. With the old postal code, that's not true. However,
> the USPS introduced extra zip code digits that do make a unique
> address, but not everyone is aware of them. I think that the extended
> zip code by itself identifies a unique address. But I can't rely on my
> users knowing it.
>
>
>


No, it does not. It identifies an area of (at most) a few blocks.

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstucklex@attglobal.net
==================

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 02-28-2008, 10:30 AM
Peter H. Coffin
 
Posts: n/a
Default Re: identify duplicate addresses?

On Mon, 19 Nov 2007 10:37:46 -0800 (PST), lawpoop@gmail.com wrote:
> On Nov 19, 12:15 pm, Jerry Stuckle <jstuck...@attglobal.net> wrote:
>
>>
>> And what do you do with something like 123 E St. NW?

>
> I'm not from a large city, but where I grew up, we didn't have street
> names that were cardinal directions. We had street names that were
> ordinals, such as "East 4th St.", but no "North Street"s. So, is this
> a realistic example? Are there actually East streets in American
> cities?
>
> In the example you gave above, the street name is "East", right? So I
> would have to first strip out the street type ( street, avenue, etc)
> 123 E St. NW -> 123 E NW
>
> Then, if the street has two cardinal directions in the name, I'll take
> the first as the street name, and then treat the second as extra
> information. E.g. Decide that the street name is "East", and not
> "NW".
> 123 E NW -> 123 East
>
> This reasoning assumes that street names never are said to be like
> "123 NW East St." -- which may be a faulty assumption. Do you know of
> such a street?


Certainly. There's one about a mile from my house. 43°00'39"N 88°13'36"W
is where the intersection of South Street and North East Avenue is.
South East Avenue begins about a half-mile further south. There's also a
West Avenue about a half-mile from East Avenue, but doesn't run as far
north as South Street. There's also a North Street that comes in East
and West varieties at 43°00'47"N 88°14'09" W, but the stree itself runs
NE and SW, and it's only contrasting convention that prevents it from
being labelled North North Street and South North Street.

Next, take a look at the fine example at 41°53'13" N 87°37'41" W. That's
the division between East Wacker Drive and West Wacker Drive. Now, over
at 41°52'55"N 87°38'13"W, there's the division between North Wacker
Drive and South Wacker Drive. So, 120 Wacker Drive without the
direction could be any of four distinct addresses. To further
complicate, Wacker also has a subterrenean aspect: there's also Lower
Wacker Drive that runs UNDERNEATH Wacker Drive from part of South Wacker
Drive all the way through North and West Wacker Drives and some distance
into East Wacker Drive, meaning that there's actually *eight* places
that could be "120 Wacker Drive". Quantum probability indicates there's
also a Charm Wacker Drive and Strange Wacker Drive that we haven't the
math to adequately draw maps of....

--
Time is a great teacher, but unfortunately it kills all its pupils.
-- Hector Berlioz
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 02-28-2008, 10:30 AM
Rik Wasmus
 
Posts: n/a
Default Re: identify duplicate addresses?

On Mon, 19 Nov 2007 20:51:31 +0100, Jerry Stuckle
<jstucklex@attglobal.net> wrote:
> Rik Wasmus wrote:
>> On Mon, 19 Nov 2007 16:47:36 +0100, <lawpoop@gmail.com> wrote:
>>
>>> Hello all -
>>>
>>> I'm working on a survey system where a user can register to win a
>>> prize. Since we don't want people gaming the system, we want to remove
>>> any duplicate entries made by the same person to increase their odds.
>>>
>>> My thought is that the best way to do this is to look at the street
>>> mailing address, since we are mailing a certificate to the winner.
>>> ( Of course this rules out two people living at the same address, but
>>> we can only allow one entry per address in the official rules). I
>>> think by focusing on the number and street name, it might be pretty
>>> good at identifying duplicate addresses.
>>>
>>> For instance, these two are the same address
>>> 100 West Oak Street
>>> 100 W. Oak St

>>
>>> Does this make sense? Is there anything I've overlooked?

>> Euhm, in Holland a _postal code_ + house number is unique. It this not
>> true for the US/UK/wherever you're from?

>
> Nope. Here in the U.S. a 9 digit postal code will identify a few square
> block area, which may have the same house number on different streets.


Which is a shame. Then again, it's a lot bigger, so it'd have to be quite
long to accomplish that. Here 4 numbers followed by 2 alpha's (0000AA) is
all it takes. I'd assume a street is unique in a postal code though :P

> What about apartments?


For convenience, I meant that as 'part of a housenumber'. Official
addresses like appartment numbers are actually stored with the postal
company (/&local government) also, so validating those is still possible
(naturally, more expensive). Making something up/splitting a house is
possible, but if you don't explicitly ask for another address it's up to
the good nature of the postal worker (it's what they're known for...)
wether this 'illegal' address is honorated. Not a problem if it's 'just a
few', but people tend to get a little pissed if their route is actually a
couple of dozens more then their employer has in the books.

It's total bureaucracy here, but it does come with it's advantages...
--
Rik Wasmus
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #6 (permalink)  
Old 02-28-2008, 10:30 AM
lawpoop@gmail.com
 
Posts: n/a
Default Re: identify duplicate addresses?

On Nov 19, 1:49 pm, Jerry Stuckle <jstuck...@attglobal.net> wrote:

> No, in this case it is E street. There is also F, G, H, and so on. And
> depending on which quadrant they're in, it can be NE, NW, SE or SW.


Well, the pattern might be that alphabet streets only have spelled out
preceding directions, or that the direction follows the street type

E.g. where the street name is "E"

East E st.
or
E St. NW

but not

NW E St.
nor
E E St.

>
> And they're all in Washington, DC.
>
> And yes, there are streets named East, West, North and South. Raleigh,
> NC has them.
>
> > In the example you gave above, the street name is "East", right? So I
> > would have to first strip out the street type ( street, avenue, etc)
> > 123 E St. NW -> 123 E NW

>
> > Then, if the street has two cardinal directions in the name, I'll take
> > the first as the street name, and then treat the second as extra
> > information. E.g. Decide that the street name is "East", and not
> > "NW".
> > 123 E NW -> 123 East

>
> > This reasoning assumes that street names never are said to be like
> > "123 NW East St." -- which may be a faulty assumption. Do you know of
> > such a street?

>
> But that could be true, also. Not off hand, but I know of a lot of
> cities which have NW before the street name.


I think there might be a pattern that I can tease out. I'll bet that
if the cardinal direction precedes the street name, then the street
name is not a cardinal direction. Actually, more generally, I'll bet
that if the street name is a cardinal direction, it will be spelled
out; otherwise it's a letter name.

In other words:
123 E NW -- street name is "E"
123 North East St. -- street name is "East"
123 N E street -- street name is "E"

I'd guess that anyone who lives on a street that has a cardinal
direction for the name never abbreviates the name of the street. For
example, if they live on East street, and there is an east East street
and a west East street, they might write

E East St or
W East St

but never

E E St nor
W E St

Another way to go about it is to assume that their is only one
cardinal direction in the street name. For instance,
E St. NW is
E Street NorthWest

while
N E St. is
North E St.
Actually, it looks like I could simply use the "Cardinal direction
street names are never abbreviated " rule to get the same output.

Are there streets that have two cardinal directions in the address?
Are any of them named after letters or cardinal directions? Would I
ever get

East West St. NW

or East E St. SE?



Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #7 (permalink)  
Old 02-28-2008, 10:30 AM
Peter H. Coffin
 
Posts: n/a
Default Re: identify duplicate addresses?

On Mon, 19 Nov 2007 13:20:19 -0800 (PST), lawpoop@gmail.com wrote:
> On Nov 19, 1:49 pm, Jerry Stuckle <jstuck...@attglobal.net> wrote:
>
>> No, in this case it is E street. There is also F, G, H, and so on. And
>> depending on which quadrant they're in, it can be NE, NW, SE or SW.

>
> Well, the pattern might be that alphabet streets only have spelled out
> preceding directions, or that the direction follows the street type
>
> E.g. where the street name is "E"
>
> East E st.
> or
> E St. NW
>
> but not
>
> NW E St.
> nor
> E E St.
>
>>
>> And they're all in Washington, DC.


1905 E ST SE
WASHINGTON DC 20003-2593

is a standardized address per USPS standards.


--
Time is a great teacher, but unfortunately it kills all its pupils.
-- Hector Berlioz
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #8 (permalink)  
Old 02-28-2008, 10:30 AM
Jerry Stuckle
 
Posts: n/a
Default Re: identify duplicate addresses?

Rik Wasmus wrote:
> On Mon, 19 Nov 2007 20:51:31 +0100, Jerry Stuckle
> <jstucklex@attglobal.net> wrote:
>> Rik Wasmus wrote:
>>> On Mon, 19 Nov 2007 16:47:36 +0100, <lawpoop@gmail.com> wrote:
>>>
>>>> Hello all -
>>>>
>>>> I'm working on a survey system where a user can register to win a
>>>> prize. Since we don't want people gaming the system, we want to remove
>>>> any duplicate entries made by the same person to increase their odds.
>>>>
>>>> My thought is that the best way to do this is to look at the street
>>>> mailing address, since we are mailing a certificate to the winner.
>>>> ( Of course this rules out two people living at the same address, but
>>>> we can only allow one entry per address in the official rules). I
>>>> think by focusing on the number and street name, it might be pretty
>>>> good at identifying duplicate addresses.
>>>>
>>>> For instance, these two are the same address
>>>> 100 West Oak Street
>>>> 100 W. Oak St
>>>
>>>> Does this make sense? Is there anything I've overlooked?
>>> Euhm, in Holland a _postal code_ + house number is unique. It this
>>> not true for the US/UK/wherever you're from?

>>
>> Nope. Here in the U.S. a 9 digit postal code will identify a few
>> square block area, which may have the same house number on different
>> streets.

>
> Which is a shame. Then again, it's a lot bigger, so it'd have to be
> quite long to accomplish that. Here 4 numbers followed by 2 alpha's
> (0000AA) is all it takes. I'd assume a street is unique in a postal code
> though :P
>


That would be a nice idea, but we have a lot of streets only 1/2 block
long with a circle and about a half-dozen houses. It might be able to
be done, but that would take intelligence. And this is a
quasi-governmental entity, after all. :-)

>> What about apartments?

>
> For convenience, I meant that as 'part of a housenumber'. Official
> addresses like appartment numbers are actually stored with the postal
> company (/&local government) also, so validating those is still possible
> (naturally, more expensive). Making something up/splitting a house is
> possible, but if you don't explicitly ask for another address it's up to
> the good nature of the postal worker (it's what they're known for...)
> wether this 'illegal' address is honorated. Not a problem if it's 'just
> a few', but people tend to get a little pissed if their route is
> actually a couple of dozens more then their employer has in the books.
>
> It's total bureaucracy here, but it does come with it's advantages...



--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstucklex@attglobal.net
==================

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #9 (permalink)  
Old 02-28-2008, 10:30 AM
strawberry
 
Posts: n/a
Default Re: identify duplicate addresses?

On Nov 19, 7:35 pm, "Peter H. Coffin" <hell...@ninehells.com> wrote:
> On Mon, 19 Nov 2007 10:37:46 -0800 (PST), lawp...@gmail.com wrote:
> > On Nov 19, 12:15 pm, Jerry Stuckle <jstuck...@attglobal.net> wrote:

>
> >> And what do you do with something like 123 E St. NW?

>
> > I'm not from a large city, but where I grew up, we didn't have street
> > names that were cardinal directions. We had street names that were
> > ordinals, such as "East 4th St.", but no "North Street"s. So, is this
> > a realistic example? Are there actually East streets in American
> > cities?

>
> > In the example you gave above, the street name is "East", right? So I
> > would have to first strip out the street type ( street, avenue, etc)
> > 123 E St. NW -> 123 E NW

>
> > Then, if the street has two cardinal directions in the name, I'll take
> > the first as the street name, and then treat the second as extra
> > information. E.g. Decide that the street name is "East", and not
> > "NW".
> > 123 E NW -> 123 East

>
> > This reasoning assumes that street names never are said to be like
> > "123 NW East St." -- which may be a faulty assumption. Do you know of
> > such a street?

>
> Certainly. There's one about a mile from my house. 43°00'39"N 88°13'36"W
> is where the intersection of South Street and North East Avenue is.
> South East Avenue begins about a half-mile further south. There's also a
> West Avenue about a half-mile from East Avenue, but doesn't run as far
> north as South Street. There's also a North Street that comes in East
> and West varieties at 43°00'47"N 88°14'09" W, but the stree itself runs
> NE and SW, and it's only contrasting convention that prevents it from
> being labelled North North Street and South North Street.
>
> Next, take a look at the fine example at 41°53'13" N 87°37'41" W. That's
> the division between East Wacker Drive and West Wacker Drive. Now, over
> at 41°52'55"N 87°38'13"W, there's the division between North Wacker
> Drive and South Wacker Drive. So, 120 Wacker Drive without the
> direction could be any of four distinct addresses. To further
> complicate, Wacker also has a subterrenean aspect: there's also Lower
> Wacker Drive that runs UNDERNEATH Wacker Drive from part of South Wacker
> Drive all the way through North and West Wacker Drives and some distance
> into East Wacker Drive, meaning that there's actually *eight* places
> that could be "120 Wacker Drive". Quantum probability indicates there's
> also a Charm Wacker Drive and Strange Wacker Drive that we haven't the
> math to adequately draw maps of....
>
> --
> Time is a great teacher, but unfortunately it kills all its pupils.
> -- Hector Berlioz


Did no one think that this might get confusing?
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #10 (permalink)  
Old 02-28-2008, 10:30 AM
Peter H. Coffin
 
Posts: n/a
Default Re: identify duplicate addresses?

On Mon, 19 Nov 2007 15:13:20 -0800 (PST), strawberry wrote:
>> Next, take a look at the fine example at 41°53'13" N 87°37'41" W. That's
>> the division between East Wacker Drive and West Wacker Drive. Now, over
>> at 41°52'55"N 87°38'13"W, there's the division between North Wacker
>> Drive and South Wacker Drive. So, 120 Wacker Drive without the
>> direction could be any of four distinct addresses. To further
>> complicate, Wacker also has a subterrenean aspect: there's also Lower
>> Wacker Drive that runs UNDERNEATH Wacker Drive from part of South Wacker
>> Drive all the way through North and West Wacker Drives and some distance
>> into East Wacker Drive, meaning that there's actually *eight* places
>> that could be "120 Wacker Drive". Quantum probability indicates there's
>> also a Charm Wacker Drive and Strange Wacker Drive that we haven't the
>> math to adequately draw maps of....

>
> Did no one think that this might get confusing?


It just growed that way.

--
100. Finally, to keep my subjects permanently locked in a mindless trance, I
will provide each of them with free unlimited Internet access.
--Peter Anspach's list of things to do as an Evil Overlord
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 06:14 AM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com