This is a discussion on grep 2 at signs within the Sco Unix forums, part of the Unix Operating Systems category; --> Hello all, bob@vodka.com jpr@jane.com jeff@needshelp.net joey@test.net jeff@waylon@bank.com ^------^------------ 2 @ signs I want to locate any email with 2 ...
| |||||||
| Register | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| Hello all, bob@vodka.com jpr@jane.com jeff@needshelp.net joey@test.net jeff@waylon@bank.com ^------^------------ 2 @ signs I want to locate any email with 2 "at" signs inside file "file_list". Keep in mind there is only ONE email per line and there are 10's of thousands of email addresses in "file_list". # grep "*@*@" file_list Displays every line # grep "*\@*\@" file_list Displays every line Thanks in advance, Jeff H |
| |||
| Jeff Hyman wrote (on Mon, Jul 10, 2006 at 04:46:54PM -0400): | bob@vodka.com | jpr@jane.com | jeff@needshelp.net | joey@test.net | jeff@waylon@bank.com | ^------^------------ 2 @ signs | | I want to locate any email with 2 "at" signs inside file "file_list". | Keep in mind there is only ONE email per line and there are | 10's of thousands of email addresses in "file_list". | | # grep "*@*@" file_list Displays every line | # grep "*\@*\@" file_list Displays every line How about: egrep '.*@.*@' file_list or awk '/.*@.*@/ { print }'file_list Bob (who would use mawk 'cause it'd be fastest) -- Bob Stockler +-+ bob@trebor.iglou.com +-+ http://members.iglou.com/trebor |
| |||
| Jeff Hyman typed (on Mon, Jul 10, 2006 at 04:46:54PM -0400): | | Hello all, | | bob@vodka.com | jpr@jane.com | jeff@needshelp.net | joey@test.net | jeff@waylon@bank.com | ^------^------------ 2 @ signs | | I want to locate any email with 2 "at" signs inside file "file_list". | Keep in mind there is only ONE email per line and there are | 10's of thousands of email addresses in "file_list". | | # grep "*@*@" file_list Displays every line | # grep "*\@*\@" file_list Displays every line Really?? Neither of those commands should find any match whatsoever in that list; have you some other odd filename in your current directory which the shell is expanding before passing grep its argument? In a regular expression, '*' is not a wild card. It indicates any number of matches (including none) of what precedes it. In a regular expression, "." stands for any single character, and ".*" for any number of any character. including none. Add 'two@@any.org' to your list, just for kicks. Then try: grep "@.*@" -- JP ==> http://www.frappr.com/cusm <== |
| |||
| Bob Stockler typed (on Mon, Jul 10, 2006 at 05:16:48PM -0400): | Jeff Hyman wrote (on Mon, Jul 10, 2006 at 04:46:54PM -0400): | | | bob@vodka.com | | jpr@jane.com | | jeff@needshelp.net | | joey@test.net | | jeff@waylon@bank.com | | ^------^------------ 2 @ signs | | | | I want to locate any email with 2 "at" signs inside file "file_list". | | Keep in mind there is only ONE email per line and there are | | 10's of thousands of email addresses in "file_list". | | | | # grep "*@*@" file_list Displays every line | | # grep "*\@*\@" file_list Displays every line | | How about: | | egrep '.*@.*@' file_list | or | awk '/.*@.*@/ { print }'file_list | | Bob (who would use mawk 'cause it'd be fastest) When it comes to commands, JP goes by the shibboleth that 'shorter is better', so he'd scratch one each of ".", "*", " ", "{", " ", "p", "r", "i", "n", "t", " ", "}" and just type: mawk '/@.*@/' file_list -- JP ==> http://www.frappr.com/cusm <== |
| |||
| Jean-Pierre Radley wrote (on Mon, Jul 10, 2006 at 06:05:44PM -0400): | Bob Stockler typed (on Mon, Jul 10, 2006 at 05:16:48PM -0400): | | Jeff Hyman wrote (on Mon, Jul 10, 2006 at 04:46:54PM -0400): | | | | | bob@vodka.com | | | jpr@jane.com | | | jeff@needshelp.net | | | joey@test.net | | | jeff@waylon@bank.com | | | ^------^------------ 2 @ signs | | | | | | I want to locate any email with 2 "at" signs inside file "file_list". | | | Keep in mind there is only ONE email per line and there are | | | 10's of thousands of email addresses in "file_list". | | | | | | # grep "*@*@" file_list Displays every line | | | # grep "*\@*\@" file_list Displays every line | | | | How about: | | | | egrep '.*@.*@' file_list | | or | | awk '/.*@.*@/ { print }'file_list | | | | Bob (who would use mawk 'cause it'd be fastest) | | | When it comes to commands, JP goes by the shibboleth that | 'shorter is better', so he'd scratch one each of ".", "*", | " ", "{", " ", "p", "r", "i", "n", "t", " ", "}" and just type: | | mawk '/@.*@/' file_list More elegant . . . less informative to those less knowledgeable. OTOH it does, in fact, inform (or remind) us of the subtleties of AWK, which by default prints any matched line. Bob -- Bob Stockler +-+ bob@trebor.iglou.com +-+ http://members.iglou.com/trebor |
| |||
| Bob Stockler typed (on Mon, Jul 10, 2006 at 06:22:12PM -0400): | Jean-Pierre Radley wrote (on Mon, Jul 10, 2006 at 06:05:44PM -0400): | | | Bob Stockler typed (on Mon, Jul 10, 2006 at 05:16:48PM -0400): | | | Jeff Hyman wrote (on Mon, Jul 10, 2006 at 04:46:54PM -0400): | | | | | | | bob@vodka.com | | | | jpr@jane.com | | | | jeff@needshelp.net | | | | joey@test.net | | | | jeff@waylon@bank.com | | | | ^------^------------ 2 @ signs | | | | | | | | I want to locate any email with 2 "at" signs inside file "file_list". | | | | Keep in mind there is only ONE email per line and there are | | | | 10's of thousands of email addresses in "file_list". | | | | | | | | # grep "*@*@" file_list Displays every line | | | | # grep "*\@*\@" file_list Displays every line | | | | | | How about: | | | | | | egrep '.*@.*@' file_list | | | or | | | awk '/.*@.*@/ { print }'file_list | | | | | | Bob (who would use mawk 'cause it'd be fastest) | | | | | | When it comes to commands, JP goes by the shibboleth that | | 'shorter is better', so he'd scratch one each of ".", "*", | | " ", "{", " ", "p", "r", "i", "n", "t", " ", "}" and just type: | | | | mawk '/@.*@/' file_list | | More elegant . . . less informative to those less knowledgeable. | | OTOH it does, in fact, inform (or remind) us of the subtleties | of AWK, which by default prints any matched line. | | Bob | | -- | Bob Stockler +-+ bob@trebor.iglou.com +-+ http://members.iglou.com/trebor Well guys, here's the results: egrep '.*@.*@' list # Works, but slow and prints twice # real 0m1.19s # user 0m1.16s # sys 0m0.02 awk '/.*@.*@/ { print }'list # does not work... just hangs # ps shows its doing something grep "@.*@" list # Works & fast # real 0m0.07s # user 0m0.07s # sys 0m0.01 mawk '/@.*@/' list # Works & Fastest # real 0m0.01s # user 0m0.01s # sys 0m0.01 You guys (as always) have been a great help and I thank you! Jeff H |
| |||
| On Tue, Jul 11, 2006, Jeff Hyman wrote: >Bob Stockler typed (on Mon, Jul 10, 2006 at 06:22:12PM -0400): >| Jean-Pierre Radley wrote (on Mon, Jul 10, 2006 at 06:05:44PM -0400): >| .... >| | | | I want to locate any email with 2 "at" signs inside file "file_list". >| | | | Keep in mind there is only ONE email per line and there are >| | | | 10's of thousands of email addresses in "file_list". .... >Well guys, here's the results: > I would think these might have problems given the greedy nature of regular expressions (e.g. .*@ matches the longest string ending in @). This might be a better solution: #!/usr/local/bin/python import re pattern = re.compile(r'.*?@.*@') # the '?' is a non-greedy match fh = open('file_list') for line in fh.readlines(): if pattern.search(line): print line, # comma suppresses extra newline > egrep '.*@.*@' list # Works, but slow and prints twice > # real 0m1.19s > # user 0m1.16s > # sys 0m0.02 > > awk '/.*@.*@/ { print }'list # does not work... just hangs > # ps shows its doing something > > > grep "@.*@" list # Works & fast > # real 0m0.07s > # user 0m0.07s > # sys 0m0.01 > > mawk '/@.*@/' list # Works & Fastest > # real 0m0.01s > # user 0m0.01s > # sys 0m0.01 > >You guys (as always) have been a great help and I thank you! > >Jeff H > -- Bill -- INTERNET: bill@Celestial.COM Bill Campbell; Celestial Software LLC URL: http://www.celestial.com/ PO Box 820; 6641 E. Mercer Way FAX: (206) 232-9186 Mercer Island, WA 98040-0820; (206) 236-1676 ``If ye love wealth greater than liberty, the tranquillity of servitude greater than the animating contest for freedom, go home from us in peace. We seek not your consul, nor your arms. Crouch down and lick the hand that feeds you. May your chains set lightly upon you; and may posterity forget ye were our countrymen.'' -- Samuel Adams (American Patriot) |
| |||
| In article <mailman.0.1152648317.26573.sco-misc@lists.celestial.com>, Bill Campbell <bill@celestial.com> wrote: >... >>| | | | I want to locate any email with 2 "at" signs inside file "file_list". >>| | | | Keep in mind there is only ONE email per line and there are >>| | | | 10's of thousands of email addresses in "file_list". >... > >I would think these might have problems given the greedy nature >of regular expressions (e.g. .*@ matches the longest string >ending in @). Greedy expressions will never prevent a match; "greedy" affects only how much is matched. Another solution, about as fast as mawk, is to use 'pgrep' (supplied with gwxlibs). John -- John DuBois spcecdt@armory.com KC6QKZ/AE http://www.armory.com/~spcecdt/ |
| |||
| John DuBois wrote: > Another solution, about as fast as mawk, is to use 'pgrep' (supplied > with gwxlibs). That's actually `pcregrep`. gwxlibs is unwise (at best) to provide the `pgrep` alias link. On Linux systems: $ apropos pgrep; apropos pcregrep pgrep, pkill - look up or signal processes based on name and other attributes pcregrep - a grep with Perl-compatible regular expressions >Bela< |