vBulletin Search Engine Optimization
| |||||||
| Register | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| Hey Everybody, I've been searching high and low for how to grep for two different strings at once, and I'm not sure that it can be done. Ideas? I've got a bunch of files and I'm searching for ones that have both the word "cat" and "dog" in them, not necessarily on the same line. I've tried using something like this: find /tmp -exec grep -q -E "cat" {} \; -print but I can't seem to do both cat and dog at the same time. find /tmp -exec grep -q -E "cat.\dog" {} \; -print this is closer, but it only works if cat and dog are on the same line. Any ideas? It's OSR507 (I don't think it matters). The people who read this newsgroup are so clever. Thanks, Kevin |
| |||
| On Thu, Jan 26, 2006, Kevin Fleming wrote: >Hey Everybody, > >I've been searching high and low for how to grep for two different >strings at once, and I'm not sure that it can be done. Ideas? >I've got a bunch of files and I'm searching for ones that have both the >word "cat" and "dog" in them, not necessarily on the same line. >I've tried using something like this: You want egrep, not grep: egrep 'pattern1|pattern2' ... If you want files with both these patterns, then things are more complicated, and I'm not sure that there's a program in the grep family that will do it in one pass. I would probably do it this way. find ... | xargs grep -l 'pattern1' > /tmp/list1 xargs grep -l 'pattern2' < /tmp/list1 > /tmp/listfinal Bill -- INTERNET: bill@Celestial.COM Bill Campbell; Celestial Software LLC URL: http://www.celestial.com/ PO Box 820; 6641 E. Mercer Way FAX: (206) 232-9186 Mercer Island, WA 98040-0820; (206) 236-1676 ``Virtually everything is under federal control nowadays except the federal budget.'' -- Herman E. Talmadge, 1975 |
| |||
| Kevin Fleming wrote: > I've been searching high and low for how to grep for two different > strings at once, and I'm not sure that it can be done. Ideas? > I've got a bunch of files and I'm searching for ones that have both the > word "cat" and "dog" in them, not necessarily on the same line. > I've tried using something like this: > > find /tmp -exec grep -q -E "cat" {} \; -print > > but I can't seem to do both cat and dog at the same time. > > find /tmp -exec grep -q -E "cat.\dog" {} \; -print > > this is closer, but it only works if cat and dog are on the same line. Start by learning how to search a single file for multiple strings; get rid of the `find` part of this equation. You are using `grep -E`, which is the newfangled name for `egrep`. Either will work and I'm going to use `egrep` here because I think it shows the differences more clearly. I'll search /etc/termcap and I'll use "cat" and "man" because both of those strings appear in /etc/termcap. So. Regular `grep` (and `egrep` and `fgrep`) will search for multiple expressions given as multiple lines in the search string: $ grep 'cat man' /etc/termcap `egrep` adds alternation: $ egrep 'cat|man' /etc/termcap This should produce the same results, but (in the case of OSR5) alternation is a lot slower. That's due to an ancient library bug which was never fixed in OSR5. I don't know if OSR6 is better. Because of this bug, I always use the separate-lines syntax for simple alternation. For complex alternation: "(cat|man).*bites.*(rat|dog)", I use the '|' syntax and live with the lame performance. Putting `find` back in the mix: $ find /tmp -exec grep -l 'cat dog' {} \; `grep -l` means "print only the names of matching files". This should have the same effect you were trying to get with `grep -q` followed by "-print", but seems more direct to me. >Bela< |
| |||
| ----- Original Message ----- From: "Kevin Fleming" <kevintickle@gmail.com> Newsgroups: comp.unix.sco.misc To: <distro@jpr.com> Sent: Thursday, January 26, 2006 5:08 PM Subject: help with grep looking for cats and dogs > Hey Everybody, > > I've been searching high and low for how to grep for two different > strings at once, and I'm not sure that it can be done. Ideas? > I've got a bunch of files and I'm searching for ones that have both the > word "cat" and "dog" in them, not necessarily on the same line. > I've tried using something like this: > > find /tmp -exec grep -q -E "cat" {} \; -print > > but I can't seem to do both cat and dog at the same time. > > find /tmp -exec grep -q -E "cat.\dog" {} \; -print > > this is closer, but it only works if cat and dog are on the same line. > > Any ideas? It's OSR507 (I don't think it matters). > The people who read this newsgroup are so clever. > > Thanks, > Kevin find /tmp -type f |xargs fgrep -l cat |xargs fgrep -l dog or find /tmp -type f |xargs -n 1 awk '{if($0~"cat")C=1;if($0~"dog")D=1;if(C+D==2){print ARGV[1];exit}}' explanation of the first way: find produces a list of files (only files thanks to -type f) the first xargs runs grep as many times as necessary to process all the files, putting as many filenames in each command as possible the output of the first xargs/grep is a list of files that have cat this list goes into the second xargs, which runs grep as many times as necessary grepping for dog so all the cat files get searched a second time for dog the output of the second xargs/grep is filenames that have cat and dog fgrep is used instead of grep just because it's faster and works as long as the search is for a simple string and not a regular expression. explanation of the second way: A more efficient way that doesn't involve two passes through some of the files is possible using awk instead of grep. A more readable version of the same awk code, placed into a seperate script file cat myscript #!/usr/bin/awk -f if ($0~"cat") C=1 if ($0~"dog") D=1 if (C+D==2) {print ARGV[1] ; exit } And you feed that filenames one at a time with xargs -n 1 find /tmp -type f |xargs -n 1 myscript Awk works on records, each line in the input file causes the script to run once. Variables retain their value across records, so if line one has cat but no dog, then C=1 but D is still blank and so C+D != 2 fails and so the next record of the file is read. If the next line has neither cat or dog nothing changes and the next line is read, if the next line has dog but no cat then D=1, C still = 1 from before, and so this time the C+D=2 passes and the filename is printed and the script exits. No sense reading through the rest of the file. xargs then runs the script again for the next file in the list that find produced. The first way is shorter to type and easier to look at and understand, but the second way might be more efficient. And by now after I hit submit I bet one of the _real_ geniuses will have posted some simple little egrep or perl syntax that puts this to shame Brian K. White -- brian@aljex.com -- http://www.aljex.com/bkw/ +++++[>+++[>+++++>+++++++<<-]<-]>>+.>.+++++.+++++++.-.[>+<---]>++. filePro BBx Linux SCO FreeBSD #callahans Satriani Filk! |
| |||
| ----- Original Message ----- From: "Bela Lubkin" <filbo@armory.com> Newsgroups: comp.unix.sco.misc To: <distro@jpr.com> Sent: Thursday, January 26, 2006 6:07 PM Subject: Re: help with grep looking for cats and dogs > Kevin Fleming wrote: > >> I've been searching high and low for how to grep for two different >> strings at once, and I'm not sure that it can be done. Ideas? >> I've got a bunch of files and I'm searching for ones that have both the >> word "cat" and "dog" in them, not necessarily on the same line. >> I've tried using something like this: >> >> find /tmp -exec grep -q -E "cat" {} \; -print >> >> but I can't seem to do both cat and dog at the same time. >> >> find /tmp -exec grep -q -E "cat.\dog" {} \; -print >> >> this is closer, but it only works if cat and dog are on the same line. > > Start by learning how to search a single file for multiple strings; get > rid of the `find` part of this equation. > > You are using `grep -E`, which is the newfangled name for `egrep`. > Either will work and I'm going to use `egrep` here because I think it > shows the differences more clearly. I'll search /etc/termcap and I'll > use "cat" and "man" because both of those strings appear in > /etc/termcap. > > So. Regular `grep` (and `egrep` and `fgrep`) will search for multiple > expressions given as multiple lines in the search string: > > $ grep 'cat > man' /etc/termcap > > `egrep` adds alternation: > > $ egrep 'cat|man' /etc/termcap > > This should produce the same results, but (in the case of OSR5) > alternation is a lot slower. That's due to an ancient library bug which > was never fixed in OSR5. I don't know if OSR6 is better. Because of > this bug, I always use the separate-lines syntax for simple alternation. > For complex alternation: "(cat|man).*bites.*(rat|dog)", I use the '|' > syntax and live with the lame performance. > > Putting `find` back in the mix: > > $ find /tmp -exec grep -l 'cat > dog' {} \; > > `grep -l` means "print only the names of matching files". This should > have the same effect you were trying to get with `grep -q` followed by > "-print", but seems more direct to me. > >>Bela< *sigh* I predicted this would happen. (see my response to this post, in case you get this out of order and it doesn't make sense) Brian K. White -- brian@aljex.com -- http://www.aljex.com/bkw/ +++++[>+++[>+++++>+++++++<<-]<-]>>+.>.+++++.+++++++.-.[>+<---]>++. filePro BBx Linux SCO FreeBSD #callahans Satriani Filk! |
| |||
| Brian K. White wrote: > From: "Kevin Fleming" <kevintickle@gmail.com> > > I've been searching high and low for how to grep for two different > > strings at once, and I'm not sure that it can be done. Ideas? > > I've got a bunch of files and I'm searching for ones that have both the > > word "cat" and "dog" in them, not necessarily on the same line. > > I've tried using something like this: > > > > find /tmp -exec grep -q -E "cat" {} \; -print > > > > but I can't seem to do both cat and dog at the same time. > > > > find /tmp -exec grep -q -E "cat.\dog" {} \; -print > > > > this is closer, but it only works if cat and dog are on the same line. > find /tmp -type f |xargs fgrep -l cat |xargs fgrep -l dog > > or > > find /tmp -type f |xargs -n 1 awk > '{if($0~"cat")C=1;if($0~"dog")D=1;if(C+D==2){print ARGV[1];exit}}' Whoops... my response would only find files with both words on the same line. It is not my usual habit to misread things like that. Oh well. Both of your ways should work. I think `awk` will process this equivalent code slightly more efficiently: awk '/cat/ { C = 1 } /dog/ { D = 1 } C + D == 1 { print ARGV[1]; exit }' > A more readable version of the same awk code, placed into a seperate script > file > > cat myscript > #!/usr/bin/awk -f > if ($0~"cat") C=1 > if ($0~"dog") D=1 > if (C+D==2) {print ARGV[1] ; exit } Almost ... you need to put braces around all the code: #!/usr/bin/awk -f { if ($0~"cat") C=1 if ($0~"dog") D=1 if (C+D==2) {print ARGV[1] ; exit } } >Bela< |
| |||
| "Brian K. White" <brian@aljex.com> wrote in message news:00df01c622d2$954fcf70$951fa8c0@venti... > > ----- Original Message ----- > From: "Bela Lubkin" <filbo@armory.com> > Newsgroups: comp.unix.sco.misc > To: <distro@jpr.com> > Sent: Thursday, January 26, 2006 6:07 PM > Subject: Re: help with grep looking for cats and dogs > > > > Kevin Fleming wrote: > > > >> I've been searching high and low for how to grep for two different > >> strings at once, and I'm not sure that it can be done. Ideas? > >> I've got a bunch of files and I'm searching for ones that have both the > >> word "cat" and "dog" in them, not necessarily on the same line. > >> I've tried using something like this: > >> > >> find /tmp -exec grep -q -E "cat" {} \; -print > >> > >> but I can't seem to do both cat and dog at the same time. > >> > >> find /tmp -exec grep -q -E "cat.\dog" {} \; -print > >> > >> this is closer, but it only works if cat and dog are on the same line. > > > > Start by learning how to search a single file for multiple strings; get > > rid of the `find` part of this equation. > > > > You are using `grep -E`, which is the newfangled name for `egrep`. > > Either will work and I'm going to use `egrep` here because I think it > > shows the differences more clearly. I'll search /etc/termcap and I'll > > use "cat" and "man" because both of those strings appear in > > /etc/termcap. > > > > So. Regular `grep` (and `egrep` and `fgrep`) will search for multiple > > expressions given as multiple lines in the search string: > > > > $ grep 'cat > > man' /etc/termcap > > > > `egrep` adds alternation: > > > > $ egrep 'cat|man' /etc/termcap > > > > This should produce the same results, but (in the case of OSR5) > > alternation is a lot slower. That's due to an ancient library bug which > > was never fixed in OSR5. I don't know if OSR6 is better. Because of > > this bug, I always use the separate-lines syntax for simple alternation. > > For complex alternation: "(cat|man).*bites.*(rat|dog)", I use the '|' > > syntax and live with the lame performance. > > > > Putting `find` back in the mix: > > > > $ find /tmp -exec grep -l 'cat > > dog' {} \; > > > > `grep -l` means "print only the names of matching files". This should > > have the same effect you were trying to get with `grep -q` followed by > > "-print", but seems more direct to me. > > > >>Bela< > > *sigh* I predicted this would happen. > (see my response to this post, in case you get this out of order and it > doesn't make sense) > Actually, Bela's solution gives a list of all files that contain "cat OR dog". The original poster was looking for a list of all files that contain "cat AND dog", (anywhere in the file) and your solution (given in the other post) is correct. Bob |
| |||
| In article <200601261711.aa12334@deepthought.armory.com>, Bela Lubkin <filbo@armory.com> wrote: >> From: "Kevin Fleming" <kevintickle@gmail.com> >> > I've been searching high and low for how to grep for two different >> > strings at once, and I'm not sure that it can be done. Ideas? >> > I've got a bunch of files and I'm searching for ones that have both the >> > word "cat" and "dog" in them, not necessarily on the same line. > >Both of your ways should work. I think `awk` will process this >equivalent code slightly more efficiently: > > awk '/cat/ { C = 1 } > /dog/ { D = 1 } > C + D == 1 { print ARGV[1]; exit }' ^ ^ s/b 2 I say: find /tmp -type f |xargs awk ' FNR == 1 { c = d = 0 } /cat/ { c = 1 } /dog/ { d = 1 } c && d { print FILENAME; next }' John -- John DuBois spcecdt@armory.com KC6QKZ/AE http://www.armory.com/~spcecdt/ |
| |||
| Bela Lubkin wrote: > Brian K. White wrote: > > > From: "Kevin Fleming" <kevintickle@gmail.com> > > > > I've been searching high and low for how to grep for two different > > > strings at once, and I'm not sure that it can be done. Ideas? > > > I've got a bunch of files and I'm searching for ones that have both the > > > word "cat" and "dog" in them, not necessarily on the same line. > > > I've tried using something like this: > > > > > > find /tmp -exec grep -q -E "cat" {} \; -print > > > > > > but I can't seem to do both cat and dog at the same time. > > > > > > find /tmp -exec grep -q -E "cat.\dog" {} \; -print > > > > > > this is closer, but it only works if cat and dog are on the same line. > > > find /tmp -type f |xargs fgrep -l cat |xargs fgrep -l dog > > > > or > > > > find /tmp -type f |xargs -n 1 awk > > '{if($0~"cat")C=1;if($0~"dog")D=1;if(C+D==2){print ARGV[1];exit}}' > > Whoops... my response would only find files with both words on the same > line. It is not my usual habit to misread things like that. Oh well. > > Both of your ways should work. I think `awk` will process this > equivalent code slightly more efficiently: > > awk '/cat/ { C = 1 } > /dog/ { D = 1 } > C + D == 1 { print ARGV[1]; exit }' > > > A more readable version of the same awk code, placed into a seperate script > > file > > > > cat myscript > > #!/usr/bin/awk -f > > if ($0~"cat") C=1 > > if ($0~"dog") D=1 > > if (C+D==2) {print ARGV[1] ; exit } > > Almost ... you need to put braces around all the code: > > #!/usr/bin/awk -f > { > if ($0~"cat") C=1 > if ($0~"dog") D=1 > if (C+D==2) {print ARGV[1] ; exit } > } > > >Bela< Simple and crude but easy: grep -l dog `grep -l cat /tmp/*` or grep -l dog /tmp/* | xargs grep -l cat or if you must find find /tmp -name "*" -exec grep -l dog {} \; | xargs grep -l cat Regards...Dan. |
| ||||
| jdanskinner wrote: > Bela Lubkin wrote: > > Brian K. White wrote: > > > > > From: "Kevin Fleming" <kevintickle@gmail.com> > > > > > > I've been searching high and low for how to grep for two different > > > > strings at once, and I'm not sure that it can be done. Ideas? > > > > I've got a bunch of files and I'm searching for ones that have both the > > > > word "cat" and "dog" in them, not necessarily on the same line. > > > > I've tried using something like this: > > > > > > > > find /tmp -exec grep -q -E "cat" {} \; -print > > > > > > > > but I can't seem to do both cat and dog at the same time. > > > > > > > > find /tmp -exec grep -q -E "cat.\dog" {} \; -print > > > > > > > > this is closer, but it only works if cat and dog are on the same line. > > > > > find /tmp -type f |xargs fgrep -l cat |xargs fgrep -l dog > > > > > > or > > > > > > find /tmp -type f |xargs -n 1 awk > > > '{if($0~"cat")C=1;if($0~"dog")D=1;if(C+D==2){print ARGV[1];exit}}' > > > > Whoops... my response would only find files with both words on the same > > line. It is not my usual habit to misread things like that. Oh well. > > > > Both of your ways should work. I think `awk` will process this > > equivalent code slightly more efficiently: > > > > awk '/cat/ { C = 1 } > > /dog/ { D = 1 } > > C + D == 1 { print ARGV[1]; exit }' > > > > > A more readable version of the same awk code, placed into a seperate script > > > file > > > > > > cat myscript > > > #!/usr/bin/awk -f > > > if ($0~"cat") C=1 > > > if ($0~"dog") D=1 > > > if (C+D==2) {print ARGV[1] ; exit } > > > > Almost ... you need to put braces around all the code: > > > > #!/usr/bin/awk -f > > { > > if ($0~"cat") C=1 > > if ($0~"dog") D=1 > > if (C+D==2) {print ARGV[1] ; exit } > > } > > > > >Bela< > > Simple and crude but easy: > grep -l dog `grep -l cat /tmp/*` > or > grep -l dog /tmp/* | xargs grep -l cat > or if you must find > find /tmp -name "*" -exec grep -l dog {} \; | xargs grep -l cat > > Regards...Dan. Thanks to everyone for their help on this. I'm stilll learning how scripts and the syntax works... Brian's suggestion: find /tmp -type f |xargs fgrep -l cat |xargs fgrep -l dog was the simplest for me to understand, even if it wasn't the most efficient. I promise to spend some time learning about shell scripts, and I appreciate the explanations with the examples. Thanks again, Kevin |