REGEXpert help needed!
December 14, 2006 10:45 AM   Subscribe

Regex experts: I need a regular expression to help trim some lines in a text file. I haven't done regex in some time and I'm not having any luck with this. Hope it'll be easy for a wiz.

I have a text file with almost 3,000 e-mail addresses. The format per line is supposed to be:
[address], [name]

But many are:
[address], [address]

The system I'm importing to will not allow two addresses on a line, so I have to trim those instances to simply be:
[address]
posted by Tubes to Computers & Internet (10 answers total)
 
If you can assume that all of the e-mail addresses have @ signs and none of the names do, then you can do this:

perl -pe 's/^([^@]+@[^@]+), .*@.*$/\1/'
posted by grouse at 10:50 AM on December 14, 2006


Here's a rough and dirty stab at that:

s/(\S+@\S+),\s*\S+@\S+/$1/g

\S means "any non-whitespace character".
posted by Khalad at 10:51 AM on December 14, 2006


Which address would you keep, under those circumstances? Always the first one?

And are you using sed?
posted by cerebus19 at 10:52 AM on December 14, 2006


And as a naughty, naughty tag on; does anyone have a quick and easy way to strip out everything else in a file except for URLs? This would make exporting of Tab Mix Plus's saved sessions a doddle.
posted by dance at 11:57 AM on December 14, 2006


dance, here's the regular expression I use for finding URLs:

\b[a-z]+:\d*//(?:[&.?!:]?[\w#~+=;%@\-/]+)*

That's a start.
posted by Khalad at 12:09 PM on December 14, 2006 [1 favorite]


dance: from Regexp::Common::URI::http.pm on CPAN, with over 18k test cases:
(?:(?:http)://(?:(?:(?:(?:(?:(?:[a-zA-Z0-9][-a-zA-Z0-9]*)?[a-zA-Z0-9])[.])*(?:[a-zA-Z][-a-zA-Z0-9]*[a-zA-Z0-9]|[a-zA-Z])[.]?)|(?:[0-9]+[.][0-9]+[.][0-9]+[.][0-9]+)))(?::(?:(?:[0-9]*)))?(?:/(?:(?:(?:(?:(?:(?:[a-zA-Z0-9\-_.!~*'():@&=+$,]+|(?:%[a-fA-F0-9][a-fA-F0-9]))*)(?:;(?:(?:[a-zA-Z0-9\-_.!~*'():@&=+$,]+|(?:%[a-fA-F0-9][a-fA-F0-9]))*))*)(?:/(?:(?:(?:[a-zA-Z0-9\-_.!~*'():@&=+$,]+|(?:%[a-fA-F0-9][a-fA-F0-9]))*)(?:;(?:(?:[a-zA-Z0-9\-_.!~*'():@&=+$,]+|(?:%[a-fA-F0-9][a-fA-F0-9]))*))*))*))(?:[?](?:(?:(?:[;/?:@&=+$,a-zA-Z0-9\-_.!~*'()]+|(?:%[a-fA-F0-9][a-fA-F0-9]))*)))?))?)

posted by moift at 3:08 PM on December 14, 2006


If """^([^,]*?),[^@]*$""" matches, replace the line with the captured result.
posted by cmiller at 3:48 PM on December 14, 2006


Response by poster: Ugh - thanks much gang, but I'm working within NoteTab Pro which can do regex to some degree but it's apparently not totally compatible with the Unix implementation. I had to take it all into Excel temporarily and do some funky sorting and search/replace to clean it up. But I sure appreciate the efforts... AskMeFi continues to rock!
posted by Tubes at 9:49 PM on December 14, 2006


Wow, thanks - I had no idea there was a library of regex expressions. Now to figure out how to get TextWrangler/BBEdit to dump the results of seach into a new doc...
posted by dance at 12:20 PM on December 15, 2006


and Tubes, thanks for hosting my tag-on so graciously!
posted by dance at 12:21 PM on December 15, 2006


« Older What assumption did Feynman see through about the...   |   Pet-friendly Austin hotels? Newer »
This thread is closed to new comments.