REGEXpert help needed!
December 14, 2006 10:45 AM   RSS feed for this thread Subscribe

Regex experts: I need a regular expression to help trim some lines in a text file. I haven't done regex in some time and I'm not having any luck with this. Hope it'll be easy for a wiz.

I have a text file with almost 3,000 e-mail addresses. The format per line is supposed to be:
[address], [name]

But many are:
[address], [address]

The system I'm importing to will not allow two addresses on a line, so I have to trim those instances to simply be:
[address]
posted by Tubes to computers & internet (10 comments total)
If you can assume that all of the e-mail addresses have @ signs and none of the names do, then you can do this:

perl -pe 's/^([^@]+@[^@]+), .*@.*$/\1/'
posted by grouse at 10:50 AM on December 14, 2006


Here's a rough and dirty stab at that:

s/(\S+@\S+),\s*\S+@\S+/$1/g

\S means "any non-whitespace character".
posted by Khalad at 10:51 AM on December 14, 2006


Which address would you keep, under those circumstances? Always the first one?

And are you using sed?
posted by cerebus19 at 10:52 AM on December 14, 2006


And as a naughty, naughty tag on; does anyone have a quick and easy way to strip out everything else in a file except for URLs? This would make exporting of Tab Mix Plus's saved sessions a doddle.
posted by dance at 11:57 AM on December 14, 2006


dance, here's the regular expression I use for finding URLs:

\b[a-z]+:\d*//(?:[&.?!:]?[\w#~+=;%@\-/]+)*

That's a start.
posted by Khalad at 12:09 PM on December 14, 2006 [1 favorite]


dance: from Regexp::Common::URI::http.pm on CPAN, with over 18k test cases:


(?:(?:http)://(?:(?:(?:(?:(?:(?:[a-zA-Z0-9][-a-zA-Z0-9]*)?[a-zA-Z0-9])[.])*(?:[a-zA-Z][-a-zA-Z0-9]*[a-zA-Z0-9]|[a-zA-Z])[.]?)|(?:[0-9]+[.][0-9]+[.][0-9]+[.][0-9]+)))(?::(?:(?:[0-9]*)))?(?:/(?:(?:(?:(?:(?:(?:[a-zA-Z0-9\-_.!~*'():@&=+$,]+|(?:%[a-fA-F0-9][a-fA-F0-9]))*)(?:;(?:(?:[a-zA-Z0-9\-_.!~*'():@&=+$,]+|(?:%[a-fA-F0-9][a-fA-F0-9]))*))*)(?:/(?:(?:(?:[a-zA-Z0-9\-_.!~*'():@&=+$,]+|(?:%[a-fA-F0-9][a-fA-F0-9]))*)(?:;(?:(?:[a-zA-Z0-9\-_.!~*'():@&=+$,]+|(?:%[a-fA-F0-9][a-fA-F0-9]))*))*))*))(?:[?](?:(?:(?:[;/?:@&=+$,a-zA-Z0-9\-_.!~*'()]+|(?:%[a-fA-F0-9][a-fA-F0-9]))*)))?))?)


posted by moift at 3:08 PM on December 14, 2006


If """^([^,]*?),[^@]*$""" matches, replace the line with the captured result.
posted by cmiller at 3:48 PM on December 14, 2006


Ugh - thanks much gang, but I'm working within NoteTab Pro which can do regex to some degree but it's apparently not totally compatible with the Unix implementation. I had to take it all into Excel temporarily and do some funky sorting and search/replace to clean it up. But I sure appreciate the efforts... AskMeFi continues to rock!
posted by Tubes at 9:49 PM on December 14, 2006


Wow, thanks - I had no idea there was a library of regex expressions. Now to figure out how to get TextWrangler/BBEdit to dump the results of seach into a new doc...
posted by dance at 12:20 PM on December 15, 2006


and Tubes, thanks for hosting my tag-on so graciously!
posted by dance at 12:21 PM on December 15, 2006


« Older In one of Richard Feynman's bo...   |   We're moving to Austin, Tx and... Newer »

You are not logged in, either login or create an account to post comments



Related Questions
Newbie Seeks Javascript March 29, 2008
Help me remember this Free Custom Ringtone site! January 26, 2008
What easy to use (and free) text editor and ftp... June 14, 2007
Dance, text files, dance April 12, 2007
Email to SMS Quickly and Cheaply? March 26, 2007