Python version of this Perl 4-line regexp script?
November 11, 2006 4:46 PM   Subscribe

Convert perl to python: I am very comfortable with regular expressions but I need help converting an implementation of them from Perl to Python. I have an example Perl script below that opens a file, scans through it one line at a time, does a s// replace, then finds a string in the result and stores it in another variable. I would love to just have a working Python version of this for my edification-- and if you have any "Perl to Python" tutorial links I'd love to see them.

This code doesn't do much but it's a prime example of the sort of cruft I write in Perl every day for little things. I know about re.match and re.search but how do I do the rest?


open(L,"/tmp/A");
while(<L>) {
s/\#\#\#//;
if(/^ {1,3}[0-9]{1,3}\. http\:\/\/(.*)/) { print $1; }
}
posted by neustile to Computers & Internet (11 answers total) 1 user marked this as a favorite
 
Something like...

import re

for line in open("/tmp/A"):
> m = re.match(r"^ {1,3}\d{1,3}\. http://(.*)", line.replace("###", ""))
> if m is not None: print m.group(1)
posted by cmiller at 5:07 PM on November 11, 2006


Untested:
import re
for line in open("/tmp/A"):
    line = line.replace("###", "")
    m = re.match(r"^ {1,3}[0-9]{1,3}\. http://(.*)/", line)
    if m: print m.group(1)

posted by jepler at 5:07 PM on November 11, 2006


Oh, and in general, read _Dive into Python_
posted by cmiller at 5:09 PM on November 11, 2006


Not an answer to your question, but a comment about your regex's...

Dude, lay off the backslashes! # and : have no special meaning (unless you're using extended regexs) so you should just as well write:

s/###//;
if (m#^ {1,3}[0-9]{1,3}\. http://(.*)#) { ... }

Other languages that have regexs, but not as first class objects, drive me nuts (ie: Python, Java). You often have to double escape everything, and it makes reading an expression hard. I recommend trying to minimize your \ usage. Raw strings in Python help alleviate this.
posted by sbutler at 5:15 PM on November 11, 2006


sbutler has a point, but of course you may not have written the original regex.

This question might be better phrased as "I have a text file which looks like this, and a Perl regular expression which changes it into this. I want to do it in Python".

I mean, what are we doing here? We have some occurances of "###" and we're getting rid of them, and then, for all lines which match "at the start, one to three spaces followed by one to three numbers, followed by a dot then a space then 'http://'" we're printing out the domain name?

Seems like the whole thing could be simplified.
posted by AmbroseChapel at 6:52 PM on November 11, 2006


thanks jepler & cmiller!

sbutler: due to some early perl trauma, I tend to escape everything that's not A-Za-z0-9. I also always put $| = 1; at the start of all my scripts and I can't remember why. You can imagine why I'm trying to move to python.

Ambrose: the script was made up to get at most of the things I do in perl, it doesn't do anything at all, really.
posted by neustile at 7:18 PM on November 11, 2006


(derail:)

$| = 1 makes perl output unbuffered. Check out 'man perlvar' and do a search for $| if you're curious, but basically for little shell scripts it means that you see output from your script at the speed of execution instead of having it queued up. It actually runs slower, but you see it faster, and sometimes that's desirable.

Buffered output ($| = 0, the default) won't always display each line as it's printed by the script, and will often wait until you have a few lines printed before you see it on the output.

(end derail)
posted by blacklite at 7:52 PM on November 11, 2006


Mastering Regular Expressions, published by O'Reilley (can't remember the author's name) is really enjoyable to read, and I believe it does go into the differences between Perl and Python. I don't know if it answers your question, but people should seriously read this book if they do things with Regexs.
posted by Deathalicious at 11:57 PM on November 11, 2006


Is your goal to be a good programmer? If this isn't a one-off task and your job is really something else, then my advice is: Don't try to convert Perl to another language. If you think in Perl, you'll be a bad programmer. Learn -- really learn -- a few other languages that are nothing like Perl. Immerse yourself in them. Use them until you have dreams in them; that's how you know that you've learned something and can soon move to the next language.

Try each of Scheme, Forth, C, and Ocaml, and whatever others strike your fancy. In two years, you'll be bad-ass, and will truly deserve the title "Programmer."

(You shouldn't stop, either. Make it a point to learn a new language every year.)
posted by cmiller at 6:56 AM on November 12, 2006


Book suggestion: Perl to Python Migration. Lots of minor typos though. It's also avilable in Safari Bookshelf.

This may also be useful: PerlPhrasebook
posted by swapspace at 1:55 PM on November 12, 2006


>Ambrose: the script was made up to get at most of the things I do in perl, it doesn't do anything at all, really.

Well, it does, obviously, do a couple of things. I really think you'd be better off giving a real-life example if you have one, and if you don't, making your question more general.

If your actual question is "how do I go through each line of a text file and extract each occurrence of <foo>" then it's better to ask the question that way.
posted by AmbroseChapel at 2:33 PM on November 12, 2006


« Older Been there, done that, got the smaller t-shirt!   |   Better print-on-demand options than Cafepress? Newer »
This thread is closed to new comments.