My First Perl Script (TM)
November 11, 2008 10:30 AM   Subscribe

Please help me make a change to multiple .txt files using this funny computer code thing.

I have a item I need to find and replace in 43 different text files. As I understand it from this, it's possible to create a command that will do this for me. The link mentions some options, but they are either for windows or assume a level of knowledge that I don't have. Sad as it may sound, I have no idea where to put a command script on a MacBook Pro running OS Leopard 10.5. I've tried to find a place or option in, say, the TextEdit application that lets you select a bunch of files and put in a command, but I can't find it. What would be a code I could use, and how would I implement it?

For all of you who taught yourself Fortran as precocious eigh-year-olds, don't miss this opportunity to show a lady the wonders of technology. It's just like teaching your grandmother to use email, except I'm 34.

Details: I need to change "internet" to "Internet."
posted by foxy_hedgehog to Technology (31 answers total) 5 users marked this as a favorite
 
Best answer: perl -pi -e 's/old/new/g' *.txt
posted by plexi at 10:35 AM on November 11, 2008


Best answer: You're going to want to run terminal.app, which is in Utlities in Applications. From there you'll want to review any of the many references online regarding unix/linux/osx shells, specifically bash. From there you can run any of those funny looking perl or sed commands.
posted by cellphone at 10:38 AM on November 11, 2008


Best answer: The place to put perl one-liners (like plexi's) is on the command line, in Terminal. Open that, and paste plexi's example there.

You'll have to change "old" to whatever you're looking to replace, and "new" to whatever you're looking to replace it with. So, if you're changing all instances of "internet" to "Internet", then it will be perl -e 's/internet/Internet/g' *.txt . This example assumes that all the files are in the current directory. If not, then either cd to the current directory, or copy those files to your home directory.
posted by philomathoholic at 10:44 AM on November 11, 2008


Oh yeah, don't forget the -pi after the "perl" in my example (like I did). The -p puts a loop around the one-liner, so that it gets executed more than once. The -i will edit those files in-place, rather printing out the new files. The -e let's you write a line of code on the command line, instead of putting it in a file (script).
posted by philomathoholic at 10:49 AM on November 11, 2008


Best answer: Since you're a command-line neophyte, I'd recommend having Perl make backups of the files it changes, which you can do by changing
perl -pi -e 's/internet/Internet/g' *.txt
to
perl -pi.BAK -e 's/internet/Internet/g' *.txt
Perl will put the original contents of foo.txt into foo.txt.BAK and do the substitutions in the original file. After you've decided that Perl didn't screw anything up, you can throw the .BAK files in the trash. (You can use any extension you want in place of .BAK, by the way.)
posted by letourneau at 11:02 AM on November 11, 2008


Response by poster: Wait, so if I pull up the Linux option on the Mac, all I need to do is type

perl pi -e 's/internet/Internet/g' *.txt

on the Command line?

How will it (the computer, Linux, God) know which text files I am applying the change to?
posted by foxy_hedgehog at 11:02 AM on November 11, 2008


Best answer: How will it (the computer, Linux, God) know which text files I am applying the change to?

The *.txt means "all files in the current directory ending in .txt", which you can get a list of by typing ls *.txt at the command line. What you probably want to do is put all of the files you want to auto-replace stuff in into their own directory, so you can go into that directory and be sure you're working only on them. Suppose you make a subdirectory of your home Documents directory called "fixme" and put the files in there. Then you should do cd ~/Documents/fixme to go into that directory, do ls *.txt for good measure just to confirm that the files listed are the ones you want to change, and then you can run the Perl command described above.
posted by letourneau at 11:08 AM on November 11, 2008


Most text editors will let you do this without code. You're looking for find/replace in files functionality - usually under search or find in the menu. I'm on a PC not a Mac but I have to assume that shouldn't matter. What were you using to create/edit these files?
posted by Wolfie at 11:11 AM on November 11, 2008


if you are uncomfortable using the command line there are graphical alternatives.
posted by phil at 11:13 AM on November 11, 2008


Best answer: Like others have said, you'll need to run the Terminal application to run perl commands.

When you first open up Terminal, it spawns a shell and puts you in your home or user directory. The shell is just another way of interacting with Leopard, except instead of windows and mouse actions you type commands. By default the shell Leopard runs is something called bash. This isn't really important until you start getting into heavy command line usage, but it's good to know for reference, because some instructions will be different for bash, csh, tcsh or whatever shell you're running.

The first thing you do once you open the terminal is change to the directory containing your files. Although you can run the perl command from your home directory, it's probably easier if you're in the same directory as the files. This way, you don't need to include the full path in the perl command.

The cd command lets you change directories (most common shell commands are very short). You can type ls to list the contents of the current directory.

So, if your files are stored in the Documents directory, you can type the following command:

cd Documents

This will change to the Documents directory, which is a subdirectory of your home directory. Which just means the Documents directory is inside your home directory

You can type pwd to see what your current directory is. It will spit out whatever directory you're currently in.

This is VERY important: You should make a copy of all your files in the directory before you attempt to run the perl command. Global search and replace from the command line is risky. I've nuked files myself many times. Regular expressions, the language perl uses to perform search and replace, are very powerful, but also error-prone. You can modify the perl command to backup the files by using perl -pi.bak -e ... instead of perl -pi -e, but it's probably just easier to copy the whole directory or individual files from the GUI.

Once you're in the same directory as your files, you can run the perl command discussed above.
posted by formless at 11:32 AM on November 11, 2008


Best answer: Just to lay a few things out for you: cd means change directory. ls basically lists the contents of the directory (folder) you're currently in. If your folder name has spaces in it, you'll want to put it in quotes. So, if your folder of text files that need fixin' is in Documents/borkedfiles, you can just type cd Documents/borkedfiles after opening up Terminal. If they're in Documents/borked files, you'll need to type cd "Documents/borked files" to switch to that directory.
posted by MadamM at 11:38 AM on November 11, 2008


BK ReplaceEm is essentially a text search-and-replace program. However, unlike the search-replace functionality of a standard text editor, BK ReplaceEm is designed to operate on multiple files at once. And you need not only perform one search-replace operation per file -- you can setup a list of operations to perform. You can also specify a backup file for each file processed just in case the replace operation didn’t do exactly what you wanted.
posted by netbros at 11:42 AM on November 11, 2008


Oops, sorry. Windows only for BK ReplaceEM.
posted by netbros at 11:43 AM on November 11, 2008


Response by poster: Most text editors will let you do this without code. You're looking for find/replace in files functionality - usually under search or find in the menu. I'm on a PC not a Mac but I have to assume that shouldn't matter. What were you using to create/edit these files?

Once you're in the same directory as your files, you can run the perl command discussed above.

When I'm in the Linux mode, how do I go from documents to the specific file containing the docs I want to change?

Will every folder be a directory with a different name? Or in other words, will t the structure of the directories reflect the visual structure of windows on the Mac, with each separate folder, for example, "Budget 06," "Schoolwork," and "How to Sabotage a Relationship" each a distinct directory?

Letourneau suggested, "What you probably want to do is put all of the files you want to auto-replace stuff in into their own directory, so you can go into that directory and be sure you're working only on them. Suppose you make a subdirectory of your home Documents directory called "fixme" and put the files in there. Then you should do cd ~/Documents/fixme to go into that directory, do ls *.txt for good measure just to confirm that the files listed are the ones you want to change, and then you can run the Perl command described above."

How do I create one of these sub-directories that will show up in Linux?


Most text editors will let you do this without code. You're looking for find/replace in files functionality - usually under search or find in the menu. I'm on a PC not a Mac but I have to assume that shouldn't matter. What were you using to create/edit these files?

I've used TextEdit and TextWrangler. Neither of them seem to have this option- their "Find/Replace" only applies to the particular document you are working on.

I know I sound as dumb as a box of hammers, but this is really interesting and I'd like to be able to understand and implement it. Thank you...
posted by foxy_hedgehog at 11:45 AM on November 11, 2008


Response by poster: So, if your folder of text files that need fixin' is in Documents/borkedfiles, you can just type cd Documents/borkedfiles after opening up Terminal. If they're in Documents/borked files, you'll need to type cd "Documents/borked files" to switch to that directory.

Ah, ok- so you could get more elaborate from there to parallel your folders, eg: cd Documents/epicfailures/textfiles/wordchanges/filesthatspellinternetincorrectly

and so on?
posted by foxy_hedgehog at 11:47 AM on November 11, 2008


Best answer: How do I create one of these sub-directories that will show up in Linux?

You can use the regular OS X Finder to do the initial prep work of moving the documents to be edited into their own directory (i.e. folder). The command line sees the same thing that the Finder sees.

If you use Finder to create a folder inside your main Documents folder called "fixme", and you drag the files you want to edit into that folder, after you open up the command line you can get there by typing cd ~/Documents/fixme and then if you type ls you should see the files you dragged in there. Then you can go ahead with the Perl business.

On preview: are you just editing a set of files, or are you trying to make edits across some files and subfolders with more files in them as well? If the latter, the command line becomes a little more complicated.
posted by letourneau at 11:53 AM on November 11, 2008


Response by poster: On preview: are you just editing a set of files, or are you trying to make edits across some files and subfolders with more files in them as well? If the latter, the command line becomes a little more complicated.

Nope. Just a bunch of text files in one plain old folder.

Thanks guys- I am excited to play with my new...er...toy!
posted by foxy_hedgehog at 12:00 PM on November 11, 2008


I've used TextEdit and TextWrangler. Neither of them seem to have this option- their "Find/Replace" only applies to the particular document you are working on.

Actually, this screenshot from TextWrangler looks like it can handle what you want to do. I bet the "Multi-File Search" checkbox at the bottom of the dialog lets you do find/replace across a collection of files at once.
posted by letourneau at 12:01 PM on November 11, 2008


You can also do this on Mac command line: say I am a nice computer
posted by plexi at 12:09 PM on November 11, 2008


Best answer: You might want to make that:

perl -pi.BAK -e 's/\binternet\b/Internet/g' *.txt

The '\b's make it match only 'internet' alone as a full word. So "internetwork" doesn't become "Internetwork" and "thinternet" doesn't become "thInternet", but "the internet, a wild place" still gets changed to "the Internet, a wild place".
posted by zengargoyle at 1:02 PM on November 11, 2008


Best answer: Okay, this is something I don't think anybody's mentioned yet:

If you don't want to type in the path to the folder, just type "cd " (cd + space) and drag in the folder that contains your text files. Terminal will auto-escape every space and other bad character.
posted by 47triple2 at 1:09 PM on November 11, 2008


Slightly more dangerous, but if say you had a bunch of text files in a directory heirarchy, rather than all in the same directory and you wanted to do soemthing to each of them, you can do this:

find . -name '*txt' | xargs perl -pi.BAK -e 's/\binternet\b/Internet/g'

or to check that it's finding the right files first, do this:

find . -name '*txt' | xargs echo perl -pi.BAK -e 's/\binternet\b/Internet/g'

The xargs command runs the following command on each item in the list generated by the previous command in the pipe. So the find gets all files ending in .txt, and the output of this command is piped to xargs which runs the command perl -p -i -e ... on each file that is found.

Unix is fun and time saving, but can be dangerous.
posted by singingfish at 1:15 PM on November 11, 2008


Response by poster: Actually, this screenshot from TextWrangler looks like it can handle what you want to do. I bet the "Multi-File Search" checkbox at the bottom of the dialog lets you do find/replace across a collection of files at once.

Hey, thanks for that- I'm going to try out this Perl thing and see if it just might work.
posted by foxy_hedgehog at 1:16 PM on November 11, 2008


Also, the cool kids will snicker behind your back if you keep calling the OS X terminal "Linux mode". They all call it "a bash shell running in a terminal".

OS X and Linux both have this feature, and it works pretty much the same way on both, but doesn't mean that OS X's terminal belongs to Linux. Both OS X and Linux inherited it from Unix.
posted by flabdablet at 3:50 PM on November 11, 2008


Best answer: cd Documents/epicfailures/textfiles/wordchanges/filesthatspellinternetincorrectly

Yes, that's the idea.

In fact, Documents is itself a subfolder of your home folder, and the fact that the pathname you've just mentioned after the "cd" doesn't start with a slash means that it's relative to whatever folder you're cd'd into right now. So if you'd done

cd Documents/epicfailures/textfiles/wordchanges/filesthatspellinternetincorrectly

and then you subsequently try

cd Documents/hugesuccesses/textfiles/wordchanges/filesthatspellinternetincorrectly

this will likely fail, because your filesthatspellinternetincorrectly folder probably doesn't have a Documents folder inside it.

Every file on your computer has an absolute pathname, identifying unambiguously where it is. Absolute pathnames start with a slash.

You can find the absolute pathname of the current working directory at any time, using the pwd command.

For example: when you first open a terminal, the current working directory will be your home folder (the one that contains the Documents folder) and typing pwd might show you something like

/home/foxy_hedgehog

That would mean that the absolute pathname of your filesthatspellinternetincorrectly folder is

/home/foxy_hedgehog/Documents/epicfailures/textfiles/wordchanges/filesthatspellinternetincorrectly

You could stick that absolute pathname after cd regardless of what your current working directory was, and it would take you to the right place.

To make all this a little less painful, bash (the shell that's running in your terminal) has a few nice features you can use. First off, cd by itself (with no pathname after it) will take you straight back to your home folder.

Second, if you're in the process of entering a command and you hit the Tab key, the filename you're in the middle of typing will be auto-completed for you. You could type

cd Do[Tab]

and you'd instantly see

cd Documents/

appear on the command line, unless you also had a Dogs or Doughnuts folder, in which case you'd hear a beep; type a c and hit Tab again and the auto-completion would then work. You might be able to cd into your filesthatspell... directory with as little typing as

cd Do[Tab]ep[Tab]te[Tab]w[Tab]f[Tab]

Tab-completion totally rocks. It's usually way faster than pointing and clicking.

If you hit the up-arrow key, bash will step backwards through the commands you've recently entered and allow you to re-enter any of them, possibly after editing it first.

If you hit ctrl-R and type a few letters, bash will search backwards through your command history until it hits the most recent command containing those letters; subsequent ctrl-R keystrokes will repeat that search as many times as you like.

Welcome to your computer's literate user interface!
posted by flabdablet at 4:14 PM on November 11, 2008


I use MassReplaceIt to do just this. It's quick, it's easy and it's Mac.
posted by TheRaven at 4:33 PM on November 11, 2008


If you don't want to type in the path to the folder, just type "cd " (cd + space) and drag in the folder that contains your text files. Terminal will auto-escape every space and other bad character.

Holy shit - this is awesome! Thanks a bunch!
posted by suedehead at 4:55 PM on November 11, 2008


Hey, dragging and dropping files into the Terminal works in Ubuntu as well! Thanks for that tip, 47triple2.

Tab completion will also supply whatever escapes are necessary for spaces and whatnot in file names.
posted by flabdablet at 8:37 PM on November 11, 2008


singingfish: "Slightly more dangerous, but if say you had a bunch of text files in a directory heirarchy, rather than all in the same directory and you wanted to do soemthing to each of them, you can do this:

find . -name '*txt' | xargs perl -pi.BAK -e 's/\binternet\b/Internet/g'

or to check that it's finding the right files first, do this:

find . -name '*txt' | xargs echo perl -pi.BAK -e 's/\binternet\b/Internet/g'
...
"

I'd add -print0 so files with space in the name get handled well:

find . -name '*txt' -print0 | xargs -0 perl -pi.BAK -e 's/\binternet\b/Internet/g'
posted by zouhair at 2:30 AM on November 12, 2008


I was thinking about bringing up the -print0 thing but it felt like going overboard in the context of this discussion. But I think by now we've already decided to go down Overboard Road. Here's a way of avoiding having Perl generate .BAK files for files that aren't going to change...

grep -Zrl '/\binternet\b/' . | xargs -0 perl -pi.BAK -e 's/\binternet\b/Internet/g'

You can drag-and-drop your target directory in place of the . at the end of the grep call and before the | character.
posted by letourneau at 4:32 AM on November 12, 2008


foxyhedgehog, in reply to your MeMail:

First, the reason you'd have trouble searching and replacing parentheses is because they do indeed have a special meaning. The short answer is you need to use \( or \) wherever you find ( or ) failing to work. Here's the long answer.

Second, learning to use Perl isn't even slightly silly. Perl borrows syntax and concepts from enough other languages that in the process of coming to grips with Perl you'll find yourself picking other things up in self-defence. It's also a pretty good choice for knocking out quick write-only scripts to get specialized jobs done quickly and easily. And despite the best efforts of its many detractors, and despite the existence of newer scripting languages that are arguably cleaner and easier to use, Perl is not going away any time soon.

Don't worry about being "not a programmer". All it takes to learn how to write computer programs is an interest in doing so, an inquiring mind and practice, practice, practice.

As for learning to drive a Unix-family box using the command shell: in my opinion this is one of those things that everybody should learn along with their times tables, so I'm clearly the wrong person to ask for advice about it :-)
posted by flabdablet at 9:01 PM on November 13, 2008


« Older What *else* can I do with a BA in English?   |   Chant Box lowdown Newer »
This thread is closed to new comments.