Questions about cron
February 21, 2007 2:57 PM   Subscribe

Paging Linux geeks -- please teach me the inner mysteries of cron and scripting. Simple questions follow.

Right now I have a Windows machine that downloads information off the web every hour. This computer is a 10-year old notebook computer that's about to fall apart, and I have a new computer on the way to replace it. I badly need a Linux system to run some institutional science apps, so it looks like I should go the Linux route.

Currently, my Windows scripting is orchestrated by a simple hand-coded application whose sole job is to call a different Windows batch file at 10 minutes past every hour (GET00.BAT, GET01.BAT .... GET23.BAT). Each batch file calls wget to download whatever data is needed. It works splendidly.

However it's not enough just to call the batch file... my app also has to form the pieces of the date/time and call them as arguments. For example today 2/21/07 4:00pm would be called as:
GET00.BAT 2007 02 21 16 00
(note that some are zero-padded). These date/time values are needed to construct target filenames and are used in some source URLs. For example, the command WGET blah.com/junk.dat -O %1-%2-%3 would put my data into 2007-02-21.dat.

So my questions are essentially:

1) Is cron appropriate for this kind of work?
2) How can I set up a cron job where it runs all the time (i.e. when the computer is always on)? I'm not clear on how daemons are set up in a Linux desktop environment.
3) Can scripting devise these required time parameters explained above, or can cron pass them? Or is an ugly C++ program needed?
4) A real newbie question: Does my choice of a shell (ksh, csh, etc) matter much here?

Fortunately the data is not critical, so it's no biggie if I make some mistakes getting this set up. I'm just hoping for a push in the right direction. Thanks.
posted by zek to Computers & Internet (11 answers total) 2 users marked this as a favorite
 
You can use date to create the date arguments to your script inside backticks, like this:

yourscript.sh `date +"%y"` `date +"%m"` `date +"%d"` `date +"%H"` `date +"%M"` `date +"%S"`

That passes the current year, month, day, hour, minute, and second to the shell script.

Each of those bits in backticks can be run independently at the command line to see what they'll give you. Read the man page for the date utility to see what values it can spit out.
posted by letourneau at 3:10 PM on February 21, 2007


Actually, you may need to read the man page for strftime to get the various date format codes. And the "%y" in my example should be capitalized if you want the proper four-year date. But you get the picture...
posted by letourneau at 3:12 PM on February 21, 2007


I'm not sure which flavor of Linux you're using, but on ubuntu you can just throw your shell script in /etc/cron.hourly/ and it'll automatically run every hour.

It's also fairly simple to set up to run automatically at 10 past.
posted by dentata at 3:15 PM on February 21, 2007


Is cron appropriate for this kind of work?

Yes.

How can I set up a cron job where it runs all the time (i.e. when the computer is always on)? I'm not clear on how daemons are set up in a Linux desktop environment.

Most Linux distros I've used have had cron turned on by default. You can use crontab -e to edit the cron table. Type man 5 crontab at the shell for documentation of file format. This will run different scripts at 10 minutes after each hour every day of the month, every month, every day of the week.

10 00 * * * $HOME/scripts/get00.sh
10 01 * * * $HOME/scripts/get01.sh
...
10 23 * * * $HOME/scripts/get23.sh


Can scripting devise these required time parameters explained above, or can cron pass them? Or is an ugly C++ program needed?

letourneau gave you a hint on how to do it with scripting only. As a stylistic note I would use $() instead of backticks. It makes it easier to see what is going on when you have to edit the script later:

yourscript.sh $(date +"%Y") $(date +"%m") $(date +"%d") $(date +"%H") $(date +"%M") $(date +"%S")

A real newbie question: Does my choice of a shell (ksh, csh, etc) matter much here?

To use the syntax I just showed you, you must use bash. If you are getting started, that's what you should use, because I think most Linux users use it, and it will make following other examples easier.
posted by grouse at 3:24 PM on February 21, 2007


I should also probably note that if you're writing a shell script, you don't necessarily need to pass the various pieces of the date to the master script as arguments: you could use the same backtick (on preview: or $()) technique to generate them as needed inside the script.

Your wget command line, for instance, might be:

wget blah.com/junk.dat -O `date +"%Y-%m-%d.dat"`

...which would get you the output in 2007-02-21.dat without relying on the master script's arguments.
posted by letourneau at 3:26 PM on February 21, 2007


zek, a note about the shell that may not be obvious to Windows users:

The shell is making two passes over the command. The first pass replaces variables with their values, and also replaces expressions in backticks with the output ("stdout") of that expression when executed.*

So, that's why having date +something in backticks does something like what you want. The output of the date command gets put inline right in your command.


* (The first pass also resolves expressions that have to do with filenames, so unlike in the Windows world, rmdir xx* does not pass the string "xx*" string as a parameter to "rmdir". If possible, it replaces the wildcard with the names of matching filesystem entries, doing all the hard work (consistently!) so the program you're using doesn't have to.)
posted by cmiller at 4:37 PM on February 21, 2007


I think you might want to rethink the whole shebang since your DOS batch scripts won't work on linux anyway. It might be better to restart with a modern scripting language such as Python, Perl, or Ruby. While unix shell scripting is certainly more powerful than DOS batch scripts, its syntax is a little hard to grasp since, as you've already seen, it tends to depend on external utilities to do everything.

I'd probably just use one cron entry:

10 * * * * /usr/bin/python /home/mystuff/get_data.py

And do all the time calculation and manipulation in the script itself. Even if you are fetching different data every hour, this would still probably be simpler than having 24 different scripts.
posted by chairface at 4:45 PM on February 21, 2007


Disagree with chairface - while you certainly could do it in perl or python, it seems like overkill for a couple of wget calls. I'd stick with a simple bash script.
posted by chrisamiller at 5:27 PM on February 21, 2007


I've had to create 24 batch files to cover the day on Windows before and Linux is going to be a lot easier for this. The cron daemon runs all the time as there are system tasks that depend on it: it's a core utility.

You don't say whether the page that gets downloaded can be overwritten (only one page per .dat file), so to have just the latest page retained for that day:

10 * * * * wget $URL -O $(date +%Y-%m-%d).dat

if you want the file to accumulate all of the page pulls for that day, you'll want to set wget to append to the file:

10 * * * * wget $URL -O - >>$(date +%Y-%m-%d).dat
posted by rhizome at 5:36 PM on February 21, 2007


You are going to want to use -q (--quiet) with wget, or else you'll get an email with the program's output every time it runs. Alternatively you can discard the output (>/dev/null 2>&1). Cron is designed such that success means nothing is output by the job, and so it emails you as a warning if there is any produced. You can also disable this (or set the recipient) with MAILTO="" in the crontab. See also "man 5 crontab".

For running every hour, you can just use "*", as everyone has already noted. There is no need to enumerate one line for each hour. In fact cron can do this for you, for example if you wanted every other hour at 10 past you could use "10 */2 * * *". You can also do a list of hours, e.g. "10 2,7,16,22 * * *" means 02:10, 07:10, 16:10, and 22:10. My point here is that if you find yourself repeating a job spec as multiple lines in a crontab you are doing it wrong.
posted by Rhomboid at 6:35 PM on February 21, 2007


Choose bash.

If you post a link to your existing BAT files, I'll write an equivalent bash script that you can use with a crontab entry of

10 * * * * /home/scripts/get-stuff

and post a link back.

I expect it will be much much shorted than your existing .BAT solution, and that you will be impressed by how many hoops you no longer have to jump through when you move from the crippled command.com to a proper scriptable shell like bash.

It's actually fairly rare to find a need to indulge in convoluted DOS-esque manoeuvres to make bash do what you want it to. Still fun when it happens, though.
posted by flabdablet at 7:28 PM on February 21, 2007


« Older How many baby pics can you look at without...   |   HOW TO GET MARRIED? Newer »
This thread is closed to new comments.