Why do my Python scripts not finish?
June 11, 2019 12:03 PM Subscribe
For a hobby project, I run hourly Python scripts via a cron job. These scripts always run correctly (ie produce expected output), but sometimes one of them doesn't terminate and the Python process lives on, hogging memory until manually killed. Can you help me troubleshoot this?
I'm running Python3 (Anaconda build) on an Ubuntu remote server. The script runs Pandas and Matplotlib code to read in CSV data, do some processing, and draw some plots.
If I run a `ps` command after a couple days, I see some of the processes haven't completed:
I can't recreate this problem outside of the cron job -- when I run the script by hand in the terminal it completes fine. Even the cron jobs complete *most* of the time, only failing to complete ~10% of the time.
I use python a lot (but am not an expert by any means) and this is the only time I've run into this issue. I have a bunch of other scripts that run via cron jobs in the same environment, and only the one script has this problem.
I know there's not enough information here to really debug this but if this is something you've seen before, or if you have any suggestions of things to try, I'd really appreciate any advice.>
I'm running Python3 (Anaconda build) on an Ubuntu remote server. The script runs Pandas and Matplotlib code to read in CSV data, do some processing, and draw some plots.
If I run a `ps` command after a couple days, I see some of the processes haven't completed:
$ ps -eo pmem,pcpu,etime,vsize,pid,cmd | sort -k 1 -nr | head -5
4.4 0.0 8-01:37:37 801820 15556 /home/msj/miniconda3/bin/python3 /home/msj/mobi/mobi.py --plots /data/mobi/data/
4.4 0.0 4-03:37:37 793808 8619 /home/msj/miniconda3/bin/python3 /home/msj/mobi/mobi.py --plots /data/mobi/data/
4.2 0.0 6-00:37:37 793432 15083 /home/msj/miniconda3/bin/python3 /home/msj/mobi/mobi.py --plots /data/mobi/data/
4.2 0.0 5-02:37:37 793252 26810 /home/msj/miniconda3/bin/python3 /home/msj/mobi/mobi.py --plots /data/mobi/data/
4.0 0.0 7-23:37:37 789124 19969 /home/msj/miniconda3/bin/python3 /home/msj/mobi/mobi.py --plots /data/mobi/data/
I can't recreate this problem outside of the cron job -- when I run the script by hand in the terminal it completes fine. Even the cron jobs complete *most* of the time, only failing to complete ~10% of the time.
I use python a lot (but am not an expert by any means) and this is the only time I've run into this issue. I have a bunch of other scripts that run via cron jobs in the same environment, and only the one script has this problem.
I know there's not enough information here to really debug this but if this is something you've seen before, or if you have any suggestions of things to try, I'd really appreciate any advice.>
Best answer: Do the scripts fail to complete if you remove all the plotting code? You may need to throw in some plt.close() (or plt.close(fig) if there is more than one figure). I've seen matplotlib launch extra python processes that stick around.
posted by caek at 2:08 PM on June 11, 2019 [3 favorites]
posted by caek at 2:08 PM on June 11, 2019 [3 favorites]
The first thing to try is to attach to one of the stuck processes in a debugger and see if there are any clues to be had from the backtrace.
If you go this route, you'll probably want to install the Python GDB extensions. In particular, this package adds a "py-bt" command that acts like "backtrace", but extracts a Python-level stack trace from the interpreter's data structures, instead of just showing you what part of the interpreter each stack trace is in. This is likely to be a lot more useful for tracking down the problem.
A simpler way to get a stack trace is to just kill the process with SIGINT, which is equivalent to pressing Ctrl-C in a terminal. (This will give you a stack trace on stderr, but it only works as long as nothing else catches the signal or the resulting KeyboardInterrupt exception, and as long as it isn't stuck in an uninterruptible system call.)
posted by teraflop at 3:13 PM on June 11, 2019 [1 favorite]
If you go this route, you'll probably want to install the Python GDB extensions. In particular, this package adds a "py-bt" command that acts like "backtrace", but extracts a Python-level stack trace from the interpreter's data structures, instead of just showing you what part of the interpreter each stack trace is in. This is likely to be a lot more useful for tracking down the problem.
A simpler way to get a stack trace is to just kill the process with SIGINT, which is equivalent to pressing Ctrl-C in a terminal. (This will give you a stack trace on stderr, but it only works as long as nothing else catches the signal or the resulting KeyboardInterrupt exception, and as long as it isn't stuck in an uninterruptible system call.)
posted by teraflop at 3:13 PM on June 11, 2019 [1 favorite]
Response by poster: I appreciate the gdb recommendations since I hadn't heard of it, but adding in a `plt.close('all')` at the end of my script seems to have fixed the problem. Thanks!
posted by no regrets, coyote at 2:55 PM on June 12, 2019
posted by no regrets, coyote at 2:55 PM on June 12, 2019
This thread is closed to new comments.
posted by cyanistes at 12:31 PM on June 11, 2019 [2 favorites]