Transcoding video. Lots of video.
May 20, 2010 9:36 AM   Subscribe

I have a huge pile of video that needs to be transcoded. And it needed to be done yesterday.

About 3 terabytes of .VOB files (~600 hours) were just dumped in my lap, and need to be converted into .MP4 files that can be distributed/streamed to a large number of users.

Our streaming infrastructure uses Flash (I know, I know), and we'll eventually want all of the video in that format as well. This is priority #2, as we want these files to simply be accessible now. I also do not imagine that there is any practical way to transcode directly from VOB to FLV.

Assume I have access to a large number of Windows machines and a private gigabit network. I can quickly burn off enough Linux LiveCDs if that will help. Mac-based solutions are out of the picture. Throwing money at the problem may be possible, although timing remains critical.

What is the best/fastest way to get these files out into the open, and distribute the transcoding jobs among our computing resources? We need to keep setup time and encoding time to a minimum. Ideally, we'd also like to be able to add additional machines to our pool as the job progresses.

Is Kerrighed + Handbrake a valid option? A shell script to shoot encoding jobs off to individual clients? ClusterKnoppix? Instant-Grid/Globus?
posted by schmod to Computers & Internet (13 answers total) 3 users marked this as a favorite
 
If your VOB files are accessible on network storage to the individual machines, and you already know what (assumed to be a commandline utility) you are going to use to do the encoding, I think clustering software is probably overkill. Just split your list of files into as many chunks as you have encoding machines, and have each machine just sequentially go through the list transcoding. They can pull the original file off network storage, and probably spit the output onto that same network storage so you have all the results in one place.
posted by Vulpyne at 9:44 AM on May 20, 2010


Response by poster: That's the idea. Now, how do we automate it?
posted by schmod at 9:46 AM on May 20, 2010


The main problem with DVD is the mix of progressive, interlaced, telecined, and in some cases field-blended or badly standards-converted content. Other than that ffmpeg can make VOB to MP4 and VOB to FLV fairly painless.

On Win32 machines, you can do a lot with batch files, DGIndex, Avisynth, and x264/LAME/mp4box/ffmpeg. That doesn't get you around the content mix problem, but using the IsCombed() command from the Decomb package for Avisynth can let you (in conjunction with something to read text files from the command line, and getting into an area beyond my knowledge) programatically determine to a reasonable degree of certainty what kind of content you're dealing with.
posted by Inspector.Gadget at 10:08 AM on May 20, 2010


This is kind of difficult to answer not knowing how many computers you have and not knowing how fast your 3TB of storage/network is.
(I'm assuming you have windows machines.)
  1. I would manually spit your data up into folders that are of approximately the same size.
  2. Then go to each computer and map a network drive to one of the folders of data. (One computer to a folder.)
  3. Install FFmpeg on each computer
  4. Start the command prompt and CD to your network drive
  5. Use ForFiles to pipe files into FFmpeg.
  6. Use Windows Task Scheduler and a Batch file to copy finished files back up to the network storage every hour or so.
FFmpeg should let you convert the files straight from VOB to FLV. You might have some issues if the VOBs are from DVDs that are of slightly different format. But you probably care more about getting the files out than everything being perfect.
posted by gregr at 10:18 AM on May 20, 2010


My experience is all in Unix OSes, so I can't help you too much for automating it on Windows. On windows, I assume you would most likely use a batch file. On Linux you'd use a shell script.

For Linux: Let's assume your transcoding command is "vob2flv <input> <output>" and you have a list of input files in input.list
sed 's@^\(.*\)\.vob$@vob2flv \1.vob \1.flv@g' < input.list >commands.sh
That'll transform the list of filenames into a list of commands to transcode them. Ie, input line "/path/to/blah.vob", output line (to commands.sh) "vob2flv /path/to/blah.vob /path/to/blah.flv". Then you can split that file into as many chunks as you have using the split utility. For example, if you had 10 machines, and 100 files to encode, you'd want to split it into chunks of 10 lines with "split -l 10 command.sh", it'll generate files with names in the format "xaa", "xab", etc. Then you just copy them to the network storage, and run xaa on machine 1, xab on machine 2 and so on.

For Windows you could generate a batch file in pretty much the same fashion (you could even generate it on a Unix machine). I have no idea of your knowledge level, but having some batch/shell scripting knowledge will help you a lot when automating things - or access to someone in-house who does.
posted by Vulpyne at 10:29 AM on May 20, 2010


Response by poster: Compiling a list of command lines isn't too difficult, and I can easily make a list of files and a command line to fire off FFmpeg or Handbrake on each machine.

However, I'd rather automate the process of distributing jobs to each machine over manually splitting up the files (especially since not all the files are the same size, not all of the machines are equally powerful, etc).

Forfiles is a neat tool. I'll definitely have to add that to my arsenal. Thanks!
posted by schmod at 10:49 AM on May 20, 2010


Have you looked at something like zencoder?

http://www.zencoder.com/
posted by bitdamaged at 11:07 AM on May 20, 2010


dvd::rip has a cluster mode you could probably use to do 90% of the automation you want.
It's going to take some setup, and it's a GUI-based system, so it's not going to be completely automatic, but if the object is to just get the project done, it's probably just as fast as coding and testing a custom shell script.
posted by madajb at 11:15 AM on May 20, 2010


Assuming you have administrator rights on the remote machines, PSExec might be useful for kicking off your encodes rather than running around all the machines.

You could use Vulpyne's method and psexec to kick them off remotely - just launch 10 at a time on every machine then top up the ones that go faster (or maybe I am underestimating how many files you have).

Failing that you are going to have to whip up a script of some sort to feed the files to the machines as they are free (if these are a motley collection of office machines, you might be quicker concentrating on the ones that have decent CPUs).

on preview madajb's link looks interesting
posted by samj at 11:20 AM on May 20, 2010


Response by poster: Zencoder's cool, but unrealistic for bandwidth reasons.

I stumbled across dvd::rip, which also looks interesting, but has a rather involved setup process.

psexec and foreach are a step in the right direction! I basically just need a version of psexec that passes commands to a pool of computers, rather than one predefined machine, and queues up another task once it's completed.
posted by schmod at 11:28 AM on May 20, 2010


I kind of doubt that you will find a piece of software exactly tailored to your problem. Cobbling software together will probably be your only option.

Doing some combination of what people suggest here right now will probably be quicker than trying to find some fully automated solution.
posted by gregr at 11:59 AM on May 20, 2010


If you have a huge pipe, a willingness to give in to The Man, and python skills, Amazon AWS / EC2 would be really well suited for this task. The boto library was built by someone doing the same thing, so its example program is super-well-suited for transcoding. It also would put the media in a place that you can stream it quickly, if you want to use S3/CloudFront.
posted by tmcw at 1:02 PM on May 20, 2010


Response by poster: Pipe's big. Not that big. Pretty sure I'd be murdered by our network admins if I tried to stuff 3TB through in a day.

Right now, the project's being held up in some sort of bureaucratic tangle, which frustrates me to no end, because the content of the videos is "actually quite important."

Current gameplan is to try Media Encoding Cluster or dvd::rip if the project lands back in my lap.

As an interesting footnote, Amazon lets you sneakernet huge amounts of data into S3. Unfortunately, the structure of my organization is such that using AWS would require even more bureaucratic tangling, taking it out of the picture for the immediate future.
posted by schmod at 6:02 PM on May 22, 2010


« Older Mad-town Medicine   |   It was the most awesome thing, or something like... Newer »
This thread is closed to new comments.