Line-by-line document merge
May 22, 2011 9:23 PM   Subscribe

Is there any way to merge two Microsoft Word documents containing the same content in different languages so that one line of content from one document immediately follows the same respective line in the other? This is for subtitle translation, and I'll work with any word editor that can read .doc documents.

I'm translating a TV series about antiques, and it occurred to me that this would be a fantastic candidate for machine translation. The repetition of ong words ("the cobalt-based dyes of Qianlong-period Jingdezhen porcelain") and time I spend typing things like "Yeah, ok, good" would be great to save.

The problem is, the client wants these in one document, with the English subtitle directly beneath the Chinese subtitle. Google translator toolkit gives me a side-by-side comparison, which is fine for me to work on, but not for the client.

I want to go into translator toolkit, fix my translation, download my English-language results, then merge the translated document with the original. Is that doable?
posted by saysthis to Computers & Internet (3 answers total)
 
I think you can probably use Excel as an intermediary. Copy each "side" into an excel worksheet and write a macro combining the two (or just use formulas and then copy/paste contents). Then copy the result into a Word document. You may need to use Word's extra find-and-replace features addressing formatting marks to improve the layout further.
posted by carmicha at 9:44 PM on May 22, 2011 [1 favorite]


Totally ugly script time (assuming you've access to an OSX/*nix box). Copy the english in a file1, the translation into file2. You could change the emboldened number to a smaller one, if you know how many lines there are in your text.

for i in {1..100000};do sed -n "$i p" file1>>output;sed -n "$i p" file2>>output;done

This will only work if each line is individual - if it's part of a big paragraph, you'll get matching paragraphs.
posted by coriolisdave at 10:34 PM on May 22, 2011


If you are sure that they contain the same number of lines, such that line X in doc B is the translation of line X in doc A, you can do all of this in Microsoft Word:

0. You have two docs, one with Chinese, the other with English.

1. In each doc, convert the entire text to a single one-columned table. (Select all, then find the command for converting text to table.) Now you have two documents, each with a single one-columned table.

2. Select the entire column in the English document, copy it (ctrl-C), and then switch to the Chinese document.

3. In the Chinese document, at the top of the table, find the insertion point for a table column to the right of the existing Chinese column (I think the cursor will turn into a down arrow when you are in the right place) and then ctrl-P the entire English column to the right of the existing column.

(Go down to the bottom to check whether everything is actually matched up. They will get out of sync along the way if there is an extra paragraph in one of the docs.)

4. Select the entire two-columned table and find the command for converting a table to text. In the options window that pops up, tell it to separate the converted text with paragraph marks.

Done. Save the doc.

==================================================
Notes:

Whether you want English or Chinese to come first will determine which table you want to paste into. If you want Chinese first, paste English into the Chinese doc, so that the left column is Chinese and the right column is English. If you want English first, paste the Chinese doc into the English doc.

If there is more than one sentence per paragraph and you need to do this per sentence, you will first need to convert each end-of-sentence marker to a paragraph marker. Find and replace ". " and "? " with ^p everywhere, but watch out for exceptions I might not be thinking of right now. This could get messy, but maybe it's unneeded.

If the translations do NOT contain the exact same number of lines, such that line X in doc A may or may not be the translation of line X in doc B, because sometimes there is an extra paragraph in one doc or the other, you will have to adjust for this before or after you convert to tables, but before you convert the final table to text. Maybe do the docs piece by piece (chapter by chapter, segment by segment) to reduce the chances of things not matching up.

If you want the languages to be in different colors, do this before step 4: select one of the columns, then change the text color of that column. After the conversion, the converted text will retain the new color. This might make it easier to read this thing. You could do this with styles, too -- make one column one style and the other column another style, so that you can manipulate the text of one language afterward by changing the style definition for that language.
posted by pracowity at 11:53 PM on May 22, 2011


« Older BA in Bio or Environmental Science?   |   18-year-old going to NYU, moving from California Newer »
This thread is closed to new comments.