A specific program for copy editing
September 10, 2018 7:03 AM   Subscribe

Does this already exist? Could I build it myself with minimal experience and only reasonable amounts of tearing my hair out? (Please say it already exists somewhere.)

Ok so if you wanted a ...script? Program? Widget thing on the computer to do the following:

1. Pull out a full sentence from a manuscript at random and present it to you
2. (Optional) allow you to edit the sentence in the document (like correct typos or errors)
3. Once you’ve done that, you can mark it as done and call up the next sentence
4. Cycle through each sentence in the manuscript once like so

I want to use this as an additional copy editing pass for manuscripts of 60-100k words. So this would put all the individual sentences in the manuscript into a randomized queue (or whatever), and then allow me to inspect them one by one for errors outside of context. (I find it’s a lot easier to spot dropped words or typos or whatever looking at a sentence on its own, rather than having to read it in the flow of the MS.) If being able to edit inside this little widget adds too much to the complexity I can just run “find” on the offending sentence and then correct in the original document, so 2 is very, very optional.

If the only programming experience you had was a single introductory class approximately a million years ago in a language no one uses, and you have forgotten it all anyway, how would you go about acquiring said widget?

Is this something simple enough I could build it myself over the course of a couple of weeks, starting from actual scratch? Where would you start with this very specific project in mind?

Also! Does this already exist? Is it a plug in somewhere or something? (This would be soooo great.)

I have a MacBook Air, so...limited within those constraints.
posted by schadenfrau to Computers & Internet (19 answers total) 6 users marked this as a favorite
 
If you don't find such a program, I've found that proofreading backwards one sentence as a time also helps. You could further randomize by setting all paragraphs to keep together, printing the document, and shuffling the pages.
posted by Jacqueline at 7:08 AM on September 10, 2018 [2 favorites]


This probably wouldn't be too hard to code up (or pay someone else to) if you're working strictly with plain text files. It would get more complicated if you're working with formatted text.

I'm a translator, and what you're describing is similar to many translation-memory tools, which breaks a document into "segments" (sentences) and then presents you those segments in a two-column table, with the source on the left and your translation on the right. It doesn't do the randomization thing. That might be another avenue to explore.
posted by adamrice at 7:28 AM on September 10, 2018 [2 favorites]


Hi. I'm a professional copy editor. I don't think anything like this exists. (I also think it's a terrible idea, but hey, everybody's got a different working style.) My suggestion is for you to join the Facebook groups "Editors of Earth" or "Fiction Editors of Earth" where a guy named Paul something frequently posts. He's a programmer who makes a set of purchasable macros for editors. He (or someone else in those groups) might know if this existed, or maybe have some ideas about how to implement it.
posted by BlahLaLa at 7:30 AM on September 10, 2018 [4 favorites]


Response by poster: (Don’t want to threadsit but I’m also hella curious about why you think it’s a terrible idea. And thank you for the suggestion!)
posted by schadenfrau at 7:35 AM on September 10, 2018 [1 favorite]


To expand on what adamrice said (I too am a translator) you could actually use a translation memory tool to do this, if you copied all source segments to target (i.e. both your original and translation would be in English) and worked on the "translation" column. You could even sort the segments in various ways, though I'm not sure if it could generate a random list.

The major translation tools (Trados and MemoQ) are quite expensive for your purposes, and they have a pretty significant learning curve - there are cheap and free ones which might be simpler but I'm not familiar with them.
posted by altolinguistic at 8:04 AM on September 10, 2018 [2 favorites]


What text editor are you using?

If it's editing anything more complicated than plain text, then this is the kind of thing that's probably best done inside the editor, using whatever inbuilt scripting facility it has.

On the other hand, if it is plain text, it's the kind of thing that should take under an hour to whip up using Unix text processing tools and a bit of shell script glue.
posted by flabdablet at 8:10 AM on September 10, 2018 [1 favorite]


Response by poster: I work in scrivener and usually send MS word files to editors and proofreaders.

For these purposes, though, I’d be willing to put everything in plaintext, use that for my random sentence shuffler spotlight thing, and then when I find a sentence with an error that needs correcting, go back and find the corresponding sentence in the word file and fix it there.
posted by schadenfrau at 8:17 AM on September 10, 2018


Incidentally, the way I'd tackle this would be slightly different to the way you're thinking about it. I'd write two scripts: one called shuffle that would take a manuscript file and produce a shuffled version with all the sentences in a randomized order, each sentence prefixed with say 50 blank lines and a number identifying its original position within the manuscript.

The blank lines should create enough visual fragmentation to let you work through the shuffled file from top to bottom inside your usual text editor while seeing only each sentence in isolation.

The second script, called unshuffle, would use the number prefixes to restore the original sentence order, then remove the numbers and added blank lines to produce a version of the file that looks like you proofed it in order.
posted by flabdablet at 8:21 AM on September 10, 2018


I don't have Scrivener or a computer that will run it natively, but a quick googling leads me to believe that it has no internal scripting language but does use an internal document format based on XML and RTF, which would make Scrivener documents relatively easy to manipulate with external tools.

If you email me a sample Scrivener document containing, say, 100 sentences, I'll cast an eye over it and see how much work would be involved in whipping up Mac-compatible versions of shuffle and unshuffle that work against Scrivener documents. If the document format is halfway reasonable, I would not expect this to be at all difficult.

Email address is in profile.
posted by flabdablet at 8:29 AM on September 10, 2018


I'm not saying this would be the best way to do it, but I would turn to Excel in this instance. I would ideally start from a plain text file with each sentence on its own line and paragraphs noted in a specific way. I'd paste those in, add a column with the original order, another with a random number and a third for the edited version. Sort by the random number and work your way down. Once you're done, sort by the original order.

Merging the edits into the original can be done in a few ways. I'd likely just filter for any blanks in the Edit column and put in a formula to make them equal to the original column.
posted by soelo at 9:14 AM on September 10, 2018 [1 favorite]


Sample document received; thank you.

Given that a Scrivener document is actually a folder that can contain multiple RTF files as well as assorted other stuff, it seems to me that the path of least resistance would be to preserve as much of the overall project structure as possible and keep the sentence-shuffling operations internal to each RTF file.

So you might end up proofing sentences randomly re-ordered within a chapter, or within a part, or within some other sub-structure decided on by the document author, rather than randomly re-ordered across the project as a whole.

Is that acceptable? Because I think I can make that work with very little effort.
posted by flabdablet at 9:59 AM on September 10, 2018 [1 favorite]


Response by poster: Oh yeah, for sure. The MS is usually split up into chapter documents within a draft folder, so I could just do it chapter by chapter. (Or, as you say, part by part, depending on the project. I could also just put everything in one RTF file. Options!)

That would be fantastically helpful!
posted by schadenfrau at 10:12 AM on September 10, 2018


I save a second copy (of course), clear all styles, replace every para + space with para + return (so now you have each para containing one sentence). Copy all that, shove into Excel and create several other columns. One column is order of sentences (just run sequential numbers). In another column, go LEN (text cell #) and run it through (gives you length of sentence). In another, go LEFT (text cell #, 15). This gives you first 15 characters. Sort by each column. Use conditional formatting to highlight duplicates in any of the columns. You'll find duplicate sentences and unusual formations.

Open second spreadsheet. Call this master checklist and reuse for every manuscript. Down the first column write the things that have been errors in any manuscript (period double space, space hyphen space, % space) and so on. Across your top row, you put your chapter #. Mark off a cell for each error as you check for each chapter.

Use Grammarly. Compound words seem to be my clients' most common problems. Throw a chapter in at a time to find issues, and add tgem to your mastersheet (Find all instances of "well " followed by space), (find all instances of "ize" if correcting for US spelling) (find ampersands or percent of question mark followed by any letter). For any headings that must appear in all chapers, that too goes in your list (intro, bio, ref list). Depending what level of formatting, you can throw in heading / footer checks, prelim content, if the word "table" appears in the paragraph preceding a table and ditto "figure". Look for the word "pubic" if "public" is a common term (Grammarly, bless, luckily brought that one to my attention, pre submission to publisher).

If you do referencing use reciteworks to catch errors there (it's not perfect in the reference list, but it's really good checking to see if a reference is in text AND ref list).

But do keep that master list, somethings are just shockingly common. Oh yeah, see equals signs and make sure spaces either side. There's a nifty word find formula that allows you to find acronyms two words or longer (4am on tablet so you'll need to google).

Oh, look (in Word) for quote marks. You could find and replace both and make them highlighted, or just find each one by one. People don't always put the pair in. When looking for sigle quotes, avoid hitting apostrophes by looking for space '.
posted by b33j at 10:58 AM on September 10, 2018 [4 favorites]


Oh, and when I find issues in the Excel data, I immediately apply the solution to my master doc, and note that I've checked it for each portion of the manuscript in my master spreadsheet.
posted by b33j at 11:00 AM on September 10, 2018


So I've been playing with scripts, and I have a pair of shuffle and unshuffle scripts that will successfully round-trip the RTF document embedded in the .scriv project folder you sent me between its original condition and a shuffled version with randomly ordered, numbered sentences with multiple line breaks between each.

Which is all well and good, but the breaking into sentences and subsequent reassembly is done with some fairly fragile Perl-based search and replace code, and when I edit the shuffled version using LibreOffice Writer and save it, it inserts tremendous amounts of formatting cruft that breaks all the assumptions the unshuffle script needs to make to put the document back together again.

I think I'd need to get my hands on an actual instance of Scrivener and/or a wider range of sample texts to construct anything reliable.

Here's the shuffle script, for what it's worth. Open a new Textedit window, turn off every "smart" option that Textedit has got, paste this in, then save it as shuffle.command and alter its permissions to let it be executed. You should then be able to use Finder to drag and drop a .scriv project onto it, at which point it should make a -shuffled.scriv version of the same project with all the sentences split apart and randomized.

I wrote this on a Debian box, not on a Mac because I don't have one; I've tried to use only facilities available on both, but this script has not been tested in its target environment so it will probably break in interesting ways. You have been warned.
#!/bin/bash

# Loop over all specified names, processing folders whose
# names end in .scriv and complaining about others.

main() {
	local project
	for project
	do
		case ${project%/} in
		*.scriv)
			if test -d "$project"
			then
				process_scriv "${project%/}"
				continue
			fi
		esac
		echo $project: not a .scriv project >&2
	done
}



# Process the specified .scriv folder. Start by making
# an identical copy, then replace all the RTF documents
# inside the copy's Files/Docs subfolder with shuffled
# versions of the originals.

process_scriv() {
	local source=$1
	local target=${1%.scriv}-shuffled.scriv
	local doc
	rm -rf "$target"
	cp -a "$source" "$target"
	for doc in "$source/Files/Docs/"*.rtf
	do
		shuffle_rtf "$doc" "$target${doc#$source}"
	done
}



# Shuffle the sentences in an input RTF document to make
# an output RTF document.

shuffle_rtf() {
	perl -0777 -nE '
		s/\\/\\\\/g;
		s/\n/\\n/g;
		s/((^\{((\{[^}]*\}|\\\\[0-9;A-Za-z]+)(\\n| )*)+)|[^.!?]*[.!?]"?( |\\+n))/$1\n/g;
		s/\}$/\n\}/;
		@lines = split "\n";
		say "-1 $lines[0]";
		for ($i=1; $i<@lines-1; ++$i) {say rand . " " . "\\\\n" x 16 . "$i: [$lines[$i]]"};
		say "1 $lines[-1]";
	' "$1" |
	sort -k1n |
       	cut -d' ' -f2- |
       	perl -pe 'chomp; s/\\n/\n/g;s/\\\\/\\/g' >"$2"
}



main "$@"

posted by flabdablet at 3:58 PM on September 10, 2018 [2 favorites]


I'll be curious to see if this works. I believe Scrivener also has a manifest that keeps track of files, so if you modify them outside the app, it may throw off the package. I go between the iOS and MacOS version with my projects and sometimes they get out of synch and you have to resolve that. Maybe it only uses modification dates or something, but I have a feeling it's a bit more complex, since PC, Mac, and iOS handle mod dates a bit differently.
posted by cjorgensen at 7:30 PM on September 10, 2018


Response by poster: This is fucking dope. I’ll let you know if it works tomorrow!
posted by schadenfrau at 9:04 PM on September 10, 2018


Response by poster: (It’s midnight, is all, so I guess technically tomorrow, but you know what I mean.)
posted by schadenfrau at 9:05 PM on September 10, 2018


I know this is late, but if you haven't settled on a solution to this yet I wanted to suggest AppleScript.

Assuming Scrivener makes functions available AppleScript, which it might, then you can use AppleScript to automate Scrivener itself.

For example, here are two examples of using AppleScript to automate MS Word to work with specific sentences.

From the second link:
tell application "Microsoft Word"
delete (sentences 1 thru 4 of active document)
end tell

posted by duoshao at 5:49 AM on December 5, 2018


« Older First steps for drug addiction recovery?   |   How to deal with '(s)he's out of my league!!'... Newer »
This thread is closed to new comments.