Complex conditional text: compare values, spit out sentences
April 4, 2014 1:12 PM   Subscribe

I have a project where I am tasked with presenting many data points, and need to have a comparative sentence or two before each section featuring the highlights of the data. "In this section, X is the highest value!"

So say I have fields N1, N2, N3, N4 and my desired output text (italics being comparative language determined by N3 & N4).

"marijuana","sixth grade" 14%, 6% : "Student use of marijuana in the sixth grade (14%) was significantly higher than the sixth grade rate for the state (6%).
"alcohol","overall" 30%, 32% : "Student use of alcohol overall (14%) was about the same as the overall rate for the state (32%).

I'll be outputting in InDesign but I get the sense that this is a completely different beast from ID's conditional text feature is designed to address.

I understand there would most likely be scripting involved, and so I think I am also looking for some keywords to help me find out more about this type of thing. (I could kludge something together with an If/Then/Else approach but I'm hoping for a more rigorous/robust method.)

I've seen a million reports generated by psychiatric evaluation reporting modules that are designed to spit out automated patient reports in this sort of natural language, but so far the keywords for understanding more about the subject elude me. Googling "conditional text", "natural language generation," "data-to-text" all get me tons of less-than-relevant results. ("conditional sentences" = judicial drug law reform!) Can anyone provide me an insight to what these types of algorithms are even called?
posted by BleachBypass to Computers & Internet (4 answers total) 1 user marked this as a favorite
 
Do you just need the sentences that you can then paste into indesign or something? Because this is trivially accomplished in Excel or anything with a concatenate function.

You're missing an N5 though -- "significantly" has an actual, specific meaning in the context of data that isn't just "a lot" -- make sure you're including whether or not the difference between your two values is significant (a fact which cannot be determined solely from knowing the two values).
posted by brainmouse at 1:16 PM on April 4, 2014 [2 favorites]


I understand there would most likely be scripting involved, and so I think I am also looking for some keywords to help me find out more about this type of thing. (I could kludge something together with an If/Then/Else approach but I'm hoping for a more rigorous/robust method.)

That's pretty much exactly what I would do.

def describe_difference(N3, N4)

diff = N3/N4

if (diff > .5) { output = "much greater than" }
if (diff > .3 && diff < .5) { output = "somewhat greater than" }

return output


That said, I'm a programmer by trade so I pretty naturally gravitate toward scripting as a solution to just about any sort of problem like this.
posted by Tomorrowful at 1:20 PM on April 4, 2014


Response by poster: I'm definitely looking to automate, there are 700 or so reports, with conservatively 40 fields per report. But that's a great point about significance.

I've done it in excel before and it got ugly pretty quickly. But I think I was just making it hard on myself by nesting excel functions and not separating them out. UNRegardless, importing from excel is another one of the weaknesses of the variable data production plugins I am looking at for InDesign. InDesign does an acceptable job, but none of the VPD suites respect ID's native import filters.

I think I was making it extra hard in my head, Tomorrowful. Thanks for the simplicity. I'm self-taught and I always forget simple best practice things like writing a few functions and calling them over and over. (I was picturing all kinds of fancy custom syntax for each field.)

Now I just have to figure out where I can insert this into the automation process. Most of the VDP plugins are weirdly not into letting me trigger my own scripts (Alternately, "Consider our $3000 scripting module addon!")
posted by BleachBypass at 2:29 PM on April 4, 2014


are you absolutely married to doing this in indesign? I write my dynamic reports using iPython Notebook. They don't have to look terribly fancy for my purposes, but you can go whole-hog with latex, etc.

You could do similar things with R and latex or markdown- that's what my R-focused colleagues use.

The cool thing about this is that you can start with the raw data, and in a series of essentially self-documenting scripts, generate the entire report. If you have a new dataset, you can just plug it in, and there you go. It is pretty cool stuff, and might be worth doing especially since you have a lot of data to keep track of. Also, if none of your coworkers have seen this kind of thing, they will think you are a god. Here is one example
posted by rockindata at 5:33 PM on April 4, 2014


« Older Calling all chem- physics- and math-philes!   |   The occult, the delusions, and the swamp rats. Newer »
This thread is closed to new comments.