I'm making a family cookbook, how do I clean up all this data?
June 5, 2021 12:21 PM   Subscribe

I have almost 100 recipes from various sources. How do I make them all neat, standardized, and usable for cooks who prefer American and non-American measurements?

My siblings and I decided to put together a family cookbook as a gift for one of our parents. What was originally meant to be a quick project has rapidly ballooned, but we (well, mostly me) have got most of the data entered from a stack of old recipe cards, newspaper clippings, tweaked recipes from various websites, and more.

But I'm trying to figure out if there's an easy way to clean up all this data. To standardize "teaspoon" vs "t" vs "tsp", 6oz vs 6 oz, and ideally show everything in both imperial (volume and °F), and metric (weight in g, °C) measurements. To standardize recipe layouts.

I know I could do this all manually, and at around 100 recipes it's a large task but not a monumental one. But this is the sort of thing that I can't help but think there should be a way of doing automatically.

Notes: not opposed to paying for software, though would prefer not to (especially not Saas). I could do it all myself using find and replace and excel databases and whatnot, but I reckon that would probably take more time than doing it by hand.
posted by themadthinker to Food & Drink (7 answers total) 9 users marked this as a favorite
 
Best answer: OpenRefine might help you out here. Doesn't take too much time to learn and is very handy for data cleaning!
posted by thebots at 1:34 PM on June 5, 2021 [5 favorites]


The Recipe Writer's Handbook will answer questions you haven't even thought to ask yet. Highly recommended and quite readable.
posted by aniola at 3:26 PM on June 5, 2021


I was blown away when I finally got at a physical copy of the Modernist Cookbook; their recipe format is the shit. It's so good that I hate every other recipe written out. Completely spoiled me. When I get a "keeper" of a recipe, I bust it into this format and keep it locally. Here's an example of how they format recipes. Here's a longer, sort of generalized recipe list with the same format.

It would be kind of a pain in the ass to convert a hundred recipes to this format, but that would be a worthwhile project to engage in. I find them much more manageable than 'typical' recipe formats, where I'm constantly bouncing back and for between a line that says "Mix the poision in with the chaos" and you have to then go double check how much poison you need in the ingredient lists.
posted by furnace.heart at 3:39 PM on June 5, 2021 [6 favorites]


The Paprika app will standardize fields and will do the conversions; it could be some work to get the recipes in there, and I'm not sure how well the output options will work with your destination publishing software. Here's the manual - if you look at the Recipe Fields table, you can see the fields that you'll be able to tag, and there's a good set of built in shortcuts to tag from a text file or web page if the automatic importer can't work with your source recipes. If the recipes are on the web, they're quickly and automatically extracted if you provide the URL.
posted by zepheria at 4:14 PM on June 5, 2021


Another option is to hire someone to do the editing. Personally I'd just go through all 100 and reformat them; however, I can see this would be annoying for someone who does not like this type of editing.
posted by RoadScholar at 6:23 PM on June 5, 2021


Response by poster: @thebots: That looks like it might be just the sort of thing! Short of a more specific tool for just this use case, I might just have to study up on that.

@aniola: That looks fascinating, thank you.

@furnace.heart: Definitely agree on the usefulness of those layouts, and having them all as volume, weight, and scaling weight is amazing!

@zepheria: I have, and love using, Paprika. But what I've found is that while it's really good at analyzing recipes and figuring out that t, tsp, and teaspoon all mean the same thing, I don't think it's able to standardize all of those as an output—it keeps them in the original format.

@RoadScholar: Likewise, I'm most likely just going to do it all manually, but why take 1 hour to do something when you can take 2 hours to automate it? More realistically, automating it makes it less likely for there to be small errors that sneak in from me making lots and lots of minor tweaks.
posted by themadthinker at 6:50 PM on June 5, 2021


How are these recipes stored now? How are they formatted? The OpenRefine seems pretty close to how I'd do it if they were all just "recipe looking text files" by dumping them all into a single file with a delimiter between each recipe and then doing things with random one off Perl one-liners (or grep/sed).

Find all the '[space]t[space]', check that the line is really a "teaspoon", replace '[space]t[space]' with '[space]tablespoon[space]'. Go to the next thing.

The metric can be tricky because nobody wants '1 oz' to turn into '29.6 ml'. Fudge to '30 ml'? Dry volume can be worse, '1 cup' is '236.6 ml' but metric cups are like '240 ml' standard. Volume to weight is even harder because you need to know the density of the item... how much does '1 teaspoon'/'4.9 ml' weigh in 'g'?

Anyway, for 100 or so (or even just starting on the data cleanup on larger amounts, it's a good size to start with) the mostly manual approach is best.

My first step would be to figure out how to turn those 100 recipes into a single file that was a bit like:

NAME
Serves

### measure thing (instruction)
...

Procedure
...
...
...

---- # end of recipe
# next recipe 
---
# next recipe
---
This is data munging, transform into an easier to work with format, explore/change the easy format, transform it back into the original format (or a more suitable format).

I'm not sure if it isn't better to just take them as they are and add a couple of pages of abbreviation/conversion notes and even maybe a couple Nomograms for conversions. Just a couple of pages of 'US fluid ounce' to 'ml' like things should take care of a lot of issues. A couple columns of numbers map 'F' to 'C'. Same-ish for 'dry ounce'/'g'. The recipient of the cookbook can pencil in those conversions themselves. :)
posted by zengargoyle at 9:49 PM on June 5, 2021


« Older How much time do you spend cleaning?   |   Good blood pressure monitor? Newer »
This thread is closed to new comments.