Parsing plant orders with Perl
March 30, 2021 9:31 PM   Subscribe

Often I'm looking for x numbers of just one plant to complete an order. I typically send out a query to several nurseries asking if they have x many plants. In this case I was searching for 45 Chatham's akeake / Olearia chathamica. I insist they leave the plant name in the subject line (and I get near 100% success). There's enough consistency around grammar for me to think semi-automating replies and sucking them out into a table (markdown, csv, xls, other table format - all okay) is possible,

and there's enough discussion on perlmonks and stackoverflow for it to seem possibly do-able. But do you think it would be doable as it might be a rabbit hole? I probably do this once a week and sometimes there's more than one type of plant. I use Outlook 365 (desktop version) on Exchange.

A recent query gave me responses as per below, and I've put what I see as useful strings for each email in [ ]:

Hi Nigel

The PB 6.5’s appear to have 21 but I may have missed one tucked somewhere. Pic of both attached
[have 21]
Hi Nigel

We can do this many plants for you but only in a 2.5 litre grade @ $6.35 + gst each.
[We can]

Good morning Nigel
Thank you for your enquiry.
We can indeed supply you with 45 x Olearia Traversii.
[We can] [45]

Hi Nigel. We have 45 Olearia traversii. They are about a metre tall in a PB5 bag. Price would be $9 plus gst.
[have 45]
Hi Nigel

Yes we have 45 Olearia traversii but unfortunately only V150 grade ($3.50 + GST each), these are good plants though and are probably 75 cm tall.
[We have 45]
Is there a way of running say a perl script on a selection of Outlook emails

Hi Nigel
Sorry I don’t have any Traversii available I do have O dartonii Pb3 $5.89 plus GST and freight
[don't]

I'm not fixated on Perl, it just seems from my little knowledge it's likely the tool. Is this the sort of thing that I could/should put on jobs?
posted by unearthed to Computers & Internet (10 answers total) 1 user marked this as a favorite
 
Can I suggest Python and the perhaps the Natural Language Toolkit? I haven't done much with it myself but it feels like it would be a good tool. I don't want to bash Perl, and if you're already good at Perl there's no reason to change, but it's not a "fashionable" language these days, and you'll likely find more modern resources and support with Python. I also think Python is easier for beginning programmers to pick up if you happen to be a beginner.
posted by foxfirefey at 10:54 PM on March 30, 2021 [2 favorites]


As you probably already know, I'm somebody who would usually rather spend a hundred hours tweaking a script than the two hours the script might save me over its useful lifespan, but my approach if I had this issue would not involve wandering into the deep dark forest of machine-parsing human-generated emails. I would expect playing script whack-a-mole for every creative new syntactical variant my correspondents could devise to get old very quickly.

Instead, I'd streamline table data entry using a floating data entry window that would stay on top of my email client when switching from mail to mail. The entry dialog would be carefully designed for rapid entry of quantities using drag and drop from the emails (by ignoring anything but digits in what was dragged and dropped) and supplier and product using drag and drop and/or dropdown menus.
posted by flabdablet at 11:51 PM on March 30, 2021 [5 favorites]


Frankly, you'd get MUCH better result if you send a tiny form with the email instead of trying to "parse" the email.

"Do you have QTY ____ of (plant name) _____ in stock" (YES / NO / PARTIAL _____ )

So they say, yes, no, got some (30 out of 45 you wanted) and you decide how many to order.

So you know only to interpret the response in the form it's expecting, and if they can't be bothered to reply within the tiny form, do you really want to work with them? :)
posted by kschang at 12:25 AM on March 31, 2021 [3 favorites]


Agree with flabdablet that natural language processing is going to be a lot of work, and will probably produce too many errors to be useful.

I'd try to attack the problem in a different way; maybe instead of having them respond via email, you could link to a form where you get them to enter the data (maybe even a Google Form that feeds a Sheet). I also like kschang's idea of leading the recipients to structure their responses to make them easier to parse (if you do need to handle responses via email). Ultimately if you can get the respondents to produce more normal data in some way it'll be easier for you to ingest.

Perl is definitely well-suited to text processing, but Python will be just as good for this kind of task and has a much larger and more active community at this point.
posted by sriracha at 4:30 AM on March 31, 2021 [2 favorites]


Is it possible that some/all of your suppliers would have their stock numbers available online? I've done wget/parsing before for computer component _prices_ and it worked really well, and it's possible to aggregate different components (in your case, different plants) to do things like, in my case, estimate the cost for an order with several different components together.

In your case -- and this is something I try with purchasing other things - it might be possible to find one supplier who has everything you need in stock, so you'd only ship from one supplier -- saving fuel and shipping costs.

Ugh now I want to write this. I've been shopping for individual plants (1-2 items), but I also have wanted to find _one_ local drug store that has _all_ of the snacks, antihistamines, and index cards I need in stock. Or _one_ Instacart store that has the maximum number of items I want.
posted by amtho at 10:00 AM on March 31, 2021


Response by poster: flabdablet well some of your scripts have certainly saved me hundreds of hours. Yes, I take your point about "every creative new syntactical variant" from my suppliers.

"a floating data entry window that would stay on top of my email client" So is that a manual system, just a lot more efficient than I do now. Is this like a bin I copy a patch of text and the system just selects the good bits and saves them into a list/table?

I'm not going to be able to train my suppliers (so simple forms are probably out) - people are great with plants but less so with spelling/grammar/computers, many product lines are very low numbers, no standardisation across businesses (I suspect to make price comparison hard). Few, if any, have reliable stock numbers.

kschang This is southern NZ with few suppliers so "if they can't be bothered to reply within the tiny form, do you really want to work with them?" is not an option, altho' I have more or less dropped one recently.
posted by unearthed at 1:26 PM on March 31, 2021


While I agree mostly with previous comments, I've done a lot of hanky email stuff in Perl. Mostly more structured in content. The only hard part is the human typed messages, the rest of the stuff is if not trivial, very easy (Perl-wise).

For the human message part I'd do something akin to the old school spam detectors. You have a big list of tuples like: (regex, category, weight, code). You run all the matches over the body text, matching ones get counted up in a likely-yes vs likely-no vs likely partial (or somesuch categories). If the matching regex has a code, the code gets called with the match results so they can be stored away.


/do have (\d+) $flowername/i, 'yes', 9, sub { #store matched digits in price }
/have/i, 'yes', 1
/don't/i, 'no', 2
/tall/i, 'yes', 1 #would size/tall be in a no?

/don't have/i, 'no', 4
/do not have/i, 'no, 4
/sorry/i, 'no', 4
/for \$(\d+(.\d\d)?/i, 'yes', 6, sub { # store price}

Mostly it's just how early spam detectors worked. (remember Spamassasin). Just extended to two or three categories (like maybe yes/no/partial/next week) and to capture data out of the specific enough to be what a human would think matches.

You could probably at least be able to sort messages into yes/no likelihood . If you can trust "i have 22 FLOWER" + "for $x.xx" as a yes.
posted by zengargoyle at 2:01 PM on March 31, 2021


try this, it reads email, prints CSV, moves parsed messages to a Parsed folder.
posted by ecco at 7:13 PM on March 31, 2021 [3 favorites]


Response by poster: Thank you ecco! I will try and run this later in the week when I have some time.

Does this actually move files from outlook or just copy them? I don't want to delete messages.

It looks like (from line 105) I can input a list of emails to further filter, and probably by date range from other lines.
posted by unearthed at 5:25 PM on April 2, 2021


You're welcome unearthed. I created a test account at outlook.com, tested against it, and have updated the script. You may have to set port 993 to connect to your own exchange mail server, (lines 50ish through 60ish).

If your account has 2FA enabled, (as it should in these modern times) then the script will not be of much use since it wouldn't be able to login.

The script moves files from "Inbox" to a folder called "Parsed" (which it creates if it doesn't already exist). It will not delete emails which it doesn't parse, and it doesn't read emails unless the plant name is in the subject.

Lines 105-ish the 'search' subroutine, indicate how IMAP specifies a search. If you want to change that code then it may help to know that the parenthesis in the perl is just for clarity, technically perl strips those parenthesis and makes one long flat comma separated argument list. Also in perl => is really just a different way of writing a comma (kind of), so that may help since IMAP seems to have unary and binary search terms. Or just stick with the existing --args of the script.
posted by ecco at 8:50 PM on April 3, 2021


« Older ASDF JKL;   |   ISO abstract video art from '70s-'80s Newer »
This thread is closed to new comments.