.NET Dev seeks AppleScript Guru
November 7, 2013 6:53 AM Subscribe

I need to export Safari history to a tab delimited text file, based on a date range.

I'd like to have the date, title and URL (kind of like what you see on the history tab!) in a tab delimited text file - I'd like to be able to specify a date range for the information. I've tried the few scripts I've been able to find online and am having an embarrassingly hard time with AppleScript.

The Mac in question is running 10.6.8, in case that makes a difference.

I see that I can open the .plist file in a text editor, but I do NOT want to have to do this by hand (plus the date info is seconds since Jan 1, 2000, from what I've read) - it is something I'm going to need to do on a somewhat regular basis for a while, and seems like it would be very simple to automate - but I could really use a hand from someone who actually knows what they are doing w/ AppleScript. Or Terminal. Or whatever would be a good way to get this info out of the plist. :)

posted by hilaryjade to Computers & Internet (12 answers total)

Considered perl? There's a module to parse plists (Mac::PropertyList), and then it's straight XML (which shouldn't be too hard to massage).

I could probably hack up something super-hacky in an hour or so.
posted by hanov3r at 7:07 AM on November 7, 2013

I have been working as a software engineer around macs for getting close to a decade now and have never met an AppleScript expert. There are plenty of other tools that can parse an XML plist file, though.
posted by tylerkaraszewski at 7:39 AM on November 7, 2013

I'm perfectly happy to try Perl and am happy with XML output - I can just apply an XSLT or even open in Excel. I would appreciate even hack-y help.

I looked at something called Plist Pro but couldn't see a way to export just what I needed.
posted by hilaryjade at 9:01 AM on November 7, 2013

A quick Python script... the date may need some munging (it's probably in UTC).

#!/usr/bin/env python

from os.path import expanduser
from os import system, chdir
import xml.etree.ElementTree as ET
from datetime import datetime, timedelta

adjust = datetime(2000, 1, 1) - datetime(1970, 1, 1)

chdir(expanduser("~/Library/Safari"))
system("plutil -convert xml1 -o History.xml History.plist")
hist = ET.parse("History.xml")

with open("History.csv", "w") as out:
    for node in hist.getroot().find("dict"):
        if (node.tag == "array" and last.tag == "key" and
            last.text == "WebHistoryDates"):

            for record in node.findall("dict"):

                entry = ["", "", ""]

                for field in record:

                    if field.tag == "string" and prev.tag == "key":
                        if prev.text is None:
                            entry[2] = field.text
                        if prev.text == "title":
                            entry[1] = field.text
                        if prev.text == "lastVisitedDate":
                            entry[0] = (datetime.fromtimestamp(
                                        float(field.text))
                                        + adjust).ctime()

                    prev = field

                if all(entry):
                    out.write(",".join(entry) + "\n")

        last = node

Usage: place this in a file called histdump.py in your home directory, then in the terminal do:
python histdump.py
Output is in ~/Library/Safari/History.csv
I wrote this on a machine with Python 2.7.2 but I think it should run on 2.6.x as well.
posted by kindall at 9:52 AM on November 7, 2013 [1 favorite]

BTW, plist files are kind of wonky. There are two formats: binary and text. By default apps use binary. The script uses the plutil command line tool to convert the History.plist file to text format.

The text plist format is technically XML, but not as we know it. Order matters and while there is a dictionary object with key/value pairs, these key-value pairs are siblings, so first you get the key and then you get the value. In the Python script I resorted to just keeping track of the last tag seen so that when I saw a value, I could check to see if it was the right key.

It would be really easy to also have the Python script spit out the CSV file to standard output so you could just pipe it directly into whatever you want; just add:

print ",".join(entry)

... after the out.write line, at the same indentation.
posted by kindall at 10:01 AM on November 7, 2013

Thanks, kindall, I'll try it this evening!
posted by hilaryjade at 12:34 PM on November 7, 2013

Hi, kindall

- I just gave the script a try - I'm getting an error on this line:
out.write(",".join(entry) + "\n")

The error is:
UnicodeEncodeError: 'ascii' codec can't encode Character u'\xa0' in position 48: ordinal not in range

The output that did get sent to the CSV is good - looks like what I need. My preference would be to indicate a date range for the data (just because history lists get big, and I need only a week at a time) but this is a super helpful start and further than I was getting with the various apple script examples I was trying to bend to my will.

Comparing the History.csv to the History.xml file that the script created, I can see the URL that the failure occurred at - a particularly gnarly Google redirect.
Even just being able to convert the plist to XML (which you script demonstrates) is super helpful - the XML is kind of weird, like you said, but may be something I can at least write a transform for.

At any rate, if you have any ideas about adjusting for the error, let me know & thank you so much for the script.
posted by hilaryjade at 6:58 PM on November 7, 2013

There's probably a Unicode character in the URL or the title, which isn't unheard of. I can play with a bit more and see how to best handle that, but probably not until tomorrow. Glad my script was of some use!
posted by kindall at 8:26 PM on November 7, 2013

This isn't a direct answer to your question, but depending on what your goal is here it might be something you want to know. I did some playing around with Safari's history file a while back. The way I remember it, the History.plist file only contains a record of the most recent time that a page was visited – for example, if you visit example.com, then browse to four other pages, then visit example.com again, the earlier entry will be removed from your history.

Now that I think about this, I still don't really understand why it would be this way, so I hope I'm wrong. But I do remember it being a roadblock for my intended use of the History.plist file (understanding which sites I visit most).

(I think I wrote some kind of totally hacked-up parser in Python for the History.plist file. If I can find it and it turns out to be less embarrassing than I think I'll post it.)
posted by aaronbeekay at 9:26 PM on November 7, 2013

I'm pretty sure you can fix the UnicodeEncodeError by changing

entry[2] = field.text

to

entry[2] = field.text.encode("utf8")

And a similar change fo the entry[1] assignment.

Your CSV file will then contain text in UTF-8 encoding. You may need to specify that to whatever program is reading it. UTF-8 is the same as ASCII if there are no non-ASCII characters, but if there are accented characters, etc. the reading program will need to understand UTF-8.

If you wish to simply strip the extended characters, then use encode(errors="ignore") instead.
posted by kindall at 9:20 AM on November 8, 2013

kindall - you're awesome, thanks. The reason I'd wanted to export for a date range was I was concerned about the amount of data that may be in the history - it looks like the history defaults back a year. But this is super fast, so it doesn't matter in the least that it processes the entire file.

I pop the csv open in Excel, filter for what I need and presto, all done. Thanks so very much for your help. I really didn't even know where to start (uh, which is why my Windows brain assumed Apple Script instead of thinking through other options). I appreciate the help!
posted by hilaryjade at 6:30 AM on November 9, 2013

Glad to be of service. I frankly love Python, and enjoy solving a new problem with it every chance I get. Now I'm kind of hankering to write a generic plist parser library for Python, except I know someone has probably already done one...
posted by kindall at 10:48 AM on November 9, 2013

« Older Wiring lights, from pull switch to wall light... | How to improve FPS on Final Fantasy XIV? Newer »

This thread is closed to new comments.

Ask MetaFilter

.NET Dev seeks AppleScript Guru
November 7, 2013 6:53 AM Subscribe

Tags

Share

.NET Dev seeks AppleScript Guru November 7, 2013 6:53 AM Subscribe

Tags

Share

.NET Dev seeks AppleScript Guru
November 7, 2013 6:53 AM Subscribe