How do I script removal of the head and tail of a file up to and after given comments?
January 7, 2006 1:44 PM
Subscribe
Given BBEdit (or sed) and a file of HTML, how can I script the deletion of all text up to a given comment tag, skip over deleting the part of the file I want to keep, then resume deletion of contents after the next occurrence of a given comment tag?
I edit a site that calls for me to periodically repurpose content from other sites in our company.
I've written an Automator workflow that handles grabbing the printer-friendly version of a given article I'm repurposing, runs it through TextSoap to strip smartquotes and other stuff I hate, replaces some absolute links to be appropriate to the article's new home, slaps on a line attributing the original source for the story, and loads it into BBEdit.
At this point, it's a pretty simple operation to manually cut out the gunk I don't want, which is everything in the page source before a comment tag that reads "content_start" and after a comment tag that reads "content_stop", but I'd really like to automate this part, too, for the sheer pleasure of having an end-to-end workflow.
I just don't have the scripting chops to describe "delete all the lines up to this comment and delete all the lines after that comment."
It seems doing this in Applescript using BBEdit's scripting dictionary or doing it with a line or two of sed would be equally adequate.
posted by mph to computers & internet (7 comments total)
posted by jjg at 2:18 PM on January 7, 2006