How to automate tracking changes to legislation and U.S. Code?
August 6, 2010 8:49 AM   Subscribe

Jobhack/automation filter: Please help me track legislation and its proposed changes to the U.S. Code.

I do policy work in DC. Part of my job requires analyzing bills introduced in the House or Senate to see how they'll affect existing law so that my organization can support, oppose, or suggest changes to the bill. It's fun, but I think there's an easier way to do it, and I hope MeFi can help.

Here is how I handled a recent bill that would affect mine and workplace safety.

The bill is introduced in the House. I look the bill up on THOMAS. The bill is viewable in several different formats: PDF, XML, and plain text. I read the bill, which would make numerous changes to two existing portions of the U.S. Code: the Mine Safety and Health Act (30 U.S.C. 801 et seq.) and the Occupational Safety and Health Act (29 U.S.C. 651 et seq.).

The way this legislation is written, it's necessary to refer to the existing laws to see what's being changed. For example, here is section 701(a) of the bill:
(a) Employee Actions- Section 11(c)(1) of the Occupational Safety and Health Act of 1970 (29 U.S.C. 660(c)(1)) is amended--

(1) by striking `discharge' and all that follows through `because such' and inserting the following: `discharge or cause to be discharged, or in any manner discriminate against or cause to be discriminated against, any employee because--

`(A) such';

(2) by striking `this Act or has' and inserting the following: `this Act;

`(B) such employee has';

(3) by striking `such proceeding or because of the exercise' and inserting the following: `before Congress or in any Federal or State proceeding related to safety or health;

`(C) such employee has refused to violate any provision of this Act; or

`(D) of the exercise'; and

(4) by inserting before the period at the end the following: `, including the reporting of any injury, illness, or unsafe condition to the employer, agent of the employer, safety and health committee involved, or employee safety and health representative involved'.
Because I'm a relatively fast typist and had some free time, I manually inserted all of these changes into a .doc file of the current laws, leaving track changes enabled to show how the proposed legislation would affect the existing law. It was tedious, but I felt like it was a worthwhile thing to do.

Then the bill got reported out of committee with an amendment, meaning there is now a different version of the bill doing different things to the law, and my redlined version is outdated and inaccurate. There will likely be a different version that is finally passed by the House, then a Senate version, then an amended Senate version, then a conference report. Too much typing.

Finally, my questions:
1. Is there a way to take a text file of a bill that gets introduced in Congress and, using the standardized cues that signal changes to current law (such as "section __ is amended by inserting __" or "section __ is amended by striking __"), have the provisions of the new legislation automatically inserted into a text file of the law that it modifies?
1a. Is there a service that already does this (preferably for free)?
1b. If there isn't an easy way to do this, is there at least a good way to compare versions of a bill to show what has changed so that I don't have to make a whole new redlined version of a bill each time there's an amendment (an amendment during a markup is one thing because it's pretty easy to see what is changed, but a manager's amendment that substitutes an entire version of the legislation for the previous version makes it tricky to spot changes)?
2. Assuming that what I'm asking is possible, here are some more complications: a bill is often only available as a PDF for a few days before it goes up on THOMAS as plain text (a committee's or member's website will post the bill before it goes through THOMAS, or it will be emailed by staff to interested parties). I know there are PDF-to-text converters such as CometDocs that do a good job, but a PDF of legislation has annoying line numbers on each page, and when that is converted to text it does some crazy thing with margins and the text is justified but not really so there are lots of hyphenated words that would need to be converted. (Example here). Any suggestions for removing line numbers and fixing the broken words?
3. I usually ignore the XML versions of the bill because I've never worked with it before, but my vague understanding is that it is somehow more powerful or flexible than standard text or HTML. Is this something I should be looking at more?
4. Assuming that there is no easier way to merge or compare documents, is there at least a way to automate the formatting for the subsections of legislation once I've put them into Word? I use a new quarter-inch left tab for each section, and the subsections are standardized as (for example) 825(g)(2)(B)(iii)(II)(bb)(BB)(aaa)(etc.). It would be great if Word would recognize (double lowercase) and indent it six tabs. We use a pre-notebook view version of Word, but I have a more recent version on my personal computer and if notebook view automatically formats this, I can do that.
5. I am generally interested in neat ways to manipulate legislative things like the U.S. Code and THOMAS to allow people to better understand how laws are made, what they would do, and who is supporting them. It would be useful, for instance, to set up an alert for certain bills that emails me or our supporters whenever a new cosponsor is added to the legislation, so we can thank them or stop calling them to ask for support or whatever. I'd appreciate any suggestions of free and/or open-source services that are doing things with government information (I'm already aware of OpenCongress and WikiLeaks).

Sorry this was so long, thanks for reading. I welcome answers to any part of my questions.
posted by jalexc to Law & Government (3 answers total) 2 users marked this as a favorite
The biggest problem is that the standard cues you mention aren't actually standard enough for a simple computer program. What you describe would require a fairly complex program to be robust enough to handle all of the edge cases. Ultimately a certain amount of guess work and heuristics would be involved, which could lead to errors.

is there at least a good way to compare versions of a bill to show what has changed so that I don't have to make a whole new redlined version of a bill each time there's an amendment

Comparing two similar text documents is a well studied problem. The term you want to search with is 'diff,' as in 'diff windows' or 'diff os x' or the like. There are lots of variations on the theme, most of them are free.
posted by jedicus at 9:04 AM on August 6, 2010

Anyway, you can use WestLaw to see how a given codified statute has been affected by amendments since the last US Code update. Unfortunately it only points out that a given law amended the statute, it doesn't produce an updated version for you, as far as I know. It's also not free.

The Sunlight Foundation is a place to look for the general idea of using technology to analyze the activities of the government.
posted by jedicus at 10:09 AM on August 6, 2010

jalexc: "I usually ignore the XML versions of the bill because I've never worked with it before, but my vague understanding is that it is somehow more powerful or flexible than standard text or HTML. Is this something I should be looking at more? "

The advantage of XML is programmatic. HTML has a number of oddities and practices that make it hard to deal with. And XML can be customized for the data at hand. I haven't looked at THOMAS XML, but it could make what you want done easy. Or it could be yet another half-assed govt tech project.

However, if your level of sophistication is uploading files to websites and editing text in word, processing XML may be out of your reach. XML doesn't quite lend itself to diff and grep the way regular text will. My solution to a similar problem (HTML on HR websites) is to schedule a program to download the latest version and store it in an Revision Control System, which can be used to browse or generate diffs. This is useful in determining what changed, but is not perfect.
posted by pwnguin at 10:33 AM on August 6, 2010

« Older Cute, office-y/date/go out shoes for narrow...   |   Guess who's coming to dharma? Newer »
This thread is closed to new comments.