TX FOIA: Here's your data. It's encoded. No key for you. Buh bye.
April 3, 2014 5:02 AM   Subscribe

How likely is it that responding to a request for public records with data that has been encoded and explicitly refusing to share or even discuss the key and/or algorithm required to decode it satisfies Texas Freedom of Information Act regulations?

I am trying to access some public records collected and maintained by a Texas county. The relevant county office has told me that they'll provide the data to me, but parts of it (the most important parts, for me) are encoded in a proprietary format. The county actively, openly and explicitly refuses to discuss the algorithms used to encode this data or ways to decode it on the grounds that it's too complex!

This seems like a total farce to me. They've explicitly said (paraphrased):
We're providing the encoded data for the sake of completeness, but the algorithm used to generate the data is very complex. If you are able to decode and make use of the data on your own, feel free to do so, but do NOT contact us for information on decoding it.
What? It seems to me that if this is allowable, any agency could easily make a mockery of FOIA requirements by simply encoding all data into a secret, proprietary format before delivering it to external requesters. What say ye, Mefi?

Also: How would I go about challenging this (apparently, to me) ridiculous restriction? Do I need to lawyer up?
posted by syzygy to Law & Government (15 answers total) 2 users marked this as a favorite
Are you using the term 'proprietary' to mean 'created by a commercial company' or to mean 'obscure but created by the government'? If the former, that MIGHT be grounds for excluding it (although I doubt it). If the latter, well - FOIA the algorithm!
posted by scolbath at 5:43 AM on April 3, 2014 [1 favorite]

Being that it's Texas, I'm going to bet that the powers-that-be have concluded that what you received does indeed meet Texas' FOIA requirements. I would not be surprised if this wasn't a state-wide thing.

Yes, you could lawyer-up and challenge this. But, that would almost certainly be a very long and protracted effort, and drag-though successive courts and appeals and cost you bundles of cash.

I think your best bet would be to talk with the Texas ACLU, or maybe even the Austin EFF and see if this is something they would be interested in challenging.
posted by Thorzdad at 6:01 AM on April 3, 2014 [1 favorite]

Since you're asking LIKELY - it's possible they are not obligated themselves to provide you the information to decode it if that's not a record they maintain. When I was subject to Virginia FOIA one of the things they told us was that we are obligated to share existing information but we are by no means obligated to create new reports.

So if you were to ask me for a count of the number of different faculty I had mailed in 2009 and the count per faculty I could tell you to pound sand; that would require me to gather the information and process it to determine your answer. That's not what FOIA is about. If you asked me for every email I sent to a faculty member... that qualifies, though we enter a question of whether I know who is and is not faculty.

In that circumstance i can quote a time to determine the appropriate records and you might be subject to payment for the time, or I might just hand you every email I sent in 2009 on the grounds that what you asked for is included. There'd probably be time quoted to redact sensitive information.

So in your case you asked for the records and maybe they don't have to tell you the key. But you could FOIA them for any documents in their possession related to the encoding and decoding of the block. They may be able to refuse to help you understand them or apply them but they probably have to share the specs.
posted by phearlez at 6:02 AM on April 3, 2014

The Freedom of Information Foundation of Texas has a hotline. According to their website "Our FOI Hotline (800) 580-6651 is available to anyone with a question concerning the Texas Public Information and Texas Open Meetings acts." I suggest giving them a call.
posted by Area Man at 6:03 AM on April 3, 2014 [2 favorites]

Encoding is not the same thing as encryption. Do you mean encryption, or just encoding? My first thought was that whoever's in possession of the data now may not be computer-literate enough to have the faintest idea how to open it without their software, but once you have the data the proprietary format may or may not turn out to be accessible by other means.
posted by Sequence at 6:05 AM on April 3, 2014 [1 favorite]

Response by poster: scolbath: By 'proprietary', I mean that it is being kept private or secret. I do not know any details about the encoding algorithm (who designed it, what software can decode it, etc.). What I do know is that I can think of no good (non-commercial) reason to encode this data in a format that is not open and publicized, since there are plenty of those. As a matter of fact, the data is very similar to well-known, open formats, but just different enough to make it difficult to use.

Area Man: I've called and left a message with them - thanks for the tip.

Thorzdad: I'll contact both of those agencies. Thanks!

phearlez: I think I may have to FOI them for records related to the encoding and decoding of the data. This is simple data that is easily represented in a number of open formats. I can't think of a non-commercial reason for someone to store it any other way, and I think a commercial provider should be forced to share the data in an open format, or else publicly share the methods for decoding it if it is being used to decode public data. Otherwise, this public data for which I have paid with my taxes is not really 'available' or 'open' to the public.

Sequence: I mean encoding, not encryption.
posted by syzygy at 7:02 AM on April 3, 2014

Response by poster: Sequence: Sorry, some of my reply to you was cut off when I responded.

I have some examples of the data. I have been able to successfully decode parts of it, but it's very hit and miss toreverse engineer something like this by trial and error, one set of data at a time. Every time I come across a dataset with something new, I have to figure out what the heck is going on.
posted by syzygy at 7:05 AM on April 3, 2014

There might well be a perfectly plausible reason for this - eg if the data you are asking for is in a particular database format & said database is no longer online. Writing conversion code to covert the data from its current format to an open standard would be well outside the reasonable budget for an FOI request!

If you can find out what program generated the data, you might be able to get a lot further. Even better if you can convince them to give you access to the source code, although they might not have it any more of course.

Can you upload a sample of the data somewhere? Someone might recognise the format.
posted by pharm at 7:15 AM on April 3, 2014 [1 favorite]

Response by poster: pharm: A new and updated dataset is generated and released every year, so it can't really be an issue of data encoded into a proprietary format generated by antiquated software that is no longer in use.

I've asked about the format and shared samples of it before, here on askme and elsewhere. So far, no one's been able to point me to the source. I've also done a TON of research over the past couple of years without getting much closer.

Also: Just got off the phone with an attorney for the Texas FOI Foundation. He gave me some good tips along the lines of phearlez' recommendations to FOI then for records related to the encoding and decoding of data. Now I have to figure out how to word that request so it's likely to be honored.
posted by syzygy at 7:46 AM on April 3, 2014

It looks like the agency you're requesting the data from is collecting it from other places, so it's at least possible that the agency you FOIA'd doesn't actually know how to decode the info.

Have you tried asking each county how their portion of that information can be interpreted?
posted by toomuchpete at 8:37 AM on April 3, 2014

Yeah it's entirely possible that they're telling you to not ask them because they don't know (combined with the fact that, depending on the language of TX's FOIA law, they may not have to help you with the data spec).

You might post about this over at OpenData StackExchange. It might be an encoding someone else already has dealt with.
posted by phearlez at 9:02 AM on April 3, 2014 [1 favorite]

Asking for documents related to the encoding/decoding of your datasets is a good idea. But be prepared for the response to be that they don't exist. Programmers and IT workers are perpetually behind on their documentation.

I think I may have to FOI them for records related to the encoding and decoding of the data. This is simple data that is easily represented in a number of open formats. I can't think of a non-commercial reason for someone to store it any other way

I think it's a mistake to approach this organization as "You're being purposefully obtuse!" Programmers love to re-invent file formats. I've needlessly done it countless times. Open formats are great when you're starting with an existing dataset, but when you're building your own it's a huge pain in the ass to make it work. I could either:

1. Spend many hours evaluating and auditing whether the various libraries out there are adequate for my project. Some are abandoned. Some don't have a good API. Some are deficient. Some might include only reading the data and not writing. Might not have bindings for the language I'm using. Etc.

2. Spend many hours reading the open format's RFC, essentially becoming an expert in it. This becomes more important the more involved my dataset is.

3. Spend many hours massaging my data into the same paradigm as the open format. I'll admit that occasionally this is time well spent, especially if I don't have a good grasp of the structure of my data. But sometimes it's infuriating. See also: doing anything non-trivial in the damned iCal format.

Or, I could just do an essentially binary dump of my memory structures. Useful to me, quick, let's me move on to other things. But horrible for anyone trying to use my files for other purposes.

All I'm saying is, just because you can't think of a good reason for the format doesn't mean there isn't one. You don't want to put the IT folks in a defensive posture by being accusatory in your requests.
posted by sbutler at 4:45 PM on April 3, 2014

Response by poster: toomuchpete: At the moment, I am focusing on the Tarrant Central Appraisal District (I learned from the FOI attorney I spoke with on the phone that these appraisal districts aren't actually part of their county governments, and that the 'C' in 'CAD' stands for 'Central' rather than 'County' in most cases.

I'm focusing on Tarrant right now as sort of a model for how to approach the other ones, mostly because Tarrant has a data dump available for free download online. They also have an accompanying document that explains the various tables in that data, and they make at least four specific statements about these sketch codes in that document (see the tops of p. 14, 15, 16 and, especially, 28), refusing to discuss with members of the public how the sketchd data can be decoded. The document also contains very detailed explanations of most of the codes used in the data dump. It does not include any details on the sketch codes, and two appendices have been omitted from it.

The reason Tarrant is interesting for me is that they have provided details on most of the codes they use in this data, but they refuse to do so for the sketches. It could be that they have already been required to publicize documentation of the other codes they use, and that I just need to figure out a way to have the state require them to publicize documentation of the sketch codes. It's sort of the whole issue wrapped up into one nice package.

phearlez: That is possible, although the documentation for so many of the other codes used in Tarrant's data dump leads me to wonder whether they are required to publicize information that makes their internal codes understandable to the public. What I find interesting is that the CAD can view these strings as graphical sketches, but the public can't do that. That means the CAD would most likely have to provide the individual graphic sketches as part of an FOI request, as well as the codes. Easier would be to provide the codes and instructions to decode them, I think.

In principle, I believe the CADs should provide instructions for decoding the sketch strings. They should be forced to openly publicize those instructions, on multiple grounds. One such ground is that there are companies who sell sketches based on this data. If those companies gained access unfairly to the methods required to decode these strings (either from internal CAD personnel or by making an exclusive deal with whomever wrote the software that generates and decodes the strings), this would put a member of the public at a disadvantage, and would allow a private enterprise to profit from what should be public information.

Thanks for the OpenData StackExchange link - I will poke my head in there, as well.

w/r/t programmers being perpetually behind on documentation, I don't think this is a valid excuse for keeping information that should be available to the public under FOIA from the public. I'm not talking about a private project for a private customer. I am talking about work for government agencies who have legal requirements they must fulfill (and who have legal requirements that the private firms they contract out to must fulfill).

w/r/t reinventing your own format - I might have done that when I started out working in software almost two decades ago, but I'd qualify it as an amateur move now, especially in a domain such as GIS or CAD / vector graphics where a number of mature, open and capable standard formats for representing such data exist. And again, we're talking about software written for and used by government agencies who must operate under a more stringent set of legal requirements when it comes to making their internal data available to the public.

It's no problem if you write a proprietary piece of software that saves data in a proprietary format for use solely in a private company or by private individuals, since neither private companies nor private individuals are required to make the data they possess public under FOIA regulations. It's a different matter, altogether, if you write the same software for use by a government agency (or, if a government agency chooses to use such software). The FOI attorney I spoke with on the phone put it well when he said, "government agencies are not allowed to contract out the requirement to make their data public." In other words, the sketches should be publicly available in sketch form (he said this, as well). Delivering them in an encoded format, only, with no instructions on how to decode them doesn't seem to fulfill the agencies' FOI requirements. Perhaps delivering graphic representations of each one would fulfill the FOI requirements. The much easier course, in the long run, would be to have their software developers (or contractors) do what's necessary to either provide instructions on decoding the strings, provide free software that does such, or convert them over into an open format and point the public to the official docs for said format.

To reiterate, please simply keep in mind that if you're developing software for a government agency, you will be working under a different set of requirements than if you were developing for private customers (and rightly so, imho), and that if you are procuring software (be it custom or packaged) for use by a government agency, you will need to make sure that software doesn't compromise your agency's ability to comply with valid FOI requests.

edit: fixed a few typos, added some missing words
posted by syzygy at 3:34 AM on April 4, 2014

Best answer: If those companies gained access unfairly to the methods required to decode these strings (either from internal CAD personnel or by making an exclusive deal with whomever wrote the software that generates and decodes the strings), this would put a member of the public at a disadvantage, and would allow a private enterprise to profit from what should be public information.

Now you're in Carl Malamud's domain, at least when it comes to regulations. But the disadvantaged public is nothing new - one can argue that it's the major basis for Westlaw's business of publishing state law.

(Not to say I disagree with you one iota, but as the basis for an argument this is only beginning to really get much traction with people)

w/r/t programmers being perpetually behind on documentation, I don't think this is a valid excuse for keeping information that should be available to the public under FOIA from the public.

But again this runs up against the interpretation of FOIA as an obligation to share information that already exists. Is it right that a slapdash process results in locked-in information? No, but unfortunately there may be no obligation for them to distill any materials for you, and it may be that exemptions on proprietary creations by a contract firm are exempted from FOIA so a request for source code would be denied.

Now, could you perhaps combine those two things in a legal challenge to a denial? Maybe. But I would expect an initial refusal.
posted by phearlez at 3:00 PM on April 5, 2014

Response by poster: phearlez: Now you're in Carl Malamud's domain

I wasn't familiar with Mr. Malamud or his work, so thanks for that pointer. As you can probably tell, I'm interested in the question from a practical, philosophical and legal principles standpoint.

These sketches occupy an interesting legal status in Texas. Used to, many of the appraisal districts made graphic representations of the sketches publicly available on their websites, along with other property tax records. A number of (most? many?) other states still do it this way.

In 2005 in Texas, however, a law was passed forbidding the appraisal districts from making the sketches available publicly online, The law did not remove the sketches from the purview of FOI, however. See section Sec. 25.027 of the Texas Tax Code for the exact wording of the restriction, if you're interested.

The justification used for the introduction and passage of this law was one of homeowner security, which sounds sort of ridiculous to me, since one can easily get far more info about a property and its structures from Google Maps than they could from these rudimentary sketches.

The conspiracy-minded part of me wonders whether someone who knew how to turn these codes into drawings (perhaps with insider knowledge that I think should be available to the public) made a few fat campaign contributions in order to get this law passed, thereby guaranteeing a tidy revenue stream by charging the public for what are essentially public records that the appraisal districts used to offer to all comers at no charge.

I know for a fact that at least one company is earning a tidy sum off of these drawings. I'd guess the drawings net them annual sales in at least the high 6 figure range, if not more.

Definitely has a smell of impropriety, or at least disadvantaging the average member of the public.

phearlez: But again this runs up against the interpretation of FOIA as an obligation to share information that already exists.

Yes, I see that distinction. The thing is, it seems that the information exists, somehow and somewhere, because there's at least one private entity who's profiting nicely from their (insider?) knowledge of it. It seems to me that this information (how to interpret the codes) should be made public, on principle.

I'm mostly trying to get my ducks in a row and decide whether it makes sense to pursue this, and, if so, how to best go about it. I appreciate you sharing your detailed knowledge of the FOI landscape with us here.
posted by syzygy at 4:38 AM on April 17, 2014

« Older UK/Euro Equivalent of Harper's Magazine or the New...   |   Is that a sausage in your pocket or are you just... Newer »
This thread is closed to new comments.