Help me parse this CSV file
February 22, 2024 3:14 PM Subscribe

I have a large, mis-formatted CSV file. Can you give me ideas of how to parse it? Preferably in Python.

I've been given a large (10s of gigabytes, millions of records) CSV file. Somehow it was written with inconsistent use of backslashes: In "type 1" records, backslashes escape internal quotation marks. In "type 2" records, single backslashes are part of the field content and are sometimes the last character in the field. If backslashes are treated as escape characters, valid closing parentheses get escaped and the record doesn't parse.

I can't share the file, but a toy example is:

field1,field2
"content 1 \"with escaped quotes\" within field", "content 2"
"content 1 with lone backslash \", "content 2"

If I parse the file with backslashes as the escape character, type 1 records parse and type 2 records fail. If I parse with no escape character, type 2 records parse and type 1 records fail.

I tried to run through the file with two parsers and if the first one failed fall back to the second one. This didn't work because there are internal line breaks in some fields so the two parsers get out of sync and don't agree on what constitutes a record.

There are too many records of either type to fix errors by hand.

Extra difficulty: there is no reliable unique ID field.

I'm using the python `csv` module and I have a fair bit of experience with it. Open to other tools and other methods. Any advice appreciated.

posted by no regrets, coyote to Computers & Internet (42 answers total) 5 users marked this as a favorite

Can you tokenize by the pair end quote and comma, and then for each field, work from the ends inward counting quote pairs? After the outer pair, can’t all quotes be treated as internal?
posted by nickggully at 3:24 PM on February 22

If you have the disk space, I would first just massage the whole file to be consistent.
Run it through a filter that finds the "type 2" records by looking for a backslash followed by either not-a-quote, or a quote and a comma. In those records, add an extra backslash before the backslashes, so they're properly escaped.
Then process it as if they were all "type 1" records.
posted by jozxyqk at 3:25 PM on February 22 [11 favorites]

question that may be very naive of me but would be helpful for my clarification (as someone who spends 75% of my job dealing with CSV files): are there any cases where the backslashes are not used as an escape character? in other words, do you need any backslashes at all for informational reasons or no? if this were me, as I currently understand it, I’d be doing a unix sed or something to just remove all backslashes in the whole thing. or find-replace all instances of \” with just plain “ so then the double quotes are all consistent. Maybe do the same with any escaped parentheses and escaped other-stuff. then parse as though no characters are escaped.
posted by crime online at 3:29 PM on February 22

I think agree that nickgully's approach will work: split each line on commas, then strip the outer quotes from each token. If the records are inconsistent like this I would not use the csv module, just loop over with `for line in f:` (in case you don't know, do not use the readlines() method as it will read the entire file into memory).

But I'd be tempted to see if I could first sanitize the file by fixing those incorrectly escaped quotes in your second example. It's difficult to tell from only the two examples, e.g. is that backslash a proper part of field1 that should be retained? Or it's a wrongly escaped quote? I would try a few quick searches to see how many (false) positives appear with different patterns. For that I'd use ripgrep (outside python), because it will likely be fast.
posted by theyexpectresults at 3:31 PM on February 22

Can you tokenize by the pair end quote and comma, and then for each field, work from the ends inward counting quote pairs? After the outer pair, can’t all quotes be treated as internal?

Unfortunately I tried this. The end quote+comma combination can occur inside records. To edit my toy model:

field1,field2
"content 1 \"with escaped quotes\", within field", "content 2"
"content 1 with lone backslash \", "content 2"

posted by no regrets, coyote at 3:35 PM on February 22

remove all backslashes in the whole thing. or find-replace all instances of \” with just plain “

I would love to remove all backslashes but then the internal doublequotes will parse as closing the field and cause errors.
posted by no regrets, coyote at 3:39 PM on February 22

While you might not be able to share the original file would it be possible for you to share, say, the first few thousand lines of it in a version of the file where all the letters are replaced with "a" and all the numbers with "1"?
posted by mhoye at 3:46 PM on February 22 [2 favorites]

A couple of questions. Are there any characters other than double quotes that are escaped? Do backslashes cause a problem in all rows or only ones where it is the last character in the field? What operating system do you use?
posted by phil at 3:49 PM on February 22

Do all "type 2" records have a first field that ends in \", ?

If so, you may be able to do a first pass and replace those \", with \\", so that the backslash escapes a backslash. Then the parser that uses \ as an escape should work.
posted by SegFaultCoreDump at 3:50 PM on February 22

there are internal line breaks in some fields so the two parsers get out of sync and don't agree on what constitutes a record

Is it possible to differentiate those line breaks from ones not inside fields (i.e. the ones separating records, which are probably preceded by a closing double quote and followed by an opening double quote)?
posted by trig at 3:52 PM on February 22

ohh i see, there can be commas within the quoted comma-separated elements, sorry, missed that.

do the fields always start and end with some variant of double quote (escaped or no?)
just spitballing:

find-replace \” to “ (remove all escaping from all double quotes everywhere)
find-replace “,” to , (assuming “,” is a reliable field separator - remove all quotes from either side of the field)
find-replace “ to \” (re-escape remaining double quotes which would all be internal to fields now)
then you have a file to work with where fields where commas are only within escaped-quotes

or

if fields are reliably enclosed by double quotes - find-replace “,” to #,# to change field enclosure if you can find some sort of character or character string that won’t be in the whole doc.

or “,” to “ILOVEMETAFILTER” or whatever other string won’t be in there (example delimiter provided may not work if you are working on csv parsing for the mefi nonprofit board)

on preview i am mad at myself for not thinking of what SegFaultCoreDump said
posted by crime online at 3:59 PM on February 22

Really appreciate the comments.

would it be possible for you to share, say, the first few thousand lines of it in a version of the file where all the letters are replaced with "a" and all the numbers with "1"?

Would love to but the file is in a secure environment and even heavily redacted files can't be exported without a lot of hassle.

Are there any characters other than double quotes that are escaped? Do backslashes cause a problem in all rows or only ones where it is the last character in the field? What operating system do you use?

I've only encountered escaped double quotes but I'm not 100% sure. The lone backslashes only seem to cause a problem when they're the last character in the field -- I haven't seen them elsewhere. Most records are normal and have no backslashes at all. Running on Debian.

Is it possible to differentiate those line breaks from ones not inside fields (i.e. the ones separating records)?

Only if the record is parsing correctly, which circles back to the original problem.

you may be able to do a first pass and replace those \", with \\",

If I find-replace \", with \\", then the type 1 records will have escaped backslashes which is what I don't want because they will no longer be escaping the internal quotes.
posted by no regrets, coyote at 4:09 PM on February 22

forgot to add you’d need to also replace doublequote-newline-doublequote as line break with whatever new separator. if there are elements that contain doublequote-newline-doublequote within them then what i said wouldn’t work and please ignore
posted by crime online at 4:12 PM on February 22

The end quote+comma combination can occur inside records

It can, but how often does it?

I'd probably try doing the suggested replacements, and then reading it as a CSV and seeing where it fails. This file is messed up enough that I don't think you can avoid some manual or in-debugger repair, and this way might be the minimal amount of manual work. (There are plenty of ways to imagine lines mangled in a way that makes them truly ambiguous...)
posted by Blue Jello Elf at 4:19 PM on February 22

I don't think this can be done in a single pass. My first guess would be to do this in two passes, each with their own parsing logic.

Pass 1 handles type 1 records. It processes the entire input, writing successfully parsed type 1 records to an output file and the remaining type 2 records to an exception file.
Pass 2 handles the type 2 records in the exception file.

Concatenate the output from Pass 1 and 2 to get your final result.

I'm sorry you have to deal with this. Badly formed CSV is the fucking devil.
posted by Sauce Trough at 4:24 PM on February 22 [4 favorites]

I'm also struggling to understand this problem without a more substantive example, but throwing out a few thoughts:

First thing I would try is some of the many less low level and hopefully more robust realworld CSV parsers than the csv library in stdlib, e.g. pandas.read_csv, polars.read_csv (both with and without use_pyarrow=True), and clevercsv. Maybe one of them will work for you out of the box.

I tried to run through the file with two parsers and if the first one failed fall back to the second one. This didn't work because there are internal line breaks in some fields so the two parsers get out of sync and don't agree on what constitutes a record.

Can you use some other tool (e.g. grep) to split the source into two files/streams that can be sent to the two parsers affirmatively, rather than relying on the first parser failing to send stuff to the second parser? This is likely fiddly given the presence of internal line breaks, but is certainly an easier problem than writing a parser from scratch.
posted by caek at 4:27 PM on February 22

If all fields are quoted and fields are separated by comma+space as shown, I would replace dquot-comma-space-dquot with dquot-tab-dquot as this string should not appear within Type 1 fields.

If all fields are quoted and some are separated by commas without spaces, I would replace dquot-comma-dquot as well. If there are tabs within the file already, I would use some string like -=-=- as a delimeter instead.

If there are unquoted numeric columns, these can be used as sanity checks to see when Parser A has gone off the rails and we need to use Parser B for the current record. I would write a script that throws error messages if either both parsers fail or both pass but the outputs don't match. Then amend the script to handle edge cases and finally output each record with a delimeter that does not appear within quoted text fields. I would not attempt to remove any dquots from the file until the columns line up in libreoffice.

I would also check for CR+LF linebreaks as they would need to be standardized and their presence might indicate records that need special handling. This chimera will surely need further refinement even after this particular hurdle is past.
posted by backwoods at 4:53 PM on February 22 [1 favorite]

If you can have newlines and commas in the fields, then this malformed file is just strictly ambiguous. I think you're going to have to make some assumptions about the content of the fields in order to parse it.

If you're willing to assume a "normal" field can't end with ", " (that is a comma followed by a trailing space) or a trailing newline, and you will reliably get spaces between fields, then then you could replace all instances of backslash-quote-comma-space-quote with quote-comma-space-quote and all cases of backslash-quote-newline-quote with quote-newline-quote. (turning \", " into ", " if that's easier to read)
posted by aubilenon at 5:25 PM on February 22 [2 favorites]

This regex matches quoted field values that are followed by ( either another quoted field value or a new line ). And loose whitespace.
(".*?")(?=\s*(,\s*".*?"|\n))

field1,field2,field3
"","content 1 with escaped quotes\", within field", "content 2"
,"content 1 \"with escaped quotes\", within field", "content 2"

This extension of that will also find an unquoted empty value at the beginning of a line or the empty value ,, BUT when it matches the ,, empty value it includes the leading comma.

(^(?=,)|(?<>



field1,field2,field3,field4

,"content 1 with lone backslash \",, "content 2"



However, if you put a comma at the beginning of every line, remove the beginning of line match, and include the leading comma in every match this seems to work:



(,\s*(?=,)|(,\s*".*?")(?=\s*(,\s*?,|,\s*".*?"|\n)))



NOTAFIELD,field1,field2,field3,field4

,"","content 1 with escaped quotes\", within field", "content 2",

,"content 1 \"with escaped quotes\", within field", "content 2",""

,,"content 1 with lone backslash \",, "content 2"



For every value you'd then want to trim the leading ,/s*" and trailing "\s*



Essentially this is trying to match a set of valid values that fill a line. If there's multiple valid sets then this may be wrong in some cases.



Also, I can no longer spell field. Yay for browser spellcheck.

posted by gible at 5:29 PM on February 22

I hate that I’m suggesting this, but given the size of the file I’d give up on using a library to parse it directly at all. Instead I'd go with slow and steady: read line by line, count backslashes, and then send the line to whichever parser based on whether the backslash count is 0, odd, or even.
posted by fedward at 5:33 PM on February 22 [1 favorite]

gah... and MF mulched my second regex :/

(^(?=,)|,(?=,)|(".*?")(?=\s*(,\s*?,|,\s*".*?"|\n)))
posted by gible at 5:35 PM on February 22

With enough R&D effort, this could be attacked using a Hidden Markov Model (HMM) and probabilistic modelling:

We can model the sequence of tokens as data that was generated by a probabilistic data generating process that switches between two hidden states, state 1 and state 2. When in state 1, the data generating process emits records as a sequence of tokens using type 1 encoding. When in state 2, the data generating process emits records as a sequence of tokens using type 2 encoding. The data generating process switches between state 1 and state 2 according to some probability distribution with parameters we don't know, but could estimate.

We probably don't need to be able to distinguish between all kinds of characters, only between tokens that contain significant information about what encoding mode we're in - maybe four tokens could suffice: quote, whitespace, backslash, and other. We need an unambiguous tokenisation that is independent of the unknown encoding. It's possible the problem might also benefit if we include "backslash followed by a quote" as a token as well, where the meaning of that token is quite different depending on if the data generating process is in state 1 or state 2.

We could formalise this data generating process as a Hidden Markov Model (HMM). To do that, the behavior of the data generating process must depend only on the current state. As well as "state 1" and "state 2" to track which encoding mode the process is using, we'd likely need to augment the state space with another variable to track what the previously emitted token was -- some sequences of tokens (e.g. a backslash followed by a space) are impossible if the process is in encoding state 1, so seeing this token sequence gives a large amount of information that the data generating process must have been in state 2 at that position in the token stream.

Instead of trying to parse the data we'd define a probabilistic model that can generate somewhat similar sequences of tokens to what we observe in the real data. We'd want to tune our model so it could dwell in state 1 for a while, before flipping to state 2, emitting some more tokens, then flipping back to state state 1, etc. We might define a prior belief that the data generating process starts in state 1 or state 2 with uniform probability.

Once we had all that, we could apply standard HMM algorithms to process the input stream of tokens and get it to output the most likely trajectory of hidden states. This would give, for each token, an estimate of if the data generating process was in state 1 or state 2 when emitting that token.

We could then use this output sequence of hidden states estimated from the HMM to drive a custom parser. The custom parser would use the sequence of hidden states as a control instruction telling it when to switch between type 1 parsing and type 2 parsing.

I might estimate an initial 2 weeks of full time R&D work to get a rough working prototype (that might need further R&D and tuning to improve the parse quality) if I were the person building this, but I've custom built a couple of HMM-based algorithms before for other problems.

Entry level HMM resources:

- The chapter "probabilistic reasoning over time" in Russell & Norvig's textbook "Artificial intelligence - a modern approach"
- Rabiner's 1989 tutorial: A tutorial on hidden Markov models and selected applications in speech recognition

I am not sure if there is any good existing open source tool implementing some kind of statistical parser robust to data encoding issues that you could install and configure to do this task instead of building a custom HMM algorithm from first principles.

I did a little searching and found "Hidden Markov Models for Data Standardisation" which is apparently part of "Febrl - Freely extensible biomedical record linkage". It appears that there is a release of Febrl from 2011 that can be downloaded from sourceforge. I have never used Febrl and am unsure if it could help, but in the worst case reading through the documentation might give you some other ideas for search terms or alternative ways to attack this parsing problem.

"Hidden Markov Models for Data Standardisation"
see: http://users.cecs.anu.edu.au/~Peter.Christen/Febrl/febrl-0.3/febrldoc-0.3/node24.html

Febrl download from sourceforge: https://sourceforge.net/projects/febrl/
posted by are-coral-made at 6:23 PM on February 22 [2 favorites]

I'd go with a double parsing approach, and define my own escape character to fix it, just in case you ever need to read it again.
posted by The_Vegetables at 6:50 PM on February 22

I think aubilenon's approach is sound. If you want to cover the case where a field end with \", " then you could simply count the (non \) "s between any \", " and the end of the line and even/odd will tell you if this is a real \", " or is within a field (this is assuming you aren't concerned about \", " appearing twice within fields on one line, but if you are you could search for this possibility and perhaps dump any such lines to a file for manual review).
posted by ssg at 7:00 PM on February 22

Actually, assuming you aren't dealing with an incredibly large number of lines, it may be just as quick to simply assume all ", " are field boundaries and then dump any of the presumably rare lines with \", " into a file for manual review. In any \" followed by a newline, the " must be a field boundary, so you don't need to worry about that case.
posted by ssg at 7:13 PM on February 22

A couple thoughts off the top of my head (apologies if they've already been suggested!)
* definitely give the pandas csv functions a try. I dont think they use the native csv parser, and for the sake of a 2 min test its worth it.
* I would split the file into individual lines and try parsing each line separately with different combos of escape configs until it parses, and then reassemble the whole mess. You mentioned sometimes there are newlines in field content though, so you might need some look aheads to see if the next line starts with a quotation mark (ie its a true new line) or not (and its therefore part of the previous line). Or check if theres a quote before the newline to indicate the end of the field.

Good luck - i wish i was at my computer so i could try a couple things and figure it out, I love crap like this!
posted by cgg at 8:05 PM on February 22 [1 favorite]

Miller (mlr) is good for this kind of thing.
posted by idb at 5:00 AM on February 23

A true techie would use regex as suggested by gible and theyexpectedresults (via ripgreb), but if you don't already know it, you need a more accessible approach. There is good advice in the replies above, especially the suggestions to try to rectify each kind of error, one at a time.

But im worried by your comment that some fields have internal line breaks. Are they escaped? Are they errors, or not?
posted by SemiSalt at 5:32 AM on February 23

I'd parse each line assuming it's not broken, and flag the records that don't parse or don't have the correct number of fields and manually fix those up. Possibly during this process I'd discover patterns that I could use to automate fixing up further broken files of this type.

It may be easier to fix the program that generated this broken file than to successfully parse it. You may not have access to that program, or its authors, or not have any authority to get them to fix their buggy CSV file generator; still, carefully consider whether "not possible" just means "really really hard" or "I assume it's impossible but haven't actually tried" and compare it to the difficulty of not just getting the file to parse, but validating that it has parsed correctly.
posted by kindall at 7:00 AM on February 23

I would do multiple passes - in the first pass change all occurrences of backslash-quote-comma to tilde-tilde-comma ( or other very unlikely sequence followed by comma). That should, unless I am misunderstanding or missing some nuance, eliminate the 'type 2' records. Then process all as type 1, then go back and change all tilde-tilde-comma to backslash-quote-comma.

The final step is to go find whatever extract process created it, and see if you can eliminate the problem at the source. In my experience, almost every "one shot" process is asked for, again, in the future.
posted by TimHare at 8:50 AM on February 23

When in state 1, the data generating process emits records as a sequence of tokens using type 1 encoding. When in state 2, the data generating process emits records as a sequence of tokens using type 2 encoding. The data generating process switches between state 1 and state 2 according to some probability distribution with parameters we don't know, but could estimate.

This assumes that the token/state in one line is somehow dependent on the token/state in the second line. A simple mixture model would do.
posted by MisantropicPainforest at 9:17 AM on February 23

What are the characteristics of "good" records aka ones that can be parsed completely and accurately? Is there a minimum character length for a "good" record? Are there patterns that can be used to distinguish between a field's "internal line break" and newlines used to separate records? Is there another dataset that can used to help validate your efforts?

Nthing the value of attempting to track down the csv's author(s) or at least understanding the data provenance. Can you determine how or which program created the csv file originally? If this file is internal to your organization, any chance there is an earlier version saved somewhere that you can access?
posted by oceano at 9:25 AM on February 23

I have a simple approach to this. Parse the CSV line with backslashes as escape characters. If it fails, parse it without. Do that for each line.

Python csv parsing takes a file as input and reads the whole file, but you can trick it to take one line at a time as a "file". Here is some dumb pseudocode, since I can't remember all my python foo

from io import StringIO
import csv

csv_file = open('csv_file.csv', 'r')
 
csv_parsed_rows = [] # empty array we'll push into

while True:
    line = csv_file.readline()
    if not line:
        break

    file_line = StringIO(line)
    reader = csv.reader(file_line, with_escape_sequences_options_set)
     if read_is_successful
         push_row_into csv_parsed_rows array
     else 
        reader = csv.reader(file_line, with_other_escape_options_set)
        if successful push into csv parsed rows array
   
csv_file.close

This will let you parse the csv line by line by either dialect option, switching to the other dialect when the parsing fails.
posted by dis_integration at 9:30 AM on February 23 [1 favorite]

Ok so if I’ve read everything here correctly here’s how I’d approach it on a first pass:

read the file line by line and output each line to one of two files - those with an even number of escaped quotes (including 0) and those with an odd number
use pandas read_csv to read the new csvs with the appropriate escape chars for each. This should be faster than doing so piece meal
dump parsed data to a more compact storage as soon as possible; probably parquet. If any part of this process chokes than I’d read the files in chunks and dump each parsed chunk as I go

posted by mce at 9:46 AM on February 23

Does every line have the same number of columns? If so, that can be an easy way to sanity check a transformation.

Are the mangled columns always the first column, as in your example? Or can mangled columns be in any column position?

Are any of the columns structured data (such as a specific timestamp format, or "always a number," or whatever) that are consistent and can anchor analysis so that you know the number of columns that should come before it, and the number of columns that should come after?

Even in the mangled columns, does every column end in a ", sequence (with or without an escape), or do some columns start quoted and end without a quote?

I'd recommend processing all lines that are unambiguous, then remove them from the set. For the rest, the answers to the questions above can guide rules that can be applied to the lines.

It may be possible to evaluates each ", sequence as a potential end of column, and check both variations -- end or not end -- with the rest of the line, and come up with only one variant that meets the known constraints (e.g. correct number of columms + anchor column with known format at the correct location(s)).
posted by Number Used Once at 11:12 AM on February 23 [1 favorite]

It may be possible to evaluates each ", sequence as a potential end of column, and check both variations -- end or not end -- with the rest of the line, and come up with only one variant that meets the known constraints (e.g. correct number of columms + anchor column with known format at the correct location(s)).

FWIW this is the sort of thing I was thinking of with the so-crazy-it-just-might-work suggestion of counting backslashes. In years worth of data parsing (uh, I guess it's decades now, isn't it) I've found it's often useful to avoid trying to be too smart at the outset, and let failures guide you to the right solution. In any case you're counting something, although the specifics of what you're counting could vary (backslashes, commas, quotation marks, other) and embedded newlines add a wrinkle as well. I'd probably approach this by writing only half a parser, outputting the lines that can't be parsed to a log or the console, and then I'd iterate on that half parser until it doesn't encounter any errors. Only after I can read the whole file without any errors would I try to output the data to something more regular.

But also, on further consideration of the original problem, I'm concerned that the data that's been written to the file you have has been incorrectly truncated and not just badly formatted. The fields where there's a stray trailing backslash before the closing quotation mark imply that this is the output of a parser that went wrong. That suspicion causes me to second the people above who suggested going back to the original data sources to see if a cleaner file can be generated in the first place.
posted by fedward at 12:12 PM on February 23

I think dis_integration's strategy is the best, assuming that you can reliably detect whether each parser is parsing successfully on a per-line basis (i.e., is it guaranteed that if parser 1 produces N records, then it was a line of type 1?). Performance will not be great but if you only need to do this once you can probably live with that. If for some reason performance is an issue you should be able to implement the same strategy in awk, as first a file-normalization step to bring it into a format that Pandas can then handle. Awk is pretty speedy at this sort of thing I think.
posted by biogeo at 12:36 PM on February 23

Oh sorry I failed to catch that there are internal line breaks, so dis_integration's solution won't work exactly. But I think the basic strategy is sound if you switch from a line-based approach to a stream-based one. Something like:


f = open(...)
while not eof:
    curpos = f.tell()
    try:
        newrec = type1parser(f)
    except ...:
        f.seek(curpos)
        newrec = type2parser(f)
    Write newrec to a proper CSV file

Then type1parser and type2parser are functions that try to read exactly N fields from the stream plus a newline, and raise an exception if they can't.
posted by biogeo at 1:02 PM on February 23

How do type 2 records handle quotes that occur inside field content?

Do fields containing embedded newlines occur in both types of records?

Does this file appear to have been constructed by appending a bunch of smaller files from disparate sources, each of which uses an internally consistent quote-escaping convention, or are type 1 and type 2 records just interleaved arbitrarily?

Are there any handy tells for record type, such as perhaps type 1 and type 2 records using different newline conventions (e.g. CRLF vs CR-only or LF-only)?
posted by flabdablet at 7:04 AM on February 24

Is there a predefined limit to the size of any given field? If you're using biogeo's approach (which absent any shortcuts looks to me like the right one) then it would be useful for either parser to be able to cut itself short instead of gobbling up megabytes of lines waiting for an end-of-field mark that never arrives because that part of the file is actually formatted according to the other parser's convention.
posted by flabdablet at 7:15 AM on February 24

Is the total number of records known in advance, so that you can be sure that your cleaned-up output ends up with the correct number of them? Trying to find misidentified field-embedded newlines by eyeballing multiple gigabytes of CSV text is not really workable.

Taking even a further step back from that, is there any way you can just reject this entire thing on the basis that it's been so badly constructed as to be inherently unsafe to try to machine-parse, and persuade whoever handed it to you to regenerate it properly?
posted by flabdablet at 7:23 AM on February 24

The reason I ask about the possibility of refusing to work on this file as-is is that its existence speaks to a lack of attention to data integrity in organizational processes upstream from you, which even if you do manage to make sense of this specific broken file will undoubtedly bite the organization badly in the bum at some later date.
posted by flabdablet at 7:32 AM on February 24 [1 favorite]

« Older Health Insurer changes covered amounts with no... | Scheduled online fitness classes that are NOT yoga... Newer »

You are not logged in, either login or create an account to post comments

Ask MetaFilter

Help me parse this CSV file
February 22, 2024 3:14 PM Subscribe

Tags

Share

Help me parse this CSV file February 22, 2024 3:14 PM Subscribe

Tags

Share

Help me parse this CSV file
February 22, 2024 3:14 PM Subscribe