Another excel question
January 24, 2018 9:37 AM   Subscribe

Excel formatting is driving me crazy.

I have large spreadsheets where one of the columns contains CUSIP numbers. Cusips are a nine-digit alphanumeric code. My problem is that excel reads cusips containing an E as a scientific number and reformats it like this:

22004E208 becomes 2.20E+212 (this has nothing to do with the column widths, it's actually reformatted like this)

Once this formatting is done in the spreadsheet, I can't change it back to the real number.

My current spreadsheet ("Spreadsheet A") has an apostrophe before each number to prevent this. So the above is '22004E208 in the spreadsheet. But I need to do various vlookups and index/match formulas on the cusip and the lookup spreadsheet ("Spreadsheet B") that has the cusips I am trying to match does not use the apostrophe.

My workarounds so far:

1. In Spreadsheet A, I add an extra column and put this formula in every row: (=RIGHT,$A1,9) with the result only showing the 9 rightmost numbers (removing the apostrophe).

2. Then I have to copy that whole column and paste it over itself as a number so that the cell doesn't contain a formula (or vlookup won't be able to read it).

3. Once it is pasted, I change the whole column to text, so the number doesn't auto-format into a scientific number.

4. Then I can write my formula.

Also, I should mention that lots of these cusips will start with one or more zeroes. These zeroes NEED to stay on the front, but excel will often remove them if I try to remove the apostrophe using things like text to columns or find ['] and replace [ ].

My question is this - is there any way to either disable to scientific number formatting? This is causing me a bunch of problems because I have to do so many other little fiddly things in excel just to get spreadsheets formatted correctly to run vlookup and index/match formulas.

I also don't know how Spreadsheet B has all the cusips just sitting there with no apostrophe that I can see, and it doesn't reformat into scientific numbers. Everything is just formatted as General. This spreadsheet is an export from some program, not something someone created manually.

Also, sometimes I need to save my results in .xlsx and sometimes in .csv. This also affects things, and I can't remember what it is right now, but I think it has something to do with the text formatting that messes things up.

Any help is appreciated, thanks.
posted by triggerfinger to Computers & Internet (16 answers total) 3 users marked this as a favorite
 
In my ancient (2007) but still satisfactory version of Excel there is a table of options (Home > Number) offering a dozen different formats, including Accounting, Text, Special, Custom, etc. Depending on your version, you may have to search for this among all the options along the ribbon. Have you tried experimenting with some of these?
posted by aqsakal at 9:54 AM on January 24, 2018 [1 favorite]


Yeah I actually work with security identifiers in Excel spreadsheets all the time; I even wrote our company's add-in.
The way we handle it is to pre-format all of the cells that are going to contain things like CUSIPs and SEDOLs as "Text". *Then* paste them in.
My add-in does it all in VBA but you don't have to get that complicated.
posted by jozxyqk at 10:00 AM on January 24, 2018 [2 favorites]


Have you tried reformatting those cells to be Text? (Right click the column heading, select Format Cells..., then select Text.) It seems to allow me to start fields with zeroes, and never seems to try to reinterpret what I enter (for instance, as calendar dates, which I never want but it always thinks I do).
posted by CheesesOfBrazil at 10:01 AM on January 24, 2018


I think OP's problem is that reformatting those cells as text doesn't stick, and neither does preformatting the whole column as text. I just tried this and the only way I could make it work was by formatting as text and then pasting the number into the formula bar, not into the cell.
posted by clavicle at 10:07 AM on January 24, 2018


I'm trying to get the vlookup to fail, but I'm using '22004E208 (general format) as a lookup_value, and 22004E208 "pasted as value" into a general format field as the looked-for field, but it's working. It also works if I look for 22004E208 formatted as text.

I am using 2010, and I really don't mean to patronize you here, but are you sure you even have to do these workarounds?
posted by turkeybrain at 10:14 AM on January 24, 2018


I assume this wouldn't work, but could you use concatenate to add an apostrophe to all the CUISP numbers in spreadsheet B? Then they're the same and you don't have to worry about formatting continuing to change.
posted by Mrs. Pterodactyl at 10:27 AM on January 24, 2018


Also, under File > Options > Formulas you'll want to uncheck the option to throw an error on numbers stored as text if it isn't already.
posted by clavicle at 10:40 AM on January 24, 2018


Unless I'm misunderstanding, using the apostrophe should work. You could also try writing it as ="22004E208", which then also allows a vlookup. As far as I know, this also prevents the value from getting lost / messed up when converting to / from CSV.
posted by any_name_in_a_storm at 11:12 AM on January 24, 2018


Best answer: Most of the time the pre-format as text should work for this. However, depending on the format of the copied data (e.g. unicode, html), Excel will override that setting. You need to pre-format as text and also paste special -> values only. I hit this all the time dealing with UPCs.
posted by Jacob G at 12:20 PM on January 24, 2018 [1 favorite]


Is there any chance the values in spreadsheet B have a space at the beginning or end? Especially if B was created from a text file or something and someone used the "Text to columns" feature in Excel, this can result in one or more spaces on the end of the CUSIPs which are not immediately obvious. If the value in spreadsheet B were actually "22004E208 " (note space on the end), it would explain a)why Excel sees it as text and not a number, and b) why it doesn't match the value in spreadsheet A.

One way to check would be to manually edit one of the CUSIPs in spreadsheet B to see if there are extra spaces there; another would be to use the LEN() function to find the length of the string in spreadsheet B (for your example, 9 if there are no spaces, but 10 or more if there are).
posted by DevilsAdvocate at 12:25 PM on January 24, 2018


Response by poster: Okay, to address a few things:
- I've trimmed all the extra spaces in the cells (one of the dozens of other little fiddly things I need to do to format the spreadsheet)
- I almost always use INDEX/MATCH instead of VLOOKUP as my spreadsheet has ~500k rows and 105 columns so VLOOKUP takes like 20 minutes to calculate.
- Using INDEX/MATCH with the apostrophes in Spreadsheet A but none in Spreadsheet B will not bring back a match (I just tried again).
- Using concatenate to put apostrophes in Spreadsheet B also doesn't work (weirdly, I can see the apostrophes in the cells in A but if I add them in B they disappear in the cell and only show in the formula bar)

It seems as though when I format cells as text and then save as excel, it works okay (which is what I've been doing). The problem is that I need to save a lot of these files as .csv, in which case the text formatting disappears when I close the file and those cells reformat as scientific when I reopen it. Also, these files come to me in different formats, e.g. spreadsheet A comes to me as a .csv and B comes to me as an .xlsx. That is how I open them and run them. Don't know if that matters.

In summary, I think my problem is I can't find a way to run my lookup formulas with the apostrophe, and if I remove the apostrophe for the formulas, I can't save my results as a .csv without getting the scientific formatting restored when the file is reopened. Such a clusterfuck.

There has to be a way to solve this without adding in a bunch of new and tortured formatting just to get the stupid thing to act right.
posted by triggerfinger at 1:13 PM on January 24, 2018


Oh man. I suppose you're stuck with Excel for a bunch of reasons but, if at all possible, you want to be doing this in some kind of actual database. Some of these problems will just disappear.
posted by paper chromatographologist at 1:21 PM on January 24, 2018 [1 favorite]


If I understand your workaround correctly, I think you can skip steps 1-3 by embedding the "RIGHT(...)" formula in your INDEX/MATCH formula. Namely, if you're doing a MATCH where the lookup values are in column A , do something like "MATCH(RIGHT(A1, 9), ...)"
posted by mhum at 1:45 PM on January 24, 2018


Best answer: To help with your CSV->Excel->CSV headaches: CSV is lossy-- so you're gonna always be hurting if you let Excel do it's automatic data-type detection when your data is 'numberish'. So you just need to interrupt that, so when you want to open an existing CSV file try:

- Create a new empty workbook.
- Go to `Data` and then pick `From Text/CSV`
- Pick your CSV file, import.
- Change `Data Type Detection` to `Do not detect data types`
- Click `Edit`-- The Query editor will load.
- Format your columns to their correct types (Default is now TEXT for all)

Then `Close and Load` when you're happy. You can do a bunch of stuff in that Query Editor to ease that flow if you're doing this a bunch. Doing the above might start cleaning up your other issues too.
posted by Static Vagabond at 1:54 PM on January 24, 2018 [1 favorite]


Best answer: Ah yes... If you're opening a CSV file and you want to preserve formatting, using "Data" instead of "File"-"Open" will help.
posted by jozxyqk at 2:25 PM on January 24, 2018 [1 favorite]


my spreadsheet has ~500k rows and 105 columns so VLOOKUP takes like 20 minutes to calculate.

Convert it to a database, all of these problems will disappear and the calculations will run in seconds even in Microsoft Access. You can still copy the data back to Excel at the end if you need a pivot table.
posted by Lanark at 2:48 PM on January 24, 2018 [1 favorite]


« Older Roads to avoid if you're scared of heights?   |   I seem to be drawn to very controlling, unkind men... Newer »
This thread is closed to new comments.