Excel: complicated concatenation
July 1, 2019 2:30 PM   Subscribe

I have an Excel sheet with 2 columns of data with multiple lines per cell. I need to concatenate these values into another multi-line cell. I can do it manually but is there a formula to do it so I'm less likely to make a mistake?

I can do it manually, but there are 12 sheets for a total of 250 questions so I’m worried I’ll make a hard-to-detect copy/paste error.

Out of the question: VBA (I'd rather do it manually than learn VBA just for this) and changing the input or output format as the formats are determined by an outside system.

Here's what the data looks like. It's a series of questions, possible answers, and a numeric value assigned to each answer (so 3 columns, but I don’t need to do anything to the Question column, just the last 2). As far as I can tell every question has exactly five answers and the numeric values are always 1 through 5. Sadly, there's not really a pattern otherwise: the possible strings vary greatly, and the number associated with each string is not consistent ("Rarely" can be 2 or 4, depending on the prompt).
_________________________
|Question 1 |Never    |1|
|           |Rarely   |2|
|           |Sometimes|3|
|           |Often    |4|
|           |Always   |5|
_________________________
|Question 2 |None     |5|
|           |A little |4|
|           |Somewhat |3|
|           |A lot    |2|
|           |Cannot do|1|
_________________________
|Question 3 |Never    |5|
|           |Rarely   |4|
|           |Sometimes|3|
|           |Often    |2|
|           |Always   |1|
_________________________
Here's what it should look like in the end for the receiving system:
_________________________
|Question 1 |Never=1    |
|           |Rarely=2   |
|           |Sometimes=3|
|           |Often=4    |
|           |Always=5   |
_________________________
|Question 2 |None=5     |
|           |A little=4 |
|           |Somewhat=3 |
|           |A lot=2    |
|           |Cannot do=1|
_________________________
|Question 3 |Never=5    |
|           |Rarely=4   |
|           |Sometimes=3|
|           |Often=2    |
|           |Always=1   |
_________________________
I’m happy to have lots of intermediate cells or sheets as long as I get the desired output in the end. Are there in-cell formulas I can use to accomplish this?
posted by Tehhund to Computers & Internet (9 answers total) 2 users marked this as a favorite
 
So, each question and set of answers is all in one cell? That is not ideal. Can you change the data layout?
posted by soelo at 2:47 PM on July 1, 2019


I believe what you want is something like
=CONCAT(Sheet1!B1, Sheet1!C1)
then do the usual formula drag to have it apply to the rest of the cells in the column.
posted by jraenar at 2:53 PM on July 1, 2019


The multiple lines per cell is what's failing it for you, excel doesn't do that well.

If you can put each response on it's own row, then you can use = CONCATENATE(Cell1,Cell2)
You can also add in non cell values so= CONCATENATE(Cell1," = ",Cell2) gives you exactly the result in your bottom chart.

You could then do a secondary chart where you concatenate multiple rows into one cell...
posted by larthegreat at 2:53 PM on July 1, 2019


If you can split up the data, here is a tutorial on how to use the Text to Columns function to split it up.
posted by soelo at 2:54 PM on July 1, 2019


Just out of curiosity, once you've got the data concatenated, what are you going to do with it next?
posted by adamrice at 3:03 PM on July 1, 2019 [1 favorite]


Response by poster: Just out of curiosity, once you've got the data concatenated, what are you going to do with it next?

There's a third-party Excel macro (which I have used for years on related work) that will turn this into a flat file that can be imported to the esoteric database we're working with (imagine if noSQL were invented in the '70s). I have unkind words for the developer who decided related data should all go into a single cell on different lines but that decision was made over a decade ago so the format is not changing, full stop.

Now, if I can split this related data into separate cells, concatenate, then recombine into a single cell I'm open to that idea.
posted by Tehhund at 4:23 PM on July 1, 2019 [1 favorite]


Best answer: This is going to be hard to explain via text but I'll do my best. The tl;dr version is
1)FIND() the index of each newline character,
2)CONCAT() the text using MID() to split out the string at the newlines
3)Turn on "Wrap Text"

Detail:
Assume A1 is "Question 1", B1 is "Answers", C1 is "Numeric Values".
1. Create several intermediate columns identifying the string location of each of the newline characters in "Answers" and "Numeric Values"... like Column D would find the first newline character in "Answers" like "=FIND(CHAR(10),B1,1)"; Column E finds the next newline character after that "=FIND(CHAR(10),B1, D1+1)", etc...then in column I find the first newline character in "Numeric Values" like "=FIND(CHAR(10),C1,1)", etc.
The last column(s) would just be "=LEN(B1)" and "=LEN(C1)", assuming there's not a trailing newline character
2. Once you have the index for all the newline character locations in both "Answers" and "Numeric Values" columns, construct a concat statement that alternates taking x characters from both "Answers" and "Numeric Values", and intersperse with "=". The length of the text string to take is defined by the location of the newline characters in the intermediate columns. Something like "=CONCAT(MID(B1,1,D1-1),"=",MID(C1,1,I1),MID(B1,D1+1,E1-D1-1),"=",MID(C1,I1+1,J1-I1))"....you might have to +1 or -1 in the MID statements to ensure you grab the newline character as part of the "Numeric Values" string. What you want to do is grab each word from "Answers", add an equals sign, then grab each number and a newline character from "Numeric Values".
3. You have to format the output cell to ensure "Wrap Text" is turned on, otherwise the newline characters won't show and it won't look like it worked.
posted by smokysunday at 4:25 PM on July 1, 2019 [8 favorites]


Having done this a lot, I agree with smokysunday's recommendation.

I definitely do this with multiple iterations and sheets so that I can fix any errors.
posted by Altomentis at 4:30 PM on July 1, 2019 [1 favorite]


Response by poster: smokysunday's method worked - thanks! It's fortunate that there are always 5 answers so I could turn every cell into 5 columns rather than a variable number of rows or columns. To visualize what I was doing I chose to make 4 columns that each held the position of the line breaks using FIND() and CHAR(10), then 5 columns that extracted the responses using LEFT(), MID(), and RIGHT() and the positions of the line breaks. Then I did the same for the numeric values. Finally I used a big honking CONCATENATE() to pull everything together and added the line breaks back in using CHAR(10). I did all of this in an intermediate sheet so now I can reference the value in the sheet where the third-party macro will run and the macro will only see the correct final product.

This is so much better than manual copy and paste and now I can spend my time writing formulas to check for errors instead.
posted by Tehhund at 4:10 AM on July 3, 2019 [1 favorite]


« Older When do I need to show up at the San Francisco DMV...   |   What's the right computer/tablet for a 16-year-old... Newer »
This thread is closed to new comments.