List subtraction in spreadsheet – help
March 2, 2023 6:00 PM   Subscribe

I have two lists of email addresses and need a quick explanation how to interrelate them –

List A has about 1850 entries and was added to a budget Mailchimp-like system recently. Now I've been given List B, with about 2100 entries and told to add them as well.

The limit on this account is 2500 addresses. I have no confidence that this system knows how to find and delete duplicates. Adding List B risks pushing the list into a new pricing tier, which I mustn't do.

I know that many of the addresses on List B were already on List A, so I need to subtract the A addresses from B, and only put in the addresses that are unique to List B.

I have Excel, Numbers and BBEdit and could also use Google Sheets if need be.
posted by zadcat to Computers & Internet (8 answers total) 2 users marked this as a favorite
 
I've used deduplicate feature on excel. It was pretty straight forward from what I remember. It's under Data.
posted by mokeydraws at 6:14 PM on March 2, 2023 [1 favorite]


Response by poster: I can combine both lists in BBEdit and tell it to delete duplicates, but what I really need is for List B to delete everything that's in List A, which is a different problem.

I want to avoid deleting List A from the mailing system then add A+B minus duplicates, but I suppose that's the scorched earth solution.
posted by zadcat at 6:21 PM on March 2, 2023


My very clunky way of doing this would be to copy/paste both lists into excel. To the left, I'd add a column and have a 1 next to List A addresses, and 2 next to List B addresses. I'd sort all by addresses, manually delete the duplicates that had a 2 next to them, and then filter by the 1/2 column to only 2, and add those addresses. It'll also provide a count to make sure you don't go over 2500
posted by Sparky Buttons at 6:26 PM on March 2, 2023


Best answer: This is a little convoluted, but... In BBEdit:
  1. Paste the addresses from both A and B into a single document
  2. Text menu > Process Duplicate Lines...
  3. Select Matching all radio button (important!)
  4. Check Delete duplicate lines checkbox
  5. Can leave all other options unchecked
  6. Click Process
So you should now have a list of all the addresses that were in either A or B but not both. Now:
  1. Paste the items from list B back in to this document.
  2. Text menu > Process Duplicate Lines...
  3. This time, select Leaving one radio button
  4. Check Duplicates to new document checkbox
  5. Click Process
I believe the new document (named something like “Matching lines in untitled text 3”) will have what you want?
posted by staggernation at 6:32 PM on March 2, 2023


In Excel — Export List A.
Add it to the spreadsheet that has List B.
Highlight the full list, and insert a pivot table.
Select your only column (email address) and make that your Row.
Add the email address field to “values” and select Count.
Any email address that has a count of “1” needs to be added.
Anything with a count of “2” is a duplicate.
(you can sort by count but it’s a little funky - best to look it up online.)
posted by Silvery Fish at 6:35 PM on March 2, 2023


You have a good answer, but here's how I do it in Excel:

1. Paste List A into column A of Sheet1.
2. Create Sheet2 and paste List B into column A./
3. In cell B1 of Sheet2, put this formula:

=COUNTIF(Sheet1!A:A, A1) > 0

this means "look in Sheet1/List A and write TRUE if the value exists in List A"

4. Paste that formula down by selecting B1 and double-clicking the little square at bottom right of the selection. Now Sheet2/List B looks like:

Apple FALSE
Banana TRUE
Cherry FALSE
Donut FALSE
Egg TRUE

5. Sort List B by column B and delete all the TRUE rows.
posted by caviar2d2 at 7:03 PM on March 2, 2023 [3 favorites]


This sounds very similar to this post from yesterday.
posted by mezzanayne at 8:41 PM on March 2, 2023 [1 favorite]


If python was installed, this could be accomplished with a short python script.

Save the following code as a plain text file named addr_diff.py in a directory with two files a.csv and b.csv, each containing a single column of addresses, one address per row, no header, saved in CSV format. Open a terminal in that directory, and run python3 addr_diff.py a.csv b.csv . The addresses in b.csv that are not also found in a.csv will be written to new_addresses.csv
import csv
import sys

def read_lines(filename):
	with open(filename, 'r') as f:
		return [row[0] for row in csv.reader(f)]

a_addresses = set(read_lines(sys.argv[1]))
b_addresses = set(read_lines(sys.argv[2]))

new_addresses = b_addresses.difference(a_addresses)

with open('new_addresses.csv', 'w') as f_out:
	writer = csv.writer(f_out)
	writer.writerows([[x] for x in new_addresses])

Brief explanation, not necessary to run the script. The main ingredients are: getting the names of the input files on the command line, reading and writing files, the csv module in the standard library for reading and writing CSV files, and the set data structure, which is efficient at operations such as finding all the common items between two collections or items, or finding the difference.
posted by are-coral-made at 2:48 AM on March 3, 2023


« Older How do you reduce the period of time showering?   |   Electric scooter laws in Berkeley/Albany/El... Newer »
This thread is closed to new comments.