Unpacking the one-star rating
April 14, 2023 1:17 PM   Subscribe

Let's say you're shopping on Amazon and there's a product you're thinking about buying, so you check the reviews. The average is acceptably high, say 4.2 stars, but when you look at the distribution, there are a lot of fives, some fours, not much in the middle, and then a little one-star bump at the end. How do you integrate this information when making decisions?

I started wondering about this a few years ago, when I received a climate survey that asked questions about experiences of racial discrimination. When I did the survey I thought, gee, the target population is overwhelmingly white, so most people are going to say they've never experienced it personally, but any nonzero amount can be a climate problem, right? So just reporting the mean or median response on this five point scale seems like it is not enough. How should these "J-shaped" results be reported?

The same question has since come up for me in a number of other contexts -- I feel like I see J-shaped product rating curves super often on Amazon, hence the example I led with. I'm wondering if there are principled ways to take such outliers into account, and how the context should affect that choice. What I've done in the past on Amazon specifically is read through the actual one-star reviews to look for (1) what proportion of these answers seem disproportionately angry about a toaster and (2) is there any pattern in the types of problems that are reported? But this doesn't scale super well, and I wonder if those experienced with bimodal-ish Likert scale data like this have rules of thumb for data handling, in situations where you have a lot of products to choose between, or too many comments to read them all.
posted by eirias to Grab Bag (32 answers total) 2 users marked this as a favorite
 
Good surveys will account for population bias and will provide multiple crosstabs so that they're not just reporting the mean and the overall results can be analyzed more thoroughly.

That said, Amazon just gives the ratings.

I personally look for specifics in 1-start reviews. Did someone take the time to detail why a product didn't meet their expectations, or are they just saying it sucks because they don't like it? I'll give more credit and weight to solitary 1-star reviews that provide a compelling argument. There will always be people who dislike a product, but what they dislike about the product can provide important insights.
posted by RonButNotStupid at 1:32 PM on April 14, 2023 [14 favorites]


I generally read some 1-star reviews to look for patterns (is there a particular problem I should look for with this product?) or misclicks. Some folks will give a glowing review to a product and rate it 1 star--I never know if they misunderstood the rating system or clicked the wrong star. I also check the date on the 1-star reviews--sometimes there *was* a problem with a product, but it's since been fixed.
posted by epj at 1:37 PM on April 14, 2023 [23 favorites]


The most useful thing I get out of one-star reviews is the information that the product might be difficult or tricky to operate or set up. Reading between the lines on why it sucks, one often gets a notion that perhaps there is some operator error involved.
posted by seanmpuckett at 1:38 PM on April 14, 2023 [5 favorites]


I read the low ratings and try to figure out if their failure in using the product is due to stupidity. A lot of them add up to this. If there's a lot of five-star reviews about reliability, I chock up "it broke after two weeks" to improper usage.

I also try to identify whether the review is literally the thing I'm looking at: I believe Amazon will share reviews across the "same" products (like different colors) but there may be more differences; also there's unscrupulous sellers who re-use SKUs for different products to list new items with high ratings even though the ratings are definitely not for that item; also users are stupid and review the wrong things -- and I frequently see low ratings which do not match the product listed.

I also weigh the 'noise' level of one-star reviews; 10,000 high reviews and 100 1-star reviews is 1%, so while I still look at the 1-star reviews for glaring problems I don't consider them as heavily as if there's 800 reviews and 30 1-star reviews.

(And, sometimes the 3-star reviews are the most useful; people don't want to completely vote down an item but will clearly list their shortfalls)
posted by AzraelBrown at 1:40 PM on April 14, 2023 [12 favorites]


I remember reading somewhere that nearly all 1-5 product ratings are either a 5, 4, or 1. Especially for mundane products, most there's really not many degrees of satisfaction for something like a spatula or a pair of flip flops. You either liked them, in which case you gave it a 5, or you didn't, in which case you probably gave it a 1. Most people don't think closely enough about what they buy to give a 3-star rating. "Well, this spatula had some good features, and some features that I'd like to see improved." It's a spatula. For things that are a little more complicated, like clothing, you can introduce some element of degradation over time, and that's where a lot of 4 star ratings come from. If you liked it at first but it shrank a little in the wash, or a seam tore after three months, take a star off for that.

Most products usually don't have *that* many 1-star reviews, so it's usually pretty easy to read most of them and see what the problem was. A lot of times it's "I ordered blue and it came in red" or something like that, which, yeah, I wouldn't be satisfied with that either, but that doesn't mean much to you, the next purchaser. If there *are* a lot of 1-star reviews, though, that itself should be kind of a red flag. (Especially if, as AzraelBrown says on preview, those 1-star reviews are disproportionately high.)

Generally, though, if the product is something that I'm spending a fair bit of money on, and something I need to work well (like, say, a washing machine), I'm not reading reviews on Amazon. I'll look it up on something like the Wirecutter or Serious Eats or some other review site I trust. Amazon reviews are only for fairly small, insignificant purchases like t-shirts or books or kids' toys.
posted by kevinbelt at 1:44 PM on April 14, 2023 [2 favorites]


You have to read them to determine the reason they exist. Even when they mostly sold books you get people rating books poorly because of a problem with shipping. Now with all the resellers, you'll see one star ratings because they got a fake which may or may not be relevant to your purchase. With apps, I always sort by most recent as well because that will tell you if a recent update has borked the previously good version.
posted by soelo at 1:45 PM on April 14, 2023 [8 favorites]


Response by poster: Will try not to threadsit -- I do appreciate the qualitative advice, thank you all, but I'm also looking specifically for quantitative advice on how to think about distributions like this. Crosstabs are great (per RonButNotStupid) but hard to apply in some contexts, if e.g. like Amazon you did not measure any relevant stratifying factors, or if you just have too little data (like the workplace climate survey example).
posted by eirias at 1:47 PM on April 14, 2023


In my experience, it's normal to have more 1 star reviews than 2 or 3. Most people who choose to review a product either love it or hate it, so in some sense that's selection bias.

However I think what is concerning is when the number of 1 star reviews is close to or equal to the number of 5 star reviews. I think this is typically a sign of a scam or fake product.
posted by muddgirl at 1:52 PM on April 14, 2023 [3 favorites]


I’ve noticed that very positive reviews are typical when a product is newly introduced, and that the one star reviews tend to come later.

I believe this reflects actual degradation of the product by manufacturers as part of a deliberate strategy and is driven by the fact that Amazon Prime insists upon free delivery and Amazon takes so much of the gross that it’s impossible for manufacturers to maintain profitability and original quality at the same time.
posted by jamjam at 2:01 PM on April 14, 2023


If there is a review as well as the rating, I look at why the person didn't like it. If someone, for example, rates a wood pen poorly because they hate the way that wood feels on a pen, I am going to ignore this because I love wood pens.

But if I see several one-star reviews noting flaws, things not working - and also lots of good reviews, I might wonder about the quality control. (There are some cheap pens I quite like, but the quality control is bad; we've gotten used to the idea that one in every 3 might not work.)
posted by jb at 2:21 PM on April 14, 2023


I always read the 1 star reviews first. You can generally tell when that is a crank review, or something valid.
posted by Windopaene at 2:43 PM on April 14, 2023 [2 favorites]


Reading through your entire question, it sounds like what you want to know is if you can make an evaluation on a product based on the distribution of reviews without having to actually dig through the reviews to see why they reviewed a product a given way. Like if there's something about a given distribution that is more characteristic of ignoreable outliers vs. systemic issues. Correct?

I'm no statistician but I don't think that's an inference that's easy to make. Maybe fewer than a certain percentage of 1-star reviews or something but I have no idea how you'd determine such a cutoff in a useful way so as to be generally applicable across many different products.
posted by Aleyn at 2:49 PM on April 14, 2023 [4 favorites]


I will often look at whether the 4 or 5 star reviews are *actually reviews, or if they look like bots just choosing a star rating, and whether the 1 star reviews have valuable insight.
posted by itsflyable at 3:09 PM on April 14, 2023


A large number of one star reviews is not dispositive if there are also a large number four and five star reviews. If a product has mostly one star reviews, I won't look any further. But if it has a mix of good and bad reviews, I will continue to consider the product.

After that initial take, I don't think it is possible to form a judgement based on the quantity of star ratings alone. You have to read the reviews and also check the dates that they were written. I skip the "Most Helpful" reviews that are shown by default, and look at the most recent reviews. Initial reviews are more likely to be fake (purchased when the product is launched) and recent reviews are more likely to be written by people who purchased the product.

If all or most of the recent reviews are bad, I abandon the product. The pattern indicates that either (a) the product is no longer good, or (b) the product was never good.

If the recent reviews are mostly positive, I read them to see if they are real, to understand what is good and bad about the product, and to see if it matches my use case.

The system has worked well for me. I haven't had many surprises with products I've bought.
posted by Winnie the Proust at 3:15 PM on April 14, 2023


I am sure I read something a while back which talked about a particular distribution of Amazon reviews shaped like a breast (their word not mine), which was suspicious in terms of getting lots of 4 & 5 star reviews initially then an unusually high number of 1 star reviews. Basically indicating a crap product that paid for fake reviews on release. I think they called it the Amazon breast but I can't anything with that search term.
posted by biffa at 3:57 PM on April 14, 2023 [1 favorite]


The last several things I've bought* from Amazon have come with a little insert begging for a 5-star review. Some of them say they will refund 50% of the purchase price if I leave a 5-star review and email a screenshot to them. I toss those cards, and usually have to toss the product too after a few days when it breaks. So I basically don't trust any reviews on Amazon.

* Turns out you can only leave reviews if you spend at least $50 of non-gift-card money a year. I buy things a few times a year but haven't paid with anything other than a gift card since 2017; I'm not allowed to write reviews. So that's another source of bias.
posted by basalganglia at 4:16 PM on April 14, 2023 [1 favorite]


Amazon (and many other sites that have product reviews) show you a distribution that in effect breaks down the distribution by 5 percentile levels, and that seems to fit the purpose for understanding what the distribution is. I would think any attempt to boil the reviews down into fewer statistics than that is going to lose important information.

If you need to somehow summarize summarize the distribution in fewer statistics--like it's important to measure "how much of a J-shape is this?", then that's outside of my experience or needs. However, I was curious enough to Google "how describe reverse bell curve distribution", which seems to come up with a lot of interesting stuff. It lead me to this Wikipedia page on Multimodal distribution. Is this the kind of thing you're looking for?
posted by polecat at 4:42 PM on April 14, 2023


I read a sample of the one star reviews but typically assume they reflect the defect rate or shipping issue rate of the product. Soooo often most of the one star reviews are about items that were poorly packaged and damaged in shipping, took too long, were not delivered as described, and had other issues specific to shipment rather than manufacturing or item quality.

It is fun when you find something like a toaster where most of the reviews are great but there's a small but vocal minority of reviews where the toaster straight exploded spontaneously on the counter while toasting bread, though! Keeps you on your toes
posted by potrzebie at 4:47 PM on April 14, 2023 [2 favorites]


I only pay attention to reviews with specifics. "One star; Crappy product, didn't work" tells me nothing other than whoever wrote the review was having a bad day. I also ignore "Five stars, Beautiful, I love it, exactly what I wanted." That person is having a good day, and is capable of posting their review immediately after unboxing, but before trying the item, or they might be a fake review, done in bulk, at speed.

But, "One star; crappy plastic handle broke the first time I tried to use it," is relevant. Maybe they used it the wrong way, but then so might I.

Or "Five stars, Almost completely silent, easy to replace filters, good size for a small room" is a very useful sort of review.

It's the content of the review for me, not the number of stars. If there are no useful reviews specific to the type of product I assume that they are all fake. I want to see information about the product, not see votes on it. People will give five stars because they love a jacket solely because of the status conferred by a brand name. I want to know if it's water proof, or if the lining makes it hard to get in and out of.

I'd be LESS likely to buy a product with only five and four star ratings, than one with some very low reviews because I'll assume the seller has found a way to game the rating system. I want to see someone who says the colour wasn't as shown, or who is clearly deranged and not talking about the product in question at all. In the first case they probably had the colour settings on their device set to the wrong contrast. In the second case I know they are not capable of blocking bad ratings. Both of those bad ratings would influence me positively.

But if there are a lot of meh reviews I'll consider that the product is probably very cheaply made and probably not buy it. Lots of two star and three star ratings strike me as a bad sign, even if they are not a high percentage. 200 ratings at that level out of 20,000 ratings is a pretty low ratio... but it still means that 200 people appear to have wasted their money.
posted by Jane the Brown at 5:00 PM on April 14, 2023 [5 favorites]


Best answer: For a simple, purely numeric approach, I would just do the average of everything above one star, and then the percentage of one star ratings.

The one star ratings are outliers, both statistically and qualitatively. The cause is presumably [a] the consumer (whiny), [b] their product (an "unlucky" defect), or [c] their interaction (doesn't work for their unusual use case.) You're hoping [a] will average out between similar products, leaving you with [b] and [c] driving unusual spikes in one star ratings.

Then you can use the scores in the range 2-5 to estimate the typical quality of the product and the 1-star ratings to give a signal if it's more likely than others to be a complete failure. You should pay attention to sample size on the one star ratings; this can be done formally (is the difference between two products' 1-star rate statistically significant?), or you can just ask if changing a vote or two would matter.

There's a lot this doesn't account for, like bots or bad-review spam, and maybe you don't care about the non-random case [c] at all. But without sampling individual reviews this is a good quantitative first pass as to the shape of the typical review.
posted by mark k at 6:20 PM on April 14, 2023 [1 favorite]


Mod note: A couple deleted. OP is looking for info on data handling, not for anecdotes about situations where you think someone left a bad review, or tips that are about reading the reviews. Also, as a heads-up, please try to avoid using "insane" as a term for weird, strange, illogical, etc. (more info; Mefi microagressions page). Thanks.
posted by taz (staff) at 11:25 PM on April 14, 2023


I use fakespot for this reason. Many of the 5 star reviews on any given Amazon product are fake, but the one star reviews can also be fake (I know people who have sold products on Amazon and had competitors leave fake bad reviews). This isn't a data distribution problem as much as a fake data problem, so a site that can do some analysis and cleanup of that data to give a more representative score is very helpful. But I don't think you can think of this as a pure data analysis problem, it's a data quality problem.
posted by ch1x0r at 1:09 AM on April 15, 2023


Response by poster: Thanks taz. Just to clarify once more, because I think this has gotten lost — this is not actually an Amazon question, Amazon was just a motivating example (and I see now a distracting one). I can assure you the office climate survey I had in mind did not have a data quality problem. Some data distributions genuinely look like this.
posted by eirias at 3:26 AM on April 15, 2023


I think Amazon is a very different case than an office climate survey. In both cases I would assume that the one-star reviews are sincere, and understanding the problem isn’t really about a data distribution as much as it is reading people’s comments.

But 5-star Amazon reviews are largely fake/paid for, whereas in another context they would likely be genuine.

How you use the data also depends on context. In an office if 10% of people give one star, then the organization has some major work to do. If it’s a restaurant, you can just chalk it up to some people not liking their food or getting slow service that day.
posted by mai at 6:48 AM on April 15, 2023


I don't think there's one single way to interpret this, you would need to build up a body of data with different distributions and then investigate to figure out what they mean. Folks have lots of good hypotheses here about why this distribution comes up but if you're looking to interpret a result based on distribution alone you need a baseline. (This is why employee satisfaction surveys are usually benchmarked against previous years, for example).

One alternative way of analyzing a distribution you might be interested in is a Net Promoter Score, where they ask "how likely are you to recommend..." And anything above, say, 8 is "this person is an advocate for our product" and anything below, say, 6 is "this person is bad for our reputation", and that middle 6-8 range is basically "eh it's fine" and doesn't count either way.
posted by Lady Li at 9:02 AM on April 15, 2023


Averaging ranks is scarily prevalent in social science research. (obligatory xkcd). Looking at distributions by subgroups of interest ("crosstabs") and covariances can help you gain insight to the data (of course, assuming you know subgroups of interest). If you have other questions to match on - "bias" detection techniques might be used (e.g. Differential Item Functioning). Netflix used to be famous for this kind of modeling, but I assume everyone now just gets recommended the latest Netflix show..
posted by Dotty at 9:37 AM on April 15, 2023 [1 favorite]


A quantitative analysis could take one-star ratings and build a so-called word cloud using size to show relative frequency of frequently-used words. Then you compare your one-star rating against the biggest hits in this cloud. Is it representative of a pattern common to all one-star ratings, or something unique to the product you're interested in?

You could further do a mutual information analysis to calculate the information in pairs or bigrams of words or phrases. This can help you get a quantitative measure as to combinations of phrases that you would expect to find in one-star ratings, and gauge whether the commenter has a legitimate complaint, is griefing, or has something novel to say about why they are giving a poor rating.

For either of these or similar informational approaches, you'd need to scrape data from Amazon. Not sure how you'd do that, but you'd need a corpus of data to get started on that kind of analysis.
posted by They sucked his brains out! at 10:18 AM on April 15, 2023 [2 favorites]


Another factor in the discrepancies between reviews is that Amazon mixes together the "same" product from different sellers including knockoffs and fakes, so the reviews may not even be of the same item. Personally I put the most weight on one star reviews. Of course you can discount the ones that seem to not apply or might be irrelevant, but especially if they say similar things I take them seriously. My experience is also that anything with around 10% or more 1 star ratings is risky. Especially with books I get a pretty good gut instinct from the reviews that I usually regret if I ignore it.
posted by blue shadows at 1:50 AM on April 16, 2023


I always start with the one star ratings. They are the most informative. I weight them disproportionately. A handful of compelling 1 star reviews can dissuade me even with hundreds of high reviews.

I also pretty much rate everything either 1, 4, or 5. 1 - don’t buy this. 4 - fine. 5 - buy this!!!
posted by amaire at 5:11 PM on April 16, 2023


I start by reading the reviews from the one-star ratings. About half the time they're from someone you can tell had been careless with their purchase in the first place (my favorite such review was from a guy who bought The Waterboys album FISHERMANS BLUES, but was expecting something like B. B. King and was angry he hadn't received that). But sometimes they point out legitimate problems with the product.

So I check, and if they're all Fishermans-Blues-isn't-blues kinds of reviews I just ignore them.
posted by EmpressCallipygos at 5:35 PM on April 16, 2023


Best answer: How should these "J-shaped" results be reported?

I'm not a statistician, so this is just my personal opinion. These bimodal distributions are probably best thought of as mixed distributions- a draw from a Bernoulli distribution selects whether you get a nice result or a lemon, and then nice and lemon both have their own distributions. Descriptive statistics for the whole sample will likely give the wrong impression; while an average rating of around 4 stars is pretty common, there's basically never a mode at 4 stars! Instead, an honest report should characterize the "outer" process- the percentage of lemon vs. nice- and then give separate descriptives for the "inner" processes.

How do you integrate this information when making decisions?

You're receiving a lot of context-specific advice in this thread (...because context is really important). I want to step back and provide a more abstract model or vocabulary into which most of the rest of the thread can be fit. I'm going to focus on the lemon vs. nice distinction because just that by itself is going to explain a ton of the variance.

You've got people, who each can be thought of as having some vector of features. And you've got the products of the object of study, each of which also has a vector of features. The people come into contact with these products by buying Amazon products, participating in a culture, etc. The result of those (person, product) pairings is measured with Likert (or, as the case may be, Hatert) items.

Is the lemon percentage we see in those Likert items explained by person-features ("10% of reviewers are grumpypantses prone to giving 1-star reviews"), object-features ("10% of these widgets don't even turn on"), or person-object interactions ("33% of these widgets are randomly chosen to be red, and 30% of the customers have an irrational hatred of red", or more seriously "10% of respondents were Black, and the office culture is anti-Black")?

This is what logistic regression models (or, if you're feeling fancy, neural networks!) were made for... if the data were there (multiple observations per person, multiple kinds-of-thing rated).

Lacking that data, we're trying to form heuristic models. The context informs those models as it shows us alternative sources of information or suggests to us baseline probabilities. That reviewer has a person-feature that makes them interact poorly with that product-feature- but I'm confident that I have a different value for that person-feature! *Add To Cart*

The context shapes how we use the information: when buying from Amazon, we are trying to determine "if I buy this product, what is the likelihood that I will get something terrible?" In this situation, person-product interactions can be exculpatory for the products. When evaluating a cultural climate, person-product interactions can be damning. Deciding how you're going to interact with an organization may not just be a question of "If I join, will I be treated okay?"; it might be "Does this org deserve my talents?".
posted by a snickering nuthatch at 9:17 PM on April 18, 2023 [2 favorites]


Response by poster: A mixture model is a great idea — thanks. In fact now that I read back I think this is what mark k’s suggestion boils down to also. I’ve worked with these in continuous settings, but never with Likert scales (“Hatert,” lol).

I am fascinated by the range of attitudes people have toward the one star rating! Not what I came here intending to learn, but still instructive. Thanks.
posted by eirias at 3:48 AM on April 19, 2023 [1 favorite]


« Older What's going on here and how to respond maturely   |   What are some of your favorite paintings? Newer »
This thread is closed to new comments.