What’s the advantage to NoSQL?
October 8, 2020 12:29 PM   Subscribe

I am a software developer but not a web developer. I kind of ignored the NoSQL advances since as far as I understand it, it’s for people whose database is really just a key value data store and all their queries are going to be simple key lookup for the record. Is there a reason to use a NoSQL if the data isn’t quite so simple?

It feels like people are using things like MongoDB for rather complex data but pushing the relationship part off into the client reading part via Python Pandas of similar. My SQL usages aren’t super complex but I tend to have multiple tables with different fields and I wonder how I would know if NoSQL would be an advantage?

I would also be interested in articles that describe the current state of the SQL vs NoSQL world. I feel like I have seen the initial hype and the backlash articles but it’s been over ten years so best practices must have been worked out by now. Google is just showing me snake oil marketing.
posted by DoveBrown to Computers & Internet (9 answers total) 11 users marked this as a favorite
 
The NoSQL section of this article covers it very well. You'll probably need to read the rest of the article to place that section in context, but the good news is the whole article is great.
posted by caek at 1:10 PM on October 8, 2020 [3 favorites]


There is a lot of work being done comparing NoSQL and SQL in the realm of MongoDB v PostgreSQL. Álvaro Hernández has written some extremely well researched white papers on the topic. Using many thousands of dollars worth of AWS computing time. Here is his Twitter:

https://mobile.twitter.com/ahachete

I argue that NoSQL is good when you are not familiar with the incoming data and need to figure out what the relationships are before you design your database. Where are the one to one, one to many relationships? Is there a natural primary key? What is all this data going to be used for? How much of it needs to be imported/kept?
posted by bilabial at 1:12 PM on October 8, 2020 [1 favorite]


As a development manager/sometimes developer for a mega-corp, we still mostly focus on SQL, but have done quite a few no-SQL proof of concepts, mostly as push-back against Oracle in support negotiations. Our queries are pretty simple but do involve many tables, but it's an internal app so the front end developers are the same as the back-end developers, so they can optimize front end functions.

The advantages are a huge increase in speed and way less processing power, at a small loss of flexibility and a huge loss of support, since most of the noSQL platforms we used were open source vs fully supported Oracle.

We have also done POC for key-store types of massive data stores for don't get much active editing but are mostly display-oriented data - anything that involved more than a little active editing of data in the tables, Oracle made more sense. We also had more than a few corrupted files in massive key-stores with editing, so we had to make sure we had good backups and had to manage file sizes for the keystore files. More files = harder to audit, so there were downsides there.


The other problem with noSQL is that is was not well-supported across the org, so we were 'rowing our own' which is not good in terms of developer skills or corporate best practices. There are so many good database developers, vs the supply of people who are really good at noSQL. So if your app is going NoSQL, go all the way.

You didn't ask I personally find distributed applications, with many smaller relational databases beats noSQL in terms of scalability and ease of support.
posted by The_Vegetables at 1:20 PM on October 8, 2020


I think there's also a trickle-down effect from the fact that at Google scale you don't really have a choice but to use something that looks a lot like NoSQL. Take something like the index of URLs. It's a straightforward table, you might use the URL or a hash as the key, and store the PageRank, retrieval timestamp, etc. This all maps nicely onto an RDBMS! But you could never use one to store Google's index because the concurrency issues of deleting and inserting URLs would kill you. Or imagine if you changed the format of a field and had to update a billion records to ensure an invariant, while thousands of clients stopped being able to serve requests. Instead you're forced to use something that can be easily "sharded" across many, many servers. You're right, it's primitive compared to SQL, and requires the client to do a bunch of extra work to re-implement stuff that the database "should" be able to handle.

If you want an idea of the motivating factors, look at the design of the Google File System (the original paper is quite readable). They replaced the structured tree-based file systems that have been used since the time directories were invented with something that's just a key-value store (sound familiar?). Not because they wanted to, but because the metadata servers of a networked file system at that scale could never handle all the reads and writes consistently. Primitive but scalable is going to beat full-featured but slow every single time if you're operating at the scale of the whole web.

Anyway, just like how "big data" became a thing even for companies that didn't have big or even medium data, you see a lot of places using NoSQL because it's a state-of-the-art technology, in a sense.
posted by wnissen at 1:53 PM on October 8, 2020


My impression was that there was a big cost and performance advantage a few years ago when online startups were simpler but did not have access to AWS or other powerhouse resources. Also that the state of db software is somewhere in a transition, MSQL and Postgress are increasing in performance and the Nosqls are increasing in complexity, at some point in the future there will be little difference between the two types of db's.
posted by sammyo at 1:56 PM on October 8, 2020


I tried to sort this out for my own purposes some years ago. My short answer is unless I have a real problem I can't handle with a well-indexed relational DB I won't be moving. I didn't see any savings in switching over things I had working well.

For context I was dealing primarily with scientific data generated in house, so at least theoretically pretty well structured and defined especially for recent data. Tens of millions of rows in the largest DBs, one or two orders of magnitude smaller in most other cases.

To be blunt, one thing I learned is a lot of groups weren't good at understanding databases or identifying bottlenecks, so "we need Tech X" became some way to shift the blame to tech and bring in consultants. There is some successful corporate usage at this point but unsurprisingly the big multi-million dollar initiatives fell totally flat. (Technically I use some products that contain a NoSQL DB and it's fine, without offering me any functionality I wouldn't get from the 50-year old tech.)
posted by mark k at 2:14 PM on October 8, 2020 [2 favorites]


Different data structures that a database uses provide certain performance characteristics. Simple relational databases can use B-trees, for instance. Inserting and retrieving data is slower with this structure than with the "dictionary", "key-value", or hash table structure that NoSQL databases tend to use.

This feature makes NoSQL useful in scenarios where accessing disk storage is expensive in time and memory is cheap and fast, and you want to store and retrieve short-lived data quickly — like caches. Caching is great for web apps where many visitors tend to ask for the same thing over and over. A database can quickly look for something and give it to you out of memory, if it is available. NoSQL databases can include caching strategies that push older or lesser-accessed data out of memory.
posted by They sucked his brains out! at 2:45 PM on October 8, 2020


I think of a SQL database as enforcing ACID (Atomicity, Consistency, Isolation, Durability) and if your application can relax some of those properties, NoSQL might be an option.

Different flavors of SQL databases tend to be more alike than different, which is why they're often the default choice. Choosing a NoSQL solution takes a little more analysis -- look at Google Firebase which has two different flavors of NoSQL, each with its own distinct feature set. See also this post on Amazon's DynamoDB with a snarky little flowchart at the end.
posted by RobotVoodooPower at 3:48 PM on October 8, 2020 [1 favorite]


One thing I don't see in these discussions: sometimes, NoSQL is simpler. If you're building on top of cloud infrastructure like AWS, it is massively cheaper and easier to deploy and manage DynamoDB (their NoSQL) than it is RDS (their SQL). (Of course, cost can vary in ways that are pretty opaque.)

The APIs themselves can be more complicated, especially coming from a traditional SQL background, but if you can make the majority of your data access key value lookups you're golden.

On preview, much of that last flowchart is wrong, which makes sense given that it's from 2017. Backup support might have been missing in 2017, but things like VPC startup cost in Lambdas would have been big reasons not to follow some of these recommendations back then. It's much easier to recommend RDS for the serverless use case now because of the Lambda networking improvements.
posted by Anonymous Function at 3:57 PM on October 8, 2020


« Older Recommend some lesser-known Youtube critics?   |   Where to buy thin women's tees with funky... Newer »
This thread is closed to new comments.