For the storage nerds in the audience...
January 29, 2018 7:40 PM Subscribe
I’m struggling to find a decently technical deep dive on how storage hardware (hard disks, SSDs, flash) works. Recommend me some books and papers!
I work on large computing clusters, for a company you’ve probably heard of. I’m considering moving to a storage team, but while I’m pretty familiar with the distributed systems side of things, I don’t have a good handle on the physical storage hardware side. Like, how do disks actually work?
Detailed and technical is good: I did materials physics before computers, so I’m not scared of explanations that include the microscopic details. But Google and Amazon searches are failing me: I don’t want to buy a hard drive and have no interest in a book that will help me pick one. I want one that tells me how they work!
Either books or papers are good, but survey papers are probably better as I’m not looking for specific research. Happy to shell out for a textbook if it’s known to be really good.
I work on large computing clusters, for a company you’ve probably heard of. I’m considering moving to a storage team, but while I’m pretty familiar with the distributed systems side of things, I don’t have a good handle on the physical storage hardware side. Like, how do disks actually work?
Detailed and technical is good: I did materials physics before computers, so I’m not scared of explanations that include the microscopic details. But Google and Amazon searches are failing me: I don’t want to buy a hard drive and have no interest in a book that will help me pick one. I want one that tells me how they work!
Either books or papers are good, but survey papers are probably better as I’m not looking for specific research. Happy to shell out for a textbook if it’s known to be really good.
IBM has a very recent Red Book published Dec 2017, Introduction to Storage Area Networks. Red Books are definitely technical and should give you a good stepping off point for additional research. Best is, they're free. Enjoy.
posted by michswiss at 8:08 PM on January 29, 2018
posted by michswiss at 8:08 PM on January 29, 2018
you are wanting something that gives an overview of how bits are stored on disks of spinning rust, correct?
posted by Dr. Twist at 8:25 PM on January 29, 2018 [1 favorite]
posted by Dr. Twist at 8:25 PM on January 29, 2018 [1 favorite]
Best answer: There is Hard Disk Drive Mechatronics and Control by Al-Mamun, Guo, Bi. And Inside Solid State Drives, ed. Micheloni, Marelli, Eshghi.
posted by solitary dancer at 8:32 PM on January 29, 2018 [2 favorites]
posted by solitary dancer at 8:32 PM on January 29, 2018 [2 favorites]
Best answer: So, "I work on large computing clusters, for a company you’ve probably heard of"... (ok, I've done that too), "but while I’m pretty familiar with the distributed systems side of things, I don’t have a good handle on the physical storage hardware side. Like, how do disks actually work?"
I'm going to guess (might be wrong) that you do not actually care about what's physically going on on the media (or, that if you do, it's out of curiosity, not out of necessity) and that what you really care about is how the software interacts with the firmware of the disk controller, because you're probably trying to get the utmost performance out of the thing, and the only tool you have is the software you write (you're probably not designing your own disk hardware -- or at least *you* are probably not designing your own disk hardware, though someone at your company *might* be... probably not, but can't rule it out.) That means the place where the rubber meets the road is most probably within the storage controller driver within the OS. I've spent a number of years working on such drivers.
First off, high performance flash storage drivers (e.g. NVME) are considerably different than traditional storage drivers. Traditionally, for decades, storage was *SLOW* compared to CPU and RAM. This meant that for performance, it almost didn't matter what the driver did, so long as what it did was ultimately correct because such a tiny fraction of each i/o was spent within the driver compared to the eternity spent waiting for the disk. With high performance flash, this changes.
As I understand it (it's been about 3 years since I looked at this) to keep up with NVME style flash devices, you typically need to aim multiple CPUs at the device, and you need to be very careful about use of locks and synchronization mechanisms to avoid creating bottlenecks. You have a stream of i/o requests coming into the storage driver from the OS, and you want concurrent requests from different CPUs to be able to enter the driver concurrently. Typically there is a storage area per cpu reserved for a ring buffer of requests to be sent to the device, and another ring buffer per cpu for tags coming back from the device. The tags are how the driver will know which requests the device is completing. They are per cpu so that there is no need to synchronize access to a common ring buffer between cpus. Typically there is some way for the cpu to indicate to the device for each command which cpu the interrupt indicating that command's completion should be delivered to (MSI, MSI-X) to make the CPU cache more effective on command completions. You want the completions to occur on the same CPU that the command originates on.
Hmm, this is a deeper topic than really fits in a metafilter comment.
I once wrote this thing that traces the life of an i/o within the linux kernel which may be of interest. It does not involve a high performance flash device, it's more along the lines of traditional i/o.
posted by smcameron at 12:14 AM on January 30, 2018 [4 favorites]
I'm going to guess (might be wrong) that you do not actually care about what's physically going on on the media (or, that if you do, it's out of curiosity, not out of necessity) and that what you really care about is how the software interacts with the firmware of the disk controller, because you're probably trying to get the utmost performance out of the thing, and the only tool you have is the software you write (you're probably not designing your own disk hardware -- or at least *you* are probably not designing your own disk hardware, though someone at your company *might* be... probably not, but can't rule it out.) That means the place where the rubber meets the road is most probably within the storage controller driver within the OS. I've spent a number of years working on such drivers.
First off, high performance flash storage drivers (e.g. NVME) are considerably different than traditional storage drivers. Traditionally, for decades, storage was *SLOW* compared to CPU and RAM. This meant that for performance, it almost didn't matter what the driver did, so long as what it did was ultimately correct because such a tiny fraction of each i/o was spent within the driver compared to the eternity spent waiting for the disk. With high performance flash, this changes.
As I understand it (it's been about 3 years since I looked at this) to keep up with NVME style flash devices, you typically need to aim multiple CPUs at the device, and you need to be very careful about use of locks and synchronization mechanisms to avoid creating bottlenecks. You have a stream of i/o requests coming into the storage driver from the OS, and you want concurrent requests from different CPUs to be able to enter the driver concurrently. Typically there is a storage area per cpu reserved for a ring buffer of requests to be sent to the device, and another ring buffer per cpu for tags coming back from the device. The tags are how the driver will know which requests the device is completing. They are per cpu so that there is no need to synchronize access to a common ring buffer between cpus. Typically there is some way for the cpu to indicate to the device for each command which cpu the interrupt indicating that command's completion should be delivered to (MSI, MSI-X) to make the CPU cache more effective on command completions. You want the completions to occur on the same CPU that the command originates on.
Hmm, this is a deeper topic than really fits in a metafilter comment.
I once wrote this thing that traces the life of an i/o within the linux kernel which may be of interest. It does not involve a high performance flash device, it's more along the lines of traditional i/o.
posted by smcameron at 12:14 AM on January 30, 2018 [4 favorites]
I'll make the guess opposite of @smcameron. Hard disks are, as @Dr. Twist notes, spinning rust. The brown color is iron oxide, same as audio cassettes, 8-tracks, VHS tapes, reel-to-reel tapes, etc. They store info by passing an electric field over the rust, which magnetizes the rust. Power the wires in one direction and it's a zero bit, power the wires in the other direction and it's a one bit.
Here's a vid of sprinkling rust on some cellophane tape and recording onto it, an episode of Tim Hunkin's The Secret Life of Machines
posted by at at 12:36 AM on January 30, 2018 [1 favorite]
Here's a vid of sprinkling rust on some cellophane tape and recording onto it, an episode of Tim Hunkin's The Secret Life of Machines
posted by at at 12:36 AM on January 30, 2018 [1 favorite]
Best answer: These probably aren't deep enough, but they could be useful starting points: a set of slides from a talk by Seagate's CTO which would give you some specific terms to look up (like SMR, TDMR, and HAMR). White papers from Research at Google about Disks for Data Centers and Flash Reliability in the Field. SMR disks are commonly in use, but solid state storage seems to be a pretty rapidly changing field. I don't know what the state of the art would be for data center usage, where cost and reliability are maybe more of a factor than raw speed.
posted by fedward at 8:13 AM on January 30, 2018
posted by fedward at 8:13 AM on January 30, 2018
I'll admit I'm really confused by this question. Unless you're designing the actual drive hardware, there's no real benefit into getting into the deep inner technical workings beyond what you can find on Wikipedia even if you're going to be integrating these drives into your overall solutions. Not to mention, the deep inner technical workings are quite likely to be a trade secret for the company making any individual drive, which is why you'll probably only find general "this is how this type of thing generally works in broad strokes"-type of info.
As long as you know the basic workings and why you might choose one technology over another, it'd be more important to know how to benchmark and test out various storage configurations/vendors for whatever application you'd be using them for so that you can optimize your ROI on it.
posted by Aleyn at 12:59 PM on January 30, 2018
As long as you know the basic workings and why you might choose one technology over another, it'd be more important to know how to benchmark and test out various storage configurations/vendors for whatever application you'd be using them for so that you can optimize your ROI on it.
posted by Aleyn at 12:59 PM on January 30, 2018
(BTW, all of that said, doing a Wiki-walk of the articles I linked and their sources and related links is a pretty good way to get up to speed on modern storage tech IMO; they go into a pretty fair amount of detail all things considered.)
posted by Aleyn at 1:08 PM on January 30, 2018
posted by Aleyn at 1:08 PM on January 30, 2018
Response by poster: Apologies for the long silence, I was sick today and didn't look at the computer much.
I appreciate the answers to the (poorly defined) question! smcameron is correct that the software side of things is what I need functionally, and the physics of the device itself is interesting mostly out of curiosity and a longing for my days doing physics.
Thanks!
posted by fencerjimmy at 7:56 PM on January 30, 2018
I appreciate the answers to the (poorly defined) question! smcameron is correct that the software side of things is what I need functionally, and the physics of the device itself is interesting mostly out of curiosity and a longing for my days doing physics.
Thanks!
posted by fencerjimmy at 7:56 PM on January 30, 2018
This thread is closed to new comments.
posted by Marie Mon Dieu at 7:44 PM on January 29, 2018