Help me become a SAN expert
February 1, 2011 6:56 AM Subscribe
I need to quickly become an expert in SAN technology.
I'm interviewing for a job shortly where a large majority of the position will be consulting on SAN (storage are network) design. I'm a quick learner but I've only briefly dealt with SANs in past roles.
I'm now aware of SNIA and have combed that site.
I've ordered this book: Designing Storage Area Networks
and this book: Storage Virtualization
If you have any good recommendations for where to find white papers, case studies . . I feel a bevy of real world implementation examples would be great. SAN vs NAS. . VLTs, etc.
I'm interviewing for a job shortly where a large majority of the position will be consulting on SAN (storage are network) design. I'm a quick learner but I've only briefly dealt with SANs in past roles.
I'm now aware of SNIA and have combed that site.
I've ordered this book: Designing Storage Area Networks
and this book: Storage Virtualization
If you have any good recommendations for where to find white papers, case studies . . I feel a bevy of real world implementation examples would be great. SAN vs NAS. . VLTs, etc.
Response by poster: I've built home a home NAS . . using free NAS and various linux distros. I've also built a ton of RAID arrays. . . but I'm looking for more enterprise level information.
posted by patrad at 7:03 AM on February 1, 2011
posted by patrad at 7:03 AM on February 1, 2011
A lot of books miss the basics. Knowing them makes building on the concepts much easier.
NAS = Just like a normal fileserver, except that it is single purpose and usually designed for large capacity versus small physical size.
SAN = A machine that serves block-based storage to fileservers. It is inherently a back-end sort of connection: users do not connect to the SAN. They connect to the fileservers, which connect to the SAN.
The purpose is to separate the physical storage (hard drives, RAID) from the fileserver. Started out as a replacement for those external RAID cages that were connected to fileservers via SCSI cables. But you can only cram so many SCSI cards into a server, and they had to be physically close to the fileserver because SCSI cables can only be so long.
So they designed a protocol called iSCSI that allows the controller-to-drive communications to be run over different links than just a SCSI cable. Now your storage can be in another rack, another room, or even another building. Although iSCSI is routable, you aren't going to get great performance across a WAN. But it can have its uses. By "virtualizing the SCSI cable", you can run the protocol over whatever medium works at the moment. FibreChannel, ethernet, whatever 10gb-over-magic comes around the bend.
Further, because those connections are one-to-many or many-to-one (unlike SCSI cables which are one-to-one), you can have one SAN box serving up storage to multiple servers. This allows you to optimize your capacity. Say you have 10 servers, all with three drives. In RAID5, you lose one of those drives to redundancy. So in your 10 servers, you've got 10 drives of "lost" capacity. So, you buy a SAN box and rebuild. You cram those 30 drives into the SAN box and (simplifying) you can decide that maybe you only need 3 drives worth of redundancy. So instead of 20 drives' worth of storage you get 27 drives' worth. And your techs only have to look after one box to replace failed drives instead of 10.
Further, further, a SAN lets you virtualize the storage volumes. You can carve up one 1000gb RAID volume into smaller pieces. You can do it the old fashioned way, and just give each server a range of blocks on the drive, like partitioning a hard drive in a PC. But with modern LVM layers, you can abstract that away. To increase capacity, you don't have to back up, shut down, add drives, rebuild the RAID and then repartition and restore. You just plug in drives, the SAN box can reshape itself online, and then you can tell the LVM to just increase the number of blocks available to the various logical volumes.
In the end, it starts looking just like a transactional or key-value database, where each record is a storage block. The fileservers know that they have X number of blocks of storage, and the SAN figures out how to keep track of them.
It also simplifies backups and helps with redundancy.
Your task is to figure out all the specifics and how to implement all of this, of course.
The key to remember: it is virtual hard drives. Just like how RAID takes many drives and makes one virtual hard drive, SANs take many hard drives, combines them into one or more giant hard drive(s) which can THEN be carved up into smaller virtual drives. Unless you are dealing with fancy filesystems, only one machine can connect to one hard drive at a time. A SAN can't give clients files. It can only give its clients blocks of storage, and it is up to the client to deal with the filesystem.
posted by gjc at 8:04 AM on February 1, 2011 [11 favorites]
NAS = Just like a normal fileserver, except that it is single purpose and usually designed for large capacity versus small physical size.
SAN = A machine that serves block-based storage to fileservers. It is inherently a back-end sort of connection: users do not connect to the SAN. They connect to the fileservers, which connect to the SAN.
The purpose is to separate the physical storage (hard drives, RAID) from the fileserver. Started out as a replacement for those external RAID cages that were connected to fileservers via SCSI cables. But you can only cram so many SCSI cards into a server, and they had to be physically close to the fileserver because SCSI cables can only be so long.
So they designed a protocol called iSCSI that allows the controller-to-drive communications to be run over different links than just a SCSI cable. Now your storage can be in another rack, another room, or even another building. Although iSCSI is routable, you aren't going to get great performance across a WAN. But it can have its uses. By "virtualizing the SCSI cable", you can run the protocol over whatever medium works at the moment. FibreChannel, ethernet, whatever 10gb-over-magic comes around the bend.
Further, because those connections are one-to-many or many-to-one (unlike SCSI cables which are one-to-one), you can have one SAN box serving up storage to multiple servers. This allows you to optimize your capacity. Say you have 10 servers, all with three drives. In RAID5, you lose one of those drives to redundancy. So in your 10 servers, you've got 10 drives of "lost" capacity. So, you buy a SAN box and rebuild. You cram those 30 drives into the SAN box and (simplifying) you can decide that maybe you only need 3 drives worth of redundancy. So instead of 20 drives' worth of storage you get 27 drives' worth. And your techs only have to look after one box to replace failed drives instead of 10.
Further, further, a SAN lets you virtualize the storage volumes. You can carve up one 1000gb RAID volume into smaller pieces. You can do it the old fashioned way, and just give each server a range of blocks on the drive, like partitioning a hard drive in a PC. But with modern LVM layers, you can abstract that away. To increase capacity, you don't have to back up, shut down, add drives, rebuild the RAID and then repartition and restore. You just plug in drives, the SAN box can reshape itself online, and then you can tell the LVM to just increase the number of blocks available to the various logical volumes.
In the end, it starts looking just like a transactional or key-value database, where each record is a storage block. The fileservers know that they have X number of blocks of storage, and the SAN figures out how to keep track of them.
It also simplifies backups and helps with redundancy.
Your task is to figure out all the specifics and how to implement all of this, of course.
The key to remember: it is virtual hard drives. Just like how RAID takes many drives and makes one virtual hard drive, SANs take many hard drives, combines them into one or more giant hard drive(s) which can THEN be carved up into smaller virtual drives. Unless you are dealing with fancy filesystems, only one machine can connect to one hard drive at a time. A SAN can't give clients files. It can only give its clients blocks of storage, and it is up to the client to deal with the filesystem.
posted by gjc at 8:04 AM on February 1, 2011 [11 favorites]
Response by poster: gjc, great info, thanks. You had me until the one of the last sentences: "Unless you are dealing with fancy filesystems, only one machine can connect to one hard drive at a time."
Do you mean to say: only one machine can connect to one virtual hard drive at a time?
posted by patrad at 10:18 AM on February 1, 2011
Do you mean to say: only one machine can connect to one virtual hard drive at a time?
posted by patrad at 10:18 AM on February 1, 2011
This thread is closed to new comments.
posted by majortom1981 at 6:59 AM on February 1, 2011