Data storage systems. Types of connection of data storage systems Data storage system purpose

is a hardware and software solution for securely storing data and providing quick and reliable access to it.

Implementation of hardware in data storage systems(storage) similar to architecture implementation personal computer. Why, then, use storage systems at all in the architecture of an organization’s local network, why can’t it be provided and implemented on the basis of a regular PC?

Storage system as an additional local network node based on a personal computer or even a powerful server have existed for a long time.

The simplest provision of access to data using protocols FTP(file transfer protocol) and SMB(protocol for remote access to network resources) which is supported in all modern operating systems.

Why then did they appear at all? Storage system?

It's simple, appearance Storage system is associated with a lag in the development and speed of permanent storage devices (hard magnetic disks) from the central processor and RAM. The biggest bottleneck in PC architecture is still considered HDD, even despite the powerful development SATA(serial interface) up to a transfer speed of 600 MB/s ( SATA3), the physical device of the drive is a platter, the data on which must be accessed using read heads, which is very slow. The latest shortcomings have now been resolved by drives SSD(not mechanical storage device), built on the basis of memory chips. In addition to the high price SSD they have, in my opinion, at the current time, a lack of reliability. Engineers Storage system proposed to displace storage devices into a separate element, and use the RAM of such devices to store frequently changing data using special algorithms, which required the software component of the product. Eventually storage systems work faster than hard disk drives in servers, and the removal of the storage device (disk subsystem into a separate element) influenced reliability And centralization systems as a whole.

Reliability ensured the fact of implementing a disk system in a separate device, which, working with the software component, performs one function - these are operations I/O and data storage.

Except simple principle– one device, one function ensuring reliability. All main nodes: power supplies, controllers Data storage systems are duplicated, which of course further increases the reliability of the system, but affects the price of the final product.

Moving the disk system to a separate node allows centralize storage devices. As a rule, without separate network storage, users' home folders, mail, databases are stored on separate nodes, usually servers on the network, which is very inconvenient and unreliable. I have to do backups, duplicate data to a backup server on the network, which in addition to the costs of support and equipment, software, takes up part of the network bandwidth.

This is what it looks like:

With separate storage system:

Depending on the method, connection technology Storage system into the information network. Storage system divided into: DAS, NAS, SAN

DAS (DirectAttachedstorage)– a connection method that is no different from the standard one connecting hard disk, disk arrays (RAID) to a server or PC. Typically used for connection SAS.

SAS– in fact, the protocol designed to replace SCSI uses a serial interface, unlike SCSI, but the commands used are the same as in SCSI. SAS has a large throughput thanks to channel connections in one interface.

NAS (NetworkAttachedstorage)– the disk system is connected to a common LAN network, the TCP transport protocol is used, protocols run on top of the model SMB,NFS (remote access to files and printers).

SAN (StorageAreaNetwork) is a dedicated network connecting storage devices with servers. Works using the protocol Fiber Channel or iSCSI.

WITH FiberChannel everything is clear - optics. And here iSCSI– encapsulation of packets in the IP protocol allows you to create storage networks based on Ethernet infrastructure, transmission speeds of 1Gb and 10GB. According to the developers, the speed of iSCSI should be sufficient for almost all business applications. To connect the server to Storage system By iSCSI Requires adapters that support iSCSI. When using iSCSI, at least two routes are laid to each device, using VLAN, each device and LUN(defines a virtual partition in the array, used for addressing) the address is assigned ( WorldWideName).

Difference NAS from SAN that it's online SAN During I/O operations, data is read and written in blocks. Storage system has no idea about the structure of file systems.

Among the most branded vendors in the storage devices market are: NetApp, IBM, HP, DELL, HITACHI, EMC.

Our project requires a storage system with the following characteristics:

Volume 1TB for files, 1TB for server operating systems and databases, 300 – 500 GB, for backup servers + reserve. Total minimum 3TB of disk space
Support for SMB and NFS protocols for distributing shared files to users without servers
If we want to boot the hypervisor from Storage system, you need at least the iSCSI protocol
In theory, you still need to take into account such an important parameter as the input/output (IO) speed that the storage system can provide. You can estimate this parameter by measuring IO on the operating hardware, for example with the IOMeter program.

It should be taken into account that Microsoft clustering only works through FiberChannel.

Here is a list of companies and hardware to choose from:

Asustor

Asustor AS 606T, AS 608T, 609 RD(except for the possibility of installing up to 8 disks with a capacity of 4Tb stated VMware support, Citrix and Hyper-V.

Hardware component

CPU Intel Atom 2.13

RAM 1GB (3GB) DDR3

Hard 2.5, 3.5, SATA 3 or SSD

Lan Gigabit Ethernet – 2

LCD Screen, HDMI

Net

Network protocols

File system

For built-in hard drives: EXT4, For external hard drives: FAT32, NTFS, EXT3, EXT4, HFS+

Storage

Supports multiple volumes with spare disks

Volume type: Single disk, JBOD, RAID 0, RAID 1, RAID 5, RAID 6, RAID 10

Supports online migration of RAID levels

Maximum number of targets: 256

Maximum number of LUNs: 256

Target masking

LUN display

Mounting ISO Images

MPIO and MCS support

Persistent redundancy (SCSI-3)

Disk management

Search for bad blocks according to schedule

Scheduled S.M.A.R.T scanning

Supported OS

Windows XP, Vista, 7, 8, Server 2003, Server 2008, Server 2012

Mac OS X 10.6 Onwards

UNIX, Linux, and BSD

Backup

Rsync (remote synchronization) mode support

Backup to the cloud

Backup via FTP

Backup to external media

One-touch backup

System administration

Log type: system log, connection log, file access log

Real-time user activity recorder

Real-time system monitor

Network Recycle Bin

User disk quota

Virtual disk (mount ISO images, max. 16)

UPS support

Access Control

Maximum number of users: 4096

Maximum number of groups: 512

Maximum number of shared folders: 512

Maximum number of simultaneous connections: 512

Windows Active Directory support

Safety

Firewall: Preventing Unauthorized Access

Network filter: preventing network attacks

Threat notifications: E-mail, SMS

Secure connections: HTTPS, FTP over SSL/TLS, SSH, SFTP, Rsync over SSH

operating system ADM with the ability to connect additional modules via app central

Models AS 604RD, AS 609RD Unlike AS 606T, AS 608T, do not include an LCD display, are designed for rack installation and have a redundant power supply, support for virtualization platforms is declared

Netgear

Ready Nas 2100, Ready Nas 3100, Ready Nas Pro 6

Hardware component

CPU Intel SOC 1GHz

Hard 2.5, 3.5, SATA 2 or SSD

Lan Gigabit Ethernet – 2

Net

Network protocols

CIFS/SMB, AFP, NFS, FTP, WebDAV, Rsync, SSH, SFTP, iSCSI, HTTP, HTTPS

File system

For built-in hard drives: BTRFS, For external hard drives: FAT32, NTFS, EXT3, EXT4, HFS+

Storage

Supports online RAID capacity expansion

Maximum number of targets: 256

Maximum number of LUNs: 256

Target masking

LUN display

Disk management

Disk capacity, performance, load monitoring

Scan to find bad blocks on disks

Support HDD S.M.A.R.T.

Online correction of data on disks

Disk Scrubbing mode support

Defragmentation support

Messages (from SMTP service via e-mail, SNMP, syslog, local log)

Automatic shutdown (HDD, fans, UPS)

Restoring performance when power is restored

Supported OS

Microsoft Windows Vista(32/64-bit), 7 (32/64-bit), 8 (32/64-bit), Microsoft Windows Server 2008 R2/2012, Apple OS X, Linux/Unix, Solaris, Apple iOS, Google Android)

Backup

Unlimited number of snapshots for continuous protection.

Recover pictures at any time. Through GUI user (admin console), ReadyCLOUD, or Windows Explorer

Ability to create snapshot manually or through a scheduler

Synchronizing files via R-sync

Cloud management Remote Replication(ReadyNAS to ReadyNAS). Does not require licenses for devices running the Radiator OS v6 operating system.

Hot backup

eSATA support

Support Reserve copy on external drives toe (USB/eSATA)

Apple Remote technology support Time Machine backup and restore (via ReadyNAS Remote)

ReadyNAS Vault Cloud support (optional)

ReadyDROP sync support (Mac/Windows file sync to ReadyNAS)

Support for DropBox service for file synchronization (required Account on the DropBox service)

System administration

ReadyCLOUD for device discovery and management

RAIDar – agent for discovering devices on a network (Windows/Mac)

Saving and restoring a configuration file

The event log

syslog server message support

Message support for SMB

Graphical user interface in Russian and English

Genie+ marketplace. Built-in application store to enhance device functionality

Unicode character support

Disk Manager

Thin provision Shares and LUNs support

Instant provisioning

Access Control

Maximum number of users: 8192

Maximum number of groups: 8192

Maximum number of folders provided for network access: 1024

Maximum number of connections: 1024

ACL-based access to folders and files

Advanced folder and subfolder permissions based on ACL for CIFS/SMB, AFP, FTP, Microsoft Active Directory (AD) Domain Controller Authentication

Custom access lists

ACL-based ReadyCLOUD access lists

operating system

ReadyNAS OS 6 is based on Linux 3.x

Ready Nas 3100 distinguishes Ready Nas 2100 2GB ECC memory capacity

Ready Nas Pro 6– storage with six slots, Intel processor Atom D510, 1GB DDR2 memory

Qnap

TS-869U-RP, TS-869 PRO

Hardware component

CPU Intel Atom 2.13GHz

Hard 2.5, 3.5, SATA 3 or SSD

Lan Gigabit Ethernet – 2

Net

IPv4, IPv6, Supports 802.3ad and Six Other Modes for Load Balancing and/or Network Failover, Vlan

Network protocols

CIFS/SMB, AFP, NFS, FTP, WebDAV, Rsync, SSH, SFTP, iSCSI, HTTP, HTTPS

File system

For built-in hard drives: EXT3, EXT4, For external hard drives: FAT32, NTFS, EXT3, EXT4, HFS+

Storage

Volume type: RAID 0, RAID 1, RAID 5, RAID 6, RAID 10

Supports online RAID capacity expansion

Maximum number of targets: 256

Maximum number of LUNs: 256

Target masking

LUN display

iSCSI Initiator (Virtual Disk)

Stack Chaining Master

Up to 8 virtual disks

Disk management

Increasing the disk space capacity of a RAID array without data loss

Scan for bad blocks

RAID recovery function

Bitmap support

Supported OS

Backup

Real-time replication (RTRR)

Works both as an RTRR server and client

Supports real-time and scheduled backups

File filtering, compression and encryption possible

Button for copying data from/to an external device

Apple Time Machine support with backup management

Block-level resource replication (Rsync)

Works both as a server and client

Secure replication between QNAP servers

Backup to external media

Backup to cloud storage systems

NetBak Replicator Application for Windows

Apple Time Machine support

System administration

Web interface using AJAX technology

Connection via HTTP/HTTPS

Instant notifications via E-mail and SMS

Cooling system control

DynDNS and specialized service MyCloudNAS

UPS support with SNMP management (USB)

Network UPS support

Resource Monitor

Network recycle bin for CIFS/SMB and AFP

Detailed event and connection logs

List of active users

Syslog client

Firmware update

Saving and restoring system settings

Restoring factory settings

Access Control

Up to 4096 user accounts

Up to 512 user groups

Up to 512 network resources

Batch adding users

Import/export users

Setting quota parameters

Managing access rights to subfolders

operating system

TS-869 Pro– model without backup power supply, memory capacity 1GB

Synology

RS 2212, DS1813

Hardware component

CPU Intel Core 2.13GHz

Hard 2.5, 3.5, SATA 2 or SSD

Lan Gigabit Ethernet – 2

Net

IPv4, IPv6, Supports 802.3ad and Six Other Modes for Load Balancing and/or Network Failover

Network protocols

CIFS/SMB, AFP, NFS, FTP, WebDAV, SSH

File system

For built-in hard drives: EXT3, EXT4, For external hard drives: NTFS, EXT3, EXT4

Storage

Volume type: RAID 0, RAID 1, RAID 5, RAID 6, RAID 10

Maximum number of targets: 512

Maximum number of LUNs: 256

Disk management

Changing the RAID level without stopping the system

Supported OS

Windows 2000 and later, Mac OS X 10.3 and later, Ubuntu 9.04 and later

Backup

Network redundancy

Local reservation

Synchronizing shared folders

Desktop reservation

System administration

Notification of system events via SMS, E-mail

User quota

Resource monitoring

Access Control

Up to 2048 user accounts

Up to 256 user groups

Up to 256 network resources

operating system

DS1813– 2 GB RAM, 4 Gigabit, HASP 1C support, 4TB disk support

Thecus

N8800PRO v2, N7700PRO v2, N8900

Hardware component

CPU Intel Core 2 1.66GHz

Lan Gigabit Ethernet – 2

LAN capability 10Gb

Net

IPv4, IPv6, Supports 802.3ad and Six Other Modes for Load Balancing and/or Network Failover

Network protocols

CIFS/SMB, NFS, FTP

File system

For built-in hard drives: EXT3, EXT4, For external hard drives: EXT3, EXT4, XFS

Storage

Volume type: RAID 0, RAID 1, RAID 5, RAID 6, RAID 10, RAID 50, RAID 60

Supports online RAID capacity expansion

Target masking

LUN display

Disk management

Disk health monitoring (S.M.A.R.T)

Scan for bad blocks

Ability to mount ISO images

Supported OS

Microsoft Windows 2000, XP, Vista (32/ 64 bit), Windows 7 (32/ 64 bit), Server 2003/ 2008

Backup

Acronis True Image

Thecus Backup Utility

Reading from optical disk on Nas

System administration

Server web administration interface

Access Control

ADS support

operating system

N7700PRO v2– model without backup power supply

N8900– new model with support for SATA 3 and SAS

Based on the data above, you need at least 3 Tb at the moment, and when updating the OS and programs this figure can be multiplied by two, then you need disk storage with a capacity of at least 6Tb, and with the possibility of growth. Therefore, with a bookmark for the future and the organization of a RAID 5 array, the final figure is the need for 12 Tb. To support a 4Tb hard drive disk system, you need a system with at least six drive bays.

The selection has been significantly reduced by the following models: AS 609RD, Ready NAS 3200, TS-869U-RP, RS-1212RP+, N8900. All models include additional power supply. And the manufacturer’s declared support for well-known virtualization platforms. The model from NetGear seemed the most interesting - Ready NAS 3200, since only this model, in addition to SMART, supported at least some additional technologies for working with disks other than SMART and memory with ECC, but the price went over 100,000 rubles, and besides, there were doubts about the possibility of working with 4Tb and SATA3 disks in it. Price per RS-1212RP+, also flew above 100 thousand. AS 609RD– the player in the storage systems market is very new, so it is not known how this one will behave Storage system.

What was left was only two systems to choose from: TS-869U-R.P., N8900.

TS-869U-RP– currently costs about 88,000 rubles.

N8900– price 95,400 rubles, has a lot of advantages compared to TS-869U-RP– this supports both SATA drives and SAS, possibility of additional adapter installation 10 Gb, more powerful dual-core processor, support for SATA3 4Tb drives. In addition, there is firmware redundancy on a backup chip, which provides more favorable reliability compared to other systems.

Back

Shkera

With the daily increase in complexity of networked computer systems and global enterprise solutions, the world began to demand technologies that would give impetus to the renaissance corporate systems information storage (storage systems). Now, one single technology brings never-before-seen performance, enormous scalability, and exceptional total cost of ownership benefits to the world's treasure trove of storage advancements. The circumstances that emerged with the advent of the FC-AL (Fibre Channel - Arbitrated Loop) standard and the SAN (Storage Area Network), which develops on its basis, promise a revolution in data-oriented computing technologies.

“The most significant development in storage we"ve seen in 15 years"
Data Communications International, March 21, 1998

Formal definition of SAN as interpreted by the Storage Network Industry Association (SNIA):

“A network whose main task is to transfer data between computer systems and data storage devices, as well as between the storage systems themselves. A SAN consists of a communications infrastructure that provides physical connection, and is also responsible for the management layer, which combines communications, storage and computer systems, transmitting data safely and securely.”
SNIA Technical Dictionary, copyright Storage Network Industry Association, 2000

Options for organizing access to storage systems

There are three main options for organizing access to storage systems:

SAS (Server Attached Storage), storage attached to the server;
NAS (Network Attached Storage), storage connected to the network;
SAN (Storage Area Network), data storage network.

Let us consider the topologies of the corresponding storage systems and their features.

SAS

A storage system connected to the server. A familiar, traditional way of connecting a storage system to a high-speed interface in a server, usually a parallel SCSI interface.

Figure 1. Server Attached Storage

The use of a separate enclosure for the storage system within the SAS topology is not mandatory.

The main advantage of a storage connected to a server compared to other options is its low price and high performance based on one storage for one server. This topology is the most optimal in the case of using one server through which access to the data array is organized. But it still has a number of problems that prompted designers to look for other options for organizing access to data storage systems.

Features of SAS include:

Access to data depends on the OS and file system (in general);
The complexity of organizing systems with high availability;
Low cost;
High performance within one node;
Reduced response speed when loading the server that serves the storage.

NAS

Storage system connected to the network. This option for organizing access appeared relatively recently. Its main advantage is the ease of integrating an additional storage system into existing networks, but by itself it does not bring any radical improvements to the storage architecture. In fact, NAS is a pure file server, and today you can find many new storage-type NAS implementations based on Thin Server technology.

Figure 2. Network Attached Storage.

NAS Features:

Dedicated file server;
Access to data is independent of OS and platform;
Ease of administration;
Maximum ease of installation;
Low scalability;
Conflict with LAN/WAN traffic.

Storage built using NAS technology is an ideal option for cheap servers with a minimal set of functions.

SAN

Data storage networks began to develop intensively and be implemented only in 1999. The basis of a SAN is a network separate from the LAN/WAN, which serves to organize access to data from servers and workstations that directly process it. Such a network is created based on the Fiber Channel standard, which gives storage systems the advantages of LAN/WAN technologies and the ability to organize standard platforms for systems with high availability and high demand intensity. Almost the only drawback of SAN today is the relatively high price of components, but the total cost of ownership for corporate systems built using storage area network technology is quite low.

Figure 3. Storage Area Network.

The main advantages of SAN include almost all of its features:

Independence of the SAN topology from storage systems and servers;
Convenient centralized management;
No conflict with LAN/WAN traffic;
Convenient data backup without loading the local network and servers;
High performance;
High scalability;
High flexibility;
High availability and fault tolerance.

It should also be noted that this technology is still quite young and in the near future it should undergo many improvements in the field of standardization of management and methods of interaction of SAN subnets. But one can hope that this only threatens the pioneers with additional prospects for championship.

FC as the basis for building a SAN

Like a LAN, a SAN can be created using a variety of topologies and media. When building a SAN, both a parallel SCSI interface and Fiber Channel or, say, SCI (Scalable Coherent Interface) can be used, but SAN owes its increasing popularity to Fiber Channel. The design of this interface involved experts with significant experience in the development of both channel and network interfaces, and they managed to combine all the important positive features of both technologies in order to get something truly revolutionary new. What exactly?

Basic key features channel:

Low latency
High speeds
High reliability
Point-to-point topology
Small distances between nodes
Platform dependency

and network interfaces:

Multipoint topologies
Long distances
High scalability
Low speeds
Long delays

merged into Fiber Channel:

High speeds
Protocol independence (levels 0-3)
Long distances
Low latency
High reliability
High scalability
Multipoint topologies

Traditionally, storage interfaces (that is, what is located between the host and storage devices) have been an obstacle to increased performance and increased capacity of storage systems. At the same time, application tasks require a significant increase in hardware capacity, which, in turn, entails the need to increase the throughput of interfaces for communication with storage systems. It is precisely the problems of building flexible high-speed data access that Fiber Channel helps solve.

The Fiber Channel standard was finalized over the past few years (from 1997 to 1999), during which a tremendous amount of work was done to harmonize the interaction of various component manufacturers, and everything was done to move Fiber Channel from a purely conceptual technology into real, which received support in the form of installations in laboratories and computer centers. In the year 1997, the first commercial samples of cornerstone components for building FC-based SANs, such as adapters, hubs, switches and bridges, were designed. Thus, since 1998, FC has been used for commercial purposes in business, manufacturing and large-scale projects for the implementation of failure-critical systems.

Fiber Channel is an open industry standard for high-speed serial interface. It provides connection to servers and storage systems at a distance of up to 10 km (using standard equipment) at a speed of 100 MB/s (samples of products that use new standard Fiber Channel with speeds of 200 MB/s per ring, and implementations of the new standard with speeds of 400 MB/s are already operating in laboratory conditions, which is 800 MB/s when using a double ring). (At the time of publication of this article, a number of manufacturers had already begun shipping network cards and switches with FC 200 MB/s.) Fiber Channel simultaneously supports a number of standard protocols (including TCP/IP and SCSI-3) using a single physical medium, which potentially simplifies construction network infrastructure, in addition, this provides opportunities to reduce installation and maintenance costs. However, using separate subnets for LAN/WAN and SAN has a number of advantages and is recommended by default.

One of the most important advantages of Fiber Channel along with speed parameters(which, by the way, are not always the main ones for SAN users and can be implemented using other technologies) is the ability to work over long distances and topology flexibility, which came to the new standard from network technologies. Thus, the concept of building a storage network topology is based on the same principles as traditional networks, usually based on hubs and switches, which help prevent speed drops as the number of nodes increases and create the possibility of conveniently organizing systems without a single point of failure.

To better understand the advantages and features of this interface, we present a comparative description of FC and Parallel SCSI in the form of a table.

Table 1. Comparison of Fiber Channel and Parallel SCSI technologies

The Fiber Channel standard assumes the use of various topologies, such as point-to-point, ring or FC-AL hub (Loop or Hub FC-AL), backbone switch (Fabric/Switch).

Point-to-point topology is used to connect a single storage system to a server.

Loop or Hub FC-AL - for connecting multiple storage devices to multiple hosts. By organizing a double ring, the speed and fault tolerance of the system increases.

Switches are used to provide maximum performance and fault tolerance for complex, large and extensive systems.

Thanks to network flexibility, SAN has an extremely important feature - the convenient ability to build fault-tolerant systems.

By offering alternative solutions for storage systems and the ability to combine multiple storage systems for hardware redundancy, SAN helps protect hardware and software systems from hardware failures. To demonstrate, we will give an example of creating a two-node system without points of failure.

Figure 4. No Single Point of Failure.

The construction of three or more node systems is carried out by simply adding additional servers to the FC network and connecting them to both hubs/switches).

When using FC, building disaster-tolerant systems becomes transparent. Network channels for both storage and local networks can be laid on the basis of optical fiber (up to 10 km or more using signal amplifiers) as a physical carrier for FC, while standard equipment is used, which makes it possible to significantly reduce the cost of such systems.

By being able to access all SAN components from anywhere, we have an extremely flexible data network that can be managed. It should be noted that the SAN provides transparency (the ability to see) all components down to the disks in storage systems. This feature has prompted component manufacturers to leverage their significant experience in building LAN/WAN management systems to build rich monitoring and management capabilities into all SAN components. These capabilities include monitoring and management of individual nodes, storage components, enclosures, network devices and network substructures.

The SAN management and monitoring system uses open standards such as:

SCSI command set
SCSI Enclosure Services (SES)
SCSI Self Monitoring Analysis and Reporting Technology (S.M.A.R.T.)
SAF-TE (SCSI Accessed Fault-Tolerant Enclosures)
Simple Network Management Protocol (SNMP)
Web-Based Enterprise Management (WBEM)

Systems built using SAN technologies not only provide the administrator with the ability to monitor the development and status of storage resources, but also open up opportunities for monitoring and controlling traffic. Thanks to these resources, SAN management software implements the most effective storage capacity planning schemes and load balancing on system components.

Storage area networks are perfectly integrated into existing information infrastructures. Their implementation does not require any changes to the already existing networks LAN and WAN, but only expands the capabilities of existing systems, relieving them of tasks focused on transferring large amounts of data. Moreover, when integrating and administering a SAN, it is very important that key network elements support hot replacement and installation, with dynamic configuration capabilities. So the administrator can add this or that component or replace it without turning off the system. And this entire integration process can be visually displayed in graphics system SAN management.

Having considered the above advantages, we can highlight a number of key points that directly affect one of the main advantages of the Storage Area Network - the total cost of ownership (Total Cost Ownership).

Incredible scalability allows an enterprise using a SAN to invest in servers and storage as needed. And also to preserve your investments in already installed equipment when changing technological generations. Each new server will have high-speed access to storage and each additional gigabyte of storage will be available to all servers on the subnet at the administrator’s command.

Excellent capabilities for building fault-tolerant systems can bring direct commercial benefits by minimizing downtime and saving the system in the event of a natural disaster or some other disaster.

The controllability of components and the transparency of the system provide the opportunity to centrally administer all storage resources, and this, in turn, significantly reduces the cost of their support, the cost of which, as a rule, is more than 50% of the cost of equipment.

Impact of SAN on applications

In order for our readers to understand more clearly how practically useful the technologies discussed in this article are, we will give several examples of applied problems that, without the use of storage networks, would be solved ineffectively, would require enormous financial investments, or would not be solved at all by standard methods.

Data Backup and Recovery

Using a traditional SCSI interface, when building data backup and recovery systems, the user is faced with a number of complex problems that can be very easily solved using SAN and FC technologies.

Thus, the use of storage networks takes the solution of the problem of backup and recovery to a new level and provides the opportunity to perform backups several times faster than before, without loading the local network and servers with data backup work.

Server Clustering

One of the typical tasks for which a SAN is effectively used is server clustering. Since one of the key points in organizing high-speed cluster systems that work with data is access to storage, with the advent of SAN, the construction of multi-node clusters at the hardware level can be solved by simply adding a server connected to the SAN (this can be done without even turning off the system, since FC switches support hot-plug). When using a parallel SCSI interface, the connectivity and scalability of which is much worse than that of FC, it would be difficult to create data-processing-oriented clusters with more than two nodes. Parallel SCSI switches are very complex and expensive devices, but for FC this is a standard component. To create a cluster that will not have a single point of failure, it is enough to integrate a mirrored SAN (DUAL Path technology) into the system.

Within the framework of clustering, one of the RAIS (Redundant Array of Inexpensive Servers) technologies seems particularly attractive for building powerful, scalable Internet commerce systems and other types of tasks with increased power requirements. According to Alistair A. Croll, co-founder of Networkshop Inc, using RAIS is quite effective: “For example, for $12,000-$15,000 you can buy about six inexpensive single- or dual-processor (Pentium III) Linux/Apache servers. The power, scalability and fault tolerance of such a system will be significantly higher than, for example, a single four-processor server based on Xeon processors, and the cost will be the same.”

Concurrent video streaming, data sharing

Imagine a task where you need to edit video at several (say, >5) stations or simply work on huge amounts of data. Transferring a 100GB file over a local network will take you a few minutes, and overall working on it will be a very difficult task. With a SAN, each workstation and server on the network accesses the file at speeds equivalent to a local high-speed disk. If you need another station/server for data processing, you can add it to the SAN without turning off the network, simply by connecting the station to the SAN switch and granting it access rights to the storage. If you are no longer satisfied with the performance of the data subsystem, you can simply add another storage and, using data distribution technology (for example, RAID 0), get twice the performance.

Basic SAN Components

Wednesday

Copper and optical cables are used to connect components within the Fiber Channel standard. Both types of cables can be used simultaneously when building a SAN. Interface conversion is carried out using GBIC (Gigabit Interface Converter) and MIA (Media Interface Adapter). Both types of cable today provide the same data transfer speed. Copper cable is used for short distances (up to 30 meters), optical cable - both for short and for distances up to 10 km and more. Multimode and single-mode optical cables are used. Multimode cable is used for short distances (up to 2 km). The internal diameter of the optical fiber of a multimode cable is 62.5 or 50 microns. To achieve transfer speeds of 100 MB/s (200 MB/s full duplex) when using multimode fiber, the cable length should not exceed 200 meters. Single mode cable is used for long distances. The length of such a cable is limited by the power of the laser used in the signal transmitter. The internal diameter of the optical fiber of a single-mode cable is 7 or 9 microns, it allows the passage of a single beam.

Connectors, adapters

To connect copper cables, DB-9 or HSSD type connectors are used. HSSD is considered more reliable, but DB-9 is used just as often because it is simpler and cheaper. The standard (most common) connector for optical cables is the SC connector; it provides a high-quality, clear connection. For regular connections, multimode SC connectors are used, and for remote connections, single-mode connectors are used. Multiport adapters use microconnectors.

The most common adapters for FC under PCI bus 64 bit. Also, many FC adapters are produced for the S-BUS bus; adapters for MCA, EISA, GIO, HIO, PMC, Compact PCI are produced for specialized use. The most popular are single-port cards; there are two- and four-port cards. On PCI adapters, as a rule, they use DB-9, HSSD, SC connectors. Also often found are GBIC-based adapters, which come with or without GBIC modules. Fiber Channel adapters differ in the classes they support and the various features they offer. To understand the differences, here is a comparison table of adapters produced by QLogic.

Fiber Channel Host Bus Adapter Family Chart
SANblade	64 Bit	FCAL Publ. Pvt Loop	FL Port	Class 3	F Port	Class 2	Point to Point	IP/ SCSI	Full Duplex	FC Tape	PCI 1.0 Hot Plug Spec	Solaris Dynamic Reconfig	VIВ	2Gb
2100 Series	33 & 66MHz PCI	X	X	X
2200 Series	33 & 66MHz PCI	X	X	X	X	X	X	X	X	X
	33MHz PCI	X	X	X	X	X	X	X	X	X	X
	25 MHZ Sbus	X	X	X	X	X	X	X	X	X		X
2300 Series	66MHZ PCI/ 133MHZ PCI-X	X	X	X	X	X	X	X	X	X			X	X

Hubs

Fiber Channel HUBs (hubs) are used to connect nodes to an FC ring (FC Loop) and have a structure similar to Token Ring hubs. Since a broken ring can lead to the cessation of network functioning, modern FC hubs use ring bypass ports (PBC-port bypass circuit), which allow automatic opening/closing of the ring (connecting/disconnecting systems connected to the hub). Typically FC HUBs support up to 10 connections and can stack up to 127 ports per ring. All devices connected to the HUB receive a common bandwidth that they can share among themselves.

Switches

Fiber Channel Switches (switches) have the same functions as LAN switches familiar to the reader. They provide full-speed, non-blocking connections between nodes. Any node connected to an FC switch receives full (with scalable) bandwidth. As the number of ports on a switched network increases, its throughput increases. Switches can be used in conjunction with hubs (which are used for areas that do not require dedicated bandwidth for each node) to achieve optimal price/performance ratio. Thanks to cascading, switches can potentially be used to create FC networks with 2 24 addresses (over 16 million).

Bridges

FC Bridges (bridges or multiplexers) are used to connect parallel SCSI devices to an FC-based network. They provide translation of SCSI packets between Fiber Channel and Parallel SCSI devices, examples of which are Solid State Disk (SSD) or tape libraries. It should be noted that recently, almost all devices that can be utilized within a SAN are being produced by manufacturers with a built-in FC interface for direct connection to storage networks.

Servers and Storage

Despite the fact that servers and storage are far from the least important components of a SAN, we will not dwell on their description, since we are sure that all our readers are well familiar with them.

In the end, I would like to add that this article is only the first step towards storage networks. To fully understand the topic, the reader should pay a lot of attention to the features of how components are implemented by SAN and software management, since without them the Storage Area Network is just a set of elements for switching storage systems that will not bring you the full benefits of implementing a storage network.

Conclusion

Today, Storage Area Network is a fairly new technology that may soon become widespread among corporate customers. In Europe and the USA, enterprises that have a fairly large fleet of installed storage systems are already beginning to switch to storage networks to organize storage with the best total cost of ownership.

According to analysts, in 2005, a significant number of mid- and high-end servers will come with a pre-installed Fiber Channel interface (this trend can already be seen today), and only the parallel SCSI interface will be used for internal disk connections in servers. Even today, when building storage systems and purchasing mid- and high-end servers, you should pay attention to this promising technology, especially since today it makes it possible to implement a number of tasks much cheaper than using specialized solutions. Plus, if you invest in SAN technology today, you won't lose your investment tomorrow because Fiber Channel's features create great opportunities to leverage your investment today into the future.

P.S.

The previous version of the article was written in June 2000, but due to the lack of mass interest in storage network technology, publication was postponed to the future. This future has arrived today, and I hope that this article will encourage the reader to realize the need to move to storage area network technology as an advanced technology for building storage systems and organizing data access.

In this article, we will look at what types of data storage systems (SDS) exist today, and we will also consider one of the main components of SDS - external connection interfaces (interaction protocols) and drives on which data is stored. We'll do them the same way general comparison according to the opportunities provided. For examples, we will refer to the storage system line provided by DELL.

Examples of DAS models
Examples of NAS models
Examples of SAN models
Types of storage media and protocol for interaction with storage systems Fiber Channel Protocol
iSCSI protocol
SAS protocol
Comparison of storage system connection protocols

Existing types of storage systems

In the case of a separate PC, the storage system can be understood as the internal hard drive or disk system ( RAID array). When it comes to data storage systems at different levels of enterprises, then traditionally three technologies for organizing data storage can be distinguished:

Direct Attached Storage (DAS);
Network Attach Storage (NAS);
Storage Area Network (SAN).

DAS (Direct Attached Storage) devices are a solution when a data storage device is connected directly to a server or workstation, usually via an interface using the SAS protocol.

NAS (Network Attached Storage) devices are a separate integrated disk system, essentially a NAS server, with its own specialized OS and a set of useful functions quick launch systems and providing access to files. The system connects to a regular computer network(LAN), and is quick solution problems of lack of free disk space available to users of this network.

A Storage Area Network (SAN) is a special dedicated network that connects storage devices with application servers, usually based on the Fiber Channel protocol or the iSCSI protocol.

Now let's take a closer look at each of the above types of storage systems, their positive and negative sides.

DAS (Direct Attached Storage) storage system architecture

The main advantages of DAS systems include their low cost (compared to other storage solutions), ease of deployment and administration, as well as high speed of data exchange between the storage system and the server. In fact, it is precisely because of this that they have gained great popularity in the segment of small offices, hosting providers and small corporate networks. At the same time, DAS systems also have their drawbacks, which include non-optimal utilization of resources, since each DAS system requires the connection of a dedicated server and allows you to connect a maximum of 2 servers to a disk shelf in a certain configuration.

Figure 1: Direct Attached Storage Architecture

Fairly low cost. Essentially, this storage system is a disk basket with hard drives located outside the server.
Easy to deploy and administer.
High speed of exchange between the disk array and the server.

Low reliability. If the server to which it is connected fails this storage, the data is no longer available.
Low degree of resource consolidation - all capacity is available to one or two servers, which reduces the flexibility of data distribution between servers. As a result, it is necessary to purchase either more internal hard drives or install additional disk shelves for other server systems
Low resource utilization.

Examples of DAS models

Of the interesting models of devices of this type, I would like to note the DELL PowerVault MD series. The initial models of disk shelves (JBOD) MD1000 and MD1120 allow you to create disk arrays with up to 144 disks. This is achieved due to the modularity of the architecture; up to 6 devices can be connected to the array, three disk shelves per RAID controller channel. For example, if we use a rack of 6 DELL PowerVault MD1120, then we will implement an array with an effective data volume of 43.2 TB. Such disk enclosures are connected by one or two SAS cables to external ports of RAID controllers installed in Dell PowerEdge servers and are managed by the management console of the server itself.

If there is a need to create an architecture with high fault tolerance, for example, to create a failover cluster of MS Exchange or a SQL server, then the DELL PowerVault MD3000 model is suitable for these purposes. This system already has active logic inside the disk enclosure and is completely redundant due to the use of two built-in active-active RAID controllers that have a mirrored copy of the data buffered in cache memory.

Both controllers process data read and write streams in parallel, and if one of them fails, the second “pick up” data from the neighboring controller. At the same time, connection to a low-level SAS controller inside 2 servers (cluster) can be made via several interfaces (MPIO), which provides redundancy and load balancing in Microsoft environments. To expand disk space, you can connect 2 additional MD1000 disk shelves to the PowerVault MD3000.

NAS (Network Attached Storage) storage system architecture

NAS technology (networked storage subsystems, Network Attached Storage) is developing as an alternative to universal servers that carry many functions (printing, applications, fax server, Email and so on.). In contrast, NAS devices perform only one function - a file server. And they try to do it as best, easier and faster as possible.

NAS connect to a LAN and provide data access to an unlimited number of heterogeneous clients (clients with different OSes) or other servers. Currently, almost all NAS devices are designed for use in Ethernet networks (Fast Ethernet, Gigabit Ethernet) based on TCP/IP protocols. NAS devices are accessed using special file access protocols. Most common protocols file access The protocols are CIFS, NFS and DAFS. Such servers contain specialized operating systems such as MS Windows Storage Server.

Figure 2: Network Attached Storage Architecture

The cheapness and availability of its resources not only for individual servers, but also for any computers in the organization.
Ease of sharing resources.
Ease of deployment and administration
Versatility for clients (one server can serve MS, Novell, Mac, Unix clients)

Accessing information through “network file system” protocols is often slower than accessing a local disk.
Most inexpensive NAS servers do not provide the fast and flexible method of accessing data at the block level inherent in SAN systems, rather than at the file level.

Examples of NAS models

At the moment, classic NAS solutions such as PowerVault NF100/500/600. These are systems based on mass-market 1 and 2 processor Dell servers optimized for rapid deployment NAS services. They allow you to create file storage up to 10 TB (PowerVault NF600) using SATA or SAS drives, and connecting this server to the LAN. There are also higher-performance integrated solutions, such as PowerVault NX1950, which can accommodate 15 drives and can be expanded to 45 by connecting additional MD1000 disk enclosures.

A major advantage of the NX1950 is the ability to work not only with files, but also with data blocks at the iSCSI protocol level. Also, the NX1950 variety can work as a “gateway”, allowing file access to iSCSI-based storage systems (with block access method), for example MD3000i or Dell EqualLogic PS5x00.

SAN (Storage Area Network) storage system architecture

A Storage Area Network (SAN) is a special dedicated network that connects storage devices with application servers, usually based on the Fiber Channel protocol, or on the increasingly popular iSCSI protocol. Unlike NAS, SAN has no concept of files: file operations are performed on servers connected to the SAN. SAN operates in blocks, like a large hard drive. The ideal result of a SAN is the ability of any server running any operating system to access any part of the disk capacity located in the SAN. SAN end elements are application servers and storage systems (disk arrays, tape libraries, etc.). And between them, as in a regular network, there are adapters, switches, bridges, and hubs. ISCSI is a more “friendly” protocol because it is based on the use of standard Ethernet infrastructure - network cards, switches, cables. Moreover, iSCSI-based storage systems are the most popular for virtualized servers due to the ease of setting up the protocol.

Figure 3: Storage Area Network Architecture

High reliability of access to data located on external storage systems. Independence of the SAN topology from the storage systems and servers used.
Centralized data storage (reliability, security).
Convenient centralized switching and data management.
Moves heavy I/O traffic to a separate network, offloading the LAN.
High performance and low latency.
Scalability and flexibility of the SAN logical fabric
The ability to organize backup, remote storage systems and a remote backup and data recovery system.
The ability to build fault-tolerant cluster solutions without additional costs based on an existing SAN.

Higher cost
Difficulty in setting up FC systems
The need for certification of specialists in FC networks (iSCSI is a simpler protocol)
More stringent requirements for component compatibility and validation.
The appearance of DAS “islands” in networks based on the FC protocol due to the high cost, when enterprises have single servers with internal disk space, NAS servers or DAS systems due to budget constraints.

Examples of SAN models

At the moment, there is a fairly large selection of disk arrays for building SANs, ranging from models for small and medium-sized enterprises, such as the DELL AX series, which allow you to create storage capacities of up to 60 TB, and ending with disk arrays for large corporations, the DELL/EMC CX4 series, they allow you to create storage capacities of up to 950 TB. Eat inexpensive solution based on iSCSI, this is the PowerVault MD3000i - the solution allows you to connect up to 16-32 servers, you can install up to 15 disks in one device, and expand the system with two MD1000 shelves, creating a 45TB array.

The Dell EqualLogic system based on the iSCSI protocol deserves special mention. It is positioned as an enterprise-scale storage system and is comparable in price to Dell systems | EMC CX4, with a modular port architecture that supports both the FC protocol and the iSCSI protocol. The EqualLogic system is peer-to-peer, meaning each disk enclosure has active RAID controllers. When connecting these arrays to unified system, the performance of the disk pool increases smoothly with the increase in the available data storage volume. The system allows you to create arrays of more than 500TB, can be configured in less than an hour, and does not require specialized knowledge of administrators.

The licensing model is also different from the rest and already includes in the initial price all possible snapshot options, replication and integration tools into various OSes and applications. This system is considered one of the most fast systems in tests for MS Exchange (ESRP).

Types of storage media and protocol for interaction with storage systems

Having decided on the type of storage system that is most suitable for you to solve certain problems, you need to move on to choosing a protocol for interacting with the storage system and selecting the drives that will be used in the storage system.

Currently, SATA and SAS drives are used to store data in disk arrays. Which disks to choose for storage depends on specific tasks. Several facts are worth noting.

SATA II drives:

Single disk sizes up to 1 TB available
Rotation speed 5400-7200 RPM
I/O speed up to 2.4 Gbps
The time between failures is approximately two times less than that of SAS drives.
Less reliable than SAS drives.
About 1.5 times cheaper than SAS disks.

Single disk sizes up to 450 GB available
Rotation speed 7200 (NearLine), 10000 and 15000 RPM
I/O speed up to 3.0 Gbps
MTBF is twice as long as SATA II drives.
More reliable drives.

Important! Last year, industrial production of SAS disks with a reduced rotation speed of 7200 rpm (Near-line SAS Drive) began. This made it possible to increase the amount of data stored on one disk to 1 TB and reduce the energy consumption of disks with a high-speed interface. Despite the fact that the cost of such drives is comparable to the cost of SATA II drives, and the reliability and I/O speed remains at the level of SAS drives.

Thus, at this moment it is worth really thinking seriously about the data storage protocols that you are going to use within the framework of enterprise storage.

Until recently, the main protocols for interaction with storage systems were FibreChannel and SCSI. Now SCSI has been replaced by the iSCSI and SAS protocols, having expanded its functionality. Let's look below at the pros and cons of each of the protocols and the corresponding interfaces for connecting to the storage system.

Fiber Channel Protocol

In practice, modern Fiber Channel (FC) has speeds of 2 Gbit/Sec (Fiber Channel 2 Gb), 4 Gbit/Sec (Fiber Channel 4 Gb) full-duplex or 8 Gbit/Sec, that is, this speed is provided simultaneously in both directions. At such speeds, connection distances are practically unlimited - from the standard 300 meters on the most “ordinary” equipment to several hundred or even thousands of kilometers when using specialized equipment. The main advantage of the FC protocol is the ability to combine many storage devices and hosts (servers) into a single storage area network (SAN). At the same time, there is no problem of distributing devices over long distances, the possibility of channel aggregation, the possibility of redundant access paths, “hot plugging” of equipment, and greater noise immunity. But on the other hand, we have a high cost and high labor intensity for installing and maintaining disk arrays using FC.

Important! The two terms Fiber Channel protocol and Fiber Channel interface should be distinguished. The Fiber Channel protocol can operate on different interfaces - both on a fiber-optic connection with different modulations, and on copper connections.

Flexible storage scalability;
Allows you to create storage systems over significant distances (but shorter than in the case of the iSCSI protocol; where, in theory, the entire global IP network can act as a carrier.
Great reservation possibilities.

High cost of the solution;
Even higher costs when organizing an FC network over hundreds or thousands of kilometers
High labor intensity during implementation and maintenance.

Important! In addition to the emergence of the FC8 Gb/s protocol, the emergence of the FCoE (Fibre Channel over Ethernet) protocol is expected, which will allow the use of standard IP networks to organize the exchange of FC packets.

iSCSI protocol

iSCSI (IP-based SCSI encapsulation) allows users to create IP-based storage networks using Ethernet infrastructure and RJ45 ports. In this way, iSCSI overcomes the limitations of directly attached storage, including the inability to share resources across servers and the inability to expand capacity without shutting down applications. Baud rate per this moment limited to 1 Gb/s (Gigabit Ethernet), but given speed is sufficient for most business applications of medium-sized enterprises and this is confirmed by numerous tests. It is interesting that it is not so much the data transfer speed on one channel that is important, but the algorithms of operation of RAID controllers and the ability to aggregate arrays into a single pool, as is the case with DELL EqualLogic, when three 1GB ports are used on each array, and load balancing occurs among the arrays one group.

It is important to note that SANs based on the iSCSI protocol provide the same benefits as SANs using the Fiber Channel protocol, but at the same time, the procedures for deploying and managing the network are simplified, and the cost of this storage system is significantly reduced.

High availability;
Scalability;
Ease of administration, as Ethernet technology is used;
Lower price for organizing a SAN using the iSCSI protocol than using FC.
Easy integration into virtualization environments

Eat certain restrictions on using storage systems with the iSCSI protocol with some OLAP and OLTP applications, with Real Time systems and when working with a large number of video streams in HD format
High-level storage systems based on iSCSI, as well as storage systems with the FC protocol, require the use of fast, expensive Ethernet switches
It is recommended to use either dedicated Ethernet switches or VLAN organization to separate data streams. Network design is no less important part of the project than when developing FC networks.

Important! Manufacturers promise to soon mass produce SANs based on the iSCSI protocol with support for data transfer rates of up to 10 Gb/s. The final version of the DCE (Data Center Ethernet) protocol is also being prepared; the mass appearance of devices that support the DCE protocol is expected by 2011.

From the point of view of the interfaces used, the iSCSI protocol uses 1Gbit/C Ethernet interfaces, and they can be either copper or fiber-optic interfaces when operating over long distances.

SAS protocol

The SAS protocol and interface of the same name are designed to replace parallel SCSI and achieve higher throughput than SCSI. Although SAS uses a serial interface as opposed to the parallel interface used by traditional SCSI, SCSI commands are still used to control SAS devices. SAS allows you to provide a physical connection between a data array and several servers over short distances.

Acceptable price;
Ease of storage consolidation - although SAS-based storage systems cannot connect to as many hosts (servers) as SAN configurations that use FC or iSCSI protocols, when using the SAS protocol there are no difficulties with additional equipment for the organization shared storage for multiple servers.
The SAS protocol allows for higher throughput using 4 channel connections within a single interface. Each channel provides 3 Gb/s, which allows you to achieve a data transfer rate of 12 Gb/s (currently this is highest speed data transfer for storage systems).

Limited reach - the cable length cannot exceed 8 meters. Thus, storage with a connection via the SAS protocol will be optimal only when the servers and arrays are located in the same rack or in the same server room;
The number of connected hosts (servers) is usually limited to several nodes.

Important! In 2009, SAS technology is expected to appear with a data transfer speed over one channel of 6 Gbit/s, which will significantly increase the attractiveness of using this protocol.

Comparison of storage connection protocols

Below is a summary table comparing the capabilities of various protocols for interaction with storage systems.

Parameter	Storage connection protocols
Parameter
Architecture	SCSI commands are encapsulated in an IP packet and transmitted over Ethernet, serial transmission	Serial transmission of SCSI commands	Dial-up
Distance between the disk array and the node (server or switch)	Limited only by the distance of IP networks.	No more than 8 meters between devices.	50,000 meters without the use of specialized repeaters
Scalability	Millions of devices – when working over the IPv6 protocol.	32 devices	256 devices 16 million devices if you use FC-SW (fabric switches) architecture
Performance	1 Gb/s (planned to develop up to 10 Gb/s)	3 Gb/s when using 4 ports, up to 12 Gb/s (in 2009 up to 6 Gb/s on one port)	Up to 8 Gb/s
Investment level (implementation costs)	Minor - Ethernet is used	Average	Significant

Thus, at first glance, the presented solutions are quite clearly divided according to their compliance with customer requirements. However, in practice, everything is not so simple; additional factors are included in the form of budget restrictions, the dynamics of the organization’s development (and the dynamics of increasing the volume of stored information), industry specifics, etc.

The dependence of enterprise business processes on the IT sector is constantly growing. Today, the issue of continuity of IT services is paid attention not only by large companies, but also by representatives of medium and often small businesses.

One of the central elements of ensuring fault tolerance is a data storage system (DSS) - a device on which all information is centrally stored. The storage system is characterized by high scalability, fault tolerance, and the ability to perform all service operations without stopping the device (including replacing components). But the cost of even a basic model is measured in tens of thousands of dollars. For example, Fujitsu ETERNUS DX100 with 12 discs Nearline SAS 1Tb SFF (RAID10 6TB) costs about 21,000 USD, which is very expensive for a small company.

In our article we propose to consider options for organizing budget storage, which is not inferior in performance and reliability to classical systems. To implement it, we suggest using CEPH.

What is CEPH and how does it work?

CEPH– storage based on free software, is a combination of disk spaces of several servers (the number of servers in practice is measured in tens and hundreds). CEPH allows you to create easily scalable storage with high performance and resource redundancy. CEPH can be used both as object storage (to store files) and as a block device (to serve virtual hard disks).

Fault tolerance of the storage is ensured by replication of each data block to several servers. The number of simultaneously stored copies of each block is called the replication factor; by default, its value is 2. The storage diagram is shown in Figure 1, as you can see, the information is divided into blocks, each of which is distributed over two different nodes.

Figure 1 - Distribution of data blocks

If the servers do not use fault-tolerant disk arrays, it is recommended to use a higher replication factor for reliable data storage. If one of the servers fails, CEPH records the unavailability of data blocks (Figure 2) that are located on it, waits certain time(the parameter is configurable, default is 300 seconds), after which it begins to reconstruct the missing blocks of information in another place (Figure 3).

Figure 2 - Failure of one node

Figure 3 - Restoring redundancy

Similarly, if a new server is added to the cluster, the storage is rebalanced in order to evenly fill the disks on all nodes. The mechanism that controls the processes of distributing blocks of information in the CEPH cluster is called CRUSH.

For getting high performance disk space in CEPH clusters, it is recommended to use the cache tiering functionality (multi-level caching). Its meaning is to create a separate high-performance pool and use it for caching, while the main information will be placed on cheaper disks (Figure 4).

Figure 4 - Logical view of disk pools

Multi-tier caching will work as follows: client write requests will be written to the fastest pool, and then moved to the storage tier. Similarly for read requests - information when accessed will rise to the caching level and be processed. The data continues to remain at the cache level until it becomes inactive or becomes irrelevant (Figure 5). It is worth noting that caching can be configured to be read-only, in which case write requests will be written directly to the storage pool.

Figure 5 - Operating principle of cash-tearing

Let's look at real-life scenarios for using CEPH in an organization to create a data warehouse. Small and medium-sized businesses, where this technology will be most in demand, are considered as potential clients. We calculated 3 scenarios for using the described solution:

A manufacturing or trading enterprise with a requirement for the availability of an internal ERP system and file storage of 99.98% per year, 24/7.
An organization that needs to deploy an on-premises private cloud for its business needs.
Very budget solution for organizing fault-tolerant block data storage, completely independent of hardware with 99.98% availability per year and inexpensive scaling.

Use case 1: CEPH-based data warehouse

Let's consider real example application of CEPH in the organization. For example, we need fault-tolerant high-performance storage with a capacity of 6 TB, but the costs of even a basic storage system model with disks are about $21 000 .

We assemble a storage facility based on CEPH. We suggest using the solution as servers Supermicro Twin(Figure 6). The product consists of 4 server platforms in a single case with a height of 2 units; all main components of the device are duplicated, which ensures its continuous operation. To implement our task, it will be enough to use 3 nodes, the 4th will be in reserve for the future.

Figure 6 - Supermicro Twin

We configure each of the nodes as follows: 32 GB of RAM, 4-core processor 2.5 GHz, 4 SATA disks of 2 TB each for a storage pool, combined into 2 RAID1, 2 arrays SSD drive for the caching pool we also combine it into RAID1. The cost of the entire project is shown in Table 1.

Table 1. Components for CEPH-based storage

Accessories	Price, USD	Qty	Cost, USD
	4 999,28	1	4 999,28
	139,28	6	835,68
Processor Ivy Bridge-EP 4-Core 2.5GHz (LGA2011, 10MB, 80W, 22nm) Tray	366,00	3	1 098,00
	416,00	12	4 992,00
	641,00	6	3 846,00
TOTAL			15 770,96

Conclusion: As a result of building the storage, we will get a 6Tb disk array with costs of the order of $16 000 , What 25% less than purchasing a minimum storage system, while at the current capacity you can run virtual machines working with storage, thereby saving on the purchase of additional servers. In essence, this is a complete solution.

The servers from which the storage is built can be used not only as storage for hard drives, but also as storage media virtual machines or application servers.

Use case 2: Building a private cloud

The challenge is to deploy the infrastructure to build a private cloud at minimal cost.

Building even a small cloud consisting of, for example, 3 carriers in approximately $36 000 : $21,000 – cost of storage + $5000 for each server with 50% capacity.

Using CEPH as storage allows you to combine computing and disk resources on one hardware. That is, there is no need to purchase storage systems separately - disks installed directly into the servers will be used to host virtual machines.

Brief information:
The classic cloud structure is a cluster of virtual machines, the functioning of which is provided by 2 main hardware components:

Computing part (compute) - servers filled with RAM and processors, the resources of which are used virtual machines for calculations
Data storage system (storage) is a device filled with hard drives on which all data is stored.

We use the same Supermicro servers as equipment, but install more powerful processors - 8-core with a frequency of 2.6 GHz, as well as 96 GB of RAM per node, since the system will be used not only for storing information, but also for running virtual machines. We take a set of disks similar to the first scenario.

Table 2. CEPH-based private cloud components

Accessories	Price, USD	Qty	Cost, USD
Supermicro Twin 2027PR-HTR: 4 hot-pluggable systems (nodes) in a 2U form factor. Dual socket R (LGA 2011), Up to 512GB ECC RDIMM, Integrated IPMI 2.0 with KVM and Dedicated LAN. 6x 2.5" Hot-swap SATA HDD Bays. 2000W Redundant Power Supplies	4 999,28	1	4 999,28
Module Samsung memory DDR3 16GB Registered ECC 1866Mhz 1.5V, Dual rank	139,28	18	2 507,04
CPU Intel Xeon E5-2650V2 Ivy Bridge-EP 8-Core 2.6GHz (LGA2011, 20MB, 95W, 32nm) Tray	1 416,18	3	4 248,54
Hard drive SATA 2TB 2.5" Enterprise Capacity SATA 6Gb/s 7200rpm 128Mb 512E	416	12	4 992,00
Solid state drive SSD 2.5"" 400GB DC S3710 Series.	641	6	3 846,00
TOTAL			20 592,86

The assembled cloud will have the following resources, taking into account maintaining stability if the 1st node fails:

RAM: 120 GB
Disk space 6000 GB
Physical processor cores: 16 pcs.

The assembled cluster will be able to support about 10 medium-sized virtual machines with the following characteristics: 12 GB RAM / 4 processor cores / 400 GB disk space.

It is also worth considering that all 3 servers are only 50% full and, if necessary, they can be supplemented, thereby increasing the pool of resources for the cloud by 2 times.

Conclusion: As you can see, we have received both a full-fledged fault-tolerant cluster of virtual machines and redundant data storage - the failure of any of the servers is not critical - the system will continue to function without stopping, while the cost of the solution is approximately 1.5 times lower than buying storage systems and individual servers.

Use case 3: Building an ultra-low-cost data warehouse

If the budget is completely limited and there is no money to purchase the equipment described above, you can purchase used servers, but you should not save on disks - it is strongly recommended to buy new ones.

We suggest considering the following structure: purchased 4 server nodes, each server has 1 SSD drive for caching and 3 SATA drives. Supermicro servers with 48 GB of RAM and 5600 series processors can now be purchased for approximately $800 .

The disks will not be assembled into fault-tolerant arrays on each server, but will be presented as a separate device. In this regard, to increase the reliability of the storage, we will use a replication factor of 3. That is, each block will have 3 copies. With this architecture of disk mirroring, an SSD cache is not required, since information is automatically duplicated to other nodes.

Table 3. Storage components

Conclusion: If necessary, this solution can use larger disks, or replace them with SAS if you need to get maximum performance for the DBMS to work. IN in this example The result is 8 TB storage with very low cost and very high fault tolerance. The price of one terabyte turned out to be 3.8 times cheaper than using industrial storage for $21,000.

Final table, conclusions

Configuration	Storage system Fujitsu ETERNUS DX100 + 12 Nearline SAS 1Tb SFF (RAID10)	Storage system Fujitsu ETERNUS DX100 + 12 Nearline SAS 1Tb SFF (RAID10) + Supermicro Twin	Our Scenario 1: CEPH Based Storage	Our scenario 2: building a private cloud	Our scenario 3: building ultra-low-cost storage
Useful volume, GB	6 000	6 000	6 000	6000	8 000
Price, USD	21000	36000	15 770	20 592	7 324
Cost 1 GB, USD	3,5	6	2,63	3,43	0,92
Number of IOPs* (70% read/30% write, 4K block size)	760	760	700	700	675
Purpose	Storage	Storage + Compute	Storage + Compute	Storage + Compute	Storage + Compute

*The calculation of the number of IOPs was performed for created arrays of NL SAS disks on storage systems and SATA disks on CEPH storage; caching was disabled to ensure the purity of the obtained values. When using caching, IOPs will be significantly higher until the cache is full.

As a result, we can say that reliable and cheap data warehouses can be built on the basis of the CEPH cluster. As calculations have shown, using cluster nodes only for storage is not very effective - the solution is cheaper than purchasing storage systems, but not by much - in our example, the cost of storage on CEPH was about 25% less than Fujitsu DX100. The real savings are felt as a result of combining the computing part and storage on one piece of equipment - in this case, the cost of the solution will be 1.8 times less than when building a classic structure using dedicated storage and separate host machines.

EFSOL implements this solution according to individual requirements. We can use your existing equipment, which will further reduce the capital costs of implementing the system. Contact us and we will examine your equipment for its use in creating storage systems.

In the simplest case, a SAN consists of storage systems, switches and servers connected by optical communication channels. In addition to direct disk storage systems, you can connect disk libraries, tape libraries (streamers), devices for storing data on optical disks (CD/DVD and others), etc. to the SAN.

An example of a highly reliable infrastructure in which servers are connected simultaneously to local network(left) and to a storage network (right). This scheme provides access to data located on the storage system in the event of failure of any processor module, switch or access path.

Using SAN allows you to provide:

centralized resource management of servers and data storage systems;
connecting new disk arrays and servers without stopping the entire storage system;
using previously purchased equipment in conjunction with new data storage devices;
prompt and reliable access to data storage devices located at great distances from servers, *without significant performance losses;
speeding up the process of data backup and recovery - BURA.

Story

The development of network technologies has led to the emergence of two network solutions for storage systems - Storage Area Network (SAN) for data exchange at the block level supported by client file systems, and servers for storing data at the Network Attached Storage (NAS) file level. To distinguish traditional storage systems from network ones, another retronym was proposed - Direct Attached Storage (DAS).

The successive DAS, SAN, and NAS that have appeared on the market reflect the evolving chain of communications between the applications that use data and the bytes on the media containing that data. Once upon a time, application programs themselves read and wrote blocks, then drivers appeared as part of operating system. In modern DAS, SAN and NAS, the chain consists of three links: the first link is the creation of RAID arrays, the second is the processing of metadata that allows binary data to be interpreted in the form of files and records, and the third is services for providing data to the application. They differ in where and how these links are implemented. In the case of DAS, the storage system is “bare”; it only provides the ability to store and access data, and everything else is done on the server side, starting with interfaces and drivers. With the advent of SAN, RAID provision is transferred to the storage system side; everything else remains the same as in the case of DAS. But NAS differs in that metadata is also transferred to the storage system to ensure file access; here the client can only support data services.

The emergence of SAN became possible after the Fiber Channel (FC) protocol was developed in 1988 and approved by ANSI as a standard in 1994. The term Storage Area Network dates back to 1999. Over time, FC gave way to Ethernet, and IP-SAN networks with iSCSI connections became widespread.

The idea of a network-attached storage server (NAS) belongs to Brian Randall of Newcastle University and was implemented in machines running a UNIX server in 1983. This idea was so successful that it was picked up by many companies, including Novell, IBM, and Sun, but ultimately replaced the leaders by NetApp and EMC.

In 1995, Garth Gibson developed the principles of NAS and created object storage systems (OBS). He began by dividing all disk operations into two groups, one that included those that were performed more frequently, such as reading and writing, and the other that were performed less frequently, such as operations with names. He then proposed another container in addition to blocks and files, which he called an object.

OBS features a new type of interface, it is called object-based. Client data services interact with metadata using the Object API. OBS not only stores data, but also supports RAID, stores metadata related to objects, and supports the object interface. DAS and SAN and NAS and OBS coexist over time, but each access type is more suited to a specific type of data and application.

SAN architecture

Network topology

SAN is a high-speed data network designed to connect servers to storage devices. A variety of SAN topologies (point-to-point, Arbitrated Loop, and switching) replace traditional server-to-storage bus connections and provide greater flexibility, performance, and reliability over them. The SAN concept is based on the ability to connect any of the servers to any data storage device running using the Fiber Channel protocol. The principle of interaction of nodes in a SAN with point-to-point topologies or switching is shown in the figures. In an Arbitrated Loop SAN, data transfer occurs sequentially from node to node. In order to begin data transmission, the transmitting device initiates arbitration for the right to use the data transmission medium (hence the name of the topology - Arbitrated Loop).

The transport basis of SAN is the Fiber Channel protocol, which uses both copper and fiber-optic device connections.

SAN components

SAN components are classified as follows:

Data storage resources;
Devices implementing SAN infrastructure;

Host Bus Adapters

Storage Resources

Storage resources include disk arrays, tape drives, and Fiber Channel libraries. Storage resources realize many of their capabilities only when included in the SAN. So disk arrays upper class can replicate data between arrays over Fiber Channel networks, and tape libraries can transfer data to tape directly from disk arrays with a Fiber Channel interface, bypassing the network and servers (Serverless backup). The most popular in the market are disk arrays from EMC, Hitachi, IBM, Compaq (Storage Works family, which Compaq inherited from Digital), and among tape library manufacturers, StorageTek, Quantum/ATL, and IBM should be mentioned.

Devices implementing SAN infrastructure

Devices that implement the SAN infrastructure are Fiber Channel switches (FC switches), hubs (Fiber Channel Hub) and routers (Fiber Channel-SCSI routers). Hubs are used to combine devices operating in Fiber Channel Arbitrated Loop (FC_AL) mode ). The use of hubs allows you to connect and disconnect devices in a loop without stopping the system, since the hub automatically closes the loop if a device is disconnected and automatically opens the loop if a new device is connected to it. Each loop change is accompanied by a complex process of its initialization. The initialization process is multi-stage, and until it is completed, data exchange in the loop is impossible.

All modern SANs are built on switches, allowing for a full-fledged network connection. Switches can not only connect Fiber Channel devices, but also limit access between devices, for which so-called zones are created on switches. Devices placed in different zones cannot communicate with each other. The number of ports in a SAN can be increased by connecting switches to each other. A group of interconnected switches is called a Fiber Channel Fabric or simply Fabric. The connections between switches are called Interswitch Links, or ISL for short.

Software

The software allows you to implement redundancy of server access paths to disk arrays and dynamic load distribution between paths. For most disk arrays, there is a simple way to determine that ports accessible through different controllers belong to the same disk. Specialized software maintains a table of access paths to devices and ensures that paths are disconnected in the event of a disaster, dynamically connecting new paths and distributing the load between them. As a rule, disk array manufacturers offer specialized software of this type for their arrays. VERITAS Software produces VERITAS Volume Manager software, designed to organize logical disk volumes from physical disks and provide redundancy of disk access paths, as well as load distribution between them for most known disk arrays.

Protocols used

Low-level protocols are used in storage networks:

Fiber Channel Protocol (FCP), SCSI transport over Fiber Channel. The most commonly used protocol at the moment. Available in 1 Gbit/s, 2 Gbit/s, 4 Gbit/s, 8 Gbit/s and 10 Gbit/s options.
iSCSI, SCSI transport over TCP/IP.
FCoE, FCP/SCSI transport over pure Ethernet.
FCIP and iFCP, encapsulation and transmission of FCP/SCSI in IP packets.
HyperSCSI, SCSI transport over Ethernet.
FICON transport over Fiber Channel (used only by mainframes).
ATA over Ethernet, ATA transport over Ethernet.
SCSI and/or TCP/IP transport over InfiniBand (IB).

Advantages

High reliability of access to data located on external storage systems. Independence of the SAN topology from the storage systems and servers used.
Centralized data storage (reliability, security).
Convenient centralized switching and data management.
Moving heavy I/O traffic to a separate network – offloading the LAN.
High performance and low latency.
Scalability and flexibility of the SAN logical fabric
The geographic size of a SAN, unlike classic DAS, is practically unlimited.
The ability to quickly distribute resources between servers.
The ability to build fault-tolerant cluster solutions without additional costs based on an existing SAN.
A simple backup scheme - all data is in one place.
Availability additional features and services (snapshots, remote replication).
High degree of SAN security.

Sharing storage systems typically simplifies administration and adds a fair amount of flexibility, since cables and disk arrays do not need to be physically transported and reconnected from one server to another.

Another advantage is the ability to boot servers directly from the storage network. With this configuration, you can quickly and easily replace a faulty