Parallel File Systems

In cluster environments and MPPs, some file systems have been optimized to take advantage of the processor and memory resources represented by the nodes forming the cluster or MPP.

IBM’s General Parallel File System (GPFS) is an example of a parallel file system; it can be used on AIX clusters (HACMP), MPPs (IBM SP) or Linux clusters. Our description of GPFS is based on [HEG01]. GPFS’s major characteristics are:

» A clusterized file management system allowing transparent cluster file access (that is, program running on any node transparently accesses files, even if they are stored on another node)

» Scalability: GPFS has the ability to make use of processor, memory (used as a disk cache), and I/O resources of the nodes

» Failure-tolerant: GPFS provides journaling for metadata changes and data replication

In the SP implementation, GPFS is built on a software layer called Virtual Shared Disk (VSD). VSD allows disk blocks to be routed over a network, either an IP network or the interconnect network of an MPP. To this extent, VSD can be looked upon as a form of SAN (Storage Area Network), which we will discuss later.

GPFS is installed on the system nodes; it is possible to configure some nodes as specialist storage nodes. Data is shared by the applications running on the nodes provided with GPFS instances. Data is cached on the client nodes.

GPFS distributes data over the available disk, providing an effect similar to data-striping, which we will discuss later in the section on RAID systems.

Apart from AIX, the major software components of this architecture are the components of PSSP (Parallel System Support Programs), which are specific to the SP environment. The major elements are:

» VSD: Virtual Shared Disk, which provides the ability to access logical volumes as if they were local to the accessing node.

» GS: Group Services, which provides notification on the event of failure of a node or process, along with recovery of programs executing on the failing nodes on surviving ones. These services also initialize information necessary to VSD’s operation.

» RVSD: Recoverable Virtual Shared Disk, which makes it possible to prevent access by a node to certain disks during recovery phases of the node.

AIX also includes a component called VFS (Virtual File System), which allows applications’ file access requests to be directed to the appropriate file system (e.g., JFS (AIX’s journaled file system) or GPFS) transparently, depending on the type of the file.

Source of Information : Elsevier Server Architectures

No comments:

Cloud storage is for blocks too, not just files

One of the misconceptions about cloud storage is that it is only useful for storing files. This assumption comes from the popularity of file...