Computer Techno: Server Architectures

Showing posts with label Server Architectures. Show all posts

The TPC

As noted earlier, the first attempts at standardizing a systems-level benchmark took place in the second half of the 1980s, with the introduction of the debit/credit benchmark, or TP1.

To mitigate deficiencies in the definition of this benchmark and to establish a rigorous basis for systems comparison, a group of computer systems vendors and DBMS vendors formed the Transaction Processing Council (TPC). The TPC aimed to both define a benchmark for transaction-processing systems and to specify rigorous publication rules for the results. Each TPC member is committed to obeying the rules, which require results to be accompanied by publication of a detailed report. Reports are subject to audit.

TPC then published a new benchmark to characterize decision support applications, and then, later, one for Web-based transactional applications. The history of TPC benchmarks can be summarized by the following list (benchmarks shown in bold were current at the beginning of 2002):

» TPC-A (1989): a simple transaction which does one update to a bank
account
» TPC-B (1990): the “database portion” of TPC-A
» TPC-C (1992): a benchmark involving multiple, complex transactions
» TPC-D (1995): decision support
» TPC-H (1999): decision support
» TPC-R (1999): decision support
» TPC-W (2000): Web-based transaction processing (e.g., electronic commerce, B2B, etc.)

The TPC benchmarks allow systems to be compared in two ways:
» Performance (for example, number of transactions per unit of time)
» Price/performance (for example, total cost of ownership over a threeyear period per transaction per minute)

The benchmark is not specified by source code, and so it is the responsibility of each TPC member who wishes to characterize a system to implement the benchmarks for those systems.

The TPC does not measure systems itself.

As far as the benchmarks are concerned, system cost comprises the cost of acquiring the system from the vendor (hardware and software) along with the costs of maintenance for three years. The transactions called for must be implemented properly, respecting the ACID properties of atomicity, coherence, insulation and durability.

Information on the activities and the standards issued by the TPC, as well as the published results of measurements, is available on their Web site at http:// www.tpc.org.

Source of Information : Elsevier Server Architectures

Various RAID Levels

» For RAID 0 (Data Striping), the cost of storage is higher than for a single disk (assuming that a single disk has sufficient capacity) since using several disks (regardless of their ability to provide more storage capacity than a single disk) increases costs for items such as racks, cables, controllers, power. Data availability is lower than for a single
disk, because MTBF for the RAID is the MTBF of a single disk divided by the number of disks used—that is, a RAID 0 of N disks has an MTBF N times smaller than the MTBF of a single disk. Reading and writing large blocks on a RAID 0 using N disks takes less time than for a single disk (at best N times less, limited by the fact that the disks are not in general rotationally synchronized). This reduces the occupation time of the disks and allows higher bandwidths. The same is true for random reads and writes.

» For RAID 1 (Mirroring), the storage cost is proportional to the number of copies of the data kept (the factor M in the table). Most often, mirroring is simple replication (M = 2). As to availability, it is clear that RAID 1 has higher availability than RAID 3 or RAID 5, since it has complete data replication rather than a parity disk per N physical disks. Reading, whether of large transfers or random transfers, has higher performance because the data can be read concurrently from multiple disks. Concurrency is less effective for writes, whether for large transfers or random transfers, because of the need to not signal completion until the last write on the last disk is complete.

» RAID 0 + 1 (Striped Mirror) has more or less the same properties as RAID 1, with just one further comment on write operations: the time for write operations for large transfers can be lower than for a single disk, if the time saved as a result of distributing the data across N parallel disks is greater than the extra cost of synchronizing completion across M groups of disks.

» RAID 3 (Parity Disk) availability is ensured through the use of parity information. Large block reads offer similar performance to RAID 0, with any differences attributable to the need to compute parity for the information read, along with any required correction. Large block writes are slower, because such transfers involve both the calculation of parity and writing the parity values to the parity disk, whose busy time can be greater than those of the collection of data disks, since there is just one parity disk. Random reads require a parity disk access, calculation of data parity, parity comparison, and any necessary correction. A write operation implies calculation of parity and its writing to the parity disk. Performance compared with a single disk depends on the performance advantage obtained by distributing the data across multiple disks.

» RAID 5 (Spiral Parity) offers essentially the same availability as RAID 3. Again, large transfer performance is impacted by the need to calculate parity and apply correction as required. Random reads and writes are generally better than for RAID 3 because of the distribution of parity information over multiple disks, reducing contention on parity updates.

» RAID 6 (Double Spiral Parity) provides higher availability than RAID 5, since it can survive two concurrent independent failures. RAID 6 has slightly higher read performance than RAID 5, since double parity reduces contention and thus wait time for parity writes (only slightly higher performance, since the number of disks grows only from N + 1 to N + 2). Write operations, on the other hand, are slower, suffering from the increased burden of double parity computation and writing.

Source of Information : Elsevier Server Architectures

Common Internet File System

We would be remiss in our descriptions of remote access file systems were we to omit mention of CIFS, which is used in Windows systems for remote file access.

CIFS is an improved version of Microsoft’s SMB (Server Message Block); proposed by Microsoft, CIFS was offered to the IETF (Internet Engineering Task Force) for adoption as a standard.

CIFS, installed on a PC, allows that PC access to data held on UNIX systems.

There is an important difference between NFS and CIFS. NFS is stateless, while CIFS is stateful.

This means that an NFS server does not need to maintain any state information on its clients, but a CIFS server must. Thus, in the event of a failure in either the network or the server, recovery is much more complex for a CIFS server than for an NFS server. NLM (Network Lock Manager) was provided to implement lock operations in NFS, but its use is not widespread. Version 4 of NFS supports locking.

Examples of products implementing CIFS include:
» Samba (free software);
» ASU (Advanced Server UNIX) from AT&T
» TAS (TotalNET Advanced Server) from Syntax

UNIX file systems need extensions to support Windows file semantics; for example, the “creation date” information needed by Windows and CIFS must be kept in a UNIX file system in a complementary file.

This diagram follows our practice of omitting some components for simplicity. We do not show the TLI (Transport Layer Interface) nor the NDIS (Network Driver Interface Layer), for example, nor do we show local accesses on the server. NTFS (NT File System) is the Windows 2000 native file system.

The I/O manager determines whether an access is local or remote; the request is either directed to the local file system or handled by the CIFS Redirector. This checks to see whether the data is available in the local cache and, if not, passes the request on to the network layers for forwarding to the server holding the file involved.

Source of Information : Elsevier Server Architectures

Parallel File Systems

In cluster environments and MPPs, some file systems have been optimized to take advantage of the processor and memory resources represented by the nodes forming the cluster or MPP.

IBM’s General Parallel File System (GPFS) is an example of a parallel file system; it can be used on AIX clusters (HACMP), MPPs (IBM SP) or Linux clusters. Our description of GPFS is based on [HEG01]. GPFS’s major characteristics are:

» A clusterized file management system allowing transparent cluster file access (that is, program running on any node transparently accesses files, even if they are stored on another node)

» Scalability: GPFS has the ability to make use of processor, memory (used as a disk cache), and I/O resources of the nodes

» Failure-tolerant: GPFS provides journaling for metadata changes and data replication

In the SP implementation, GPFS is built on a software layer called Virtual Shared Disk (VSD). VSD allows disk blocks to be routed over a network, either an IP network or the interconnect network of an MPP. To this extent, VSD can be looked upon as a form of SAN (Storage Area Network), which we will discuss later.

GPFS is installed on the system nodes; it is possible to configure some nodes as specialist storage nodes. Data is shared by the applications running on the nodes provided with GPFS instances. Data is cached on the client nodes.

GPFS distributes data over the available disk, providing an effect similar to data-striping, which we will discuss later in the section on RAID systems.

Apart from AIX, the major software components of this architecture are the components of PSSP (Parallel System Support Programs), which are specific to the SP environment. The major elements are:

» VSD: Virtual Shared Disk, which provides the ability to access logical volumes as if they were local to the accessing node.

» GS: Group Services, which provides notification on the event of failure of a node or process, along with recovery of programs executing on the failing nodes on surviving ones. These services also initialize information necessary to VSD’s operation.

» RVSD: Recoverable Virtual Shared Disk, which makes it possible to prevent access by a node to certain disks during recovery phases of the node.

AIX also includes a component called VFS (Virtual File System), which allows applications’ file access requests to be directed to the appropriate file system (e.g., JFS (AIX’s journaled file system) or GPFS) transparently, depending on the type of the file.

Source of Information : Elsevier Server Architectures

Why were parallel databases a limited success?

As with any new technology, time is needed for development and experience in tuning for good performance and acceptable stability. Recall, again, that Teradata has been providing massively parallel systems running a proprietary DBMS since 1984. The major DBMS vendors embarked on their versions towards the end of the 80s.

When used for decision support, parallel databases provide excellent results. On the other hand, their use for transaction processing has been less satisfactory.

The difficulty in obtaining a suitable scalability in transaction processing (a much larger portion of the market than decision support) explains their limited success. This is a real-world example of the difficulties faced in writing software for massively parallel architectures.

Source of Information : Elsevier Server Architectures 2005

What should we think of manufacturers who claim 99.999% availability for their hardware?

Hardware availability is just one, albeit important, factor in server availability. Over the past few years, hardware availability has increased because of technology improvements. Key factors in this are the increasing level of integration (which reduces the component count necessary to implement a system, as well as the number of connectors needed), improvements in data integrity, and the use of formal methods during the design of hardware components.

Systems based around standard technologies increasingly integrate functionality— such as system partitioning or the ability to swap subsystems “online,” i.e., without requiring that the system be brought down—that was until very recently the prerogative of mainframe systems and of “Fault Tolerant” systems. As a result of such advancements, it is possible to reach very high levels of hardware availability without entering the domain of specialized, expensive machinery.

On the other hand, it must be realized that software failures are more frequent than hardware failures. This trend is increasing. Hardware reliability keeps improving, but the amount of software in a system keeps increasing while its quality (as measured, for example, by the number of defects per thousand lines of code) shows little sign of improving. What matters, for a system, is total availability—a combination of hardware quality, the quality of software written or provided by the manufacturer, the quality of third-party software, and finally the quality of any application and/or operating procedures developed within the company. This last factor requires special attention—only too often, underestimating the importance of the quality of the in-house applications and/or operating procedures has its effect on the failure rate of the system.

It is appropriate to be very precise in any calculation concerning availability. To illustrate this point, consider a system that works 24 hours a day, 7 days a week, with an availability of 99.999%. This figure implies that the system is down no more than five minutes a year. Whether planned downtime is included in this budget makes a big difference in the difficulty of achieving this objective. Finally, we must not forget that any use of redundancy to improve availability tends to have an effect on performance.

Are RISC processors dead, killed by Intel?

Source of Information : Elsevier Server Architectures 2005