Windows IT Pro is the authoritative and independent resource for windows nt, windows 2000, windows 2003, windows xp. Features a collection of resources and magazines for windows IT professionals.
  
  
  Advanced Search 


June 2001

Hewlett-Packard Does Datacenter


RSS
Subscribe to Windows IT Pro | See More Windows 2000 Datacenter Server Articles Here | Reprints | Or get the Monthly Online Pass—only $5.95 a month!
SideBar    Scaling Up vs. Scaling Out

Real-world implementation and high-availability design guidelines

Today, systems administrators are facing the challenge of making Windows 2000 available more than 99.9 percent of the time. To address this challenge, Microsoft has partnered with several top-tier OEMs to deliver and support Win2K Datacenter Server. The result of this collaboration is the Windows Datacenter Program, which provides customers a list of certified configurations that Microsoft has thoroughly tested for reliability. Hewlett-Packard (HP), an OEM involved in the Windows Datacenter Program, has been working through the challenges and pitfalls of Datacenter implementations. Learning from their experiences, HP engineers and consultants have developed a valuable list of best practices to share with Datacenter customers around the world. With these best practices in mind, you can more easily decide whether Datacenter makes sense for you and see what you must do to create your own high-availability infrastructure.

For more information about the Windows Datacenter Program, see Greg Todd, "Win2K Datacenter Server," December 2000, and the Microsoft article "The Datacenter Program and Windows 2000 Datacenter Server Product" (http://support.microsoft.com/support/kb/articles/q265/1/73.asp). You can also visit Microsoft's Datacenter Web page at http://www.microsoft.com/windows2000/datacenter.

High Availability 101
Does your environment need a high-availability solution? To determine which high-availability technologies are relevant to your environment, you need to understand your availability requirements. Only then can you begin to design an infrastructure that meets your needs.

You also need to understand the difference between fault resilience and fault tolerance. Fault-resilient systems consist of clusters that achieve high availability through failover. Microsoft Cluster service is a clustering solution that makes Datacenter and Win2K Advanced Server fault-resilient. Cluster nodes have independent system images, and failover can take from a few seconds to several minutes. (A system image, which completely describes the point-in-time status of a particular system, is unique to each computer system and changes rapidly. This image includes such information as memory, CPU registers, disk and memory buffers, and message queues.)

Applications on fault-resilient systems use checkpoint files to recover application data. A checkpoint file is a log file, such as a database transaction log, that lets an application recover its state—the processing stage of the application at a certain point in time—after a power failure or hardware failure. Following a failure, the application first looks at checkpoint files stored on the disk to either roll forward or roll back transactions that were incomplete at the time of failure. Fault-resilient systems recover only to the most recent checkpoint. Information not saved to some form of checkpoint file (i.e., residing only in memory) will be lost on failover.

Fault-tolerant systems, which have tighter coupling of resources, keep applications available by protecting one system image. Applications that run on a fault-tolerant system don't require checkpoint files—they simply depend on the underlying fault-tolerant platform to keep the system running. Proprietary and highly customized hardware and software characterize fault-tolerant systems. Therefore, fault-tolerant systems are typically more expensive than their fault-resilient counterparts. When constituent components fail, redundant components take over so that the system image runs uninterrupted. Most high-availability computing uses fault-resilient systems, which don't require the same level of expensive custom hardware or software. However, fault-tolerant systems can more commonly achieve 99.999 percent planned availability.

In terms of high availability, a key difference between fault-tolerant and fault-resilient systems is recovery time. Fault-tolerant systems boast recovery times that approach zero. Fault-resilient systems (i.e., Cluster service clusters) have recovery times that range from a few seconds to several minutes because of the time necessary for failover.

By the Numbers
Availability is the ratio of the amount of time that a system is available to the amount of time the system should be available. Industry convention is to express availability as a percentage. The mythical perfect system would be available 100 percent of the time. Real systems, of course, post lower percentages.

You can use the simple calculation

A = MTBF/(MTBF+MTTR)

where A is availability, MTBF is mean time between failures, and MTTR is mean time to repair (or recover), to find a system's availability. "Three nines" conveys that availability is 99.9 percent, "four nines" conveys that availability is 99.99 percent, and so on. If you use 20 minutes as the MTTR value (Microsoft claims 20 minutes is the average time necessary to restore a Win2K or Windows NT system) and .999 as the A value, you get an MTBF value of approximately 14 days. (Not coincidentally, 14 days is the duration of the Microsoft stress test for Datacenter hardware and kernel-mode drivers.) The primary high-availability design goal is to increase A by increasing MTBF and decreasing MTTR.

Table 1 gives an overview of availability in terms of nines. The table's downtime numbers are measurements of unplanned downtime. (In today's world of high availability, techniques such as online backup and rolling upgrades for system maintenance or hardware updates keep planned downtime close to zero.) Do you need three or more nines? Costs can increase 10-fold for each nine that you add. Take a close look at your business. What does downtime cost you? To justify a high-availability solution, you need to start by calculating the cost of an unavailable system. Table 2 shows sample downtime costs per hour from various industries. Table 3 shows causes of downtime as evenly divided among planned outages, software, and physical factors (i.e., people, hardware, and environment).

Glancing at this data, you can easily understand the importance of people and processes to achieving high availability. In a recent white paper, "Increasing System Reliability and Availability with Windows 2000," Microsoft refers to industry studies showing that 80 percent of system failures are the result of human error or flawed processes.

   Previous  [1]  2  3  4  Next 


Reader Comments
Oh no! HP doesn't do datacenter. If you think installing DC on an 8 way server just to satisfy the microsoft's dc programme rules, yes, HP does this very well. The truth is; Unisys ES7000 is the FIRST and ONLY intel based machine that can run W2K DC on 32 CPUs and 64Gb RAM (the figures that you love to mention about).

Yavuz Guceri September 21, 2001


You must log on before posting a comment.

If you don't have a username & password, please register now.




Top Viewed ArticlesView all articles
The Memory-Optimization Hoax

Don't believe the hype. At best, RAM optimizers have no effect. At worst, they seriously degrade performance. ...

Command Prompt Tricks

One reader shares his tip for setting up the command prompt to reflect a remote path. ...

WinInfo Short Takes: Week of November 24, 2008

An often irreverent look at some of the week's other news, including a Vista Capable dismissal request, Zune price reductions, Morrow musings, Novell and Microsoft sitting in a tree ... two years later, Yahoo!, IE 6 on Windows Mobile, and so much more ...


Windows OSs Whitepapers Why SaaS is the Right Solution for Log Management

Related Events Check out our list of Free Email Newsletters!

Windows OSs eBooks Understanding and Leveraging Code Signing Technologies

A Guide to Windows Certification and Public Keys

SQL Server Administration for Oracle DBAs

Related Windows OSs Resources Become a VIP member of the Windows IT Pro community!
Get it all with the VIP CD and VIP access. A $500+ value for only $279!

Subscribe to Windows IT Pro!
Solve your toughest technical problems with our experts and access 10,000 + articles online. 30% off

Monthly Online Pass - Only $5.95!
Get instant access to 10,000+ articles from Windows IT Pro Magazine!

TechNet Virtual Labs
Evaluate and test Microsoft's newest products.


Windows IT Pro Home Register FAQ for Windows WinInfo News
Europe Edition About Us Contact Us/Customer Service Media Kit Affiliates / Licensing  
SQL Server Magazine Office & SharePoint Pro Windows Dev Pro IT Job Hound ITTV
IT Library Technology Resource Directory Connected Home Windows Excavator Windows SuperSite 
 
 Windows IT Pro is a Division of Penton Media Inc.
 Copyright © 2008 Penton Media, Inc., All rights reserved. Terms and Use | Privacy Statement | Reprints and Licensing