SDACM Logo  
  San Diego Professional Chapter Association for Computing Machinery
Meetings
Past Meetings
Mailing List
Join ACM
Professional Development
Career Task Force
Jobs
Related Orgs
Membership Policy
Officers

 

Past Meeting - December 19, 2002

Philip presents

Barry introduces

Learn About Building Computing Clusters at the San Diego Supercomputer Center

Phil P. Picture
Philip Papadopoulos

Attendee Review:

From Joe Crawford's blog. He is the founder of websandiego.

"On Thursday I went to the ACM San Diego meeting. I even got a tour of San Diego Supercomputer Center. They have lots of computers and computing power. I got to see the gigantic (1.7 teraflops (a teraflop is a trillion floating-point operations per second) Blue Horizon machine. When I think about Moore's law, and think that the machine is getting obsolete like with every minute -- words fail me.

The talk/demo on the NPACI Rocks toolkit for building supercomputer clusters was great. I didn't grok it all, but I like the use of XML to config redhat installs, and the autodetection of new machines on the cluster was quite clever. As a web developer, I appreciated the use of Apache and MySQL to keep track of the cluster, and to have the machines report their status and structures. The variability of commodity hardware makes simple hard disk mirroring - which I believe is the way things like Beowulf work - chancy. The differences between hardware can make installations unstable or break. Their philosophy is to let Red Hat be Red Hat and install itself on variable hardware with proper drivers. They use the de-facto standard of RPM files to do updates to all the nodes on the cluster. The problem Rocks solves is the problem of administering lots of linux boxes. By using smart administration the whole system, and all the nodes, can be updated intelligently."

Summary:

Thursday, December 19
6:30 P.M. - 8:00 P.M.

Join your colleagues at the next meeting of The San Diego Chapter of the Association for Computing Machinery (ACM), featuring Philip Papadopoulos of San Diego Supercomputer Center (SDSC) talking about building and managing computing clusters with Kickstart and XML using the NPACI Rocks Clustering Toolkit.

We will meet at 6:30PM at the San Diego Supercomputer Center on the UC San Diego campus. The meeting cost is $3 (free to members) and is open to the public. For more information, call (858) 452-8701. We are extremely pleased to have SDSC's assistance for this event, and very excited to have Mr. Papadopoulos speaking.

Bring your colleagues and friends to this don't-miss event -- we hope to see you there!

RSVP:

To RSVP, e-mail us or call (858) 452-8701. Please RSVP by Dec 18.

Abstract:

High-performance compute clusters are becoming commonplace in the world of scientific computing. These distributed memory machines (coupled by a high-erformance/gigabit-class network) are built from commodity components with a copy of the operating system on every node in a cluster. One of the clear problems with this construction is version and configuration skew of installed OS software across a large cluster. Without effective techniques and software, small differences in node configuration translate to overall system instability. Furthermore, as clusters increase in size, nodes often need to take on specialized functions like pure compute, file serving, web serving, and compilation/job launch. Ideally, one would like to have a methodology that captures both the basic configuration common to all nodes as well as the specialized extensions.

Standard practice in the commodity cluster world is to build a model compute node that is essentially hand-crafted by a skilled administrator. Once built, a bit-image of the installation is taken and then this image is replicated across nodes. Norton Ghost and PowerQuest DriveImage are commercial examples of this methodology. SystemImager represents the same design in the open-source Linux space. These methods require substantial hardware similarity among imaged nodes to work properly, and this represents a major drawback to this structure.

In the NPACI Rocks Toolkit, we take a descriptive approach to defining the configuration and functionality of node. Instead of building a model node, we build text-based descriptions of the functionality needed. In this way, we are able to replicate complete configurations without resorting to monolithic bit images. Also, configurations can be agnostic to specific hardware configurations. Finally, our methods allow different node types to share configuration information, allowing the user to define an inheritable common software core.

The Rocks Toolkit uses Red Hat's Kickstart facility as the descriptive mechanism. On top of Kickstart, we provide a programmatic and graph method to describing the OS contents of a node. We start by using XML descriptions to create generic blocks of functionality (software packages + configuration). These descriptions (there 80+ descriptive files in the standard Rocks distribution) are then assembled into different complete nodes (appliances) using a directed acyclic graph. When two node types (eg. file server and compute node) need to share configuration information (compilers, ssh, ...) both entries will point to the same set of nodes that describe the shared functionality. The complete description of a node (XML files + graph) are used to dynamically generate a Kickstart description for a particular node.

In this talk, we will overview the Rocks toolkit and specifically detail this configuration mechanism. A critical message of Rocks is that an OS image is disposable and reconfiguration or complete reinstallation should be extremely efficient. Rocks has been successfully deployed on approximately 100 clusters with the largest instance being 128 nodes.

Presenter Bio:

Philip Papadopoulos received his PhD in electrical and computer engineering from the University of California, Santa Barbara. His focus was on scalable numerical methods for matrix-valued equations in control. In 1993, Dr. Papadopoulos moved to Oak Ridge National Laboratory and was a member of the Parallel Virtual Machine (PVM) design and implementation team. In 1998, Philip joined the computer science department at UC San Diego working with Prof. Andrew Chien on high-performance clusters based on the Windows NT operating system. In 1999, Dr. Papadopoulos joined the San Diego Supercomputer Center to lead their Linux cluster development group. Dr. Papadopoulos is currently the Program Director for Grid and Cluster Computing at the San Diego Supercomputer Center.