Past Meeting - December 19, 2002
 |
Philip presents |
Barry introduces |
Learn About Building Computing Clusters at the San Diego Supercomputer
Center
 |
Philip
Papadopoulos |
Attendee Review:
From Joe Crawford's blog.
He is the founder of websandiego.
"On Thursday I went to the ACM San Diego meeting.
I even got a tour of San Diego Supercomputer Center. They have lots
of computers and computing power. I got to see the gigantic (1.7 teraflops
(a teraflop is a trillion floating-point operations per second) Blue
Horizon machine. When I think about Moore's law, and think that the
machine is getting obsolete like with every minute -- words fail me.
The talk/demo on the NPACI Rocks toolkit for building supercomputer
clusters was great. I didn't grok it all, but I like the use of XML
to config redhat installs, and the autodetection of new machines on
the cluster was quite clever. As a web developer, I appreciated the
use of Apache and MySQL to keep track of the cluster, and to have the
machines report their status and structures. The variability of commodity
hardware makes simple hard disk mirroring - which I believe is the way
things like Beowulf work - chancy. The differences between hardware
can make installations unstable or break. Their philosophy is to let
Red Hat be Red Hat and install itself on variable hardware with proper
drivers. They use the de-facto standard of RPM files to do updates to
all the nodes on the cluster. The problem Rocks solves is the problem
of administering lots of linux boxes. By using smart administration
the whole system, and all the nodes, can be updated intelligently."
Summary:
Thursday, December 19
6:30 P.M. - 8:00 P.M.
Join your colleagues at the next meeting of The San Diego Chapter of
the Association for Computing Machinery (ACM), featuring Philip Papadopoulos
of San Diego Supercomputer Center
(SDSC) talking about building and managing computing clusters with Kickstart
and XML using the NPACI Rocks Clustering Toolkit.
We will meet at 6:30PM at the San Diego Supercomputer Center on the
UC San Diego campus. The meeting cost is $3 (free to members) and is
open to the public. For more information, call (858) 452-8701. We are
extremely pleased to have SDSC's assistance for this event, and very
excited to have Mr. Papadopoulos speaking.
Bring your colleagues and friends to this don't-miss event -- we hope
to see you there!
RSVP:
To RSVP, e-mail
us or call (858) 452-8701. Please RSVP by Dec 18.
Abstract:
High-performance compute clusters are becoming commonplace in the world
of scientific computing. These distributed memory machines (coupled
by a high-erformance/gigabit-class network) are built from commodity
components with a copy of the operating system on every node in a cluster.
One of the clear problems with this construction is version and configuration
skew of installed OS software across a large cluster. Without effective
techniques and software, small differences in node configuration translate
to overall system instability. Furthermore, as clusters increase in
size, nodes often need to take on specialized functions like pure compute,
file serving, web serving, and compilation/job launch. Ideally, one
would like to have a methodology that captures both the basic configuration
common to all nodes as well as the specialized extensions.
Standard practice in the commodity cluster world is to build a model
compute node that is essentially hand-crafted by a skilled administrator.
Once built, a bit-image of the installation is taken and then this image
is replicated across nodes. Norton Ghost and PowerQuest DriveImage are
commercial examples of this methodology. SystemImager represents the
same design in the open-source Linux space. These methods require substantial
hardware similarity among imaged nodes to work properly, and this represents
a major drawback to this structure.
In the NPACI Rocks Toolkit, we take a descriptive approach to defining
the configuration and functionality of node. Instead of building a model
node, we build text-based descriptions of the functionality needed.
In this way, we are able to replicate complete configurations without
resorting to monolithic bit images. Also, configurations can be agnostic
to specific hardware configurations. Finally, our methods allow different
node types to share configuration information, allowing the user to
define an inheritable common software core.
The Rocks Toolkit uses Red Hat's Kickstart facility as the descriptive
mechanism. On top of Kickstart, we provide a programmatic and graph
method to describing the OS contents of a node. We start by using XML
descriptions to create generic blocks of functionality (software packages
+ configuration). These descriptions (there 80+ descriptive files in
the standard Rocks distribution) are then assembled into different complete
nodes (appliances) using a directed acyclic graph. When two node types
(eg. file server and compute node) need to share configuration information
(compilers, ssh, ...) both entries will point to the same set of nodes
that describe the shared functionality. The complete description of
a node (XML files + graph) are used to dynamically generate a Kickstart
description for a particular node.
In this talk, we will overview the Rocks toolkit and specifically detail
this configuration mechanism. A critical message of Rocks is that an
OS image is disposable and reconfiguration or complete reinstallation
should be extremely efficient. Rocks has been successfully deployed
on approximately 100 clusters with the largest instance being 128 nodes.
Presenter Bio:
Philip Papadopoulos received his PhD in electrical and computer engineering
from the University of California, Santa Barbara. His focus was on scalable
numerical methods for matrix-valued equations in control. In 1993, Dr.
Papadopoulos moved to Oak Ridge National Laboratory and was a member
of the Parallel Virtual Machine (PVM) design and implementation team.
In 1998, Philip joined the computer science department at UC San Diego
working with Prof. Andrew Chien on high-performance clusters based on
the Windows NT operating system. In 1999, Dr. Papadopoulos joined the
San Diego Supercomputer Center to
lead their Linux cluster development group. Dr. Papadopoulos is currently
the Program Director for Grid and Cluster Computing at the San Diego
Supercomputer Center.