Text Only Login to PAWS
Baton Rouge, Louisiana |
LSU Homepage
homeaboutprogramprojectscyberinfrastructurenewseventscontact

Mardi Gras Conference 2008 Tutorials

The 15th Mardi Gras Conference will have three tutorials, open to all attendees:
 
  Introduction to the Condor High Throughput Computing System and the Metronome Build and Test System
 
  Scientific Workflows: The Pegasus Workflow Management System Example
 
  Swift: Scripting for fast and easy parallel computing with loosely-coupled tasks
 

Introduction to the Condor High Throughput Computing System and the Metronome Build and Test System

Presenter: Becky Gietzel
University of Wisconsin, Madison, WI
 

 
Abstract :

I. An introduction to the Condor High Throughput Computing System

Condor provides features commonly found in batch scheduling systems, such as job queuing and scheduling policies, priorities, resource monitoring, and resource management. Condor places submitted jobs into a queue, decides where and when to run them based on flexible policy expressions, monitors the jobs as they run and returns the results back to the user.

Condor may be used to manage both dedicated clusters of compute nodes, as well as utilizing unused compute cycles from idle workstations. Condor can also be used to build Grid-style computing environments that cross administrative boundaries. Condor's flocking technology allows multiple Condor compute installations to work together. Condor also incorporates many of the emerging Grid-based computing methodologies and protocols, including but not limited to Condor-G (Globus).

II. An introduction to Metronome - for reliable, automated building & testing of software

Metronome is a distributed, multi-platform framework designed to provide automated software building and testing capabilities to a variety of grid computing projects. We believe that software is not reliable unless it is regularly built and tested. Doing so requires not only a significant number of CPU cycles, but often a variety of unusual and difficult-to-maintain platforms, and a framework for automating, tracking, and monitoring the entire process. Parameters for each run are retained, making the builds reproducible.

The Metronome framework is not specific to any application or programming language, making most builds and tests candidates for automation in this system. Our goal is to provide an implementation of this framework utilizing proven grid computing tools as a foundation, as well as to support the growing number of Metronome facilities internationally, including our own NMI Lab at the University of Wisconsin-Madison. Metronome leverages various Condor features, including scheduling, file transfer, resource management and failover capabilities.


Scientific Workflows: The Pegasus Workflow Management System Example

Presenters: Ewa Deelman1, Karan Vahi1, Kent Wenger2
1 USC Information Sciences Institute, Marina del Rey, CA
2 University of Wisconsin, Madison, WI
 
Abstract :

Scientific workflows are becoming an important part of the scientific discovery process. They capture the individual data transformations and analysis steps as well as the mechanisms to carry them out in a distributed environment. Each step in the workflow specifies a process or computation to be executed (e.g., a software program to be executed, a web service to be invoked). The steps are linked according to the data flow and dependencies among them. The representation of these computational workflows contain many details required to carry out each analysis step, including the use of specific execution and storage resources in distributed environments, Workflow systems can exploit these explicit representations of the complex computational processes to manage their lifecycle and to automate their execution. Workflows can capture complex analysis processes at various levels of abstraction, and also provide the provenance information necessary for scientific reproducibility, result publication, and result sharing among collaborators.

In this tutorial we will examine the opportunities and challenges of designing and running scientific workflows in distributed environments. In addition to a high-level overview of issues we will also provide hands-on experience we will provide hands-on experience with the Pegasus Workflow Management System (Pegasus-WMS). The system is composed of the Pegasus workflow mapper and the DAGMan workflow execution engine. Pegasus allows users to design workflows at a high-level of abstraction and then automatically maps it to the distributed resources. The tutorial will cover issues of workflow composition-how to design a workflow in a portable way, and workflow execution-how to run the workflow on a variety of execution environments: a workstation, campus cluster, Condor pool, or the grid resources such as the Open Science Grid or TeraGrid. The tutorial will also cover performance and disk space optimization capabilities.

Pegasus-WMS has been in development for more than 6 years and is used in production use by several scientific applications in projects such as the Southern California Earthquake Center (SCEC), Montage, an astronomy application, and the Laser Interferometer Gravitational Wave Observatory (LIGO) running on the TeraGrid as well as OSG. DAGMan , the Pegasus-WMS workflow executor is a production quality software developed as part of the Condor project that executes the workflows by performing dependency analysis and releasing workflow jobs to the execution environment as and when they are ready for execution.


Swift: Scripting for fast and easy parallel computing with loosely-coupled tasks

Presenters: Ben Clifford1, Michael Wilde2
1 Computation Institute, University of Chicago
2 Argonne National Laboratory
 
Abstract :

Swift is a scripting language and system that makes it easy to create applications which execute large numbers of tasks coupled by disk-resident datasets. It meets prevalent needs in science and engineering for analyzing vast quantities of data, performing parameter studies, and executing ensemble simulations.

The open source Swift system combines a simple scripting language for the concise, high-level specification of parallel computations, mappers for accessing diverse disk-based data structures conveniently, and an execution engine that efficiently manages the dispatch of tasks to distributed processors on parallel clusters, campus networks, or multi-site grids.

This introductory tutorial will provide a hands-on taste of Swift. Participants will learn, via a series of examples, how to orchestrate the execution of multiple independent programs; how to use mappers to access data in various file-based structures; and how to do parallel distributed computing with simple but powerful scripts.

LSU Homepage