Untitled Document
 Register Now & Save!
Untitled Document
2009 Gold Sponsor
Untitled Document
2009 Silver Sponsor
Untitled Document
2009 Panel Sponsor
Untitled Document
2009 Exhibitors
Untitled Document
2009 Media Sponsors
Latest News
In 2011, Apache Hadoop received tremendous attenti...
AMD said late Tuesday that its chief sales officer...
Intel has finally seen the back of that 2009 antit...
On Tuesday, Clustrix announced the availability of...
What are the legal implications and consequences o...
EMC moved to make Hadoop safe for the Joe Blow big...
Amazon has reined in the price of its S3 storage a...
The focus of Java EE 7 is on the cloud, and specif...
2011 was a year of rapid adoption for public and p...
AMD Thursday told financial analysts it’s gonna tr...
Can't Miss RSS Feed
Subscribe to the RSS Feed & Get All The Conference News As It Happens!
Specialized HPC Clusters in the Cloud
A new frontier for life sciences and beyond

There are hundreds of life science labs in the U.S. using next-generation sequencing, bioinformatics, proteomics, and molecular modeling to identify the genes behind, and potential drug targets to cure, many diseases including diabetes, cancer and Alzheimer's disease.

With increasing data coming off of modern scientific instruments, the demand for compute power to analyze the data is increasing dramatically. Currently, life science researchers in bioinformatics, next-generation sequencing, and molecular modeling need to spend tens to hundreds of thousands of dollars to buy server clusters to run their scientific calculations.

High performance computing (HPC) has come a long way for life sciences. Twenty years ago, expensive parallel supercomputers were required to render proteins in three dimensions and run software that helped researchers understand their shapes. Now 3D rendering can be done on graphics cards in workstations, laptops and even phones.

It is important to note that there are two types of HPC. There's the sprinter type, where users try to run a highly parallel application, and then there's the marathon runner type of HPC, in which applications are pleasantly parallel. For sprinter applications, latency is of key importance and performance must be optimized at every level to get results. Currently these applications are best run on a single multi-core server in the cloud; however, infrastructure from various providers may make this use case able to run on many servers. For the marathon applications, also called high throughput computing, many commodity servers can run jobs faster by taking advantage of the parallel nature of the work.

In either of these applications, compute clusters using many commodity servers have replaced expensive parallel supercomputers, but the data and problems being solved have grown to demand increased compute capacity. This leaves companies with large capital investments in fixed-size clusters that have all the traditional challenges of maximizing utilization, minimizing operational costs and shortening time-to-result for users.

Rise of Cloud HPC Clusters-as-a-Service
Cloud computing promises to help solve these issues. It makes provisioning servers easier and cost efficient. The cloud delivers virtualized servers and storage via the Internet, at a large scale where you're billed only for what you use. However, getting started working in the cloud is not easy. For example, Amazon EC2 requires programming to provision nodes and administrators and security staff to manage the servers through its Application Programming Interfaces (APIs).

This provisioning challenge has led to cloud HPC clusters, built upon infrastructure providers like Amazon EC2. Instead of building out a datacenter, procuring servers, network equipment, racks and hiring IT personnel, companies can tap into these compute clusters as a service, which are provisioned automatically.

Cloud HPC cluster users can start up clusters without having to worry about putting in place various applications, operating systems, security, encryption and other software. Scientists can create clusters that automatically add servers when work is added and turn the servers off when the work is completed. This enables life science researchers to run calculations only when they need compute power.

HPC Before Cloud Computing
To understand these costs, let's look at HPC clusters before cloud computing. Before cloud computing, when buying a cluster, end users would size it to be able to complete their largest set of calculations in a desired time. For example, if a 20,000 hour calculation for a quarterly process needed to finish in a day, the cluster might be sized to 1,000 cores. On the storage side, the same sizing would occur to ensure that enough space existed to hold the working data and final results of the calculations.

Purchasing a pre-cloud cluster required large up-front capital expenditures for the machines required to do calculations and storage for the results, as well as lengthy procurement and provisioning processes. In addition, IT staff is required to maintain the cluster, and ensure that its operating systems and applications are up to date. When a cluster is operational for the first time, it isn't full to capacity as it is provisioned and researchers only have a fixed-sized cluster to do their calculations.

After the cluster is in production, researchers have a fixed number of cores to run their research. If they have a 4,000 compute-hour calculation to run on a 40-core cluster, it will always take 100 hours at best to get the result.

Once these clusters are purchased, they are typically only used about 30 percent of the time. For example, they could run during the day or when an instrument produces data. The larger the cluster, the faster the calculations run, but the more money and manpower are wasted when the cluster is 70 percent idle. Renting servers from the cloud could solve these problems, but requires programming, needs IT experience to maintain, and comes with severe security concerns.

Increasing Data, Computation and Time-to-Results
Modern scientific instruments, like mass spectrometers for proteomics or next-generation genomic sequencers, are compounding this problem. They require large quantities of compute power. The data generated when sequencing a human genome is more than one trillion bytes, or a terabyte, in size.

This scale of data puts the project in a unique place: it is large enough to be unwieldy to analyze for the labs that generate it, but small enough that with some data scheduling it can be moved over the Internet. As this data comes off an instrument, it needs to be processed using differing numbers of computers and return results as quickly as possible.

This bursty availability of data by instruments poses a problem for traditional, fixed clusters that cannot grow or shrink to efficiently run the calculations. It also increases costs. A traditional, in-house compute cluster with 30-percent utilization costs three times the amount of money to run per calculation consumed as a fully utilized cluster. However, fully utilized clusters are up to 10 times slower to complete the calculations because the cluster is not large enough to run the calculations as fast as possible. For drug discovery processes, clinical trial design or bioinformatics, this 10-times slower time to result translates to slower time-to-market, which also costs money.

Changing the Math for Compute and Storage Costs
For compute clusters as a service, the math is different: having 40 processors work for 100 hours costs the same as having 1,000 processors run for 4 hours. Yet with 1,000 processors, the results of most life science calculations would come back the same day, rather than four days later.  This kind of disruptive decrease in time to result can lead to shorter times to develop products, discover drugs or isolate important genes in a genome.  The results also come back tens of times quicker at no additional cost.

This key shift in high-performance calculations also applies to storage. A hard disk capable of storing a terabyte can be bought for $150 at a local office store. However, filers with redundancy, de-duplication and hundreds of terabytes of storage can cost $12,000 or more per terabyte. Traditional filers cost 10 times more per terabyte for large capacities and reliability than the cost of hard drives bought off the shelf. In the cloud, all storage is redundant and highly available. The cost per terabyte goes down at large scales.

Improving Time-to-Market
These advantages create great incentives that improve time-to market and reduce costs. As an example, Varian Inc. is a producer of scientific instruments and ion traps. Researchers at the company run calculation-intensive Monte Carlo simulations to help develop better future products. In one instance, a simulation for a mass spectrometer was scheduled to take several thousand compute hours and nearly six calendar weeks on an internal pool of processors. With product design and conference deadlines looming it needed to get results faster.

Rather than purchasing a traditional cluster, Varian Inc. was able to run this calculation using a cloud HPC cluster service on Amazon EC2 that helps companies run calculations easily and securely. The elastic cluster added nodes to run its calculations, and stopped the servers when there was no more work was left to compute. Utilizing a service to automate provisioning, security, encryption, administration and support made using the cluster cost-effective and easy to use. With the cloud HPC cluster, this six-week calculation ran in less than one day.

Applications for Life Sciences
For researchers in life sciences, including bioinformatics, proteomics and computational chemistry, these clusters can support all the applications that users expect on internal clusters with minimum effort required for installation. Both open source and proprietary software applications can be run in the cloud. Domain scientists can then have access to a full range of pre-installed domain appropriate tools. Pipelines for standard applications like Gromacs, Bowtie, Velvet, OMSSA, Tandem, HMMER, and BLAST, an algorithm for comparing primary biological sequence information such as the amino-acid sequences of different proteins or the nucleotides of DNA sequences, are some examples of applications that can be run in the cloud. These applications must generally be tuned to work optimally in the cloud, which has a more flexible architecture than fixed internal clusters.

As an example, Schrödinger, a leading supplier of molecular-simulation and computational-chemistry software to the pharmaceutical industry, made its Glide docking program available on the cloud. Glide is used for virtual screening, a process that determines potential drug candidates from a large database of compounds based upon their fit with a given target site.

Shortening Product Pipelines: 1.5 Years of Drug Target Screens in 1.5 Days
Molecular modeling and simulation are central to drug discovery, although they are often rate-limiting. Local computational resources are often insufficient to perform the massive burst-mode computations needed to bring a drug project forward in a timely fashion.

Recently Schrödinger decided to show how on-demand availability of large, secure and trouble-free cloud computational resources can fill this gap. As test data, it screened 1.8 million candidate compounds against a target site to find potential matches. Using a 600-processor cloud HPC cluster, 18 months worth of screening was completed in 36 hours.

Other Benefits: Audit, Disaster Recovery and Security
Cloud HPC clusters enable scientists to only consume calculations when they need them and pay for what they use. The scalability of the cloud allows them to size HPC clusters to their jobs to minimize the time to result. Storage in the cloud can get cheaper with economies of scale, not more expensive like traditional filers.

Another benefit is that cloud HPC clusters are virtualized. This means that it is possible to provision repeatable clusters that have standardized images for qualification purposes and reliable application environments every time they are provisioned. Disaster recovery scenarios are easier to manage because the entire cluster environment is repeatable through virtualization.

Cloud HPC clusters are the same every time they are provisioned from their virtual machine images. Security can be handled in a consistent way, with guaranteed encryption and encryption-key management for sensitive data and applications at rest on disk or over the network between cluster servers. As an example, hard disks containing user data can be encrypted using the Advanced Encryption Standard (AES) 128-bit, 192-bit or 256-bit encryption. Data communicated to the cluster via Web services use the same SSL encryption that protects credit-card information for holiday purchases.

Is the Future of HPC Cloudy?
HPC's future is clear: it will be in the clouds. Calculations play an important role in helping researchers efficiently design better drugs, run more efficient clinical trials and develop better crops.

Increasing data from scientific instruments is requiring analysis for the large, transferrable data being generated. There is increased pressure to lower costs and speed up product development timelines for crops, clinical trials and cures for diseases. These factors have led to the creation of cloud HPC clusters that have helped pharmaceutical companies perform calculations that lead to better scientific results.

About Jason Stowe
Jason Stowe is a seasoned entrepreneur, and the founder and CEO of Cycle Computing, the leader in Condor Grid and Cloud Computing Solutions. In 2005, Jason started Cycle, an employee-owned company, to help clients easily use open-source Condor to provide more innovative grid functionality and reduce costs. Not having investors lets Jason and the Cycle-team focus on customers' needs and execution, rather than hype.

Starting with three initial Fortune 100 clients in Insurance, Financial Services, and Defense, Cycle has grown to deploy production grids at Fortune 500s, SMBs, government research, and academic institutions alike, for a wide variety of industries and applications.

For over a year, our CycleCloud service has provided the same production-quality grids on demand in the cloud, and is used for computations in bioinformatics, statistics, product and hardware simulation, and financial risk analysis, among others.

Jason attended Carnegie Mellon and Cornell Universities, and volunteered/guest lectured for the Entrepreneurship program at Cornell's Johnson Business School.

In order to post a comment you need to be registered and logged in.

Register | Sign-in

Reader Feedback: Page 1 of 1

Untitled Document

Call 201 802-3021 or Click Here to Save $400!

Save $400

 Sponsorship Opportunities

SYS-CON's International Cloud Computing Conference & Expo, held each year in California, New York and Prague is the leading event covering the fast-emerging Cloud Computing market for Enterprise IT professionals. Co-located with the International Virtualization Conference & Expo, the combined event will surely deliver the #1 i-Technology educational and networking opportunity of the year for those seeking to establish a market lead anywhere in the multiple layers of the Cloud Computing ecosystem.





Who Should Attend?

Senior Technologists including CIOs, CTOs, VPs of technology, IT directors and managers, network and storage managers, network engineers, enterprise architects, communications and networking specialists, directors of infrastructure Business Executives including CEOs, CMOs, CIOs, presidents, VPs, directors, business development; product and purchasing managers.


Video Coverage of Cloud Computing Expo

Brian Stevens: The Opening of Virtualization
Jon Wallace: User Environment Management – The Third Layer of the Desktop
Brian Duckering & Ken Berryman: Managing Hybrid Endpoint Environments
Preeti Somal: Game-Changing Technology for Enterprise Cloud and Applications

 Conference Media Sponsor: Cloud Computing Journal

Cloud Computing Journal aims to help open the eyes of Enterprise IT professionals to the economics and strategies that utility/cloud computing provides. Cloud computing - the provision of scalable IT resources as a service, using Internet technologies - potentially impacts every aspect of how IT deploys and operates software.

Government IT Conference & Expo 2009
Allstar Conference Faculty Lineup Will Include...


CHEVALIER

Novell Canada

DICARLO

Sun Micosystems

FOXWELL

Sun Microsystems Federal

GABHART

Web Age Solutions

GREENBERG

Integralis

HAHN

Tranxition

WILLIAMS

Maxworks

JACKSON

Dataline, LLC

KHOSLA

IBM

KRZYSKO

US Departement of Defense

LIBERMAN

Lieberman Software

MARKS

AgilePath

MORGENTHAL

QinetiQ North America

RYAN

Asankya

TRAJMAN

Vertica

WHITE

BDNA


SYS-CON EVENTS


Past Events Archive

Cloud Computing Conference & Expo
2009 East

cloudcomputingexpo
2009east.sys-con.com/
Virtualization Conference & Expo
2009 East

virtualizationconference
2009east.sys-con.com/
Cloud Computing Conference & Expo
2008 West

cloudcomputingexpo
2008west.sys-con.com/
SOAWorld Conference & Expo 2008 West
soaworld2008.com/
Virtualization Conference & Expo 2008 West
virtualizationconference
2008west.sys-con.com
AJAXWorld Conference & Expo 2008 West
ajaxoct08.sys-con.com
SOAWorld Conference & Expo 2008 East
soa2008east.sys-con.com
Virtualization Conference & Expo 2008 East
virt2008east.sys-con.com
AJAXWorld 2008 Conference & Expo East
ajaxmar08.sys-con.com
SOAWorld Conference & Expo 2007 West
www.soaworld2007.com
Virtualization Conference & Expo 2007 West
virt2007west.sys-con.com
AJAXWorld 2007 Conference & Expo West
ajaxoct07.sys-con.com

Cloud Computing Expo Alumni Delegates Represents...

• AccuRev
• Adea Solutions
• Adobe Systems, Inc [3 delegates]
• ADP
• Aeropostale, Inc
• Aetna
• Akbank Training Center
• American Family Insurance
• American International College
• American Modern Insurance
• Amphion Innovations
• Amplify LLC, Clipmarks [2 delegates]
• Anderson Consulting
• Arrow Electronics [3 delegates]
• Ashcroft Inc
• Athabasca University
• ATS
• Audatex
• Avanade, Inc.
• Avaya Inc. [5 delegates]
• Azul [2 delegates]
• Backbase [2 delegates]
• Bank of America
• Bank of NY
• Barnes and Noble
• Barnex Investment International Limited
• BEA
• Bear Stearns [2 delegates]
• Bendel Newspaper Company Limited
• BizInnovative
• Bloomberg [2 delegates]
• BlueBrick Inc.
• BMC Software
• Boeing
• Bottomline Technologies [2 delegates]
• BP
• Broadcom

   read more...
Cloud Computing Blogs
In other words, VMware’s server density is higher. Boles suggests this means that customers should be “assessing virtualisation on a ‘cost per application’ basis. VM density has a sign
Traditionally, the way people have implemented high availability is by using a high-availability management package like Linux-HA[1], then configure it in detail for each application, file system moun