Tag Archives: hpc

Event: The Future of Scale Out Storage – by Ken Claffey VP&GM @ Seagate – 2 March

Seagate Technology, in conjunction with PuneTech and Software Exporters Association of Pune (SEAP) presents a talk on The Future of Scale Out Storage by Ken Claffey, VP & GM of the Storage Systems Group at Seagate Technologies, on Monday, 2nd March, at 6pm, in Sumant Moolgaokar Auditorium, ICC Trade Center, SB Road.

Abstract of the Talk

Big Data is changing the nature of storage infrastructure, traditional SAN and NAS systems are becoming obsolete. This disruption is creating opportunities for next generation scale out storage systems and converged infrastructure. Seagate as the world’s preeminent supplier of Disks Drives and Storage enclosures has a unique view point of this transition and the technology underpinnings that will be the foundation of a new data infrastructure designed to meet the challenges of big data. Seagate will present its view on this transition from the storage I/O device level all the way up the I/O stack to the application layer.

About the Speaker – Ken Claffey

Ken Claffey is Vice President and General Manager of Seagate’s Clusterstor™ HPC & Big Data business. Mr. Claffey led Seagate’s HPC initiative that started in 2009 and has led the successful execution of this strategy ever since. Mr. Claffey has also held senior management roles in Business Management, System Architecture, Business Development and Product Management functions at Xyratex, a storage and HPC technology company that was later acquired by Seagate. Prior to that, he held management positions at Adaptec and Eurologic Systems where he established and grew new businesses.

Venue

The event is from 6pm to 7pm on Monday, 2nd March, at the Sumant Moolgaokar Auditorium, Ground Floor, Wing A, ICC Trade Center, Senapati Bapat Road.

Fees and registration

The event is free and open for anybody to attend. Please register here.

SEAP/CSI Event: Big Data and High Performance Computing, by Paul Kent – Nov 19

What: Big data and High Performance Computing with Paul Kent of SAS
When: Friday, 19 Nov, 10am
Where: YASHADA, Baner Road
Registration and Fees: This event is free for all. Register by sending a mail go deshpande@synygy.com.

Abstract of the Talk – Big Data and HPC

The emergence of commodity multi-core blade servers is changing the landscape of high-performance computing quickly and profoundly. It coincides with the exponential increase in data – digital information streaming in from intelligent/smart devices that pervade our lives today available on ever larger, faster and cheaper data storage arrays.

Processing the data now requires larger computing resources. The two computing disciplines (Big Data, HPC) are merging on a largely common platform and this presents good opportunities for improving the state of the art; both in speed as well as volumes of data processed.

A new generation of software technologies for analytics operating on enormous data stores is the outcome. This talk describes the evolution of Big Data and High Performance Computing, details approaches taken, reflects on lessons learned, and discusses some of the current challenges in this area.

About the Speaker – Paul Kent

Paul Kent, is the Vice President of Platform Research and Development at SAS, and leads the teams responsible for many SAS foundation technologies – Base SAS and related data access, management and presentation software. Paul joined SAS in 1984 as a Technical Marketing Representative and eventually moved into the Research and Development division. He has used SAS for more than 20 years and has contributed to the development of SAS software components including PROC SQL, the WHERE clause, TCP/IP connectivity, and portions of the Output Delivery System (ODS). A strong customer advocate, Paul is widely recognized within the SAS community for his active participation in local and international user conferences.

Live-Blog: Overview of High Performance Computing by Dr. Vipin Chaudhary

(This is a live-blog of Dr. Vipin Chaudhary talk on Trends in High Performance Computing, organized by the IEEE Pune sub-section. Since this is being typed while the talk is going on, it might not be as well organized, or as coherent as other PuneTech articles. Also, links will usually be missing.)

Dr. Vipin Chaudhary, CEO of CRL
Live-blog of a talk by Dr. Vipin Chaudhary, CEO of CRL, on High Performance Computing at Institute of Engineers, Pune. CRL are the makers of Eka, one of the world's fastest privately funded supercomputers. For more information about HPC and CRL, click on the photo above.
Myths about High Performance Computing:

  • Commonly associated with scientific computing
  • Only used for large problems
  • Expensive
  • Applicable to niche areas
  • Understood by only a few people
  • Lots of servers and storage
  • Difficult to use
  • Not scalable and reliable

This is not the reality. HPC is:

  • Backbone for national development
  • Will enable economic growth. Everything from toilets to potato chips are designed using HPC
  • Lots of supercomputing is throughput computing – i.e. used to solve lots of small problems
  • “Mainstream” businesses like Walmart, and entertainment companies like Dreamworks Studioes use HPC.
  • _(and a bunch of other reasons that I did not catch)

China is really catching up in the area of HPC. And Vipin correlates China’s GDP with the development of supercomputers in China. Point: technology is a driver for economic growth.  We need to also invest in this.

Problems solved using HPC:

  • Movie making (like avatar)
  • Real time data analysis
    • weather forecasting
    • oil spill impact analysis
    • forest fire tracking and monitoring
    • biological contamination prediction
  • Drug discover
    • reduce experimental costs through simulations
  • Terrain modeling for wind-farms
    • e.g. optimized site selection, maintenance scheduling
    • and other alternate energy sources
  • Geophysical imaging
    • oil industry
    • earthquake analysis
  • Designing airplanes (Virtual wind tunnel)

Trends in HPC.

The Manycore trend.

Putting many CPUs inside a single chip. Multi-core is when you have a few cores, manycore is when you have many, many cores. This has challenges. Programming manycore processors is very cumbersome. Debugging is much harder. e.g. if you need to get good performance out of these chips then you need to do parallel, assembly programming. Parallel programming is hard. Assembly programming is hard. Both together will kill you.

This will be one of the biggest challenges in computer science in the near future. A typical laptop might have 8 to 10 processses running concurrently. So there is automatic parallelism, as long as number of cores is less than 10. But as chips get 30, 40 cores or more, individual processes will need to be parallel. This will be very challenging.

Oceans of Data but the Pipes are Skinny

Data is growing fast. In sciences, humanities, commerce, medicine, entertainment. The amount of information being created in the world is huge. Emails, photos, audio, documents etc. Genomic data (bio-informatics) data is also huge.

Note: data is growing way, way faster than Moore’s law!

Storing things is not a problem – we have lots of disk space. Fetching and finding stuff is a pain.

Challenges in data-intensive systems:

  • Amount of data to be accessed by the application is huge
  • This requires huge amounts of disk, and very fat interconnects
  • And fast processors to process that data

Conventional supercomputing was CPU bound. Now, we are in the age of data-intensive supercomputing. Difference: old supercomputing had storage elsewhere (away from the processor farm). Now the disks have to be much closer.

Conventional supercomputing was batch processed. Now, we want everything in real-time. Need interactive access. To be able to run analytic and ad hoc queries. This is a new, and difficult challenge.

While Vipin was faculty in SUNY Buffalo, they started an initiative for data-intensive discovery initiative (Di2). Now, CRL is participating. Large, ever-changing data sets. Collecting and maintaining data is of course major problem, but primary focus of Di2 is to search in this data. e.g. security (find patterns in huge logs user actions). This requires a new, different architecture from traditional supercomputing, and the resulting Di2 system significantly outperforms the traditional system.

This also has applications in marketing analysis, financial services, web analytics, genetics, aerospace, and healthcare.

High Performance Cloud Services at CRL

Cloud computing makes sense. It is here to stay. But energy consumption of clouds is a problem.

Hence, CRL is focusing on a green cloud. What does that mean?

Data center optimization:

  • Power consumption optimization on hardware
  • Optimization of the power system itself
  • Optimized cooling subsystem
  • CFD modeling of the power consumption
  • Power dashboards

Workflow optimization (reduce computing resource consumption via efficiencies):

  • Cloud offerings
  • Virtualizations
  • Workload based power management
  • Temperature aware distribution
  • Compute cycle optimization

Green applications being run in CRL

  • Terrain modeling
  • Wind farm design and simulation
  • Geophysical imaging
  • Virtual wind tunnel

Summary of talk

  • Manycore processors are here to stay
    • Programmability have to improve
    • Must match application requirements to processor architecture (one size does not fit all)
  • Computation has to move to where the data is, and not vice versa
  • Data scale is the biggest issue
    • must co-locate data with computing
  • Cloud computing will continue to grow rapidly
    • Bandwidth is an issue
    • Security is an issue
    • These issues need to be solved

Session on High Performance Computing – Dr. Vipin Chaudhari, CEO CRL

What: IEEE Pune presents a session on High Performance Computing, by Dr. Vipin Chaudhary, CEO of the Computation Research Laboratories (CRL) the makers of the Eka supercomputer
When: Saturday, 14 August, 5pm-7pm
Where: Institution of Engineers, Shivajinagar, JM Road, Opposite Modern Cafe
Registration and Fees: This event is free for all. Register by sending mail to IEEE125.Pune.Symposium@gmail.com.
Details: Contact Amey Asodekar 020-6606-8494

Dr. Vipin Chaudhary, CEO of CRL
Dr. Vipin Chaudhary, CEO of CRL, will give a talk on High Performance Computing at Institute of Engineers, Pune. CRL are the makers of Eka, one of the world's fastest privately funded supercomputers. For more information about HPC and CRL, click on the photo above.

Computation Research Laboratories (CRL) is the Pune-based company from the Tatas which built the Eka supercomputer. Eka was the 4th fastest when it launched a few years back, but has now dropped to 33rd; nevertheless, it remains one of the fastest private (i.e. not funded by any government) supercomputers in the world.

Earlier this year, Dr. Vipin Chaudhary took over as the CEO of CRL. I assume this marks a change in direction for CRL. Earlier, the focus was on building Eka, which required lots of cutting edge research in hardware, software, and facilities among other things. During that phase it was developed and run by academics (CRL was started with the help of Dr. Narendra Karmarkar, and most of the senior executives in CRL were ex-IIT-Bombay professors. Now, however, it is likely that they’re looking for a return on the investment, and would like to start marketing high performance computing services using Eka. They have a team working on high performance infrastructure and applications using the Eka hardware, and being a purely private company, are in a unique position to offer their hardware, software and services to companies who might be interested in supercomputing applications (think airplane design and modeling (e.g. somebody like Boeing), or car design (e.g. for in-house use like Tata Motors)). Dr. Chaudhary, who relocated from the US for this role, has been earlier involved with two startups (Corio, acquired by IBM in 2005) and Cradle Technologies (2000-2003), in addition to being an associate professor at SUNY Buffalo. Thus the credentials he brings, specifically his strong technical and business background in this area, are impressive.

CRL, is working on some of the most complex technologies in Pune. For that reason alone, any techie in Pune should be interested in this talk.

Commercial work pilots begin on Eka, Tata’s supercomputer

The Financial Express reports that Eka, the 4th fastest supercomputer in the world, built in Pune by the Tata’s Computation Research Lab (CRL), has begun running pilot projects for various commercial entities. Excerpts:

According to sources close to the development, the main application areas are in aerospace and aerodynamics, automotive design and engineering, academics, animation, weather forecasting and so on

and

Although the company would use some of these application in house, be it for Tata Motors or Tata Elxsi, much of the revenues would flow in from outside the Tata Group, mostly from abroad. 

These would include aircraft design companies like Boeing, Lockheed and Airbus.

See also:
Building the world’s 4th largest supercomputer

Building EKA – The world’s fastest privately funded supercomputer

Eka, built by CRL, Pune is the world’s 4th fastest supercomputer, and the fastest one that didn’t use government funding. This is the same supercomputer referenced in Yahoo!’s recent announcement about cloud computing research at the Hadoop Summit. This article describes some of the technical details of Eka’s design and implementation. It is based on a presentation by the Eka architects conducted by CSI Pune and MCCIA Pune.

Interconnect architecture

The most important decision in building a massively parallel supercomputer is the design of how the different nodes (i.e. processors) of the system are connected together. If all nodes are connected to each other, parallel applications scale really well (linear speedup), because communication between nodes is direct and has no bottlenecks. But unfortunately, building larger and larger such systems (i.e. ones with more and more nodes) becomes increasingly difficult and expensive because the complexity of the interconnect increases as n2. To avoid this, supercomputers have typically used sparse interconnect topologies like Star, Ring, Torus (e.g. IBM’s Blue Gene/L), or hypercube (Cray). These are more scalable as far as building the interconnect for really large numbers of nodes is concerned. However, the downside is that nodes are not directly connected to each other and messages have to go through multiple hops before reaching the destination. Here, unless the applications are designed very carefully to reduce message exchanges between different nodes (especially those that are not directly connected to each other), the interconnect becomes a bottleneck for application scaling.

In contrast to those systems, Eka uses an interconnect designed using concepts from projective geometry. The details of the interconnect are beyond the scope of this article. (Translation: I did not understand the really complex mathematics that goes on in those papers. Suffice it to say that before they are done, fairly obscure branches of mathematics get involved. However, one of these days, I am hoping to write a fun little article on how a cute little mathematical concept called Perfect Difference Sets (first described in 1938) plays an important role in designing supercomputer interconnects over 50 years later. Motivated readers are encouraged to try and see the connection.)

To simplify – Eka uses an interconnect based on Projective Geometry concepts. This interconnect gives linear speedup for applications but the complexity of building the interconnect increases only near-linearly.

The upshot of this is that to achieve a given application speed (i.e. number of Teraflops), Eka ends up using fewer nodes than its compatriots. This means it that it costs less and uses less power, both of which are major problems that need to be tackled in designing a supercomputer.

Handling Failures

A computer that includes 1000s of processors, 1000s of disks, and 1000s of network elements soon finds itself on the wrong side of the law of probability as far as failures are concerned. If one component of a system has a MTBF (mean time between failures) of 10000 hours, and the system has 3000 components, then you can start expecting things to fail once every 10 hours. (I know that the math in that sentence is probably not accurate, but ease of understanding trumps accuracy in most cases.)

If an application is running on 500 nodes, and has been running for the last 20 hours, and one of the nodes fails, the entire application has to be restarted from scratch. And this happens often, especially before an important deadline.

A simple solution is to save the state of the entire application every 15 minutes. This is called checkpointing. When there is a failure, the system is restarted from the last checkpoint and hence ends up losing only 15 minutes of work. While this works well enough, it can get prohibitively expensive. If you spend 5 minutes out of every 15 minutes in checkpointing your application, then you’ve effectively reduced the capacity of your supercomputer by 33%. (Another way of saying the same thing is that you’ve increased your budget by 50%.)

The projective geometry architecture also allows for a way to partition the compute nodes in such a way that checkpointing and status saving can be done only for a subset of the nodes involved. The whole system need not be reset in case of a failure – only the related subset. In fact, with the projective geometry architecture, this can be done in a provably optimal manner. This results in improved efficiency. Checkpoints are much cheaper/faster, and hence can be taken more frequently. This means that the system can handle failures much better.

Again, I don’t understand the details of how projective geometry helps in this – if someone can explain that easily in a paragraph or two, please drop me a note.

The infrastructure

The actual supercomputer was built in just 6 weeks. However, other aspects took much longer. It took an year of convincing to get the project funded. And another year to build the physical building and the rest of the infrastructure. Eka uses

  • 2.5MW of electricity
  • 400ton cooling capacity
  • 10km of electrical cabling
  • 10km of ethernet cabling
  • 15km of infiniband cabling

The computing infrastructure itself consists of:

  • 1800 blades, 4 cores each. 3Ghz for each core.
  • HP SFS clusters
  • 28TB memory
  • 80TB storage. Simple SATA disks. 5.2Gbps throughput.
  • Lustre distributed file-system
  • 20Gbps infiniband DDR. Eka was on the cutting edge of Infiniband technology. They sourced their infiniband hardware from an Israeli company and were amongst the first users of their releases – including beta, and even alpha quality stuff.
  • Multiple Gigabit ethernets
  • Linux is the underlying OS. Any Linux will work – RedHat, SuSe, your favorite distribution.

Its the software, stupid!

One of the principles of the Eka project is to be the one-stop shop for tackling problems that require huge amounts of computational powers. Their tagline for this project has been: from atoms to applications. They want to ensure that the project takes care of everything for their target users, from the hardware all the way up to the application. This meant that they had to work on:

  • High speed low latency interconnect research
  • System architecture research
  • System software research – compilers etc.
  • Mathematical library development
  • Large scientific problem solving.
  • Application porting, optimization and development.

Each of the bullet items above is a non-trivial bit of work. Take for example “Mathematical library development.” Since they came up with a novel architecture for the interconnect for Eka, all parallel algorithms that run on Eka also have to be adapted to work well with the architecture. To get the maximum performance out of your supercomputer, you have to rewrite all your algorithms to take advantages of the strengths of your interconnect design while avoiding the weaknesses. Requiring users to understand and code for such things has always been the bane of supercomputing research. Instead, the Eka team has gone about providing mathematical libraries of the important functions that are needed by applications specifically tailored to the Eka architecture. This means that people who have existing applications can run them on Eka without major modifications.

Applications

Of the top 10 supercomputers in the world, Eka is the only system that was fully privately funded. All other systems used government money, so all of them are for captive use. This means that Eka is the only system in the top 10 that is available for commercial use without strings attached.

There are various traditional applications of HPC (high-performance computing) (which is what Eka is mainly targeted towards):

  • Aerodynamics (Aircraft design). Crash testing (Automobile design)
  • Biology – drug design, genomics
  • Environment – global climate, ground water
  • Applied physics – radiation transport, supernovae, simulate exploding galaxies.
  • Lasers and Energy – combustion, ICF
  • Neurobiology – simulating the brain

But as businesses go global and start dealing with huge quantities of data, it is believed that Eka-like capabilities will soon be needed to tackle these business needs:

  • Integrated worldwide supply chain management
  • Large scale data mining – business intelligence
  • Various recognition problems – speech recognition, machine vision
  • Video surveillance, e-mail scanning
  • Digital media creation – rendering; cartoons, animation

But that is not the only output the Tatas expect from their investment (of $30 million). They are also hoping to tap the expertise gained during this process for consulting and services:

  • Consultancy: Need & Gap Analysis and Proposal Creation
  • Technology: Architecture & Design & Planning of high performance systems
  • Execution: Implement, Test and Commissioning of high performance system
  • Post sales: HPC managed services, Operations Optimization, Migration Services
  • Storage: Large scale data management (including archives, backups and tapes), Security and Business Continuity
  • Visualization: Scalable visualization of large amounts of data

and more…

This article is based on a presentation given by Dr. Sunil Sherlekar, Dr. Rajendra Lagu, and N. Seetha Rama Krishna, of CRL, Pune, who built Eka. For details of their background, see here. However, note that I’ve filled in gaps in my notes with my own conjectures, so errors in the article, if any, should be attributed to me.

Yahoo and CRL (EKA) to collaborate on cloud computing research

(Link courtesy Amit Paranjape via e-mail)

Yahoo and CRL have just announced that they will collaborate on research into cloud computing. I believe this announcement is essentially about Yahoo’s use of Hadoop on CRL’s EKA supercomputer.

This Yahoo!/CRL announcement comes on the eve of the first ever Hadoop Summit. Apache Hadoop is a Free Java software framework that supports data intensive distributed applications running on large clusters of commodity computers. It enables applications to easily scale out to thousands of nodes and petabytes of data. In Feb 2008, Yahoo announced that its Search Webmap application has been converted to a Hadoop application that runs on a more than 10,000 core Linux cluster and produces data that is now used in every Yahoo! Web search query.

CRL’s EKA supercomputer is a Hewlett-Packard based system with 14,400 processors, 28 terabytes of memory, 140 terabytes of disks, a peak performance of 180 trillion calculations per second (180 teraflops), and sustained computation capacity of 120 teraflops for the LINPACK benchmark. Of the top ten supercomputers in the world, EKA is the only supercomputer funded by the private sector that is available for use on commercial terms.

This announcement should also increase the interest in attending the presentation on EKA to be held here (i.e. in Pune) in a few days (Thursday, 27th March). The talk promises to be very interesting, especially given the background of speakers Dr. Lagu and Dr. Sherlekar (who used to be profs in IIT-B amongst other things). Details.

Upcoming Event: Seminar on CRL’s Supercomputer (4th fastest in the world)

A presentation on CRL’s EKA supercomputer will be held next Thursday evening. Considering that it is one of the fastest supercomputer in the world, and considering the qualifications of the speakers (see below), it promises to be an interesting talk.

Details

When: 27th March 2008, 6.30 to 8 pm.
Where: MCCIA, Crossword Building 5th Floor, Hall 4 & 5, Senapati Bapat Road
Price: No entry fee.

CRL’s EKA Supercomputer – Architecture and Applications

Computational Research Laboratories (CRL) has recently designed and built a high performance computer (HPC) system called EKA (Sanskrit word for “one”) which has been rated fourth fastest in the world and fastest in Asia. This rating was announced in the recent Supercomputing 2007 conference held in Reno, Nevada in USA. Of the top ten supercomputers in the world, this is the only computer funded by private sector and is available for usage on commercial terms. CRL is a fully owned subsidiary of Tata Sons Limited. The Tata group has always contributed to scientific research in India and EKA will strengthen this cause with appropriate public-private partnerships.

The senior team from CRL will cover salient aspects of EKA — architecture and applications.

About the Speaker – Dr. Rajendra Lagu

Education
B.Tech, Electrical Engineering, IIT Bombay, 1978, M.Tech, Electrical Engineering, IIT Bombay 1981, Ph.D., University of Florida, Gainesville, USA, 1985

Experience

  • Member of Technical Staff, Computational Research Labs, Pune, June 2006-present
  • Project Director, Society for Innovation and Entrepreneurship (SINE), IIT Bombay, June 2000-May 2006
  • Adjunct Professor, School of Management, IIT Bombay, June 2000-May 2006
  • Group Software Manager, Mastek Limited, January 1995-May 2000
  • Assistant Professor, IIT Bombay, June 1986-May 1990

About the Speaker – Dr. Sunil Dattatraya Sherlekar

Education

  • B. Tech (Elect. Engg.), I.I.T Bombay, April 1978
  • M. Tech (Computer Sc. & Engg.), I.I.T Bombay, July 1982.
  • Ph.D., I.I.T. Bombay, Sept . 1987

Experience

  • Larsen & Toubro Ltd., Graduate Engineer Trainee, June 1978 – Nov 1978.
  • CSRE, I.I.T. Bombay, Sr.Research Asst, Dec 1978 – July 1979.
  • Computer Centre, I.I.T Bombay, Sep, 1979 – Oct 1982.
  • Tata Consultancy Services Ltd., Head, Embedded Systems ( R& D), June 2002 – date.

Awards

  • Fellowship of the Indian National Academy of Engineering, 2007
  • IEEE Test Technology Committee award for contributions to the Asian activities of the committee.
  • UNESCO / ROSTSCA award for young scientists for contribution to the field of Computer Sc. & Informatics, 1989.

About the Speaker – Mr. N. Seetha Rama Krishna

Education
B.E (ECE), 1986, MBA (Dual) Indira Institute, Pune

Experience
21+ years handling Complex IT infrastructure & Data Centre solutions to verticals like:

  • Space, Defense, Research, Academic
  • Energy, Industry, Public Sector, Finance.
  • E – Governance (Govt), Medical Informatics.
  • Executed several turnkey & consultancy projects in all above mentioned areas.
  • Currently working with Computational Research Laboratories Ltd (Wholly owned subsidiary of Tata Sons) from May 07.
  • Earlier was working with C – DAC.

Other Upcoming Events: Startup Lunch, OpenCoffee Club and VC Circle.