liveblogging | punetech.com

(This is a live-blog of Dr. Vipin Chaudhary talk on Trends in High Performance Computing, organized by the IEEE Pune sub-section. Since this is being typed while the talk is going on, it might not be as well organized, or as coherent as other PuneTech articles. Also, links will usually be missing.)

Live-blog of a talk by Dr. Vipin Chaudhary, CEO of CRL, on High Performance Computing at Institute of Engineers, Pune. CRL are the makers of Eka, one of the world's fastest privately funded supercomputers. For more information about HPC and CRL, click on the photo above.

Myths about High Performance Computing:

Commonly associated with scientific computing
Only used for large problems
Expensive
Applicable to niche areas
Understood by only a few people
Lots of servers and storage
Difficult to use
Not scalable and reliable

This is not the reality. HPC is:

Backbone for national development
Will enable economic growth. Everything from toilets to potato chips are designed using HPC
Lots of supercomputing is throughput computing – i.e. used to solve lots of small problems
“Mainstream” businesses like Walmart, and entertainment companies like Dreamworks Studioes use HPC.
_(and a bunch of other reasons that I did not catch)

China is really catching up in the area of HPC. And Vipin correlates China’s GDP with the development of supercomputers in China. Point: technology is a driver for economic growth. We need to also invest in this.

Problems solved using HPC:

Movie making (like avatar)
Real time data analysis
- weather forecasting
- oil spill impact analysis
- forest fire tracking and monitoring
- biological contamination prediction
Drug discover
- reduce experimental costs through simulations
Terrain modeling for wind-farms
- e.g. optimized site selection, maintenance scheduling
- and other alternate energy sources
Geophysical imaging
- oil industry
- earthquake analysis
Designing airplanes (Virtual wind tunnel)

Trends in HPC.

The Manycore trend.

Putting many CPUs inside a single chip. Multi-core is when you have a few cores, manycore is when you have many, many cores. This has challenges. Programming manycore processors is very cumbersome. Debugging is much harder. e.g. if you need to get good performance out of these chips then you need to do parallel, assembly programming. Parallel programming is hard. Assembly programming is hard. Both together will kill you.

This will be one of the biggest challenges in computer science in the near future. A typical laptop might have 8 to 10 processses running concurrently. So there is automatic parallelism, as long as number of cores is less than 10. But as chips get 30, 40 cores or more, individual processes will need to be parallel. This will be very challenging.

Oceans of Data but the Pipes are Skinny

Data is growing fast. In sciences, humanities, commerce, medicine, entertainment. The amount of information being created in the world is huge. Emails, photos, audio, documents etc. Genomic data (bio-informatics) data is also huge.

Note: data is growing way, way faster than Moore’s law!

Storing things is not a problem – we have lots of disk space. Fetching and finding stuff is a pain.

Challenges in data-intensive systems:

Amount of data to be accessed by the application is huge
This requires huge amounts of disk, and very fat interconnects
And fast processors to process that data

Conventional supercomputing was CPU bound. Now, we are in the age of data-intensive supercomputing. Difference: old supercomputing had storage elsewhere (away from the processor farm). Now the disks have to be much closer.

Conventional supercomputing was batch processed. Now, we want everything in real-time. Need interactive access. To be able to run analytic and ad hoc queries. This is a new, and difficult challenge.

While Vipin was faculty in SUNY Buffalo, they started an initiative for data-intensive discovery initiative (Di2). Now, CRL is participating. Large, ever-changing data sets. Collecting and maintaining data is of course major problem, but primary focus of Di2 is to search in this data. e.g. security (find patterns in huge logs user actions). This requires a new, different architecture from traditional supercomputing, and the resulting Di2 system significantly outperforms the traditional system.

This also has applications in marketing analysis, financial services, web analytics, genetics, aerospace, and healthcare.

High Performance Cloud Services at CRL

Cloud computing makes sense. It is here to stay. But energy consumption of clouds is a problem.

Hence, CRL is focusing on a green cloud. What does that mean?

Data center optimization:

Power consumption optimization on hardware
Optimization of the power system itself
Optimized cooling subsystem
CFD modeling of the power consumption
Power dashboards

Workflow optimization (reduce computing resource consumption via efficiencies):

Cloud offerings
Virtualizations
Workload based power management
Temperature aware distribution
Compute cycle optimization

Green applications being run in CRL

Terrain modeling
Wind farm design and simulation
Geophysical imaging
Virtual wind tunnel

Summary of talk

Manycore processors are here to stay
- Programmability have to improve
- Must match application requirements to processor architecture (one size does not fit all)
Computation has to move to where the data is, and not vice versa
Data scale is the biggest issue
- must co-locate data with computing
Cloud computing will continue to grow rapidly
- Bandwidth is an issue
- Security is an issue
- These issues need to be solved

This is a live-blog of the Pune Open Coffee Club session on use of cloud apps for your business. Since this is being typed as the session is in progress, it might be a bit incoherent and not completely well-structured, and there are no links.

Pune OpenCoffee Club - POCC Logo — Pune Open Coffee Club is an informal group for all those interested in Pune's startup ecosystem. As of this writing, it has more than 2700 members. Click on the image to get all PuneTech articles related to the Pune Open Coffee Club

This session is being run as a panel discussion. Santosh Dawara is the moderator. Panelists are:

Dhananjay Nene, Independent Software Architect/Consultant
Markus Hegi, CEO of CoLayer
Nitin Bhide, Co-founder of BootstarpToday, a cloud apps provider
Basant Rajan, CEO of Coriolis, which makes the Colama virtual machine management software
Anthony Hsiao, Founder of Sapna Solutions

The session started with an argument over the defintion of cloud, SaaS, etc., which I found very boring and will not capture here.

Later, Anthony gave a list of cloud apps used by Sapna Solutions:

Google apps for email, calendaring, documents
GitHub for code
Basecamp for project management
JobScore for recruitment (handles job listings on your website, and the database of applicants, etc.)
GreyTip (Indian software for HR management)

Question: Should cloud providers be in the same country?
Answer: you don’t really have a choice. There are no really good cloud providers in India. So it will be outside.

Question: Are customers ready to put their sensitive data on the cloud?
Audience comment: Ashish Belagali has a startup that provides recruitment software. They can provide it as installable software, and also as a hosted, could app. However, they’ve found that most customers are not interested in the cloud app. They are worried about two things: a) The software will be unavailable if internet is not available, and b) The data is outside the company premises.

Point by Nitin Bhide of BootstrapToday: Any cloud provider will take security of your data very seriously. Because, if they screw this up even once, they’ll go out of business right away. Also, as far as theft of data is concerned, it can happen even within your own premises, by your own employees.

Comment 1: Yes, the above argument makes logical sense. But most human beings are not logical, and can have an irrational fear and will defend their choice.

Comment 2: This fear is not irrational. There are valid reasons to be unhappy about having your sensitive data in the cloud.

Comment 3: Another reason why this fear is not irrational is to do with CYA: cover-your-ass. If you put data in the cloud and something goes wrong, you will be blamed. If you put the data locally and something goes wrong, you can claim that you did everything that was expected of you. As long as CYA exists (especially in enterprises), this will be a major argument against the cloud.

Question: Does anybody use accounting packages in the cloud?
Answer: No. Most people prefer to stick to Tally, because of its compliance with Indian laws (or at least its compliance with Indian CAs). There doesn’t seem to be any online alternative that’s good enough.

At this point there was a longish discussion about the availability and uptime of the cloud services. Points made:

Cloud app providers have lots of redundancies and lots of backups to ensure that there is no downtime
However, there are enough instances of even world-class providers having downtime
Also, most of them claim redundancies, but give no guarnatees or SLAs, and even if they do give an SLA, you’re too small a player to enforce the SLA.
Also remember, that in the Indian context, downtime of the last mile of your internet will result in downtime of your app
Point to remember is that an app going down it not the real problem. The real problem is recovery time. How long does it take before it comes back up? Look at that before choosing upon your app.
It would be great if there was a reputation service for all cloud apps, which gives statistics on availability, downtime, performance etc. There isn’t right now, and that is a problem.
Remember, there is an economic cost of cloud apps that you will incur due to downtime, but also remember that there are definite economic savings too. For many startups the savings outweigh the potential costs. But you need to look into this for yourself.

Question: What kind of cost savings can a startup get by going to the cloud?

Nobody had concrete answers, but general points made:

Can you really afford to pay a system administrator who is competent, and who can administer a mail server, a file server, a this, and a that? There were some people who said that while admins are expensive in the US, they are not that expensive in India. However, more people felt that this would be expensive.
All significant large cloud services cost a very tiny fraction of what it would cost to do it yourself.
It is not a question of cost. As a startup, with my limited team, I wouldn’t have time to do this.

Basant Rajan points out that so far the discussion has been about either something that is in the cloud, or it is something that you do entirely yourself. These are not the only options. There is a third option – called managed services, or captive clouds. He points out that there is a Pune company called Mithi software that offers a whole bunch of useful services that they manage, on their machines, in your premises.

Question: What about compatibility between your apps? If the recruitment app needs to talk to your HR app are you in trouble?
Answer: The good ones already talk to each other. But yes, if you are not careful, you could run into trouble.

Some Pune startups who are providing cloud based apps:

Pune startup BootstrapToday provides an all-in-one solution in the cloud for development:

Source code control (using SVN). All the rest of these services are home grown.
Wiki pages
Bug tracking
Project management
Time Tracking (coming soon)
Project Tracking (coming soon)

Pune startup Acism has developed an in-house tool for collaboration and project communication which they are making available to others.

Pune startup CoLayer has been around for a long time, and has a product for better collaboration within an enterprise. It is like Google Wave, but has been around for longer, and is still around (while Wave is not).

Pune startup Colama offers private clouds based on virtualization technology. They are currently focusing on software labs in educational institutions as customers. But this technology can also be used to create grids and private clouds for development, testing and training.

Recommendations for cloud apps:

General recommendation: if you’re not using Google Apps, you must. Mail, Documents (i.e. Office equivalent functionality), Calendar.

Bug Tracking: Jira (very good app, but expensive), Pivotal Tracker (only for those familiar with agile, suggested by @dnene), Lighthouse App (suggested by: @anthonyhsiao), Mantis.

Project Management: ActiveCollab (self hostable), DeskAway, SugarCRM on Google Apps (very good CRM, very good integration with Google Apps, has a learning curve).

For hosting your own cloud (i.e. bunch of servers with load balancing etc.): Rackspace Cloud is good but expensive. Amazon Web Services is cost effective, but has a learning curve.

Unfortunately, due to time constraints, this part of the session got truncated. Hopefully we’ll have some more time in the end to pick this up again.

IndicThreads conference pass giveaway

IndicThreads will give a free pass to their Cloud Computing conference that is scheduled for 20/21 August to the best blog or tweet either about this POCC event, or about Cloud Computing in general. The pass is normally worth Rs. 8500. To enter, tweets and blogs should be brought to the attention of @indicthreads on twitter, or conf@rightrix.com. This PuneTech blog is not eligible for the free pass (because I already have a pass), so the field is still open 🙂

punetech.com

Connecting together Pune's Technologists

Tag Archives: liveblogging

Live-Blog: Overview of High Performance Computing by Dr. Vipin Chaudhary

Trends in HPC.

The Manycore trend.

Oceans of Data but the Pipes are Skinny

High Performance Cloud Services at CRL

Summary of talk