Category Archives: Technology

Post-mortem of the Amazon Cloud Disruption

May 16, 2011Featured, TechnologyNavin Kabra

Last month, Amazon Web Services had a major outage which resulted in downtime for a number of companies who are using AWS as their infrastructure provider. This has given rise to a host of concerns for everybody interested in cloud computing, and it is important to understand the reasons for the outage, the long-term implications, if any, of this outage, and most important of all, what changes users of cloud infrastructure should make in their architecture and processes so that they’re less affected by such problems.

Suhas Kelkar, who is the Director of the Innovation Team at BMC Software India has done this port-mortem of the incident.

Note: if you have trouble viewing the video embedded above click here. Suhas has created this video+slides presentation using kPoint another Pune-based cloud software product.

A couple of years back, Suhas had written an article for PuneTech titled Musings on Why Cloud Computing will Prevail which is also interesting reading in this context.

How to prevent such outages from affecting your own infrastructure? A few days after the outage, Dhananjay Nene, Chief Architect at Vayana, and also a consulting software architect, wrote an article arguing that the cloud just got stronger as a result of the AWS outage.

Here are his recommendations:

AWS has multiple availability zones. An application should ideally leverage at least two. If you read the Netflix presentation I referred to, Netflix apparently uses three. Do not assume the servers will not go down. Assume it is possible that at least one availability zone could go down. Make sure you have the systems to quickly activate, systems in the alternative availability zone. For that you will need to find ways to keep data current across availability zones. Also find ways to ensure you have the ability to quickly switch to and fro between availability zones. More advanced options could include concurrently active systems across availability zones or those spread across AWS regions or even between AWS and other vendors.

Read the whole article, and also check out Dhananjay’s blog.

What I like about Rails3 by Gautam Rege

March 31, 2011Event Report, Technologyrails, ruby, techweekend, tw8Navin Kabra

(This article by Gautam Rege is based on a talk he gave at Techweekend #8. It was first published on the Josh Software blog, and is reproduced here with permission.)

This is NOT a post about differences between Rails 2.x and Rails 3 but these are some musings about what I love in Rails. A lot of goodies come ‘in the box’ (I hate saying out-of-the-box) with Rails3 and some of them have been there since early version of Rails but somehow less frequently used or talked about. I spoke about this at Techweekend #8 and the presentation is here.

Bundler

Ever had a production system crash one day – without any code deployment or even anyone logging in. After the initial ‘its not me’ excuses, one system administrator says ‘Hey, I had updated the system libraries’. Having been burnt already before (hopefully), you check on the system and find that some system library has been upgraded and some gems have been upgraded and that is causing incompatibility between other gems etc. We had the case where rack (1.0.1) was upgraded to rack (1.1) causing incompatibility with the Rails gem we were running! The fix is simple — upgrade or downgrade your gems or libraries and you’re on your way. A few days later, another developer needs to deploy a simple sinatra application. He takes the latest version which requires rack > 1.1 and it automatically upgrades the gem. Boom! Your Rails app crashed again.

Did I hear you freeze the gems? Nah – not a good approach, as it causes your application deployment bundle to be huge and ‘frozen’. Every application you use would require to freeze gems and this does not really solve your problem.

Bundler (by Yahuda and Carl) built this awesome gem which is now the de-facto standard for any Rails application. In fact, it was so cool, its not Rails 2.x compatible and very highly recommended. You can now specify your dependencies in a Gemfile and prevent any clashes with any other gem versions and their dependencies. Since the gems are installed in the system default location (not frozen in your app), it means it us re-usable and version friendly!

source "http://rubygems.org"


gem "haml"       # the latest version

gem "rack", "~>1.1"  # v1.1 or greater

gem "nokogiri", :git => "git://github.com/tenderlove/nokogiri.git"

gem "splat", :path => "~/Code/splat"  # local code base

group :test do # only in test environment gem "rspec", :require => "spec" end

UJS

Unobtrusive Java Script has been around for ages now but Rails lingered with prototype.js. Now, with the awesome features of JQuery, we can easily use UJS to solve some common and nagging problems:

Avoid submit if button clicked twice!
Make non-get requests from Hyperlinks!
Submit form data via Ajax

Add :remote => true to hyperlink, forms and other elements. This adds the data-remote=true in your html properties. The ‘live’ JQuery function binding kicks in and sets up the events for various elements. Simple and awesome – this is available here.

XSS

Cross site scripting has been a pain to handle for a long time. Rails does this under covers – you dont event need to know too many details:

protect_from_forgery is automatically added during basic rails project creation. This ensures that every form created by the application has an authenticity_token as a hidden data field. During a post request, this is verified and thus ensures that the source of the form creation is the same server – this avoid session stealing where a malicious form is posted to your server using an authenticated user’s session!

While using UJS, you need to add csrf_meta_tag in your layout to avoid silent Ajax errors.

SQL injection is cleanly avoided with new where syntax:
# Wrong where("user_name = '#{user_name}' AND "password = '#{password}'").first


 # Correct

 where("user_name = ? AND password = ?", user_name, password).first

# Correct and clean where(:user_name => user_name, :password => password).first

In Rails3, all html spewed out is HTML SAFE! So, you cannot leave gaps for non-HTML safe code, even by mistake! If indeed you do trust the source, you can use the ‘raw’ method to spew out the HTML as is.

Rails Eager Loading

The N+1 query problem is fairly common and largely ignored until you hit serious performance issues. Straight out of the Rails guide, consider the case
clients = Client.all.limit(10)

clients.each do |client| puts client.address.postcode end
There are 11 queries fired here. Using the :includes construct, Rails does eager loading like this:
clients = Client.includes(:address).limit(10)

clients.each do |client| puts client.address.postcode end

Here only 2 queries are fired as Rails includes the address relationship too while fetching the client objects.

Transliteration / Babosa

What happens to your permalinks if a user enters the information in Arabic? We faced exactly this issue and were asked by our client to prevent input which is not English. Woh! ActiveSuppprt in Rails3 addresses a lot of these transliteration issues:

"Jürgen Müller".to_slug.transliterate.to_s #=> "Jurgen Muller"

Performance using Identity Map

The awesomeness of Rails progression – As of this inclusion the Identity Map pattern is now part of Rails 3 caching mechanism. An identity map is a design pattern used to improve performance by providing a in-memory cache to prevent duplicate retrieval of the same object data from the database, in context of the same request or thread.

Optimistic Locking

A really old concept which has been there since REALLY early versions of Rails. This is commonly overlooked but is critically important when it comes to concurrent request processing. By adding a ‘lock_version’ field in the table, Rails automatically kicks into optimistic locking mode and prevents concurrent writes when the data is stale. The StaleObjectError is raised incase the lock_version is not the same as when it was read.

Named Scopes

This is almost cliched now 🙂 Mames scopes were added since Rails 2.1. Its one of the things I love about Rails. The scopes are chained together and the query is fired only when the data is really needed. This is excellent for report filters! Adding new filters is a breeze as its only one of the scopes to be chained. Remember that scopes do not return an Array but an association object like has_many. That is how they can be chained to other scopes.

I’m pretty sure I have missed some things here. Do comment on what features you like best about Rails3! 😉

Top 5 things to worry about when designing a Cloud Based SaaS

December 13, 2010Featured, Technologyavailability, cloud, cloud computing, pubmatic, SaaS, scalability, Technology, web servicesNavin Kabra

(This article on things you need to be careful when designing the architecture of a cloud based Software-as-a-Service offering is a guest post by Mukul Kumar, who, as SVP of Engineering at Pubmatic has a lot of hands-on experience with having designing, building and maintaining a very high performance, high scalability cloud-based service.)

Designing a SaaS software stack poses challenges that are very different from the considerations for host-based software design. The design aspects for performance, scalability, reliability of SaaS with lots of servers and lots of data is very different and interesting from designing a software that is installed on a host and is used by that host.

Here I list the top 5 design elements for Cloud Based SaaS.

High availability

SaaS software stack is built on top of several disparate elements. Most of the times these elements are hosted by different software vendors, such as Rackspace, Amazon, Akamai, etc. The software stack consists of several layers, such as – application server, database server, data-mining server, DNS, CDN, ISP, load-balancer, firewall, router, etc. Highly availability of SaaS actually means thinking about the high availability of all or most of these components. Designing high availability of each of these components is a non-trivial exercise and the cost shoots up as you keep on adding layers of HA. Such design requires thinking deeply about the software architecture and each component of the architecture. Two years back I wrote an article on Cloud High Availability, where I described some of these issues, you can read it here.

Centralized Manageability

As you keep on adding more and more servers to your application cluster the manageability gets hugely complex. This means:

you have to employ more people to do the management,
human errors would increase, and
the rate at which you can deploy more servers goes down.

And, don’t just think of managing the OS on these servers, or these virtual machines. You have to manage the entire application and all the services that the application depends on. The only way to get around this problem is to have centralized management of your cluster. Centralized management is not an easy thing to do, since every application is different, making a generalized management software is oversimplifying the problem and is not a full solution.

Online Upgradability

This is probably the most complex problem after high availability. When you have a cluster of thousands of hosts, live upgradability is a key requirements. When you release a new software revision, you need to be able to upgrade is across the servers in a controlled way, with the ability of rolling it back whenever you want – at the instant that you want, across the exact number of servers that you want. You would also need to control database and cache coherency and invalidation across the cluster is a controlled way. Again, this cannot be solved in a very generic way; every software stack has its own specificity, which needs to be solved in its own specific ways.

Live testability

Testing your application in a controlled way with real traffic and data is another key aspect of SaaS design. You should be able to sample real traffic and use it for testing your application without compromising on user experience or data integrity. Lab testing has severe limitations, especially when you are testing performance and scalability of your application. Real traffic patterns and seasonality of data can only be tested with real traffic. Don’t start your beta until you have tested on real traffic.

Monitor-ability

The more servers and applications that you add to your cluster the more things can fail and in very different ways. For example – network (NIC), memory, disk and many other things. It is extremely important to monitor each of these, and many more, constantly, with alarms using different communication formats (email, SMS, etc.). There are many online services that can be used for monitoring services, and they provide a host of difference services and have widely varying pricing. Amazon too recently introduced CloudWatch, which can monitor various aspects of a host such as CPU Utilization, Disk I/O, Network I/O etc.

As you grown your cluster of server you will need to think of these design aspects and keep on tuning your system. And, like the guys at YouTube said:

Recipe for handling rapid growth

    while (true)
    {
        identify_and_fix_bottlenecks();
        drink();
        sleep();
        notice_new_bottleneck();
     }

About the Author – Mukul Kumar

Mukul Kumar is the Co-Founder & Senior Vice President Engineering at PubMatic. PubMatic, an online advertising company that helps premium publishers maximize their revenue and protect their brands online, has its Research & Development center in Pune.

Mukul is responsible for PubMatic’s Engineering team and resides in Pune, India. Mukul was previously the Director of Engineering at PANTA Systems, a high-performance computing startup. Before that he was at VERITAS India, where he joined as the 13th employee and helped it grow to over 2,000 individuals. Mukul has filed for 14 patents in systems software, storage software, and application software. Mukul is a graduate of IIT Kharagpur with a degree in Electrical Engineering.

Mukul is very passionate about technology, and building world-class teams. His interests include architecting scalable and high-performance web-applications, handling and mining massive amounts of data and system & storage architecture.

Mukul’s email address is mukul at pubmatic.com.

Android/iPhone/BlackBerry/Nokia – Which platform(s) should developers target

November 20, 2010Event Report, Technologyandroid, Events, google, open source, TechnologyNavin Kabra

(I attended the IndicThreads Conference on Mobile Application Development today. This article is based on presentations made there and conversations I had with some of the presenters.)

The smartphones market is very fragmented.

In 3Q2010, Symbian had 37% of the smartphone market, Android was second with 25% (it was at 2% 18 months ago), and iOS in third place with 16%. RIM (Blackberry) was next. Windows was losing.

So, what should a developer do? Which to target?

I talked to Romin Irani of Xoriant about this problem, and whether HTML5 is the answer to these issues. My key takeaway’s from this conversation were:

HTML5 is here already. I was under the impression that HTML5 is something that will arrive sometime in the near future. Romin pointed out that HTML5 support is pretty good even today, especially if you’re thinking of mobile phone browsers.
But HTML5 not the answer to all your problems. If you need access to device sensors, you’re probably better off with a native app. If you want access to the appstore/marketplace, then you need a native app. HTML5 doesn’t qualify!
If you’re a new startup, and you want to build a mobile app, what should you do? These are the guidelines:
- If you don’t need device sensors, and don’t need to be in the appstore/marketplace, strongly consider a HTML5+CSS+JavaScript app
- If you want to go after the US market, you must have an iPhone native app. (Maybe followed by Android)
- If you want to go after Europe market, then you will need to have a Nokia based native app, just for the sheer numbers they have

Rohit Nayak of Talentica had talked about the use of cross-platform app development frameworks like Titanium and PhoneGap. Both allow you to write apps in JavaScript. Titanium cross-compiles them to native apps on each platform. PhoneGap uses a modified version of the browser so that your app is HTML+CSS+JavaScript, but there are modifications that allow you to access native phone features (like sensors).

There are some limitations, and such apps aren’t as good as native apps.

So, would he really recommend the use of PhoneGap/Titanium for developing apps? Rohit had this to say:

Titanium and PhoneGap are rapidly getting better and better. More and more apps built using them are showing up on the android marketplace.
If you already know JavaScript, and need to get to the market quickly, you should definitely consider using one of these tools
If you don’t really need advanced native features of any specific platform, then it makes a lot of sense to go this route
If you are a software outsourcing company that’s building apps for third parties, you should seriously considering building a team that uses Titanium. For most of your customers, you’ll be able to quickly complete an app that satisfies them. Otherwise, you’re faced with a nightmare – you’ll need to build teams with expertise in each of the major platforms, and this is almost impossible to do with today’s attrition.

The last few points seem very similar to the advantages of HTML5, so I asked Rohit whether PhoneGap/Titanium had any advantages over HTML5. Answer:

PhoneGap/Titanium generally support more native features than HTML is planning on supporting
An app built Titanium/PhoneGap can go on the appstore/marketplace.
An HTML5 app necessarily requires you to have a “cloud” presence – a web server and an API, and supporting all the online connections. PhoneGap/Titanium application does not require any of that.

Pune’s KQInfoTech announces beta availability of ZFS file-system for Linux

September 17, 2010Featured, Technologyfile-systems, kqinfotech, linux, storage, sun, systemsNavin Kabra

About an year ago, we had reported that Pune based KQInfoTech is working on porting Sun’s ZFS file system to linux. They have now announced that a “Technology Preview” of the port is now complete, and the ported ZFS for Linux is now available in beta. They are looking for interested folks to try out the beta and help them with finding bugs and other issues.

But first some background, taken from previous PuneTech articles about KQInfoTech.

What is KQInfotech?

KQInfoTech is a Pune company that's trying to combine mentorship programmes for technology students, along with technology services to the industry and open source projects. Click on the logo to see other PuneTech articles on KQInfoTech's various initiatives.

Pune-based KQInfoTech is an organization started by Anurag Agarwal and Anand Mitra, both of whom chucked high-paying jobs in the industry because they felt that there was a desperate need to work on the quality of students that is being churned out by our colleges. For the 2 years or so, they have been trying various experiements in education, at the engineering college level. All their experiments are based on one basic premise: students’ ability to pay should not be a deterrent – in other words, the offerings should be free for the students; KQInfoTech focuses on finding alternative ways to pay for the costs of running the course. As a part of this initiative, they provide services to industry, and take on open source projects, and the students in their mentorship program actually do the work under their guidance.

What is ZFS?

ZFS – the Zettabyte File System – is an enormous advance in capability on existing file systems. It provides greater space for files, hugely improved administration and greatly improved data security. Wikipedia has this to say:

ZFS is a combined file system and logical volume manager designed by Sun Microsystems. The features of ZFS include support for high storage capacities, integration of the concepts of filesystem and volume management, snapshots and copy-on-write clones, continuous integrity checking and automatic repair, RAID-Z and native NFSv4 ACLs.

Why ZFS on Linux by KQInfotech?

ZFS is arguably one of the best file-systems available today, and Linux is one of the most widely used operating systems for servers by new startups. So, having ZFS available on Linux would be great. And, With many years of experience in Veritas building VxFS, another one of best file-systems in the world, the founders of KQInfoTech do have the technical background to be able to do a good job of this.

At this point, ZFS is not available on Linux. See the Linux section of ZFS entry on Wikipedia for more details.

So what does this port contain?

This port of ZFS is an extension to the port of DMU layer by Brian Behlendorf. We have added the missing ZPL layer to Brian’s port. With this addition it becomes possible to mount the zfs filesystem on linux and leverage ZFS’s features on linux.

What next?

If you’re interested in participating in the beta and helping out, or you’re one of the people whose business would really be helped by having ZFS available on Linux, apply for the beta, or get in touch with KQInfoTech: zfs-query@kqinfotech.com.

Also, check out the FAQ.

Choices in Cloud Computing and What’s Right for You

August 20, 2010Event Report, Featured, Technologycloud, cloud computing, google, indicthreadsNavin Kabra

(This is a live-blog of a talk given by Kalpak Shah, at the Indic Threads Conference on Cloud Computing, held in Pune on 20/21 Aug 2010. Since it’s being typed in a hurry, it is not necessarily as coherent and complete as we would like it to be, and also links might be missing.)

Kalpak Shah is the founder and CEO of Clogeny, a company that does consulting & services in cloud computing. His talk is about the various choices available in cloud computing today, and how to go about picking the one that’s right for you.

Cloud computing making the right choices

View more presentations from Clogeny Technologies.

These are the slides that were used by Kalpak for this talk. Click here if you can’t see the slideshow above.

Kalpak’s definition of a cloud:

If you cannot add a new machine yourself (i.e. without making a phone call or email), then it’s just hosting, not cloud computing
If you cannot pay as you go (i.e. pay per use) it is not cloud computing
If you don’t have APIs which allow integration with the rest of your infrastructure/environment, then it is not a cloud

Kalpak separates out cloud infrastructure into three parts, and gives suggestions on how to choose each:

Infrastructure as a service

Basically allows you to move your local server stuff into the cloud. Examples: Amazon EC2, Terremark vCloud, GoGrid Cloud, Rackspace Cloud

You should check:

Support and Helpdesk. Is it 24×7? Email? Phone?
Hardware and Performance. Not all of them are the same. Amazon EC2 not as good as Terremark.
OS support. Which OS and distributions are supported. Is imaging of server allowed? Is distribution and re-selling of images allowed? Not everybody allows you to save the current state of the server, and restart it later on a different instance.
Software availability and partner network. Example, Symantec has put up their anti-virus software for Windows on EC2. How many such partners are available with the provider you’re interested in? (EC2 is far ahead of everybody else in this area.)
APIs and Ecosystem. What APIs are available and in what languages. Some providers don’t do a good job of providing backward compatibility. Other might not be available in language of your choice. EC2 and Rackspace are the best in this area.
Licensing is a big pain. Open source software is not a problem, but if you want to put licensed applications on the cloud, that is a problem. e.g. IBM Websphere clustering is not available on EC2. Or Windows licenses cannot be migrated from local data center to the cloud.
Other services – How much database storage are you allocated? What backup software/services are available? What monitoring tools? Auto-scaling, load-balancing, messaging.

Kalpak has put up a nice comparison of Amazon AWS, Rackspace, GoGrid and Terremark on the above parameters. You can look at it when the PPT is put up on the IndicThreads conference website in a few days.

Platform as a Service

This gives you a full platform, not just the hardware. You get the development environment, and a server to upload the applications to. Scalability, availability managed by the vendor. But much less flexibility than infrastracture-as-a-service. You are stuck with the programming language that the PaaS supports, and the tools.

For example, Google AppEngine. Which is available only for Python and Java. Or Heroku for Ruby + Rails.

PaaS is targeted towards developers.

Software as a Service

This gives you a consumer facing software that sits in the cloud. You can start using the software directly, or you can extend it a bit. A business layer is provided, so you can customize the processes to suit your business. Good if what is provided fits what you already want. Not good if your needs are rather different from what they have envisoned.

Examples: Sales Force, Google Apps, Box.net, Zoho

Storage as a Service

Instead of storing data on your local disks, store it in the cloud. Lots of consumer adopton, and now enterprise usage is also growing. No management overhead, backups, or disaster recovery to worry about. And pay either flat fees per month, or by the gigabyte.

Examples: Mozy from EMC. Amazon S3. Rackspace CloudFiles. Carbonite. DropBox.

Comparing PaaS and SaaS

Some choices automatically made for you based on development language and available skill sets. Python + Java? Use Google AppEngine. Ruby on Rails? Use Heroku. Microsoft shop? Use Azure.

Other ways to compare are the standard ones: size of vendor and ecosystem maturity. Tools, monitoring, connectors, etc. e.g. AppEngine has a Eclipse plugin, so if your developers are used to Eclipse (and they should be!) then this is very good. Another question to ask is this – will the vendor allow integration with your private cloud? Can you sync your online hosted database with your local database? If yes, that’s great. If not that can be very painful and complicated for you.

Interesting Private Cloud Platforms

These are some interesting private cloud platforms

Eucalyptus: open source IaaS cloud computing platform.
VMWare Cloud: Partnered with Terremark. Expensive but worth it.
Appistry: Allows installing of the platform on Amazon EC2, or in your private data center. Allows application deployment and mgmt, various services across the stack IaaS, PaaS, SaaS. Integration with SQL Azure, SharePoint, Dynamics CRM. Visual Studio development and testing. Supports multiple development languages.

Database in the cloud

You can either do regular relational databases (easy to use, everybody knows them, scaling and performance needs to be managed by you). Or do NoSQL – non-relational databases like SimpleDB (Amazon), Hadoop (Yahoo), BigTable (Google). They’re supported and managed by cloud vendor in some cases. Inherent flexibility and scale. But querying is more difficult and less flexible.

Business Considerations

Licensing is a pain, and can make the cloud unattractive if you’re not careful. So figure this one out before you start. SLAs are around 99.9% for most vendors, but lots of fine print. Still evolving and might not meet your standards, especially if you’re an enterprise. Also, if SLA is not being met, vendor will not tell you. You have to complain and only then they might fix it. Overall, this is a grey area.

Pricing is a problem – it keeps changing (e.g. in case of Amazon). So you can have problems estimating it. Or the pricing is at a level that you might not understand. e.g. pricing of 10 cents per million I/O requests. Do you know how many I/Os your app makes? Maybe not.

Compliance might be a problem – your government might not allow your app to be in a different country. Or, for banking industry, there might be security certification required (for the vendor) before the cloud can be reached.

Consider all of these before deciding whether to go to a cloud or not.

Summary

IaaS gives you the infrastructure in the cloud. PaaS adds the application framework. SaaS adds a business layer on the top.

Each of these are available as public clouds (that would be somewhere out there on the world wide web), or private clouds that are installed in your data-center. Private is more expensive, more difficult to deploy, but your data is in your premises, you have better (local) connectivity, and have more flexibility. You could also have a hybrid cloud, where some stuff is in-house and some stuff in the public cloud. And if your cloud infrastructure is good enough, you can easily move computation or data back and forth.

Kalpak Shah, CEO of Clogeny, gave a broad overview of the various options available in cloud computing infrastructure, platforms and software, and the questions you need to ask before you choose the one for you.

About the Speaker – Kalpak Shah

Kalpak Shah is Founder & CEO of Clogeny Technologies Pvt. Ltd. and guides the overall strategic direction of the company. Clogeny is focused on providing services and consultancy in the cloud computing and storage domains. He is passionate about the ground-breaking economics and technology afforded by the cloud computing platforms. He has been working on various cloud platforms including IaaS, PaaS and SaaS vendors.

You can also follow @clogeny and @kalpakshah on twitter.

Live-Blog: Overview of High Performance Computing by Dr. Vipin Chaudhary

August 14, 2010Live Blogging, Technologycrl, Events, hpc, liveblogging, supercomputing, TechnologyNavin Kabra

(This is a live-blog of Dr. Vipin Chaudhary talk on Trends in High Performance Computing, organized by the IEEE Pune sub-section. Since this is being typed while the talk is going on, it might not be as well organized, or as coherent as other PuneTech articles. Also, links will usually be missing.)

Live-blog of a talk by Dr. Vipin Chaudhary, CEO of CRL, on High Performance Computing at Institute of Engineers, Pune. CRL are the makers of Eka, one of the world's fastest privately funded supercomputers. For more information about HPC and CRL, click on the photo above.

Myths about High Performance Computing:

Commonly associated with scientific computing
Only used for large problems
Expensive
Applicable to niche areas
Understood by only a few people
Lots of servers and storage
Difficult to use
Not scalable and reliable

This is not the reality. HPC is:

Backbone for national development
Will enable economic growth. Everything from toilets to potato chips are designed using HPC
Lots of supercomputing is throughput computing – i.e. used to solve lots of small problems
“Mainstream” businesses like Walmart, and entertainment companies like Dreamworks Studioes use HPC.
_(and a bunch of other reasons that I did not catch)

China is really catching up in the area of HPC. And Vipin correlates China’s GDP with the development of supercomputers in China. Point: technology is a driver for economic growth. We need to also invest in this.

Problems solved using HPC:

Movie making (like avatar)
Real time data analysis
- weather forecasting
- oil spill impact analysis
- forest fire tracking and monitoring
- biological contamination prediction
Drug discover
- reduce experimental costs through simulations
Terrain modeling for wind-farms
- e.g. optimized site selection, maintenance scheduling
- and other alternate energy sources
Geophysical imaging
- oil industry
- earthquake analysis
Designing airplanes (Virtual wind tunnel)

Trends in HPC.

The Manycore trend.

Putting many CPUs inside a single chip. Multi-core is when you have a few cores, manycore is when you have many, many cores. This has challenges. Programming manycore processors is very cumbersome. Debugging is much harder. e.g. if you need to get good performance out of these chips then you need to do parallel, assembly programming. Parallel programming is hard. Assembly programming is hard. Both together will kill you.

This will be one of the biggest challenges in computer science in the near future. A typical laptop might have 8 to 10 processses running concurrently. So there is automatic parallelism, as long as number of cores is less than 10. But as chips get 30, 40 cores or more, individual processes will need to be parallel. This will be very challenging.

Oceans of Data but the Pipes are Skinny

Data is growing fast. In sciences, humanities, commerce, medicine, entertainment. The amount of information being created in the world is huge. Emails, photos, audio, documents etc. Genomic data (bio-informatics) data is also huge.

Note: data is growing way, way faster than Moore’s law!

Storing things is not a problem – we have lots of disk space. Fetching and finding stuff is a pain.

Challenges in data-intensive systems:

Amount of data to be accessed by the application is huge
This requires huge amounts of disk, and very fat interconnects
And fast processors to process that data

Conventional supercomputing was CPU bound. Now, we are in the age of data-intensive supercomputing. Difference: old supercomputing had storage elsewhere (away from the processor farm). Now the disks have to be much closer.

Conventional supercomputing was batch processed. Now, we want everything in real-time. Need interactive access. To be able to run analytic and ad hoc queries. This is a new, and difficult challenge.

While Vipin was faculty in SUNY Buffalo, they started an initiative for data-intensive discovery initiative (Di2). Now, CRL is participating. Large, ever-changing data sets. Collecting and maintaining data is of course major problem, but primary focus of Di2 is to search in this data. e.g. security (find patterns in huge logs user actions). This requires a new, different architecture from traditional supercomputing, and the resulting Di2 system significantly outperforms the traditional system.

This also has applications in marketing analysis, financial services, web analytics, genetics, aerospace, and healthcare.

High Performance Cloud Services at CRL

Cloud computing makes sense. It is here to stay. But energy consumption of clouds is a problem.

Hence, CRL is focusing on a green cloud. What does that mean?

Data center optimization:

Power consumption optimization on hardware
Optimization of the power system itself
Optimized cooling subsystem
CFD modeling of the power consumption
Power dashboards

Workflow optimization (reduce computing resource consumption via efficiencies):

Cloud offerings
Virtualizations
Workload based power management
Temperature aware distribution
Compute cycle optimization

Green applications being run in CRL

Terrain modeling
Wind farm design and simulation
Geophysical imaging
Virtual wind tunnel

Summary of talk

Manycore processors are here to stay
- Programmability have to improve
- Must match application requirements to processor architecture (one size does not fit all)
Computation has to move to where the data is, and not vice versa
Data scale is the biggest issue
- must co-locate data with computing
Cloud computing will continue to grow rapidly
- Bandwidth is an issue
- Security is an issue
- These issues need to be solved

The Rise and Fall of Google Wave

August 10, 2010In Depth, Technologycommunication, google, webNavin Kabra

(In this guest post, Markus Hegi, partially-Pune-based CEO of partially-Pune-based company Colayer, laments the death of Google Wave, and points out that the concept behind the Wave is right. Google should have re-launched a new, improved Wave, he feels, because the world does need a paradigm shift in business communications. This article is a shortened & modified version of a post published on ex.colayer.com)

3 days ago, Google announced that it would stop the development of Wave and would stop supporting it by the end of the year. Even though the buzz about Wave and the (visible) progress of Wave was low for the last few months, the shut down is surprising: I would have expected a re-launch, a change of the architecture, integration with gmail – anything, but not a complete halt – The concept behind Wave is right and ahead of its time – and Google could have been a leading player in this space!

When I looked at Wave for the first time right after the announcement one year ago, it struck me, how similar the concepts were to what we were working for years with Colayer. I started Colayer in 99 – suffering myself the mess of email communication. As a travelling business consultant I was convinced, that this can not be the way we will communicate in future! This is fundamentally wrong! – I mean: the basic idea of SENDING information on the web is wrong! (You GO TO and ARE ON Facebook, twitter, yahoo – you don’t ‘download’ it.) Google Wave addresses exactly these same issues.

We were excited to see, what approach Google would take to implement the new paradigm of online communication – But also realized quickly, that this product in this stage would not be usable for 3 main reasons:

The Technical Architecture was too heavy and complex
The Operability – The way to operate the tool was limiting
The Notification – the way the users would be notified about updates in their many waves.

If you would use this product in a real world scenario with heavy communication, it would not work! – But Wave was at its very start. We thought Google would quickly realize the problems and implement solutions for it – and with their market power, Google would be able to initiate the paradigm shift in online communication.

But after the Wave launch, it seemed that innovation stopped. Yes, there was development, improvements & many extensions were released. But the above 3 problems were not addressed. They couldn’t be solved through improvements or extensions, but needed fundamental shifts in the product design – which never happened. And as many users seemed to loose patience too, Google pulled the plug for poor user adoption after only one year.

What went wrong? – Gartner has a valid point: “Startup innovation” has simply no place in a large enterprise software company. Well, this is not exactly what Gartner writes, but this is essentially the meaning: Either you are in the business of breaking & paradigm shifting innovation (Startups), or you are serving a large base of enterprise customers – Both together is almost impossible, because there is no breaking innovation, without messing up with your customers. After Wave was launched, even though it was still tagged as ‘beta’, the team could not just say to its 100’000 users: “you know, we just realized that the architecture has a fundamental problem – lets start it all over again …!” – which we, in a small company did several times …

Maybe another problem of Wave was, that Google choose the wrong market: Wave was intended for the broad consumer market, as well as for enterprises – But the paradigm shift happens elsewhere first: If you observe today’s kids and young nerds, you can imagine, how the next generation of businesses will use online communication: Email for them is ‘lame’ and just used for communication with outsiders, older people and the ‘conservative’ business world. Why would you need email anyway in a world of Facebook & Foursquare?

After 10 years, we are still in the beginning of the massive paradigm shift of online communication. I am eager to see, who will join the journey next!

About Google Wave

Wave is a web application for real-time communication and collaboration.

(See one of the most popular videos explaining the basic concepts of Wave)

Announced in May 2009, Wave attracted a lot of attention for a couple of months. The project was stopped by Google after just a little more than one year for poor user adoption.

About the author – Markus Hegi

Markus Hegi founded Metalayer (now renamed to Colayer) 10 years ago. The Colayer platform is a software technology to create collaborative web sites.

Colayer is a Swiss-Indian company with headquarters in Zurich, Switzerland and development center in Pune, India. Markus ‘commutes’ since 10 years between Zurich and Pune and spends almost half of his time here in Pune. See his linked-in profile, or follow him on twitter.

About Colayer vs Google Wave:

See an overview of articles about Colayer vs Google Wave on colayer.com.

Mentor India internship for tech students: entrance exam on 4th July

June 29, 2010Miscellaneous, Technologyeducation, students, systems, webNavin Kabra

For the last few years, Pune startup KQ Infotech has been running Mentor India, an year-long, free, internship program for technology students, aimed at giving students a very strong base in systems programming, web development, or web design and animation.

The next batch of Mentor India is starting in July, and the entrance exam is on 4th July. Interested students may register here

Click on this icon to see all PuneTech articles related to tech education in Pune

From Mentor India’s webpage:

Mentor India is a fusion of the cultural heritage of the ancient Gurukul system with modern methods of teaching and learning. This unique concept emphasizes experiential learning by the Shishya with the Guru skillfully facilitating it. Thus, students of this program would be paying from the day one but in the form of their contribution on live projects.

Being a Software Development & Consulting firm Knowledge Quest Infotech has a strong background of technology, thus enabling students in cultivating their technical roots.

Here are highlights of the program from the KQ InfoTech website:

Mentor India Program ensures that the students learn and earn with upcoming technology
Students don’t need to pay any fees and their training is paid by work on live projects
Students can start earning within 6 months of their program based on performance
Exposure to live projects along one year work experience
Placement opportunity with KQ Infotech and other technology companies
Course completion certification

Any student with one of these degrees is eligible: MCA,MSc CS/IT/Tech,MCM,BE/BTech. Candidates will be selected on the basis of a written technical and aptitude written test, followed by a technical and personal interview.

Syllabus for technical test contains

C Programming Language
Data Structure and Algorithms
Operating systems
Quantitative & Analytical Reasoning

Apply here

Meeting Report: Pune Rails Meetup (Dec 2009)

December 21, 2009Event Report, Featured, Technologycommunity, dynamic programming languages, rails, ruby, user groups, webNavin Kabra

(This is a report of Pune Ruby on Rails meetup that happened on 12th December. This report was originally written by Gautam Rege on his blog, and is reproduced here with permission for the benefit of PuneTech readers.)

Pune Rails Logo — Click on the logo to find all punetech articles about Rails in Pune

It was great to be a part of the Pune Rails Meetup which was held yesterday (19th December, 2009) at ThoughtWorks, Pune. It was an idea initiated by Anthony Hsiao of Sapna Solutions which has got the Pune Rails community up on their feet. Helping him organize was a pleasure!

It was great to see almost 35 people for this meet — it was a probably more than what we expected. It was also heartening to see a good mix in the crowd – professionals in rails, students working in rails and students interested in rails – not to forget entrepreneurs who were very helpful.

Proceedings began with Vincent and _______ (fill in the gaps please — am really lousy with names) from ThinkDRY gave an excellent presentation on BlankApplication – a CMS++ that they are developing. I say CMS++ because its not just another CMS but has quite a lot of ready-to-use features that gets developers jump-started. There were interesting discussions regarding how ‘workspaces’ are managed and how its indeed easier to manage websites.

After this technical talk, I spoke next on my experience at the Lone Star Ruby Conference in Texas. I tried to keep the session interactive with the intention of telling everyone how important it is to know and use Ruby effectively while working in Rails. Dave Thomas’s references to the ‘glorious imperfection’ of Ruby did create quite a buzz. To quote a little from Dave’s talk:

name {}

This is a method which takes a block as a parameter but the following line is a method which takes a has as a parameter! A simple curly parenthesis makes all the difference!

name ( {} )

Similarly, the following line is a method m() whose result is divided by ‘n’ whose result is divided by ‘o’

m/n/o

but add a space between this and its a method m() which takes a regular expression as a parameter!

m /n/o

It was nice to see everyone get involved in these interactive sessions. More details about my experience at LSRC is here.

After this there was another technical talk about a multi-app architecture that has been developed by Sapna Solutions. Anthony and Hari gave a talk on this and it was very interesting to see it work. Using opensource applications like shopify, CMS and other social networking apps to work with a shared-plugin and a single database, its possible to create a mammoth application which is easily customizable and scalable.

Hari did mention a few problems like complexity in migrations and custom routes which they currently ‘work-around’ but prefer a cleaner approach. Some good suggestions were provided by Scot from ThoughtWorks regarding databases. I suggested some meta-programing to align models. Working with git submodules and ensuring rake scripts to sync up data, this indeed seems to have a lot of potential.

There were some new entrepreneurs from ______ who have already developed a live application in Merb which they discussed and explained details of. It was good to hear about how they managed performance and scalability testing. The Q&A forum which was the next event was extremely interactive. Some of the discussions were:

Which are really great CMS in Rails?

There were some intense discussions regarding RadiantCMS, Adva and even BlankApp. The general consensus was a ‘programmable CMS’ Vs WYSIWYG. Those who prefer more of the content management prefer CMS’s like Drupal, Joomla. Those who prefer more customization via programing and code, prefer Radiant. This topic could not close and is still open for discussion.. Do comment in your views – I am a radiant fan

What about testing? Cucumber, Rspec, others?

Usually its still adhoc – testing is expensive for smaller firms — so adhoc blackbox testing is what is done. I opined that cucumber and rspec ROCK! Cucumber is great for scenario testing and testing controller logic and views. Rspec is great for Direct Model Access and Cucumber can make great use of Webrat for browser testing.

In Rpsec, when do we use mocks and stubs?

It was suggested that mocks and stubs should be used when there are no ready model and code. If the code is ready, its probably just enough not to use mocks and stubs directly. Comments welcome on this!

How do you do stress testing?

Stress testing, concurrency testing and performance testing can be done using http-perf. It was interesting to note that ____ have actually done their own implementation for stress and concurrency testing. I recommended they open source it.

How are events, scheduled job and delayed jobs handled?

This was my domain Using delayed_job is the way to go. Following the leaders (github) and using Redis and resque would be great too but definitely not backgrounDrb or direct cron!

What project management tools do you use? Pivotal Tracker, Trac, Mingle?

Pivotal tracker suits startup needs. Mingle rocks but becomes expensive. Scott ? Dhaval from TW mentioned how easy it was to co-ordinate an ‘mingle’ with their 200 strong team over distributed geographies.

Which SCM do you use? git, svn, cvs?

People have been very comfortable with git and more and more are migrating from svn to git. It was heartening to see that nobody uses CVS Jaju (I have have misspelt) gave an excellent brief about how code and diffs can be squished and ‘diff’ed with another repository before the final merge and push to the master. Dhaval gave an idea about how they effectively used git for managing their 1GB source code (wow!)

Some pending questions – probably in next meet-up

Which hosting service do you use and why?
TDD or BDD?

Suggestions are welcome!

About the Author – Gautam Rege

Gautam Rege is the co-founder and managing director at Josh Software, Pune.

Gautam has an engineering degree in Computer Science from PICT, Pune. In his 9 years in the IT industry, he has worked in companies like Symantec, Zensar and Cybage before starting Josh 2 years ago.

Gautam’s technical knowledge spans from various languages like C, C++, Perl, python, Java to software expertize in various industry domains like Finance, Manufacturing, Insurance and even advertising.

As with the company name, Gautam has a lot of ‘josh’ about new and emerging technologies. His company is one of the few which works almost exclusively in Ruby on Rails, the cutting edge web technology that has taken the industry by storm.

(Comments on this article are closed. Please comment at the location of the original article)

punetech.com

Connecting together Pune's Technologists