Category Archives: Technology

Growing a Community Powered Website

(In this article, Manas Garg, a regular contributor to PuneTech, explores the various factors involved in the growth of a community powered wesbite. These ideas are relevant to any website/company that expects to get a lot of its content from the actions of its users – and there are a number of such sites from SadakMap, and JustMeans to the Pune OpenCoffee Club, and of course, PuneTech itself. Even otherwise, these are important issues that any technologist living in a web-2.0 world must understand.)

Community Powered Websites (CPWs) are a rage today. And there are good reasons for that. First, you only build a website and the content (which is the primary value to these sites) comes from people. These people don’t charge you anything, in fact, you can make some money by running ads to these very people.

Secondly, the people who bring in the content also become the users of the website. Which means, people bring the content, people consume the content, and you just provide a framework for doing that using a website. Great!

Two primary factors contributing to the success of a Community Powered Website (CPW) are its tendency to grow and its immunity to abuse. This is, of course, in addition to the functional value that this website has.

Growth for a CPW means, more data, more contributors and more users. Simple. And immunity to abuse means when bad people come to your site to do bad things, your site can shrug off these attacks and get on with life. For Wikipedia, a bad thing is someone putting spam on a page. For twitter, a bad thing is someone hacking the system and making thousands of people follow him/her.

In this article, I have put down some of my thoughts on how we can make a CPW “tending to grow”. I do not claim expertise in this area. Nor do I claim to be exhaustive. I am just trying to make sense out of the way web is evolving today and community power is a very interesting phenomenon in that.

So, let’s start…

For any CPW, we anyway have to do things which people find valuable/useful and for which they would want to use the website in the first place. For instance, facebook, delicious, twitter, wikipedia have some fundamental value for which people would like to use them. On top of that functional value, there is a social design which makes them “tending to grow”.

A simple example is Blogger. It has some functional value (i.e. a blogging platform) for which people use it. But as long as the game is purely functionality based, people will choose Blogger only if its functionality is the best. Tomorrow, if a new blogging platform with better functionality comes along, new blogs may use that platform. That’s the reason blogger team is adding some social touch so that more and more people “choose” blogger if their contacts are already on blogger.

So, this is the “tendency to grow”. It is outside the purview of functionality. And it’s becoming more and more important because it’s becoming very easy for anyone to match a given set of functionality.

Now, let’s look at the contributors to this “tendency to grow”…

The Network Effect

In short, network effect is when a service becomes more and more valuable when more and more people use it which thereby increases its adoption and hence the value. This creates a self sustaining loop. The loop doesn’t go infinite as eventually there is a max limit to the final value. But it can certainly take us very far.

The general purpose social networking sites are the best examples of network effect. More the users we have, more the chances of getting more users. That’s why they have grown phenomenally in a short time span. Delicious doesn’t trigger the network effect even though it is social. There is no reason for me to join delicious even if all my friends are using it. On the other hand, I would naturally join LinkedIn because all my “connections” are using LinkedIn. Blogger, by being more social, is trying to bring in the network effect.

How to bring in the network effect is a subject worth another complete article or may be a book. Suffice it to say that a network effect has to be designed for in any CPW. Once we have modeled our website, we can test that model (mentally of course) for what kind of network effect this model can produce. If we are building a CPW but don’t design it for network effect, we are limiting the mileage we can get out of it.

Ease of contribution

It’s difficult to have a general purpose definition of what contribution is as it depends on the website. For flickr, photographs are contributions, for facebook, pretty much everything a user does on the site is a contribution. Even visiting someone’s profile on facebook is a contribution to facebook as the very fact that you visited that profile is shown on that profile.

On every Wikipedia page, you’ll see a clear “Edit” link to edit that page. For every section within the page, the edit link for that section is well placed. It almost “invites” you to edit. When the very design of a website has a look that invites you for contributions, it’s got the tendency to growth 🙂

For receiving contributions, there are two possibilities –

  1. Unintentional contribution. We contribute bookmarks to delicious for our own purpose. We contribute photographs to flickr for our own purpose. While we are doing our own things, unintentional contributions are being made to the system. When we share something with our friends on facebook, the system is getting richer automatically even though the users are not working towards making the system richer 🙂
  2. Intentional contribution. Wikipedia is a place where people specifically contribute with the intention of making the system richer. It’s not like sharing something with friends or saving something for future reference. There is an explicitness here.

Needless to say, it’s easier to get people on board when their contribution is unintentional i.e. they are doing their own thing and the system just gets richer. This lends a greater tendency to grow to the CPW.

I am sure there would be several other aspects of making a CPW tending to grow which escaped my limited knowledge and the retarded mind. Will some people with experience in this area throw a little bit of light here?

About the Author – Manas Garg

About the author: Manas is interested in a variety of things like psychology, philosophy, sociology, photography, movie making etc. But since there are only 24 hours in a day and most of it goes in sleeping and earning a living, he amuses himself by writing software, reading a bit and sharing his thoughts.
About the author: Manas is interested in a variety of things like psychology, philosophy, sociology, photography, movie making etc. But since there are only 24 hours in a day and most of it goes in sleeping and earning a living, he amuses himself by writing software, reading a bit and sharing his thoughts.
Reblog this post [with Zemanta]

Why Python is better than Java for Object-Oriented Design

Dhananjay Nene recently switched over to Python and has discovered that he is much happier writing programs in Python. We covered his first article in the series, and the end of that post gives an idea of why we think you should listen to him, and also subscribe to his blog. In the next article in the Python vs. Java series, he takes a few design principles of object-oriented programming and shows how to implement those using sample code in Java and Python. 

An excerpt to whet your appetite:

Well, static typed languages use polymorphism as a powerful mechanism of extensibility. In other words, in many cases the extensions are likely to be newer derived types. Thus design the rest of your code to work on the base type and introduce the newer derived types later as required without having to necessarily change existing code. However static languages primarily depend upon inheritance as the vehicle for delivering polymorphism. Dynamic languages on the other hand depend upon duck typing. Duck typing supports polymorphism without using inheritance. In this context you need the same set of relevant methods to be implemented in each of the extension classes. The role of the abstract base class or interface as the one which specifies the contract / api has been made redundant. You can still choose to define a base class / interface if you want to, but you no longer have to. 

The full article is a must read if you are a student of programming languages in general, and object-oriented programming in particular. If you are neither of those things, and if you plan to be in the software field for a while, then you seriously need to ask yourself, “Why not!?“.

Why Python is better than Java?

Dhananjay Nene recently switched over to Python and has discovered that he is much happier writing programs in Python. He has a detailed post over at this blog on the reasons:

I think the most dominant impression from the last few months is that python does make programming feel a lot more easier and often more enjoyable. The feeling is not very different between riding a bicycle without gears then riding one with gears. In the latter case one just feels one can cover a lot more distance much more easily though any physicist will tell you the actual effort is not particularly different. It just feels like one has a much bigger toolbox (ie a wider assortment of tools) to work with and therefore the task seems simpler. Why do I think that way ? I believe the following features of python do help (in no particular order) :

* Concise Coding style : The code typically is much more concise, with much lesser verbosity
* Dynamic typing : You really do not need to worry about declaring data types and making sure the inheritance hierarchies especially for all the interfaces and implementations well laid out. The various objects do not even need to be in the same inheritance hierarchy – so long as they can respond to the method, you can call it. This is a double edge sword, but that doesn’t take away the fact that programming under dynamic types environment does seem a lot easier.
* Easier runtime reflection : Java seems to have all the reflection capabilities but I think these are just way too painful to use as compared to python. In python the entire set of constructs (classes, sequences etc.) are available for easy reflection. In case you need to use metaprogramming constructs, python really rocks.
* More built in language capabilities : Items such a list comprehensions, ability to deal with functions as first class objects etc. give you a broader vocabulary to work with.
* Clean indentation requirement : It took me about 2-3 days to get over it but, it seems that python code is much easier to read since if you do not indent it correctly it will be rejected.

I am a Perl person myself, and think similar thoughts about Perl, and I don’t really care for the forced indentations of Python. But Perl is really for disciplined programmers who don’t get carried away and start doing all the weird things that the language allows. For the indisciplined folks, I guess the forced indentation of Python is probably a good way to keep them in check.

Anyway,  read the whole article. You should also read the post he wrote at the time he chose Python for this next project. In fact, subscribe to his blog. He writes detailed and insightful articles that, as a techie, you would do well to read. If you are interested in programming languages, I would recommend reading “Contrasting java and dynamic languages”, and “Performance Comparison – C++ / Java / Python / Ruby/ Jython / JRuby / Groovy”. And if you are a blogger, check out his tips for software/programming blogging.

Dhananjay is a Pune-based software Engineer with 17 years in the field. Passionate about software engineering, programming, design and architecture. For more info, check out his PuneTech wiki profile.

Why do we need server virtualization

Virtualization is fast emerging as a game-changing technology in the enterprise computing space. What was once viewed as a technology useful for testing and development is going mainstream and is affecting the entire data-center ecosystem. This article on the important use-cases of server virtualization by Milind Borate, is the second in PuneTech’s series of articles on virtualization. The first article gave an overview of server virtualization. Future articles will deal with the issue of management of virtual machines, and other types of virtualization.

Introduction

Is server virtualization a new concept? It isn’t, because traditional operating systems do just that. An operating system provides a virtual view of a machine to the processes running on it. Resources are virtualized.

  • Each process gets a virtual address space.
  • A process’ access privileges control what files it can access. That is storage virtualization.
  • The scheduler virtualizes the CPU so that multiple processes can run without conflicting with each other.
  • Network is virtualized by providing multiple streams (for example, TCP) over the same physical link.

Storage and network are weakly virtualized in traditional operating systems because some global state is shared by all processes. For example, the same IP address is shared by all processes. In case of storage, the same namespace is used by all processes. Over time, some files/directories become de-facto standards. For example, all process look at the same /etc/passwd file.

Today, the term “server virtualization” means running multiple OSs on one physical machine. Isn’t that just adding one more level of virtualization? An additional level generally means added costs, lower performance, higher maintenance. Why then is everybody so excited about it? What is it that server virtualization provides in addition to traditional OS offerings? An oversimplified version of the question is: If I can run two processes on one OS, why should I run two OSs with one process each? This document enumerates the drivers for running multiple operating systems on one physical machine, presenting a use case, evaluating the virtualization based solution, suggesting alternates where appropriate and discussing future trends.

Application Support

Use case: I have two different applications. One runs on Windows and the other runs on Linux. The applications are not resource intensive and a dedicated server is under-utilized.

Analysis: This is a weak argument in an enterprise environment because enterprises want to standardize on one OS and one OS version. Even if you find Windows and Linux machines in the same department, the administrators are two different people. I wonder if they would be willing to share a machine. On the other hand, you might find applications that require conflicting versions of, say, some library, especially on Linux.

Alternative solution: Wine allows you to run Windows applications on Linux. Cygwin allows you to run Linux applications on Windows. Unfortunately, it’s not the same as running the application directly on the required OS. I won’t bet that a third party application would run out of the box under these virtual environments.

Future: Some day, developers will get fed up of writing applications for a particular OS and then port them to others. JAVA provides us with an host/OS independent virtual environment. JAVA wants programmers to write code that is not targetted for a particular OS. It succeeded in some areas. But, still there is a lot of software written for a particular OS. Why did everybody not move to JAVA? I guess, because JAVA does not let me do everything that I can do using OS APIs. In a way, that’s JAVA’s failure in providing a generic virtual environment. In future, we will see more and more software developed over OS independent APIs. Databases would be the next target for establishing generic APIs.

Conflicting Applications

Use case: I have two different applications. If I install both on the same machine, both fail to work. In fact, they might actually work but it’s not a supported by my vendor.

Analysis: In the current state of affairs, an OS is not just hardware virtualization. The gamut of libraries, configuration files, daemons is all tied up with an OS. Even though an application does not depend on the exact kernel version, it very much depends on the library versions. It’s also possible that the applications make conflicting changes to some configuration file.

Alternative solution: OpenVZ modifies Linux to provide multiple “containers” inside the same OS. The machine runs a single kernel but provides multiple isolated environments. Each isolated environment can run an application that would be oblivious to the other containers.

Future: I think, operating systems need to support containers by default. The process level isolation provided at memory and CPU level needs to be extended storage and network also. On the other hand, I also hope that application writers desist from depending on shared configuration and shared libraries pay some attention to backward compatiblity.

Fault Isolation

Use case: In case an application or the operating system running the application faults, I want my other applications to run unaffected.

Analysis: A faulty application can bring down entire server especially if the application runs in a priviledged mode and if it could be attacked over a network. A kernel driver bug or operating system bug also brings down a server. Operating systems are getting more stable and servers going down due to operating system bug is rare now a days.

Alternative solution: Containers can help here too. Containers provide better isolation amongst applications running on the same OS. But, bugs in kernel mode components cannot be addressed by containers. Future: In near future, we are likely see micro-kernel like architectures around virtual machines monitors. Light weight operating systems could be developed to work only with virtual machine monitors. Such a solution will provide fault isolation without incurring the overheads of a full opearting system running inside a virtual machine.

Live Application Migration

Use case: I want to build a datacenter with utility/on-demand/SLA-based computing in mind. To achieve that, I want to be able to live-migrate an application to a different machine. I can run the application in a virtual machine and live-migrate the virtual machine.

Analysis: The requirement is to migrate an application. But, migrating a process is not supported by existing operating systems. Also, the application might do some global configuration changes that need to be available on the migration target.

Alternative solution: OpenVZ modifies Linux to provide multiple “containers” inside the same OS. OpenVZ also supports live migration of a container.

Future: As discussed earlier, operating systems need to support containers by default.

Hardware Support

Use case: My operating system does not support the cutting edge hardware I bought today.

Analysis: Here again, I’m not bothered about the operating system. But, my applications run only on this OS. Also, enterprises like to use the same OS version throughout the organization. If an enterprise sticks to an old OS version, it does not work with new hardware to be bought. If an enterprise is willing to move to the newer OS, it does not work with the existing old hardware.

But, the real issue here is the lack of standardization across hardware or driver development models. I fail to understand why every wireless LAN card needs a different driver. Can all hardware vendors not standardize the IO ports and commands so that one generic driver works for all cards? On the other hand, every OS and even OS version has a different drivers development model. That means every piece of hardware requires a different driver for each OS version. Alternative solution: I cannot think of a good alternative solution. One specific issue, unavailability of wireless LAN card drivers for Linux, is addressed by NdisWrapper. NdisWrapper allows us to access a wirelss card on Linux by loading a Windows driver.

Future: We either need hardware level standardization or the ability to run the same driver on all verions on all operating systems. It would be good to have wrappers, like NdisWrapper, for all types of drivers and all operating systems. A hardware driver should write to a generic API provided by the wrapper framework. The generic API should be implemented by the operating system vendors.

Software Development Environment

Use case: I want to manage hardware infrastructure for software development. Every developer and QA engineer needs dedicated machines. I can quickly provision a virtual machine when the need arises.

Analysis: Under development software fails more often than a released product. Software developers and QA engineers want an isolated environment for the tests to correctly attribute bugs to the right application. Also, software development envinronments require frequent reprovisioning as the product under development needs to be tested under different operating systems.

Alternative solution: Containers would work for most software development. I think, the exception is kernel level development.

Future: Virtual machines found an instant market in software QA labs. Virtual machines will continue to flourish in this market.

Application Configuration Replica

Use case: I want to ship some data to another machine. Instead of setting up identical application enviroment on the other machine to access the data, I want to ship the entire machine image itself. Shipping physical machine image does not work because of hardware level differences. Hence, I want to ship virtual machine image.

Analysis: This is another hard reality of life. Data formats are not compatible across multiple versions of a software product. Portable data formats are used by human readable documents. File-system data formats are also stable to a large extent and you can mount a FAT file-system or ISO 8529 file-system on virtually any version of any operating system. The same level of compatiblity is not established for other structured data. I don’t see that happening in near future. Even if this hurdle is crossed, you need to bother about correctly shipping all the application configuration, which itself could be different for the same software running on different OSs.

Alternative solution: OpenVZ container could be a light-weight alternative to a complete virtual machine.

Future: The future seems inclined towards “computing in a cloud”. The network bandwidth is increasing and so is the trend towards is outsourced hosting. Mail and web services are outsourced since a long time. Oracle-on-demand allows us to outsource database hosting. Google (writely) wants us to outsource document hosting. Amazon allows us to outsource storage and compuation both. In future, we will be completely oblivious to the location of our data and applications. The only process running on your laptop would be an improved a web browser. In that world, only system software engineers, who build these datacenters, would be worried about hardware and operating system compatibilities. But, they also will not be overly bothered because the data-center consolidations will reduce the diversity in hardware and OS.

Thin Client Desktops

Use case: I want to replace desktop PCs with thin clients. A central server will run a VM for each thin client. The thin client will act as a display server.

Analysis: Thin clients could bring down the maintenance costs substantially. Thin client hardware is more resilient than a desktop PC. Also, it’s easier to maintain the software installed on a central server than managing several PCs. But, it’s not required to run a full virtual machine for each thin client. It’s sufficient to allow users to run the required applications from a central server and make the central storage available.

Alternative solution: Unix operating systems are designed to be server operating systems. Thin X terminals are still prevalent in Unix desktop market. Microsoft Windows, the most prevalent desktop OS, is designed as a desktop OS. But, Microsft also has added substantial support for server based computing. Microsft’s terminal services allows multiple users to connect to a Windows server and launch applications from a thin client. Several commercial thin clients can work with Microsoft terminal services or similar services provided by other vendors.

Future: Before the world moves to computing in a global cloud, an intermediate step would be enterprise-wide desktop application servers. Thin-clients would become prevalent due to reduced maintenance costs. I hope to see Microsoft come up with better licensing for server based computing. On Unix, floating application licenses is the norm. With a floating application licence, a server (or a cluster of servers) can run only fixed application instances as per the license. It does not matter which user or thin client launches the application. Such a floating licensing from Microsoft will help.

Conclusion

Server virtualization is a “heavy” solution for the problems it addresses today.These problems could be adddressed by operating systems in a more efficient manner with following modifications:

  • Support for containers.
  • Support for live migration of containers.
  • Decoupling of hardware virtualization and other OS functionalities.

If existing operating systems muster enough courage to deliver these modifications, server virtualization will have tough time. It’s unrealistic to expect complete overhauls of existing operating systems. It’s possible to implement containers as a part of OS but decoupling hardware virtualizatoin from OS is a hard job. Instead, we are likely to see new light weight operating systems designed to run only in server virtualization environment. The light weight operating system will have following characteristics:

  • It will do away with functionality already implemented in virtual machine monitor.
  • It will not worry about hardware virtualization.
  • It might be a single user operating system.
  • It might expect all co-operative processes.
  • It will have a minimal kernel mode component. It will be mostly composed of user mode libraries providing OS APIs.

Existing virtual machine monitors would also take up more responsiblity in order to support light weight operating systems:

  • Hardware support: The hardware supported by a VMM will be of primary importance. The OS only needs to support the virtual hardware made visible by VMM.
  • Complex resource allocation and tracking: I should get a finer control over resources allocated to virtual machines and be able to track resource usage. This involves CPU, memory, storage and network.

I hope to see a light weight OS implementation targetted at server virtualization in near future. It would a good step towards modularizing the operating systems.

Acknowledgements

Thanks to Dr. Basant Rajan and V. Ganesh for their valuable comments.

About the Author – Milind Borate

Milind Borate is the CTO and VP of Engineering at Pune-based continuous data protecting startup Druvaa. He has over 13 years experience in enterprise product development and delivery. He worked at Veritas Software as Technical Director for SAN-FS and served on board of Veritas patent filter committee. Milind has filed over 15 patent applications (4 alloted) and co-authored “Undocumented Windows NT” in 1998. He holds a BE (CS) degree from University of Pune and MTech (CS) degree from IIT, Bombay.

This article was written when Milind was at Coriolis, a startup he co-founded before Druvaa.

Reblog this post [with Zemanta]

Cloud Computing and High Availability

This article discussing strategies for achieving high availability of applications based on cloud computing services is reprinted with permission from the blog of Mukul Kumar of Pune-based ad optimization startup PubMatic

Cloud Computing has become very widespread with startups as well as divisions of banks, pharmaceuticals companies and other large corporations using them for computing and storage. Amazon Web Services has led the pack with it’s innovation and execution, with services such S3 storage service, EC2 compute cloud, and SimpleDB online database.

Many options exist today for cloud services, for hosting, storage and application hosting. Some examples are below:

Hosting Storage Applications
Amazon EC2 Amazon S3 opSource
MOSSO Nirvanix Google Apps
GoGrid Microsoft Mesh Salesforce.com
AppNexus EMC Mozy
Google AppEngine MOSSO CloudFS
flexiscale

[A good compilation of cloud computing is here, with a nice list of providers here. Also worth checking out is this post.]

The high availability of these cloud services becomes more important with some of these companies relying on these services for their critical infrastructure. Recent outages of Amazon S3 (here and here) have raised some important questions such as this – S3 Outage Highlights Fragility of Web Services and this.

[A simple search on search.twitter.com can tell you things that you won’t find on web pages. Check it out with this search, this and this.]

There has been some discussion on the high availability of cloud services and some possible solutions. For example the following posts – “Strategy: Front S3 with a Caching Proxy” and “Responding to Amazon’s S3 outage“.

Here I am writing of some thoughts on how these cloud services can be made highly available, by following the traditional path of redundancy.

[Image: Basic cloud computing architectures config #1 to #3]

The traditional way of using AWS S3 is to use it with AWS EC2 (config #0). Configurations such as on the left can be made to make your computing and storage not dependent on the same service provider. Config #1, config #2 and config #3 mix and match some of the more flexible computing services with storage services. In theory the compute and the storage can be separately replaced by a colo service.

[Image: Cloud computing HA configuraion #4]

The configurations on the right are examples of providing high availability by making a “hot-standby”. Config #4 makes the storage service hot-standby and config #5 separates the web-service layer from the application layer, and makes the whole application+storage layer as hot-standby.

A hot-standby requires three things to be configured – rsync, monitoring and switchover. rsync needs to be configured between hot-standby servers, to make sure that most of the application and data components are up to date on the online-server. So for example in config #4 one has to rsync ‘Amazon S3’ to ‘Nirvanix’ – that’s pretty easy to setup. In fact, if we add more automation, we can “turn-off” a standby server after making sure that the data-source is synced up. Though that assumes that the server provisioning time is an acceptable downtime, i.e. the RTO (Recovery time objective) is within acceptable limits.

[Image: Cloud computing Hot Standby Config #5]
This also requires that you are monitoring each of the web services. One might have to do service-heartbeating – this has to be designed for the application, this has to be designed differently for monitoring Tomcat, MySQL, Apache or their sub-components. In theory it would be nice if a cloud computing service would export APIs, for example an API for http://status.aws.amazon.com/ , http://status.mosso.com/ or http://heartbeat.skype.com/. However, most of the times the status page is updated much later after the service goes down. So, that wouldn’t help much.

Switchover from the online-server/service to the hot-standby would probably have to be done by hand. This requires a handshake with the upper layer so that requests stop and start going to the new service when you trigger the switchover. This might become interesting with stateful-services and also where you cannot drop any packets, so quiscing may have to be done for the requests before the switchover takes place.

[Image: Cloud computing multi-tier config #6]
Above are two configurations of multi-tiered web-services, where each service is built on a different cloud service. This is a theoretical configuration, since I don’t know of many good cloud services, there are only a few. But this may represent a possible future, where the space becomes fragmented, with many service providers.

[Image: Multi-tier cloud computing with HA]
Config #7 is config #6 with hot-standby for each of the service layers. Again this is a theoretical configuration.

Cost Impact
Any of the hot-standby configurations would have cost impact – adding any extra layer of high-availability immediately adds to the cost, at least doubling the cost of the infrastructure. This cost increase can be reduced by making only those parts of your infrastructure highly-available that affect your business the most. It depends on how much business impact does a downtime cause, and therefore how much money can be spent on the infrastructure.

One of the ways to make the configurations more cost effective is to make them active-active configuration also called a load balanced configuration – these configurations would make use of all the allocated resources and would send traffic to both the servers. This configuration is much more difficult to design – for example if you put the hot-standby-storage in active-active configuration then every “write” (DB insert) must go to both the storage-servers, writes (DB insert) must not complete on any replicas (also called mirrored write consistency).

Cloud Computing becoming mainstream
As cloud computing becomes more mainstream – larger web companies may start using these services, they may put a part of their infrastructure on a compute cloud. For example, I can imagine a cloud dedicated for “data mining” being used by several companies, these may have servers with large HDDs and memory and may specialize in cluster software such as Hadoop.

Lastly I would like to cover my favorite topic –why would I still use services that cost more for my core services instead of using cloud computing?

  1. The most important reason would be 24×7 support. Hosting providers such as servepath and rackspace provide support. When I give a call to the support at 2PM India time, they have a support guy picking up my calls – that’s a great thing. Believe me 24×7 support is a very difficult thing to do.
  2. These hosting providers give me more configurability for RAM/disk/CPU
  3. I can have more control over the network and storage topology of my infrastructure
  4. Point #2 above can give me consistent throughput and latency for I/O access, and network access
  5. These services give me better SLAs
  6. Security

About the Author

Mukul Kumar, is a founding engineer and VP of Engineering at Pubmatic. He is based in Pune and responsible for PubMatic’s engineering team. Mukul was previously the Director of Engineering at PANTA Systems, a high performance computing startup. Previous to that he joined Veritas India as the 13th employee and was Director of Engineering for the NetBackup group, one of Veritas’ main products. He has filed for 14 patents in systems software, storage software, and application software and proudly proclaims his love of π and can recite it to 60 digits. Mukul is a graduate of IIT Kharagpur with a degree in electrical engineering.

Mukul blogs at http://mukulblog.blogspot.com/, and this article is cross posted from there.

Zemanta Pixie

AirTight Networks offers Wireless Security as an online service

Pune-based startup AirTight networks, which provides wireless security products, has announced that it is making wireless security available as an online service. The customer has to buy some wireless sensors (little plug-n-play hardware accessories) and attach them to appropriate machines in their company, respond to a few questions about their wireless setup and that’s it. Within a few days they begin to receive wireless security reports. There are no servers or software to buy, configure, or administer – because all the data analysis and report generation is hosted on AirTight’s servers over the internet.

The major benefits of this are ease of installation, ease of use, and most importantly the investment needed can be ramped up gradually. The simplest system costs just $2 per day as opposed to the upfront $20000 capital investment that would be required otherwise. In addition there is a free 30-day trial. This makes it easy for enterprises that are interested in wireless security but are worried about paying too much for something that they are unsure about.

The services provided are vulnerability assessment (“There are hackers outside your office on the North side!”), vulnerability remediation (“And I’ve blocked their wireless signals! Yippie!”), and regulatory compliance (“And here is a report you can show SOX auditors to prove that you’ll done all that’s humanly possible to protect customer data”). Each of these three is a separate offering that is priced independently.

Over at NetworkWorld, FarPoint Group’s Craig Mathias gushes breathlessly over this offering:

this was a smack-myself-in-the-forehead moment – why not provision IDS/IPS as a service, effectively leasing the infrastructure and offering the rest as a managed service? This is positively brilliant, and AirTight Networks has now done precisely this with their new SpectraGuard Online service, launched today.

[…]

I’ve seen a number of security-as-a-service offerings for small wireless LANs, but this is the first time I’ve seen such a service for large organizations. And I’m willing to bet this business model could become very popular indeed. As WLAN technology continues to change rapidly, and as one is never, ever “done” when it comes to security, AirTight has broken some important new ground here. The question, of course, is how this model might extend to other elements of network infrastructure. And it just might.

See the full press release for more details of this news. See PuneTech wiki’s AirTight page for a quick overview of AirTight.

Building EKA – The world’s fastest privately funded supercomputer

Eka, built by CRL, Pune is the world’s 4th fastest supercomputer, and the fastest one that didn’t use government funding. This is the same supercomputer referenced in Yahoo!’s recent announcement about cloud computing research at the Hadoop Summit. This article describes some of the technical details of Eka’s design and implementation. It is based on a presentation by the Eka architects conducted by CSI Pune and MCCIA Pune.

Interconnect architecture

The most important decision in building a massively parallel supercomputer is the design of how the different nodes (i.e. processors) of the system are connected together. If all nodes are connected to each other, parallel applications scale really well (linear speedup), because communication between nodes is direct and has no bottlenecks. But unfortunately, building larger and larger such systems (i.e. ones with more and more nodes) becomes increasingly difficult and expensive because the complexity of the interconnect increases as n2. To avoid this, supercomputers have typically used sparse interconnect topologies like Star, Ring, Torus (e.g. IBM’s Blue Gene/L), or hypercube (Cray). These are more scalable as far as building the interconnect for really large numbers of nodes is concerned. However, the downside is that nodes are not directly connected to each other and messages have to go through multiple hops before reaching the destination. Here, unless the applications are designed very carefully to reduce message exchanges between different nodes (especially those that are not directly connected to each other), the interconnect becomes a bottleneck for application scaling.

In contrast to those systems, Eka uses an interconnect designed using concepts from projective geometry. The details of the interconnect are beyond the scope of this article. (Translation: I did not understand the really complex mathematics that goes on in those papers. Suffice it to say that before they are done, fairly obscure branches of mathematics get involved. However, one of these days, I am hoping to write a fun little article on how a cute little mathematical concept called Perfect Difference Sets (first described in 1938) plays an important role in designing supercomputer interconnects over 50 years later. Motivated readers are encouraged to try and see the connection.)

To simplify – Eka uses an interconnect based on Projective Geometry concepts. This interconnect gives linear speedup for applications but the complexity of building the interconnect increases only near-linearly.

The upshot of this is that to achieve a given application speed (i.e. number of Teraflops), Eka ends up using fewer nodes than its compatriots. This means it that it costs less and uses less power, both of which are major problems that need to be tackled in designing a supercomputer.

Handling Failures

A computer that includes 1000s of processors, 1000s of disks, and 1000s of network elements soon finds itself on the wrong side of the law of probability as far as failures are concerned. If one component of a system has a MTBF (mean time between failures) of 10000 hours, and the system has 3000 components, then you can start expecting things to fail once every 10 hours. (I know that the math in that sentence is probably not accurate, but ease of understanding trumps accuracy in most cases.)

If an application is running on 500 nodes, and has been running for the last 20 hours, and one of the nodes fails, the entire application has to be restarted from scratch. And this happens often, especially before an important deadline.

A simple solution is to save the state of the entire application every 15 minutes. This is called checkpointing. When there is a failure, the system is restarted from the last checkpoint and hence ends up losing only 15 minutes of work. While this works well enough, it can get prohibitively expensive. If you spend 5 minutes out of every 15 minutes in checkpointing your application, then you’ve effectively reduced the capacity of your supercomputer by 33%. (Another way of saying the same thing is that you’ve increased your budget by 50%.)

The projective geometry architecture also allows for a way to partition the compute nodes in such a way that checkpointing and status saving can be done only for a subset of the nodes involved. The whole system need not be reset in case of a failure – only the related subset. In fact, with the projective geometry architecture, this can be done in a provably optimal manner. This results in improved efficiency. Checkpoints are much cheaper/faster, and hence can be taken more frequently. This means that the system can handle failures much better.

Again, I don’t understand the details of how projective geometry helps in this – if someone can explain that easily in a paragraph or two, please drop me a note.

The infrastructure

The actual supercomputer was built in just 6 weeks. However, other aspects took much longer. It took an year of convincing to get the project funded. And another year to build the physical building and the rest of the infrastructure. Eka uses

  • 2.5MW of electricity
  • 400ton cooling capacity
  • 10km of electrical cabling
  • 10km of ethernet cabling
  • 15km of infiniband cabling

The computing infrastructure itself consists of:

  • 1800 blades, 4 cores each. 3Ghz for each core.
  • HP SFS clusters
  • 28TB memory
  • 80TB storage. Simple SATA disks. 5.2Gbps throughput.
  • Lustre distributed file-system
  • 20Gbps infiniband DDR. Eka was on the cutting edge of Infiniband technology. They sourced their infiniband hardware from an Israeli company and were amongst the first users of their releases – including beta, and even alpha quality stuff.
  • Multiple Gigabit ethernets
  • Linux is the underlying OS. Any Linux will work – RedHat, SuSe, your favorite distribution.

Its the software, stupid!

One of the principles of the Eka project is to be the one-stop shop for tackling problems that require huge amounts of computational powers. Their tagline for this project has been: from atoms to applications. They want to ensure that the project takes care of everything for their target users, from the hardware all the way up to the application. This meant that they had to work on:

  • High speed low latency interconnect research
  • System architecture research
  • System software research – compilers etc.
  • Mathematical library development
  • Large scientific problem solving.
  • Application porting, optimization and development.

Each of the bullet items above is a non-trivial bit of work. Take for example “Mathematical library development.” Since they came up with a novel architecture for the interconnect for Eka, all parallel algorithms that run on Eka also have to be adapted to work well with the architecture. To get the maximum performance out of your supercomputer, you have to rewrite all your algorithms to take advantages of the strengths of your interconnect design while avoiding the weaknesses. Requiring users to understand and code for such things has always been the bane of supercomputing research. Instead, the Eka team has gone about providing mathematical libraries of the important functions that are needed by applications specifically tailored to the Eka architecture. This means that people who have existing applications can run them on Eka without major modifications.

Applications

Of the top 10 supercomputers in the world, Eka is the only system that was fully privately funded. All other systems used government money, so all of them are for captive use. This means that Eka is the only system in the top 10 that is available for commercial use without strings attached.

There are various traditional applications of HPC (high-performance computing) (which is what Eka is mainly targeted towards):

  • Aerodynamics (Aircraft design). Crash testing (Automobile design)
  • Biology – drug design, genomics
  • Environment – global climate, ground water
  • Applied physics – radiation transport, supernovae, simulate exploding galaxies.
  • Lasers and Energy – combustion, ICF
  • Neurobiology – simulating the brain

But as businesses go global and start dealing with huge quantities of data, it is believed that Eka-like capabilities will soon be needed to tackle these business needs:

  • Integrated worldwide supply chain management
  • Large scale data mining – business intelligence
  • Various recognition problems – speech recognition, machine vision
  • Video surveillance, e-mail scanning
  • Digital media creation – rendering; cartoons, animation

But that is not the only output the Tatas expect from their investment (of $30 million). They are also hoping to tap the expertise gained during this process for consulting and services:

  • Consultancy: Need & Gap Analysis and Proposal Creation
  • Technology: Architecture & Design & Planning of high performance systems
  • Execution: Implement, Test and Commissioning of high performance system
  • Post sales: HPC managed services, Operations Optimization, Migration Services
  • Storage: Large scale data management (including archives, backups and tapes), Security and Business Continuity
  • Visualization: Scalable visualization of large amounts of data

and more…

This article is based on a presentation given by Dr. Sunil Sherlekar, Dr. Rajendra Lagu, and N. Seetha Rama Krishna, of CRL, Pune, who built Eka. For details of their background, see here. However, note that I’ve filled in gaps in my notes with my own conjectures, so errors in the article, if any, should be attributed to me.

Understanding RPO and RTO in backups

This post is based on an article posted by Jaspreet Singh on the Druvaa Blog. Druvaa is a Pune-based startup based on Continuous data protection (CDP) technology.

Recovery Point Objective (RPO) and Recovey Time Objective (RTO) are some of the most important parameters of a disaster recovery or data protection plan. These objectives guide the enterprises in choosing an optimal data backup (or rather restore) plan.

RPO – Recovery Point Objective (wikipedia)

“Recovery Point Objective (RPO) describes the amount of data lost – measured in time. Example: After an outage, if the last available good copy of data was from 18 hours ago, then the RPO would be 18 hours.”

In other words it is the answer to the question – Up to what point in time can the data be recovered ?.

RTO – Recovery Time Objectives (wikipedia)

“The Recovery Time Objective (RTO) is the duration of time and a service level within which a business process must be restored after a disaster in order to avoid unacceptable consequences associated with a break in continuity.

[…]

It should be noted that the RTO attaches to the business process and not the resources required to support the process.”

In another words it is the answer to the question – How much time did you take to recover after notification of a business process disruption ?

The RTO/RPO and the results of the Business Impact Analysis (BIA) in its entirety provide the basis for identifying and analyzing viable strategies for inclusion in the business continuity plan. Viable strategy options would include any which would enable resumption of a business process in a time frame at or near the RTO/RPO. This would include alternate or manual workaround procedures and would not necessarily require computer systems to meet the objectives.

There is always a gap between the actuals (RTA/RPA) and objectives introduced by various manual and automated steps to bring the business application up. These actuals can only be exposed by disaster and business disruption rehearsals.

Some Examples –

Traditional Backups

In traditional tape backups, if your backup plan takes 2 hours for a scheduled backup at 0600 hours and 1800 hours, then a primary site failure at 1400 hrs would leave you with an option of restoring from the 0600 hrs backup which means RPA of 8 hours and 2 hours RTA.

Continuous Replication

Replication provides higher RPO guarantees as the target system contains a mirrored image of the source. The RPA values depend upon how fast the changes are applied and if the replication is synchronous or asynchronous. RPO is dependent only on how soon the data on target/replicated site can be made available to the application.

About Druvaa Replicator

Druvaa Replicator is a Continuous Data Protection and Replication (CDP-R) product which near-synchronously and non-disruptively replicates changes on production sever to target site and provides point-in-time snapshots for instant data access.

The partial synchronous replication ensures that the data is written to a local or remote cache (caching server) before its application can write locally. This ensures up to 5 sec RPO guarantees . CDP technology (still beta) enables up to 1024 snapshots (beta) at that target storage which helps the admin to access current or any past point-in-time consistent image of data instantly, ensuring under 2 sec RTO.

More Information – http://www.druvaa.com/products/replicator/