Category Archives: Overviews

Type in 15 languages using Lipikaar – winner of Manthan award

Pune-based startup Lipikaar, creator of software that allows typing of 15+ Indian languages using a standard English keyboard, and which was one of the companies selected for in this July’s proto.in, is once again in the news – for winning the Manthan award.

The Manthan Award is a first of its kind initiative to recognize the best practices in e-Content. It was launched on 2004, by Digital Empowerment Foundation in partnership with World Summit Award and American India Foundation. The Manthan Award South Asia 2008 had received 284 nominations from 8 countries across 13 categories. Participating countries were India, Afghanistan, Bangladesh, Bhutan, Nepal, Pakistan, Sri Lanka. The award cites Lipikaar for its coverage (15+ languages), its ease of use, and it applicability to the masses.

Here is a profile of Lipikaar from the PuneTech wiki:

Lipikaar sells software tools that allow a simple method for typing in Hindi (and 15 other Indic languages) on an ordinary keyboard. It requires no learning, and within a few seconds you will be able to type in Hindi any word that you can imagine.

Lipikaar aims to be different from all other transliteration competitors due to its use of patented technology which allows anybody to enter text without requiring any knowledge of English, which is a requirement for most other method

[edit] Features

[edit] Desktop Software

The Lipikaar technology is available as downloadable desktop software for Windows. It works as a keyboard overlay, which means that once installed, it allows you to input Indic language text into any application – Microsoft Office (i.e. Word, Excel), all websites, chat and e-mail.

[edit] Firefox Addon

Lipikaar is also available as a Firefox add-on that allows the user to enter Indic text to create emails, blogs, scraps, comments, chats and search in your favourite language on any website. Unlike the desktop software, this add-on is free.

[edit] Web publisher services

Webmasters wishing to allow local language text input on their website can avail of Lipikaar’s services, and they will work with the webmasters to integrate their technology in the website.

[edit] Languages Supported

Lipikaar supports 15 languages – Arabic, Hindi, Marathi, Sanskrit, Nepali, Konkani, Sindhi, Bengali, Gujarati, Punjabi, Tamil, Telugu, Kannada, Malayalam and Urdu.

[edit] Links

[edit] Awards

[edit] Articles

[edit] People

Stop Virtual Machine Sprawl with Colama

This is a product-pitch for Colama, a product built by Pune-based startup Coriolis. For the basics of server virtualization, see our earlier guest posts: Introduction to Server Virtualization, and Why do we need server virtualization. Virtualization is fast emerging as a game-changing technology in the enterprise computing space, and Colama is trying to address an interesting new pain-point that making its presence felt. 

Virtualization is fast becoming an accepted standard for the IT infrastructure. While it comes as a boon to the development and QA communities, the IT practitioners are dealing with the pressing issue of virtual machine sprawl that surfaces due to adhoc and hence uncontrolled adoption of virtualization. While describing the problem and its effects, this paper outlines a solution called Colama, as offered by Coriolis Technologies.

 

Virtual machines have been slipping in under the covers everywhere in the IT industry. Software developers like virtual machines because they can easily mimic a target environment. QA engineers prefer virtual machines because they can simultaneously test the software on different configurations. Support engineers can ensure reproducibility of an issue by pointing to an entire machine rather than detailing on the individual steps and/or configuration requirement on a physical host. In many cases, adoption of virtual machines has been primarily driven by users’ choice rather than any coherent corporate strategy. The ad-hoc and uncontrolled use of virtual machines across the organization has resulted in to a problem called Virtual Machine sprawl, which has become critical for today’s IT administrators.

Virtual machine sprawl is an unpleasant side effect of server virtualization and its near exponential growth over the years. Here are the symptoms:

  • At any given point, the virtual machines running in the organization are un-accounted for. Information like who created them and when, who used them, what configuration/s they have, what licensed software they use, whether security patches have been applied, whether the data is backed up etc are not maintained and tracked anywhere.
  • Most commonly, people freely copy each other’s virtual machines and no usage tracking and access control is in place.
  • Because of cheap storage, too many identical or similar copies of the same machines are floating across the organization. But reduction in storage cost does not reduce the operational cost of storage management, search, backup, etc. Data duplication and redundancy is a problem even if storage is plentiful.
  • Because there is no mechanism to keep track of why an image was created, it is hard to figure out when it should be destroyed. People tend to forget what each image was created for, and keep them around just in case they are needed. This increases the storage requirements.
  • Licensing implications: Virtual machine copied from one with a licensed (limited) software needs to tracked for its life span in order to put a control on the use of licensed software.
  •  

    There are many players in the industry who address this problem. Most of the virtual lab management products are tied to one specific virtualization technology. For example, the VMWare Lab Manager works for only VMWare virtualization technology. In a heterogeneous virtualization environment that is filled with Xen, VMWare, VirtualBox and Microsoft virtual machines, such an approach falls short.

    Colama is Coriolis Technologies solution to address this problem. Colama manages the life cycle of virtual machines across an organization. While tracking and virtual machines, Colama is agnostic to the virtualization technology.

     

    Here are some of the features of Colama:

  • Basic SCM for virtual machine: You can Checkin/checkout/clone/tag/comment virtual machine/s for tracking revisions of virtual machine.
  • Image inspection: Colama provides automatic inspection, extraction and display of image-related data, like OS version, software installed, etc and also facilitates “search” on the extracted data. For example, you can search for the virtual machines that have got Windows 2003 server with service pack 4 and Oracle 10g installed!
  • Web based interface: Navigate through the virtual machine repository of your organization using just a web browser.
  • Ownership and access control: • Create a copy of a machine for yourself and decide who can use “your” machine.
  • De-duplication: Copying/Cloning virtual machines happens without any additional storage requirement.
  • Physical machine provisioning (lab management): Spawn a virtual machine of your choice on a physical host available and ‘ready’.
  • Management reports: auditing and compliance User activity reports, virtual machine history, health information (up/down time) of virtual machines, license reports of the virtual machines etc.
  • Virtualization agnostic: works for virtual machines from all known vendors. 
  • Please note: This product-pitch, by Barnali Ganesh, co-founder of Coriolis, is featured on PuneTech because the editors found the technology interesting (or more accurately, the pain-point which it is addressing). PuneTech is a purely non-commercial website and does not take any considerations (monetary or otherwise) for any of the content on the site. If you would like your product to be featured on the front page, send us an article and we’ll publish it if we fell it’s interesting to our readers. You can always add a page to the PuneTech wiki by yourself, as long as you follow the “No PR” guidelines.

    Pune-based Harbinger wins award for learning software

    Pune-based Harbinger group, creator of the Raptivity e-learning software, has just won an award from the “Chief Learning Officer” magazine for “Clinical Challenge,” an innovative online learning program that it created with Philips Healthcare, designed to pioneer the use of gaming in healthcare education. This program tries to combine entertainment with challenge to test knowledge on complex clinical subjects. 

    From the press release:

    As can be seen on the Philips Learning Center web site www.theonlinelearningcenter.com, complex subjects such as anatomy, physiology, pathology and imaging specific challenges have been made engaging and interesting using a variety of game formats including popular TV game shows, common board games, challenging brainteasers, immersive simulation games and so forth. The tools have also provided a means to present advanced clinical images in a format that allows for exploration via drag and drop, hot spots and auto-solve techniques. 

    Use of games for learning appears to be one of the areas that Raptivity is focusing on, earlier having partnered with a US non-profit organization, the Entomological Foundation, to design an develop games based online learning software products targeted towards 3rd and 4th grade children in the US. The games teach the children about “the exciting world of insects and their role in our environment.”

    The main features of Raptivity:

    • Leverage Raptivity library of pre-built interactions based on best practices in instructional design.
    • Completely customize each interaction.
    • Create a single Flash File for your eLearning interactivity.
    • Successfully track completion status, score and responses for each interactivity.
    • Packages that allow easy creation of 3D content, learning games, converting videos to interactive videos, and simulations

    Harbinger are also the creators of Flockpod, a web-service that allows web publishers to create a space on their webpage where users can interact with each other right on-the-spot, without leaving the page.

    Stop terrorists from hacking into your company computers with AirTight networks?

    AirTight Logo

    In a report titled “Wi-Fi networks extremely vulnerable to terror attacks,” the Economic Times points out that:

     

    The recent incident involving US national Kenneth Haywood, whose Internet Protocol (IP) address was allegedly used to send the terror e-mail prior to the Ahmedabad serial blasts, should be regarded as a wake up call. While this incident of wireless hacking took security agencies by surprise, lakhs of individuals and companies are actually exposed to a similar risk. Incidents of such hacking are common, but go unreported since they may not have such grave implications.

    The police version of the Haywood incident, as reported in the newspapers, is that suspected criminals allegedly hacked into the Wi-Fi network of his laptop and used it to send the terror e-mail. Prior to this hacking, Mr Haywood is said to have complained of high browsing bills. If this is to be believed, then one possibility is that Haywoood may have left his access point open. The suspected terrorist could then have hooked on to this access point and sent the email, which then showed Haywood’s IP address as the originator. This is regarded, in hacking terminology, as stealing of bandwidth while impersonating Haywood.

    Wi-Fi hacking is an even bigger a problem for companies that have many employees who take their laptops all over the place and might come back infected, or who have a number of access points that can be easy targets if not secured properly. This is the market that Pune-based AirTight Networks is going after:

    Hemant Chaskar, Airtight’s technology director, explained: “Companies earlier used firewalls, which prevented or regulated data access between internal systems and the external world. With the adoption of wireless, firewalls can be bypassed, exposing internal systems to free external access. External devices can access internal enterprise networks, while internal devices can also connect to networks outside the company’s premises in the absence of adequate security measures.

    There are a few different capabilities that a company needs to be able to tackle this threat. First, being able to detect that wireless intrusion is happening. Second, being able to phyisically (i.e. goegraphically) locate exactly where the threat is coming from. Third, being able to do something about it. And finally, for the sake of compliance with government laws, being able to generate appropriate reports proving that you took all the appropriate steps to keep your company’s data secure from hackers. This last one is required whether you are worried about hackers or not, and is a huge pain.

    AirTight provides all these facilities and then goes one step further, which makes it unique. At $20000 a pop, most small companies would balk at the price of all the infrastructure required for achieving all this. So AirTight provides WiFi security as an online service – you simply install a few sensors in your company. Everything else is on AirTight’s servers. So you just have to pay a small monthly fee, as low as $60 per month. And you get full security from wi-fi hacking, and you keep the government happy with nice compliance reports.

    For a more details of AirTight’s products, see the PuneTech wiki profile of AirTight.

    Reblog this post [with Zemanta]

    Pune startup presents at DEMOfall, San Diego


    Pune-based startup Maverick Mobile launched their latest product, Maverick Secure Mobile (MSM), at the DEMO conference earlier this week. DEMO is one of the premier conferences for new startups to launch their products. A video of their presentation is available from the DEMO site.

    Maverick Mobile is a Pune-based mobile services and products company. Maverick develops mobile applications (for example a mobile security application, and a mobile dictionary), mobile games (about a dozen of them), and also mobile content (mp3s, music videos, ringtones, wallpapers etc.)

    [edit]Products

    [edit]Applications

    [edit]Maverick Secure Mobile

    Maverick Secure Mobile is a security application that protects your handset as well as the data stored in it. Using MSM, one can retrieve the entire phone book remotely from the stolen / lost phone. MSM can also send thief activity reports via SMS on the reporting number. The owner of the device can lock/hang the phone remotely. MSM can be used in case of theft, or for parental control.

    This product was launched at DEMOfall conference, September 2008, in San Diego.

    [edit]YO SMS

    Yo SMS is a peer to peer application which allows a user to attach backgrounds, sounds, audibles, smilies to the regular SMS messages.

    [edit]Maverick Dictionary

    A dictionary of more than 1,45,000 words, with a user interface customized for mobile usage.

    [edit]Mobile Games

    Maverick has developed about a dozen mobile games, including their own versions of classics like Sudoku, Poker, Blackjack etc.

    [edit]Mobile Content

    • In India, Maverick mobile is a first company to launch pre loaded memory cards containing Mp3 songs, video songs video scenes, ring tones, wallpapers, games in retail market.
    • Maverick has legal tie up with various film distribution houses for selling Bollywood content using through Memory cards.
    • In the span of 6 months maverick has built more than 50,000 customer base in different states of India.
    • Maverick has strong distribution network of more than 130 Distributors & 1000 retailers.

    [edit]Links

    [edit]People

    Supply Chain Management (SCM) Overview and SCM Development in Pune

    by Amit Paranjape

    Have you ever wondered how much planning and co-ordination it takes to roll out Indicas smoothly off the Tata Motors assembly line? Consider this – A typical automobile consists of thousands of parts sourced from hundreds of suppliers, and a manufacturing and assembly process that consists of dozens of steps. All these different pieces need to tie-in in an extremely well synchronized manner to realize the end product.

    How is this achieved? Well, like most complex business challenges, this too is addressed by a combination of efficient business processes and Information Technology. The specific discipline of software that addresses these types of problems is known as “Supply Chain Management” (SCM).

    Pune has a strong manufacturing base and leads the nation in automotive and industrial sectors. Companies such as Tata Motors, Bajaj Auto, Kirloskar Oil Engines, Cummins, and Bharat Forge are headquartered in Pune. The manufacturing industry has complex production and materials management processes. This has resulted in a need for effective systems to help in decision making in these domains. The discipline that addresses these decision making processes is referred to as ‘Advanced Planning & Scheduling’ (acronym: APS). APS is an important part of SCM. This article briefly discusses some of the basic concepts of SCM/APS, their high-level technology requirements, and mentions some Pune based companies active in this area. Note, given Pune’s manufacturing background, it is no accident that it is also a leader in SCM related software development activities in India.

    Introduction to SCM

    Supply chain management (SCM) is the process of planning, implementing and controlling the operations of the supply chain as efficiently as possible. Supply Chain Management spans all movement and storage of raw materials, work-in-process inventory, and finished goods from point-of-origin to point-of-consumption. SCM Software focuses on supporting the above decision making business processes that cover demand management, distribution, logistics, manufacturing and procurement. APS specifically deals with the manufacturing processes. Note, SCM needs to be distinguished from ‘ERP’ that deals with automating business process workflows and transactions across the entire enterprise.

    Decision Making’ is vital in SCM and leads to a core set of requirements for SCM software. Various decision making and optimization strategies are widely used. These include Linear Programming, Non-Linear Programming, Heuristics, Genetic Algorithms, Simulated Annealing, etc. These decision making algorithms are often implemented in C or C++. (In some cases, FORTRAN still continues to be leveraged for specific mathematical programming scenarios.) Some solutions use standard off-the-shelf optimization packages/solvers such as ILOG Linear Programming Solver as a component of the overall solution.

    Consider a typical process/paint manufacturer such as Asian Paints. They make thousands of different end products that are supplied to hardware stores from hundreds of depots and warehouses, to meet the end consumer demand. The products are manufactured in various plants and then shipped to the warehouses in trucks and rail-cars. Each plant has various manufacturing constraints such as 1) a given batch mixer can only make certain types of paints, 2) to reduce mixer cleaning requirements, different color paints need to be produced in the order of lighter to darker shades. Now, to make it more interesting, there are many raw material constraints! Certain raw materials can only be procured with a long lead time. An alternative raw material might be available earlier, but it is very expensive! How do we decide? How many decisions are we talking about? And remember, these decisions have to be synchronized, since optimizing any one particular area in isolation can lead to extremely bad results for the other, and an overall sub-optimal solution. In optimization language, you can literally end up dealing with millions of variables in solving such a problem.

    SCM software also has a fairly specific set of GUI requirements. A typical factory planner will deal with thousands of customer orders, machines, raw material parts and processing routings. Analyzing and acting on this information is often challenging. A rich role based user workflow for the planner is a critical. GUIs are usually browser-based with custom applets wherever functionality richness is needed. In very specific cases, ‘thick’ desk-top based clients (typically developed in Java) are also required for some complex workflows. Alerts and problem based navigation are commonly used to present large amounts of information in a prioritized, actionable format. Rich analytical OLAP type capabilities are also required in many cases.

    Integration is an important part of SCM software architecture. SCM software typically interacts with various Enterprise IT systems such as ERP, CRM, Data-Warehouses, and other legacy systems. Many inter-enterprise collaboration workflows also require secure integration with customer/partner IT systems via the internet. Both batch and real-time integration workflows are required. Real-time integration can be synchronous or asynchronous. Batch data can sometimes (e.g. in Retail SCM) run into terabytes and lead to batch uploads of millions of lines. Loading performance and error checking becomes very important.
    Consider a computer manufacturer such as Dell. They are renowned for pioneering the rapid turnaround configure-to-order business. Dell assembly plants sources material from different suppliers. In order to get maximum supply chain efficiencies, they actively manage raw material inventory levels. Any excess inventory results in locked-in capital and a reduction in Return on Investment (ROI). In order to achieve effective raw material inventory management, Dell needs to share its production and material requirements data with its suppliers so that they can supply parts at the right time. To achieve this, there needs to be a seamless real-time collaboration between the Dell procurement planner and the suppliers. Data is shared in a secured fashion via the internet and rapid decisions such as changes to quantity, selecting alternate parts, selecting alternate suppliers are made in real-time.

    SCM in Pune

    Most of the large manufacturing companies in Pune leverage some kind of SCM software solutions. These are typically sourced from SCM software industry leaders such SAP, i2 and Oracle. In some cases, home grown solutions are also seen.

    Many small and med-sized software product companies in Pune are focused on the SCM domain. Some offer comprehensive end-to-end solutions, while others focus on specific industry niche areas. Note that by its very nature, SCM processes are fairly complex and specifically tailored to individual companies. As a result many SCM products are highly customizable and require varying degrees of onsite development. This leads to software services as an integral part of most of these SCM product companies.

    Pune based FDS Infotech has been developing SCM and ERP software suite for over a decade. They have a wide customer base in India. A representative example of their solution can be seen at Bharat Forge. Here their SCM/APS solution is being used to maximize the efficiency of the die shop. This is achieved through better schedule generation that considers all the requisite manpower, machine and raw-material constraints.

    Entercoms, also based in Pune, is primarily focused on the Service Parts Management problem in SCM. Their customers include Forbes Marshall and Alfa-Laval.

    SAS, a global leader in business intelligence and data-analytics software also develops various SCM solutions, with specific focus on the Retail segment. Their Retail solution focuses on a wide variety of problems such as deciding the right merchandizing strategies, planning the right assortments for the stores, forecasting the correct demand, etc. They boast a wide global customer base. Their Pune R&D center is involved in multiple products, including their Retail solution.

    In addition to these three, many other small SCM software companies in Pune work on specific industry niches.

    About the Author

    Amit Paranjape is one of the driving forces behind PuneTech. He has been in the supply chain management area for over 12 years, most of it with i2 in Dallas, USA. He has extensive leadership experience across Product Management/Marketing, Strategy, Business Development, Solutions Development, Consulting and Outsourcing. He now lives in Pune and is an independent consultant providing consulting and advisory services for early stage software ventures. Amit’s interest in other fields is varied and vast, including General Knowledge Trivia, Medical Sciences, History & Geo-Politics, Economics & Financial Markets, Cricket.

    Reblog this post [with Zemanta]

    Designing for Usability

    (Manas Garg, who had organized the recent POCC meeting on Usability, was inspired to write this article after attending that session.)

    About the author: Manas is interested in a variety of things like psychology, philosophy, sociology, photography, movie making etc. But since there are only 24 hours in a day and most of it goes in sleeping and earning a living, he amuses himself by writing software, reading a bit and sharing his thoughts.
    About the author: Manas is interested in a variety of things like psychology, philosophy, sociology, photography, movie making etc. But since there are only 24 hours in a day and most of it goes in sleeping and earning a living, he amuses himself by writing software, reading a bit and sharing his thoughts.

    Ultimately, we build all the systems so that they are used at one point of time, and the system that we build will be used only if they are usable. If a system is not usable, how will someone use it?

    It is very common to come across doors that are supposed to be pushed to open (and there is a sticker close to the handle bar which says PUSH) but most of the people will pull it. Also, it is very common to see doors that are supposed to be pulled but people end up pushing them. It’s not really a problem with the people. It’s a problem with the design of the door. It’s a usability issue. There are doors that are pushed when they are expected to be pushed. It’s all in the design, isn’t it?

    When Gmail was launched, it was an instant hit. Even though it didn’t do as many things as other email services did at that point of time (like support for all browsers, drafts etc) but it was still a revolution by itself. And the reason was simple – usability. What was the need of Gnome project? Wasn’t it the usability of GNU/Linux for non-programmers?

    Two levels of system usability

    1. Conceptual usability

    A system is not an island. It is connected with various other entities by various means. And it shares one relationship or the other with those entities. A conceptual model effectively captures these various entities and what kind of relationship our system has with these external entities.

    The model of Britanica Encyclopedia is that a group of experts together create an encyclopedia over important topics which can then be read by people. The model of Wikipedia is that people (and that includes everyone in the world) can help in writing the encyclopedia that everyone can read. The model of Google Knol is that experts can write articles on specific subjects which everyone can read. The readers can suggest improvements that the original authors can incorporate.

    The conceptual model must be made usable. The entire system will eventually be built on top of the conceptual model.

    2. Interface usability

    So, we have settled that a system maintains some relationships with some external entities. This relationship is exposed through interfaces and we need to think through the usability of those interfaces.

    Gmail is an excellent example of interface usability. As I mentioned earlier, there were several email services when Gmail was introduced but its interface usability was far superior. This is an example of how the same conceptual model can be presented to the user with completely different interfaces.

    The conceptual usability is a must to help the user understand the system. Wikipedia is an encyclopedia which can be read and modified by anyone. And interface usability is a must to help the user *do* something with the system. On each page of Wikipedia, everywhere your eye goes, you’ll find a link to edit the page or a section thereof. It almost invites you to modify the page.

    Methodology and Evaluation

    There are ways/methodologies to design usable systems. The usable systems are still created by people whose thought process naturally evaluates the usability at every step. However, these methods can make the system designer better grounded in the real world and make him/her more efficient too.

    Much like a system cannot be declared as functional till it is tested for the same, usability of the system has to be tested with equal vigor. After all, if the system is not usable, who is going to use it (even if it is functional)? And there is only one way to test the usability of the system – by letting target users use it and that too without providing any guidance. Closely observing the users can be an eye opener many times.

    And the system designer/developer cannot verify the system usability. In the course of developing, the developer has learnt too much about the system and knows exactly how it works. However, the user would not have so much of knowledge about the system and may not attempt to do things in the same ways. If you don’t believe me, go and re-read the doors example.

    And what else?

    Ultimately, it is possible to educate people how the system works. But the willingness of people to be educated will depend on why they need to use the system and how often. So, keep it as the last resort.

    And first thing is still the last thing. We need to create usable systems because nobody uses unusable systems. And very few systems are usable by accident. Most of the usable systems are developed with usability as a focus.

    Further reading

    Technology overview – Druvaa Continuous Data Protection

    Druvaa, a Pune-based product startup that makes data protection (i.e. backup and replication) software targeted towards the enterprise market, has been all over the Indian startup scene recently. It was one of the few Pune startups to be funded in recent times (Rs. 1 crore by Indian Angel Network). It was one of the three startups that won the TiE-Canaan Entrepreneural challenge in July this year. It was one of the three startups chosen to present at the showcase of emerging product companies at the NASSCOMM product conclave 2008.

    And this is not confined to national boundaries. It is one of only two (as far as I know) Pune-based companies to be featured in TechCrunch (actually TechCrunchIT), one of the most influential tech blogs in the world (the other Pune company featured in TechCrunch is Pubmatic).

    Why all this attention for Druvaa? Other than the fact that it has a very strong team that is executing quite well, I think two things stand out:

    • It is one of the few Indian product startups that are targeting the enterprise market. This is a very difficult market to break into, both, because of the risk averse nature of the customers, and the very long sales cycles.
    • Unlike many other startups (especially consumer oriented web-2.0 startups), Druvaa’s products require some seriously difficult technology.

    Rediff has a nice interview with the three co-founders of Druvaa, Ramani Kothundaraman, Milind Borate and Jaspreet Singh, which you should read to get an idea of their background, why they started Druvaa, and their journey so far. Druvaa also has a very interesting and active blog where they talk technology, and is worth reading on a regular basis.

    The rest of this article talks about their technology.

    Druvaa has two main products. Druvaa inSync allows enterprise desktop and laptop PCs to be backed up to a central server with over 90% savings in bandwidth and disk storage utilization. Druvaa Replicator allows replication of data from a production server to a secondary server near-synchronously and non-disruptively.

    We now dig deeper into each of these products to give you a feel for the complex technology that goes into them. If you are not really interested in the technology, skip to the end of the article and come back tomorrow when we’ll be back to talking about google keyword searches and web-2.0 and other such things.

    Druvaa Replicator

    Overall schematic set-up for Druvaa Replicator
    Overall schematic set-up for Druvaa Replicator

    This is Druvaa’s first product, and is a good example of how something that seems simple to you and me can become insanely complicated when the customer is an enterprise. The problem seems rather simple: imagine an enterprise server that needs to be on, serving customer requests, all the time. If this server crashes for some reason, there needs to be a standby server that can immediately take over. This is the easy part. The problem is that the standby server needs to have a copy of the all the latest data, so that no data is lost (or at least very little data is lost). To do this, the replication software continuously copies all the latest updates of the data from the disks on the primary server side to the disks on the standby server side.

    This is much harder than it seems. A simple implementation would simply ensure that every write of data that is done on the primary is also done on the standby storage at the same time (synchronously). This is unacceptable because each write would take unacceptably long and this would slow down the primary server too much.

    If you are not doing synchronous updates, you need to start worrying about write order fidelity.

    Write-order fidelity and file-system consistency

    If a database writes a number of pages to the disk on your primary server, and if you have software that is replicating all these writes to a disk on a stand-by server, it is very important that the writes should be done on the stand-by in the same order in which they were done at the primary servers. This section explains why this is important, and also why doing this is difficult. If you know about this stuff already (database and file-system guys) or if you just don’t care about the technical details, skip to the next section.

    Imagine a bank database. Account balances are stored as records in the database, which are ultimately stored on the disk. Imagine that I transfer Rs. 50,000 from Basant’s account to Navin’s account. Suppose Basant’s account had Rs. 3,00,000 before the transaction and Navin’s account had Rs. 1,00,000. So, during this transaction, the database software will end up doing two different writes to the disk:

    • write #1: Update Basant’s bank balance to 2,50,000
    • write #2: Update Navin’s bank balance to 1,50,000

    Let us assume that Basant and Navin’s bank balances are stored on different locations on the disk (i.e. on different pages). This means that the above will be two different writes. If there is a power failure, after write #1, but before write #2, then the bank will have reduced Basant’s balance without increasing Navin’s balance. This is unacceptable. When the database server restarts when power is restored, it will have lost Rs. 50,000.

    After write #1, the database (and the file-system) is said to be in an inconsistent state. After write #2, consistency is restored.

    It is always possible that at the time of a power failure, a database might be inconsistent. This cannot be prevented, but it can be cured. For this, databases typically do something called write-ahead-logging. In this, the database first writes a “log entry” indicating what updates it is going to do as part of the current transaction. And only after the log entry is written does it do the actual updates. Now the sequence of updates is this:

    • write #0: Write this log entry “Update Basant’s balance to Rs. 2,50,000; update Navin’s balance to Rs. 1,50,000” to the logging section of the disk
    • write #1: Update Basant’s bank balance to 2,50,000
    • write #2: Update Navin’s bank balance to 1,50,000

    Now if the power failure occurs between writes #0 and #1 or between #1 and #2, then the database has enough information to fix things later. When it restarts, before the database becomes active, it first reads the logging section of the disk and goes and checks whether all the updates that where claimed in the logs have actually happened. In this case, after reading the log entry, it needs to check whether Basant’s balance is actually 2,50,000 and Navin’s balance is actually 1,50,000. If they are not, the database is inconsisstent, but it has enough information to restore consistency. The recovery procedure consists of simply going ahead and making those updates. After these updates, the database can continue with regular operations.

    (Note: This is a huge simplification of what really happens, and has some inaccuracies – the intention here is to give you a feel for what is going on, not a course lecture on database theory. Database people, please don’t write to me about the errors in the above – I already know; I have a Ph.D. in this area.)

    Note that in the above scheme the order in which writes happen is very important. Specifically, write #0 must happen before #1 and #2. If for some reason write #1 happens before write #0 we can lose money again. Just imagine a power failure after write #1 but before write #0. On the other hand, it doesn’t really matter whether write #1 happens before write #2 or the other way around. The mathematically inclined will notice that this is a partial order.

    Now if there is replication software that is replicating all the writes from the primary to the secondary, it needs to ensure that the writes happen in the same order. Otherwise the database on the stand-by server will be inconsistent, and can result in problems if suddenly the stand-by needs to take over as the main database. (Strictly speaking, we just need to ensure that the partial order is respected. So we can do the writes in this order: #0, #2, #1 and things will be fine. But #2, #0, #1 could lead to an inconsistent database.)

    Replication software that ensures this is said to maintain write order fidelity. A large enterprise that runs mission critical databases (and other similar software) will not accept any replication software that does not maintain write order fidelity.

    Why is write-order fidelity difficult?

    I can here you muttering, “Ok, fine! Do the writes in the same order. Got it. What’s the big deal?” Turns out that maintaining write-order fidelity is easier said than done. Imagine the your database server has multiple CPUs. The different writes are being done by different CPUs. And the different CPUs have different clocks, so that the timestamps used by them are not nessarily in sync. Multiple CPUs is now the default in server class machines. Further imagine that the “logging section” of the database is actually stored on a different disk. For reasons beyond the scope of this article, this is the recommended practice. So, the situation is that different CPUs are writing to different disks, and the poor replication software has to figure out what order this was done in. It gets even worse when you realize that the disks are not simple disks, but complex disk arrays that have a whole lot of intelligence of their own (and hence might not write in the order you specified), and that there is a volume manager layer on the disk (which can be doing striping and RAID and other fancy tricks) and a file-system layer on top of the volume manager layer that is doing buffering of the writes, and you begin to get an idea of why this is not easy.

    Naive solutions to this problem, like using locks to serialize the writes, result in unacceptable degradation of performance.

    Druvaa Replicator has patent-pending technology in this area, where they are able to automatically figure out the partial order of the writes made at the primary, without significantly increasing the overheads. In this article, I’ve just focused on one aspect of Druvaa Replicator, just to give an idea of why this is so difficult to build. To get a more complete picture of the technology in it, see this white paper.

    Druvaa inSync

    Druvaa inSync is a solution that allows desktops/laptops in an enterprise to be backed up to a central server. (The central server is also in the enterprise; imagine the central server being in the head office, and the desktops/laptops spread out over a number of satellite offices across the country.) The key features of inSync are:

    • The amount of data being sent from the laptop to the backup server is greatly reduced (often by over 90%) compared to standard backup solutions. This results in much faster backups and lower consumption of expensive WAN bandwidth.
    • It stores all copies of the data, and hence allows timeline based recovery. You can recover any version of any document as it existed at any point of time in the past. Imagine you plugged in your friend’s USB drive at 2:30pm, and that resulted in a virus that totally screwed up your system. Simply uses inSync to restore your system to the state that existed at 2:29pm and you are done. This is possible because Druvaa backs up your data continuously and automatically. This is far better than having to restore from last night’s backup and losing all data from this morning.
    • It intelligently senses the kind of network connection that exists between the laptop and the backup server, and will correspondingly throttle its own usage of the network (possibly based on customer policies) to ensure that it does not interfere with the customer’s YouTube video browsing habits.

    Data de-duplication

    Overview of Druvaa inSync. 1. Fingerprints computed on laptop sent to backup server. 2. Backup server responds with information about which parts are non-duplicate. 3. Non-duplicate parts compressed, encrypted and sent.
    Overview of Druvaa inSync. 1. Fingerprints computed on laptop sent to backup server. 2. Backup server responds with information about which parts are non-duplicate. 3. Non-duplicate parts compressed, encrypted and sent.

    Let’s dig a little deeper into the claim of 90% reduction of data transfer. The basic technology behind this is called data de-duplication. Imagine an enterprise with 10 employees. All their laptops have been backed up to a single central server. At this point, data de-duplication software can realize that there is a lot of data that has been duplicated across the different backups. i.e. the 10 different backups of contain a lot of files that are common. Most of the files in the C:\WINDOWS directory. All those large powerpoint documents that got mail-forwarded around the office. In such cases, the de-duplication software can save diskspace by keeping just one copy of the file and deleting all the other copies. In place of the deleted copies, it can store a shortcut indicating that if this user tries to restore this file, it should be fetched from the other backup and then restored.

    Data de-duplication doesn’t have to be at the level of whole files. Imagine a long and complex document you created and sent to your boss. Your boss simply changed the first three lines and saved it into a document with a different name. These files have different names, and different contents, but most of the data (other than the first few lines) is the same. De-duplication software can detect such copies of the data too, and are smart enough to store only one copy of this document in the first backup, and just the differences in the second backup.

    The way to detect duplicates is through a mechanism called document fingerprinting. Each document is broken up into smaller chunks. (How do determine what constitutes one chunk is an advanced topic beyond the scope of this article.) Now, a short “fingerprint” is created for each chunk. A fingerprint is a short string (e.g. 16 bytes) that is uniquely determined by the contents of the entire chunk. The computation of a fingerprint is done in such a way that if even a single byte of the chunk is changed, the fingerprint changes. (It’s something like a checksum, but a little more complicated to ensure that two different chunks cannot accidently have the same checksum.)

    All the fingerprints of all the chunks are then stored in a database. Now everytime a new document is encountered, it is broken up into chunks, fingerprints computed and these fingerprints are looked up in the database of fingerprints. If a fingerprint is found in the database, then we know that this particular chunk already exists somewhere in one of the backups, and the database will tell us the location of the chunk. Now this chunk in the new file can be replaced by a shortcut to the old chunk. Rinse. Repeat. And we get 90% savings of disk space. The interested reader is encouraged to google Rabin fingerprinting, shingling, Rsync for hours of fascinating algorithms in this area. Before you know it, you’ll be trying to figure out how to use these techniques to find who is plagiarising your blog content on the internet.

    Back to Druvaa inSync. inSync does fingerprinting at the laptop itself, before the data is sent to the central server. So, it is able to detect duplicate content before it gets sent over the slow and expensive net connection and consumes time and bandwidth. This is in contrast to most other systems that do de-duplication as a post-processing step at the server. At a Fortune 500 customer site, inSync was able reduce the backup time from 30 minutes to 4 minutes, and the disk space required on the server went down from 7TB to 680GB. (source.)

    Again, this was just one example used to give an idea of the complexities involved in building inSync. For more information on other distinguishinging features, check out the inSync product overview page.

    Have questions about the technology, or about Druvaa in general? Ask them in the comments section below (or email me). I’m sure Milind/Jaspreet will be happy to answer them.

    Also, this long, tech-heavy article was an experiment. Did you like it? Was it too long? Too technical? Do you want more articles like this, or less? Please let me know.

    Related articles:

    Enhanced by Zemanta

    A common tech events calendar for Pune

    There is no single comprehensive source of information for all the events in Pune that are of interest to the Technology community. The PuneTech events page only carries information about events coming up soon. IT Vidya has an Events page but that is for events all over the country, and is also not comprehensive enough. Also, both of these are more like blogs than an events calendar, and are missing many features that a calendar should have.

    On the suggestion of Freeman Murray, we have started using upcoming.org as an events calendar for tech events in Pune. The Pune Tech Events Group on Upcoming will track all the tech events in Pune. This is a free, non-commercial, community driven initiative. Anybody can join the group. Anybody can add events. Anybody can subscribe to get updates.

    [edit] How to Join

    Just go to the group page on upcoming, and click on “join this group”.

    [edit] How to add an Event

    • Go the upcoming.org
    • Click on Add an Event
    • Fill out the event details. In case the venue is not yet decided, use “TBD Pune”
    • Complete the procedure for adding the event. This will result in a page getting created for this event. You’ll be taken to that page.
    • On the right side of this page, you’ll see links for “Send to Group” and “Add a Tag”
    • Important Use the above link to add this event to the “Pune Tech Events Group”
    • Important Use the above link to add the tags “tech” and “pune” (and other relevant tags) to your event

    Or, simply send an email with the relevant details (date, time, place, description) to punetech and we’ll add it for you.

    If you are organizing a tech event in Pune, please consider taking some time out to add the event to this calendar for the benefit of the community.

    Company Profile: hover.in

    Go Ergo has an interview of the founders of Pune-based web startup hover.in. Excerpt:

    Q: Hover.in was born out of constant frustration that most bloggers and web publishers face regarding their content presentation and monetisation. Could you explain how hover bridges this gap?
    A:? As a professional blogger, one of my main problems has been monetising (making money) from my blog without compromising on user experience. Normally, a visitor does not like to see too many ads on a blog or a website. Most of the new visitors will turn away from the blog on seeing too many ads. Hover was born to address this pain point!
    Hover.in is an in-text “customised content” and “customised ad” delivery platform for websites and blogs. It enables web publishers to link and monetise keywords or phrases within their existing content.
    Till date, in-text technology has been primarily used only to display contextual ads – mostly automated, without any publisher control. However, with hover.in, publishers can create and customise the content appearing within the hover bubble (hoverlet). Hover.in goes even further, allowing the publisher to change the look and feel of the hoverlet as per the theme of the website or blog.

    Read the full interview.

    hover.in provides web publishers and bloggers with in-text customized content display. It also provides opportunities for contextual in-text ads for increased revenues to publisher. More information about hover.in from the PuneTech wiki:

    Key Features

    • Complete control of what appears within the hover window, via an administrative panel
    • Publishers can choose between hundreds of 3rd party contextual widgets or browse applications within the hover.in community
    • Improve reader engagement by displaying targeted content for particular key phrase of choice, or choose default applications
    • Customize the interface, add your own content or integrate advertisements from third party ad networks

    Hover.in is currently in closed alpha and will be opening up shortly to selected beta users.

    [editArticles

    [edit]Links

    [edit]People