Category Archives: Featured

The best articles

Liveblogging POCC’s Startup Speed Date – Meet Pune’s startups

I’m liveblogging the Pune OpenCoffee Club‘s “Startup Speed Date” meetup. This is a meeting to get to know a bunch of Pune startups in a short amount of time. Here is a list of the startups here, with a short introduction.

Pringoo – personalized products. Go to their website, create your own T-shirt, using your own images or text. Or a mug, or mousepad, or keychain. Print it and have it delivered to your home. You can order even a single T-shirt, at reasonable rates.

Sokrati – a product for search engine marketing and optimization. Targeted towards SMBs. $1million revenues so far this year. Moved to Pune from Seattle last month.

Kaboodle – Social network for shopping. Each person can upload information about things they’ve bought, or want to buy. Can check the same info for friends and others. If you can’t decide what to buy, you can create a poll that your friends vote in. You can create a style statement by putting together an interesting assortment of products that would go well together. Clientele is mostly young women. One of the top sites in the US.

VirtuaResearch – SaaS for equity research. A web-based platform for getting equity research. The provide the website, they also provide the actual equity analysis. In addition, they allow free-lancers to add their own research, which others can use. Sort of a social network for equity analysis.

Lipikaar – Be able to input text in 17 Indian languages, anywhere – website, desktop software. Blog, gmail, etc. Unique, patented, key entry method, different from all other competitors, especially easy-to-use for people who do not know any English.

Hover.in – in-text, customized, content and ad-delivery platform. Widgets to insert in your blog which can automatically add content from various websites (e.g. wikipedia) or third-party ad network.

ThinkingSpaceActiveCiti.com a service for managing events, invites. EventAZoo.com a service for creating webpages for college festivals.

Chroma Systems – Image analysis software as well as hardware.

Alabot – Wants your computer / mobile phone to understand you when you talk to it. Natural language processing. For example, be able to send an sms for buying some train tickets without having to learn any specific command formats.

Markonix – Help startups with marketing their products in the US and elsewhere.

IndicTrans – A non-commercial group aim at building ‘social capital’ through making the communication and networking feasible and affordable among the people knowing indian languages. This we believe is a primary requirement for a democratic regeneration of our society as also a condition for harmonious globalisation.

laxmiroad.in – Provides you with an ability to shop online at Laxmi Road shops (for example Chitale!) and get delivery within 24 hours.

startupforstartups – Helping a wannabe entrepreneur build the first prototype of their startup without having to spend a lot, or build a team, or even quit your current job. See PuneTech interview with the founder.

Wissen Technologies Hukum Mere Aka is a learning program sitting in a instant messenger window that can talk to you and understand your commands, and get you data from its database based on your queries.

Liveblogging CSI Pune Lecture: Applications of Business Intelligence

I am liveblogging CSI Pune‘s lecture on Applications of Business Intelligence by Narender C.V. of SAS R&D India. These are quick and dirty notes of the lecture – not intended to be a well organized article, but hopefully it gives you enough of a flavor for the area to get you interested and excited enough to check it out on google and wikipedia.

The amount of data is doubling every 11 months. And we have easier and easier access to all this data from all over the world. The problem is making sense of all this data. The amount of time at our disposal remains the same. So we have to use sophisticated software and algorithms to figure out how to use this data to improve business and efficiency. That is Business Intelligence (BI).

This talk is the second in a series of talks on BI. PuneTech covered the first talk which gave an overview of BI and data warehousing. This lecture focuses on who uses BI and why. A major portion of this talk will be a bunch of examples of use of BI in real companies. So on to the examples:

Example 1: Getting a better grip on Reality (i.e. Seeing problems earlier)

First case study will focus on using BI to simply get a good picture of the situation as it exists. Seeing Reality. Last year, US based companies paid $28 billion in servicing warranties or recalls. This is money you don’t really want to spend. Biggest problem in this is identifying these problems as early as possible. Seeing reality early. Typically, an issue first appears. A little while later, the issue becomes visible to the company, and it is prioritized. Later it is “defined” and decisions taken by the decision makers. Finally the issue is resolved, and money paid out. A study by SAS shows that the “detect” part of this cycle takes about 90 days, the prioritize part takes 20 days, and the define part takes 75 days. That’s a total of 185 days to fix the problem.

A business intelligence system helps to reduce each phase of that sequence because of better data gathering and statistical analysis. This results in 27 days detection, 5 days, prioritization and 46 days to prioritize, for a total of 78 days. This is a huge improvement, and each day saved results in money saved.

How is this done? First simple reports: defects per thousand, per product. Dashboard with easy to see defect reports. Then a library of reports that various people in the company can use easily to see and analyze defects and warranty claims. Then a statistical analysis engine to detect “emerging issues”. Use algorithms that can detect, from early trends, issues that are likely to become “big” later on. Text mining and analysis to read unstructured reports of service technicians and being able to determine, simply by looking at the keywords, which product or part or defect was the cause of that particular incident. And there are other analytics, like forecasting and trend analysis that are used. Bottomline? Shanghai GM was able to reduce detection and definition time by 70%, resulting in reduction of costs by 34%. Which is pretty cool for simply running a bunch of mathematical algorithms.

Example 2: Manage and Align Resources to Strategy

Everybody agrees that it is important for a company to have a strategy. And that everyone should understand and execute according to that strategy. Obvious?

This is a reality based on a survey: Only 5% of the workforce of a large company understand the company strategy. Only 25% of the managers were incentivized based on the strategy. 60% of organizations do not link budgets to the strategy. 86% of executive teams spend less than one hour per month discussing strategy.

How can BI help in this case?

It is possible to define objectives for each person/team in the company. Then it is possible to define how this objective can/should be measured. Then BI software can be used to capture and analyze this data, and figure out how everybody is contributing to the end objectives of the business.

Example 3: Retail Optimization

The problem to be solved. Need to stock the exact quantity that people are going to buy. Stock too much and you lose money on unsold items. Order too little and you get out-of-stock situations and lose potential profits. Need to be able to forecast demand. Optimize which sizes/assortments to stock. All of you must have an experience of going to a shop, liking an item, and not having that available in your size. Sale lost. Profit lost. Can this loss be reduced?

Use BI for this. In case study, a department store sent the same mix of different sizes to all stores. SAS did clustering of stores, to create 7 different sub-groups that have different size mixes for each sub-group of stores.

Example 4: Personalized, real-time marketing

Take the example of marketing. Consider a traditional marketing mail sent from a company. Customers hate that and the success rate is a pathetic 3% or so. That’s just stupid, but exists when there is no alternative. Better is event based marketing. When you do something, it triggers a marketing push from the company. This is often convenient for the customer, and has a 20% success rate. But the best is customer initiated interaction which has a 40% success rate.

Note that as you go down that list, it gets more difficult to quickly, in real time, determine what marketing message exactly to push to the customer. If you call a pizza delivery place and they point out that that they have a buy-one-get-one-free offer, it might or might not be interesting for you. Better would be an offer focused specifically on your needs. Use BI to analyze individual customers and forecast their needs and then tailor the offer for you. An offer you cannot refuse.

Another example. Customer puts digital camera in online shopping cart. The online shopping software contacts the BI system for offers to push to customer. It looks at customer history. Figures out that customer is non-tech savvy customer who buys high-end products. Also, customer’s demographic information is consulted, and finally some accessories are suggested. Since this is very specific recommendation, this can result in a high chance of being accepted. This significantly increases profit on this transaction.

Example 5: Understanding Customers

Mobile company, simplistic view: Customer is leaving. Offer them a lower value plan. The might or might not leave. BI gives you better tools. Cost is not the only thing to play with. Understand why people are leaving, and also understand the effect of them leaving on your business. (Sometimes it might be best to let them leave.) And based on this, determine the best course of action – what / how much to offer them.

First, use predictive analysis to get an estimate of how much profit you are going to make from a customer over the course of next N years based on the data you have gathered about them so far. Use this figure, the “customer value”, to drive decisions on how much effort to expend on trying to get this customer to stay. Forget the low value customers, and focus on the high value ones!

Another possibility. If you have marketing money to spend on giving offers to some customers. Let us say there are 3 different kinds of offers. Use BI analysis to figure out which offers to send to which customers, based on customer value, and also chances of customer accepting that offer. This optimizes the use of the “offer” dollars.

Reblog this post [with Zemanta]

The next wave of entrepreneurship in India

Monish Darda, Director of Product Development at BMC BladeLogic and co-founder of WebSym, wrote an article on the next wave of entrepreneurship in India for CSI Pune’s quarterly newsletter Desktalk. Parts of that article have been reproduced here with permission. You should be able to download the the newsletter (contain the full article) from CSI Pune website’s download section. (It’s the July-Aug-Sept 2008 newsletter.)

So let us take a look at tomorrow’s entrepreneur – I tend to see him (or her – the masculine is just convenience) in two colors – Mr E the risk taker, and Mr E, the man with the plan. Tomorrow’s risk taker is a person fresh out of college, with a few like-minded close friends and a couple of mentors, who want to do the next cool thing. The man with the plan is a youngish guy, probably back from the USofA, his future secured, with a plan that will leverage India for his next successful startup. Is anyone building up a services company? Well, yes and no. What I see in the future is services sold as a product. One in three entrepreneurs are going to be thinking about leveraging the labor cost differential at the low end of the value chain along with innovation that takes the returns to the high end of the value chain.

I see social networking sites that will spawn with better ideas in India, quickly gaining eyeballs, cheaper and faster than anywhere else except China perhaps. I see quicker and richer integration of media, with infotainment at the core. Indian entrepreneurs will be driving mid range technology applications with a larger audience and higher success rates. You saw youtube and facebook become the rage and build value worth billions in a short span of time – hold your breath for the Indian versions; they are not too far away in the future.

I see mobile middle-ware and products – if every other Indian is going to have a mobile phone soon, the apps are not far away. Indian apps, in Indian languages, closer to the Indian psyche, driven by Indian technology entrepreneurs. And it is not far in the future that we will have our own Nokias and Sonys and Ericssons – hardware is sure to follow.

I am not a betting person, but if I were, I would be betting on small, consumer shareware coming out of India in the near future – apps for the phone, the iPhone and the PC, that makes practical use of the now ubiquitous personal computer. I see enterprise software being developed in India, owned in India, but still managed and sold principally outside India. The entrepreneurs are going to be eyeing the small, high volume software for cash market, where services coupled with technology vying to increase the quality of life.

How will Technology Entrepreneurship benefit India?

My bet on the man with the plan is that he will drive the “real” technology – high tech technology creation and adaptation for grand socio-economic experiments, for logistics and the growing manufacturing industry; selling to corporates, multi-nationals and governments. He will be the guy attracting large investments and innovation dependent on infrastructure.

We are culturally a very adaptable, and very tolerant, people – the technology infusion of the future is going to bring about sweeping social change; most of which has already started. Look at what we did with the mobile phone – with SMS and the “missed calls” syndrome. We adapted the technology to suit our way of working, and kept driving costs to the ground. We will keep doing that with automobiles, phones, software, nuclear power and any other technology that we adapt as our own. We are very good at converting luxuries to needs, and that to me is key – the future of the Indian entrepreneur is dependent on this one factor. And I think this mass market has the potential to make billionaires out of ordinary people, with the spark to adapt and profit. And I believe with all its perceived ineptness, corruption and mismanagement, the government is somehow going to be the catalyst to make this happen, believe it or not. Perhaps all the impossible traffic and the constant load-shedding is already starting an entrepreneur somewhere on the road to his empire …

About the Author – Monish Darda

Monish is Director, Product Development at BMC BladeLogic. He set up the India operations of Storability Software, an East coast storage startup and was heading the group at Sun Microsystems when Storability underwent two acquisitions.

Monish is also the cofounder and Director of Websym Technologies.

He did his Master’s at Florida Atlantic University in Boca Raton, FL. He has architected large systems in the areas of customer acquisition, manufacturing and finance on J2EE and Microsoft platforms. He has also shared his experience in leading technologies in implementation and design through mentoring programs for senior developers and designers in top national and international software houses. He has implemented innovative processes and tools for distributed design and development across the US and Europe

Reblog this post [with Zemanta]

Stop Virtual Machine Sprawl with Colama

This is a product-pitch for Colama, a product built by Pune-based startup Coriolis. For the basics of server virtualization, see our earlier guest posts: Introduction to Server Virtualization, and Why do we need server virtualization. Virtualization is fast emerging as a game-changing technology in the enterprise computing space, and Colama is trying to address an interesting new pain-point that making its presence felt. 

Virtualization is fast becoming an accepted standard for the IT infrastructure. While it comes as a boon to the development and QA communities, the IT practitioners are dealing with the pressing issue of virtual machine sprawl that surfaces due to adhoc and hence uncontrolled adoption of virtualization. While describing the problem and its effects, this paper outlines a solution called Colama, as offered by Coriolis Technologies.

 

Virtual machines have been slipping in under the covers everywhere in the IT industry. Software developers like virtual machines because they can easily mimic a target environment. QA engineers prefer virtual machines because they can simultaneously test the software on different configurations. Support engineers can ensure reproducibility of an issue by pointing to an entire machine rather than detailing on the individual steps and/or configuration requirement on a physical host. In many cases, adoption of virtual machines has been primarily driven by users’ choice rather than any coherent corporate strategy. The ad-hoc and uncontrolled use of virtual machines across the organization has resulted in to a problem called Virtual Machine sprawl, which has become critical for today’s IT administrators.

Virtual machine sprawl is an unpleasant side effect of server virtualization and its near exponential growth over the years. Here are the symptoms:

  • At any given point, the virtual machines running in the organization are un-accounted for. Information like who created them and when, who used them, what configuration/s they have, what licensed software they use, whether security patches have been applied, whether the data is backed up etc are not maintained and tracked anywhere.
  • Most commonly, people freely copy each other’s virtual machines and no usage tracking and access control is in place.
  • Because of cheap storage, too many identical or similar copies of the same machines are floating across the organization. But reduction in storage cost does not reduce the operational cost of storage management, search, backup, etc. Data duplication and redundancy is a problem even if storage is plentiful.
  • Because there is no mechanism to keep track of why an image was created, it is hard to figure out when it should be destroyed. People tend to forget what each image was created for, and keep them around just in case they are needed. This increases the storage requirements.
  • Licensing implications: Virtual machine copied from one with a licensed (limited) software needs to tracked for its life span in order to put a control on the use of licensed software.
  •  

    There are many players in the industry who address this problem. Most of the virtual lab management products are tied to one specific virtualization technology. For example, the VMWare Lab Manager works for only VMWare virtualization technology. In a heterogeneous virtualization environment that is filled with Xen, VMWare, VirtualBox and Microsoft virtual machines, such an approach falls short.

    Colama is Coriolis Technologies solution to address this problem. Colama manages the life cycle of virtual machines across an organization. While tracking and virtual machines, Colama is agnostic to the virtualization technology.

     

    Here are some of the features of Colama:

  • Basic SCM for virtual machine: You can Checkin/checkout/clone/tag/comment virtual machine/s for tracking revisions of virtual machine.
  • Image inspection: Colama provides automatic inspection, extraction and display of image-related data, like OS version, software installed, etc and also facilitates “search” on the extracted data. For example, you can search for the virtual machines that have got Windows 2003 server with service pack 4 and Oracle 10g installed!
  • Web based interface: Navigate through the virtual machine repository of your organization using just a web browser.
  • Ownership and access control: • Create a copy of a machine for yourself and decide who can use “your” machine.
  • De-duplication: Copying/Cloning virtual machines happens without any additional storage requirement.
  • Physical machine provisioning (lab management): Spawn a virtual machine of your choice on a physical host available and ‘ready’.
  • Management reports: auditing and compliance User activity reports, virtual machine history, health information (up/down time) of virtual machines, license reports of the virtual machines etc.
  • Virtualization agnostic: works for virtual machines from all known vendors. 
  • Please note: This product-pitch, by Barnali Ganesh, co-founder of Coriolis, is featured on PuneTech because the editors found the technology interesting (or more accurately, the pain-point which it is addressing). PuneTech is a purely non-commercial website and does not take any considerations (monetary or otherwise) for any of the content on the site. If you would like your product to be featured on the front page, send us an article and we’ll publish it if we fell it’s interesting to our readers. You can always add a page to the PuneTech wiki by yourself, as long as you follow the “No PR” guidelines.

    Advice for entrepreneurs – Gireendra Kasmalkar

    Gireendra Kasmalkar (Giri), whose testing startup VeriSoft was recently acquired by SQS, was interviewed for CSI Pune’s quarterly newsletter Desktalk. Some quotes from that interview that should be especially inspirational are reproduced here with permission. You should be able to download the the newsletter (containing the full article) from CSI Pune website’s download section. (It’s the July-Aug-Sept 2008 newsletter.)

    About how your attitude towards something makes all the difference:

    Initially, testing was a new area and it was a tough to convince people about it. Even today testing is sometimes looked down upon. My response: “I am glad you think of testing that way. That is what creates the opportunities for us!”.

    About how to deal with commoditization in your domain:

    But the testing industry was maturing. Every company now had an ‘independent’ testing practice. To maintain our leadership, we had to specialize further. We did that horizontally in such specialized areas as automation, load / performance, security and usability testing, even publishing papers in international conferences.

    But we also realized we had to develop a vertical. It had to be a vertical right for our size and where we could have a shot at global leadership. After due consideration, we decided that Games was that vertical. It was not at all easy to enter this vertical. But today we test for top 10 publishers in the video games industry and have authorization with all 3 major console manufacturers – Nintendo, Sony and Microsoft. In the on-line games industry, we not only test for game developers / publishers, but also for regulatory bodies for their compliance.

    On how a small startup can make an impact:

    For an entrepreneur, focusing on a niche is crucially important. One can hope to compete and beat the bigger, already established players only in a niche of your strength. More often than not, this niche would be a new area, and the entrepreneur only has his / her own data and analysis to see the hidden potential in the new area.

    On the importance of networking for success:

    This is where networking plays a very important role. Networking in the industry can bring you data about the trends in the industry. And you can even get your analysis of that data, and your ideas validated using your network. I was lucky to have the IIT network, the Jnana Prabodhini network, the TCS network because of my earlier background. But I also actively worked to build new networks through forums such as SPIN, CSI and MCCIA. In fact, there was a group of 8-10 successful industry seniors, whom I met once a year, by formal appointment, specifically for discussing my activities and seeking their suggestions. To their credit, none of these people ever refused my request. I definitely became a better entrepreneur as a result of these meetings.

    From my own experience, I strongly recommend active participation in forums such as CSI.

    I particularly like the suggestion about meeting 8 to 10 successful industry seniors on an yearly basis. I had started doing this with about 2, and based on Giri’s suggestion, I plan on increasing that to 8 to 10.

    And I certainly agree with active participation in forums. Giri suggests CSI-Pune. I would strongly suggest getting involved in the Pune OpenCoffee Club. There are a whole bunch of other groups and organizations where you could get involved. Check out the Groups and Organizations Page from the PuneTech wiki.

    Final word from Giri:

    To me, entrepreneurship is less about making money and more about making a difference. (There must be easier ways of making money .) An entrepreneur provides a viable solution to a problem (need). On the flip side, the entrepreneur’s capabilities are stretched to the maximum extent possible, which I find very satisfying.

    About the Interviewee – Gireendra Kasmalkar

    Gireendra Kasmalkar (Giri) is the Founder Director of Prabodhan and Verisoft InfoSystems, and after the recent acquisition of Verisoft, is the MD and CEO of SQS India InfoSystems Pvt. Ltd. Giri has been the Chairman of CSIPune chapter for 2007-2008 and is also actively associated with other relevant industry forums like SPIN and MCCIA.

    Despite having his hands full, professionals working with Giri will vouch for the energy, responsiveness and maturity that he brings forth to any activity.

    For a full profile and links, see Giri’s profile on the PuneTech wiki.

    Reading list for startup-founders

    Earlier, we published a list of free web services that a company can use to better run their business. However, tools do not really make or break a company. Having the right attitude, and making the right decisions at the right time is much more important. If you are a startup founder, it is not necessary that you learn everything from your mistakes at the school of hard knocks. You can try to benefit from the experiences of countless others who have learnt from their own successes and failures, and have survived to write blog posts about them. 

    To educate you, Sukshma.net (the blog of Santosh and Anjali, founders of Bookeazy and Lipikaar) have put together this list of articles and books that they found useful in their own journey. Here is a full list, shamelessly plagiarized from there, (to save my lazy readers from one extra click, I hope they don’t mind):

    Think Big!

    1st Month: Bullet-proof your idea.

    2nd Month: Raise Money, or scrape some together.

    3rd Month: Define your value proposition.

    4th Month: Iterative Innovation.

    5th Month: Advertising.

    7th Month: Momentum

    • 7 months in to my 2nd stint as Startup CEO – Jason Goldberg. Sorry. This link has gone off the internet. It used to be at http://blog.socialmedian.com/2008/08/7_months_into_my_2nd_stint_as.html but the whole blog disappeared after Jason sold social|median to Xing. If you’re able to find a copy of this article, please let us know. Lot’s of Pune people want to read it.

    9th Month: Profit for sustainability.

    Raising serious money from Professional Investors/Venture Capitalists.

    Master your disruptive contribution.

    Managing up and down.

    Other books worth reading.

    The Economics of Media and E-Commerce businesses.

    I need inspiration. I need to know this has happened to others. It’s never too late to start over.

    Read Founders@Work. Just turn to any story, any page and start reading. This is not to be read in one go, it’s most useful when you think you’re a complete idiot. Also, don’t forget to read Guy Kawasaki’s list of best pages from the book. Make your own list. We both have our list that we’ll put on the blog someday.

    Startup Resources from around the web.

    Remember to check the source, as they are promising to keep updating that list based on your suggestions (leave a comment).

    Pune’s too expensive, outsource to Nashik – Interview of Sushrut Bidwai (StartupForStartups)

    Nashik based StartupForStartups (SFS) is marketing itself as a “facility to help early stage companies with limited resources to build first cut of the product (V0.5/V1.0)”. Founder Sushrut Bidwai is a regular fixture at Pune startup events, and is trying to convince Pune-based startups to outsource work to his programmers in Nashik, promising that it will be cheaper than doing it themselves. PuneTech interviewed Sushrut to get a first-hand take on SFS’ value proposition.

    Can you give an overview of StartupForStartups?
    Many times people have good ideas, but dont have guts OR are shouldering family responsibilities, which does not allow them to pursue these ideas further. In some cases none of the founding team members are from tech background, so even getting a good CTO is difficult for them (salary wise as well skill wise) . StartupForStartups (SFS for short) is meant for such teams. Its better to have something ready before taking the risk of quitting (high paying) jobs. It gives you more insight into product you are building as well domain you are targeting.
    What SFS does is, it provides you resources required to build that beta version which you will show to investors (if its a big product) or will launch to limited audience and see how market reacts to it. After having this beta ready and some reactions from market or investors it becomes easier to take the risk and pursue it further full time.

    Why are you doing this in Nashik? I would have thought that being in Pune or Bangalore (near all the startups, who are your customers) would make more sense for you?
    Problems with Pune and Banglore are operational costs and resource costs. Also Nasik will have lower attrition rates and keeping people happy is easier. With technologies pushing the boundaries, we have so many tools available which makes working in distributed teams far easier. We even can do pair programming with two people sitting 5000 miles away from each other using WebEx/dimdim/Skype. Also it provides lot of cost advantages to startups we are working with.

    As a customer, one of the worries I would have with StartupForStartups, is the availability of quality talent in Nashik. How are you tackling this?
    We have developed a unique training program called “Implementing Concepts” which all our engineers go through before joining any startup team. So even if a particular engineer has gone through it once for a particular project, he/she will go through it again using the technologies which are going to be used in new project. This kind of a very elaborate HelloWorld for a project. Also, my experience in working on tech products is 80% of work is trivial and 20% is core work which is complex. So even if a startup is working on a product which is complex, they can take help of our resource in rest of 80% work. Though this does not mean we do not have expertise to take up complex work, it just means we are flexible and are okay with working as part of larger team. Also this is not outsourcing model, it is collaboration model. So you know who is working on your product and what that persons skills are and you can choose from the pool available. Also, to keep high availability of quality talent we are in process of collaborating with colleges here in Nasik. In this we will take the training program to colleges and have students go through them while working on final year projects. Please note that we do not assign interns on the projects.

    Considering that most of your customers are early-stage startups who are strapped for cash, how do you plan on charging them for your services?
    Charging is transparent. We send details about salaries we are paying to engineers assgined assigned to work on a particular startup and plus typically 20% operational costs. Now if you consider Nasik and typical salaries engineers expect and are more than happy with are much lesser than in Pune/Banglore. We already have the infrastructure and are building team. We are planning to build a team of 12-15 people by Dec end.

    Could you give us an idea of what kind of savings I can expect compared to outsourcing to a company in Pune? (Where are the savings coming from: lower salaries, or other factors too?)
    Lower costs for resources.
    Lower operational costs. Just to give you indicator of savings, an Entry level GWT/J2EE programmer will draw salaries in the range of 17-22K in Pune/Banglore same programmer if he/she is from Nasik will be more than content to work on a good startup team if given around 9-10K salary. Plus you have to keep him/her happy so there is no attrition. Spend money on infrastructure like office space/ furniture/ hardware/ software / electricity / lunch facilities and many non-tangible costs like FBTs/Mediclaim facilities etc etc. you save these costs by almost 70%.

    What do you see as your key competencies?

    • Experience of working in large scale product companies as well as early stage startups.
    • Top management has excellent problems solving and product designing skills.
    • Understand working on an Idea and processes involved in the same.
    • We are young, enthu, full of energy and love working on good ideas. (This is probably the most important quality.)

    In the context of StartupForStartups, I’ve heard you talk about having a startup ecosystem. Can you elaborate on that?
    This eco system is for people who are still in jobs and want pursue ideas. Though part of it can be used by full time entrepreneurs.

    • We are signing an MoU with a financial and legal service provider company having experienced in handling these services for startups.
    • We are creating pool of consultants (Architects, Performance engineering, Marketing/Advertising, HR, Viral Video creators etc).
    • We are collaborating with engineering colleges here in Nasik. We have designed an unique training program “Implementing Concepts” which is focused on training engineering students with latest technology and early stage product engineering.
    • We are talking to people who have the expertise of providing mentoring to early stage companies even before product is built OR evolved. Though we are in early stages of discussions with these people, but hopefully it will happen.
    • We are building a tool (looking to raise funds for this tool) which is designed around a process called Super-Agile, which I will be publishing shortly. This tool and process are targeted for early stage product development. The tool will make writing code almost a trivial thing and even non-tech background people after a little training can build the products first cut on their own.
    • Network. Not all startup founders are well connected. It takes lot of time to connect with people who can provide you help in building the startup. We can help there by connecting you with people we know, so its some starting point.
    • Knowledge-base. Startup entrepreneurs does not have time to go through lengthy tax stuff etc. Or does not have time to design their documents like Offer letters, seperation letters, NDAs. Over a period of time we will collect such documents and put them in an inventory. This inventory can be very valuable to startups.

    Note that all the services mentioned above does not necessarily come with a price tag. Some are out of goodwill some are for money 🙂 . Our main problem is we are young, have the skillsets necessary to pursue an idea and make it to successful business. But we do not have the idea. So we want to work with people who do have it.

    For those interested in meeting, Sushrut is in Pune today (25th September 2008) and is likely to attend the CSI Pune Seminar on Entrepreneurship. I know that there are a bunch of PuneTech readers who have an idea for a startup but haven’t made much progress yet because they haven’t been able to quit their job and take the plunge. If you are one of those, would you be willing to outsource some of this development to StartupForStartups or a similar company? Do you think this model will work? Let us know in the comments section below.

    Related Links

    Use Google Insights to find a niche market for your (non-web) product

    Image representing Google Labs as depicted in ...Image via CrunchBase, source unknown

    (In this interesting article, Trevas of Druvaa uses keyword search trending data from Google Insights and Google Labs Experimental Search to fine-tune his idea of what exactly is the market niche into which his products are most likely to have a demand.

    While search term analysis is a very common technique used by web-based companies for search engine optimization and finding long-tail customers, what is surprising in this case is that the products Trevas wants to sell have nothing to do with the web. He is using the keyword analysis to simply get a feel for which needs of users seem to have been met in the last few years, and which needs seem to be increasingly unmet. That gives him ideas for potential niche markets in which to position his products. Even if you have no interest in laptop backup and disaster recovery and the other terms used in this article, you should still read the article to get a hang of the technique, which can be applied in other fields. This article first appeared at Druvaa’s blog and is reproduced with permission. For more information about Druvaa and its technology, see this in-depth punetech article.)

    While doing some keyword research for Druvaa it began to become clear how interesting search engine statistic can be when you look closely at the data. From simple keyword suggestion tools, and graphs you can ascertain information that you never thought possible.

    The terms “backup” or “recovery”, for instance, get over 300,000 searches per month each with Google. In other words people are searching for good solutions to keep their data safe. That information by itself is useful (at least to us), but it’s when you begin to look at more specific search terms that things really get interesting. In fact, you can even begin to clearly see trends within the industry when you compare specific terms over any given length of time.

    With a look at some simple charts, you can begin to see things like:

    • Interest in laptop backup solutions has greatly increased over the past 10 years.
    • Some users are finding solutions to their data backup needs and disaster recovery isn’t as much of a problem as it was 4 years ago (but it still is a problem).
    • Enterprise users who have laptops in the office are still seeking a suitable solution to their backup needs.
    • Enterprise users who have offsite backup needs are still seeking a solution to business continuity.

    To demonstrate how I can get all of that from a few search terms, let’s take a closer look at some charts.

    A Look at Trends Using Search Engine Statistics

    Using Google Labs and their experimental search tool you come up with the following charts for the terms “data backup” and “laptop backup”.  This particular tool uses search volumes, online news statistics, number of websites, and more to show interest in any given topic. The charts clearly show that, while data backup has retained the same amount of interest over the past 10 years, interest in laptop backup has (and is) increasing.

    Of course, this idea makes sense. Laptops have decreased greatly in price since 1998, and as such have become a more common tool both for enterprise users and at home. On the other hand, data integrity has been a problem for business users for a couple of decades now, so interest in the topic of “data backup” have remained relatively the same.

    This information alone isn’t necessarily new. It’s the reason we created Druvaa InSync in the first place. The industry needed a reliable data backup solution, which is also fast enough to work well with computers that are on the go.  To further look at what’s needed let’s look at some more charts. This time based on search volumes alone.

    Laptop Backup as Important as Ever

    Search volumes for any given term are an easy way to see what is happening within an industry, to gauge interest for a product or service, or even to see how one product relates to another. In the developed world more than 73% of the population has internet access, and over 88% of internet users go online when they seek a solution to a problem.

    With that in mind let’s briefly look at some search engine statistic.  In this case I have used Google Insights to compare related search terms. The charts are based on normalized data, over time. If you looked at the actual search volumes they would have increased with time (since Internet use has grown). To get a more accurate look, Insights uses normalized data displayed on a scale of 1 – 100.

    Click Here to See the Chart for Yourself

    The first chart compares the terms data backup and disaster recovery. There are two things that can be gained from this chart.

    • 1. Since search volumes for both terms have declined over the past few years, it shows that some users are finding solutions to their backup needs, and disaster recovery is less of a problem today than it was in 2004/2005.
    • 2. As the lines of the chart come together, they begin to show a direct correlation to each other. Very likely this is due to the fact that proper data backup is becoming the solution to disasters in the office. It really was only a few years ago that disaster recovery often meant taking that broken hard drive to have the data extracted. In the past couple of years, enterprise users have begun to see that simple backups are a cheaper (and more reliable) solution.

    Since the term data backup may also relate to home users, with the next chart I used the term “enterprise backup” and compared it to “laptop backup”. Again we can see a couple of things from this chart. Once again we see a slight decline in the search volumes for enterprise backup. This confirms the idea that some enterprise users are finding a suitable solution to their backup needs.

    Click Here to See the Chart for Yourself

    By adding the term laptop backup though, something else begins to become clear. The term started the chart off at 61 and finished three years later at 62. There have been slight ups and downs in search volumes, but overall they have remained relatively the same. The two terms also begin to correspond closely with each other as the chart moves through 2007 and into 2008. To me this says that these terms are also beginning to become synonymous.  In other words, although some enterprise users are finding a backup solution, those with laptops in the office aren’t.

    I could repeat these same results with terms like “offsite backup” or “remote backup”.

    With a simple look at search engine statistics we begin to see that enterprise users have a need for a laptop backup solution that works. With our own product, which provides 10x faster laptop backup and a 90% reduction in storage and bandwidth, there is a solution to suit.

    Reblog this post [with Zemanta]

    Why do we need server virtualization

    Virtualization is fast emerging as a game-changing technology in the enterprise computing space. What was once viewed as a technology useful for testing and development is going mainstream and is affecting the entire data-center ecosystem. This article on the important use-cases of server virtualization by Milind Borate, is the second in PuneTech’s series of articles on virtualization. The first article gave an overview of server virtualization. Future articles will deal with the issue of management of virtual machines, and other types of virtualization.

    Introduction

    Is server virtualization a new concept? It isn’t, because traditional operating systems do just that. An operating system provides a virtual view of a machine to the processes running on it. Resources are virtualized.

    • Each process gets a virtual address space.
    • A process’ access privileges control what files it can access. That is storage virtualization.
    • The scheduler virtualizes the CPU so that multiple processes can run without conflicting with each other.
    • Network is virtualized by providing multiple streams (for example, TCP) over the same physical link.

    Storage and network are weakly virtualized in traditional operating systems because some global state is shared by all processes. For example, the same IP address is shared by all processes. In case of storage, the same namespace is used by all processes. Over time, some files/directories become de-facto standards. For example, all process look at the same /etc/passwd file.

    Today, the term “server virtualization” means running multiple OSs on one physical machine. Isn’t that just adding one more level of virtualization? An additional level generally means added costs, lower performance, higher maintenance. Why then is everybody so excited about it? What is it that server virtualization provides in addition to traditional OS offerings? An oversimplified version of the question is: If I can run two processes on one OS, why should I run two OSs with one process each? This document enumerates the drivers for running multiple operating systems on one physical machine, presenting a use case, evaluating the virtualization based solution, suggesting alternates where appropriate and discussing future trends.

    Application Support

    Use case: I have two different applications. One runs on Windows and the other runs on Linux. The applications are not resource intensive and a dedicated server is under-utilized.

    Analysis: This is a weak argument in an enterprise environment because enterprises want to standardize on one OS and one OS version. Even if you find Windows and Linux machines in the same department, the administrators are two different people. I wonder if they would be willing to share a machine. On the other hand, you might find applications that require conflicting versions of, say, some library, especially on Linux.

    Alternative solution: Wine allows you to run Windows applications on Linux. Cygwin allows you to run Linux applications on Windows. Unfortunately, it’s not the same as running the application directly on the required OS. I won’t bet that a third party application would run out of the box under these virtual environments.

    Future: Some day, developers will get fed up of writing applications for a particular OS and then port them to others. JAVA provides us with an host/OS independent virtual environment. JAVA wants programmers to write code that is not targetted for a particular OS. It succeeded in some areas. But, still there is a lot of software written for a particular OS. Why did everybody not move to JAVA? I guess, because JAVA does not let me do everything that I can do using OS APIs. In a way, that’s JAVA’s failure in providing a generic virtual environment. In future, we will see more and more software developed over OS independent APIs. Databases would be the next target for establishing generic APIs.

    Conflicting Applications

    Use case: I have two different applications. If I install both on the same machine, both fail to work. In fact, they might actually work but it’s not a supported by my vendor.

    Analysis: In the current state of affairs, an OS is not just hardware virtualization. The gamut of libraries, configuration files, daemons is all tied up with an OS. Even though an application does not depend on the exact kernel version, it very much depends on the library versions. It’s also possible that the applications make conflicting changes to some configuration file.

    Alternative solution: OpenVZ modifies Linux to provide multiple “containers” inside the same OS. The machine runs a single kernel but provides multiple isolated environments. Each isolated environment can run an application that would be oblivious to the other containers.

    Future: I think, operating systems need to support containers by default. The process level isolation provided at memory and CPU level needs to be extended storage and network also. On the other hand, I also hope that application writers desist from depending on shared configuration and shared libraries pay some attention to backward compatiblity.

    Fault Isolation

    Use case: In case an application or the operating system running the application faults, I want my other applications to run unaffected.

    Analysis: A faulty application can bring down entire server especially if the application runs in a priviledged mode and if it could be attacked over a network. A kernel driver bug or operating system bug also brings down a server. Operating systems are getting more stable and servers going down due to operating system bug is rare now a days.

    Alternative solution: Containers can help here too. Containers provide better isolation amongst applications running on the same OS. But, bugs in kernel mode components cannot be addressed by containers. Future: In near future, we are likely see micro-kernel like architectures around virtual machines monitors. Light weight operating systems could be developed to work only with virtual machine monitors. Such a solution will provide fault isolation without incurring the overheads of a full opearting system running inside a virtual machine.

    Live Application Migration

    Use case: I want to build a datacenter with utility/on-demand/SLA-based computing in mind. To achieve that, I want to be able to live-migrate an application to a different machine. I can run the application in a virtual machine and live-migrate the virtual machine.

    Analysis: The requirement is to migrate an application. But, migrating a process is not supported by existing operating systems. Also, the application might do some global configuration changes that need to be available on the migration target.

    Alternative solution: OpenVZ modifies Linux to provide multiple “containers” inside the same OS. OpenVZ also supports live migration of a container.

    Future: As discussed earlier, operating systems need to support containers by default.

    Hardware Support

    Use case: My operating system does not support the cutting edge hardware I bought today.

    Analysis: Here again, I’m not bothered about the operating system. But, my applications run only on this OS. Also, enterprises like to use the same OS version throughout the organization. If an enterprise sticks to an old OS version, it does not work with new hardware to be bought. If an enterprise is willing to move to the newer OS, it does not work with the existing old hardware.

    But, the real issue here is the lack of standardization across hardware or driver development models. I fail to understand why every wireless LAN card needs a different driver. Can all hardware vendors not standardize the IO ports and commands so that one generic driver works for all cards? On the other hand, every OS and even OS version has a different drivers development model. That means every piece of hardware requires a different driver for each OS version. Alternative solution: I cannot think of a good alternative solution. One specific issue, unavailability of wireless LAN card drivers for Linux, is addressed by NdisWrapper. NdisWrapper allows us to access a wirelss card on Linux by loading a Windows driver.

    Future: We either need hardware level standardization or the ability to run the same driver on all verions on all operating systems. It would be good to have wrappers, like NdisWrapper, for all types of drivers and all operating systems. A hardware driver should write to a generic API provided by the wrapper framework. The generic API should be implemented by the operating system vendors.

    Software Development Environment

    Use case: I want to manage hardware infrastructure for software development. Every developer and QA engineer needs dedicated machines. I can quickly provision a virtual machine when the need arises.

    Analysis: Under development software fails more often than a released product. Software developers and QA engineers want an isolated environment for the tests to correctly attribute bugs to the right application. Also, software development envinronments require frequent reprovisioning as the product under development needs to be tested under different operating systems.

    Alternative solution: Containers would work for most software development. I think, the exception is kernel level development.

    Future: Virtual machines found an instant market in software QA labs. Virtual machines will continue to flourish in this market.

    Application Configuration Replica

    Use case: I want to ship some data to another machine. Instead of setting up identical application enviroment on the other machine to access the data, I want to ship the entire machine image itself. Shipping physical machine image does not work because of hardware level differences. Hence, I want to ship virtual machine image.

    Analysis: This is another hard reality of life. Data formats are not compatible across multiple versions of a software product. Portable data formats are used by human readable documents. File-system data formats are also stable to a large extent and you can mount a FAT file-system or ISO 8529 file-system on virtually any version of any operating system. The same level of compatiblity is not established for other structured data. I don’t see that happening in near future. Even if this hurdle is crossed, you need to bother about correctly shipping all the application configuration, which itself could be different for the same software running on different OSs.

    Alternative solution: OpenVZ container could be a light-weight alternative to a complete virtual machine.

    Future: The future seems inclined towards “computing in a cloud”. The network bandwidth is increasing and so is the trend towards is outsourced hosting. Mail and web services are outsourced since a long time. Oracle-on-demand allows us to outsource database hosting. Google (writely) wants us to outsource document hosting. Amazon allows us to outsource storage and compuation both. In future, we will be completely oblivious to the location of our data and applications. The only process running on your laptop would be an improved a web browser. In that world, only system software engineers, who build these datacenters, would be worried about hardware and operating system compatibilities. But, they also will not be overly bothered because the data-center consolidations will reduce the diversity in hardware and OS.

    Thin Client Desktops

    Use case: I want to replace desktop PCs with thin clients. A central server will run a VM for each thin client. The thin client will act as a display server.

    Analysis: Thin clients could bring down the maintenance costs substantially. Thin client hardware is more resilient than a desktop PC. Also, it’s easier to maintain the software installed on a central server than managing several PCs. But, it’s not required to run a full virtual machine for each thin client. It’s sufficient to allow users to run the required applications from a central server and make the central storage available.

    Alternative solution: Unix operating systems are designed to be server operating systems. Thin X terminals are still prevalent in Unix desktop market. Microsoft Windows, the most prevalent desktop OS, is designed as a desktop OS. But, Microsft also has added substantial support for server based computing. Microsft’s terminal services allows multiple users to connect to a Windows server and launch applications from a thin client. Several commercial thin clients can work with Microsoft terminal services or similar services provided by other vendors.

    Future: Before the world moves to computing in a global cloud, an intermediate step would be enterprise-wide desktop application servers. Thin-clients would become prevalent due to reduced maintenance costs. I hope to see Microsoft come up with better licensing for server based computing. On Unix, floating application licenses is the norm. With a floating application licence, a server (or a cluster of servers) can run only fixed application instances as per the license. It does not matter which user or thin client launches the application. Such a floating licensing from Microsoft will help.

    Conclusion

    Server virtualization is a “heavy” solution for the problems it addresses today.These problems could be adddressed by operating systems in a more efficient manner with following modifications:

    • Support for containers.
    • Support for live migration of containers.
    • Decoupling of hardware virtualization and other OS functionalities.

    If existing operating systems muster enough courage to deliver these modifications, server virtualization will have tough time. It’s unrealistic to expect complete overhauls of existing operating systems. It’s possible to implement containers as a part of OS but decoupling hardware virtualizatoin from OS is a hard job. Instead, we are likely to see new light weight operating systems designed to run only in server virtualization environment. The light weight operating system will have following characteristics:

    • It will do away with functionality already implemented in virtual machine monitor.
    • It will not worry about hardware virtualization.
    • It might be a single user operating system.
    • It might expect all co-operative processes.
    • It will have a minimal kernel mode component. It will be mostly composed of user mode libraries providing OS APIs.

    Existing virtual machine monitors would also take up more responsiblity in order to support light weight operating systems:

    • Hardware support: The hardware supported by a VMM will be of primary importance. The OS only needs to support the virtual hardware made visible by VMM.
    • Complex resource allocation and tracking: I should get a finer control over resources allocated to virtual machines and be able to track resource usage. This involves CPU, memory, storage and network.

    I hope to see a light weight OS implementation targetted at server virtualization in near future. It would a good step towards modularizing the operating systems.

    Acknowledgements

    Thanks to Dr. Basant Rajan and V. Ganesh for their valuable comments.

    About the Author – Milind Borate

    Milind Borate is the CTO and VP of Engineering at Pune-based continuous data protecting startup Druvaa. He has over 13 years experience in enterprise product development and delivery. He worked at Veritas Software as Technical Director for SAN-FS and served on board of Veritas patent filter committee. Milind has filed over 15 patent applications (4 alloted) and co-authored “Undocumented Windows NT” in 1998. He holds a BE (CS) degree from University of Pune and MTech (CS) degree from IIT, Bombay.

    This article was written when Milind was at Coriolis, a startup he co-founded before Druvaa.

    Reblog this post [with Zemanta]

    Technology overview – Druvaa Continuous Data Protection

    Druvaa, a Pune-based product startup that makes data protection (i.e. backup and replication) software targeted towards the enterprise market, has been all over the Indian startup scene recently. It was one of the few Pune startups to be funded in recent times (Rs. 1 crore by Indian Angel Network). It was one of the three startups that won the TiE-Canaan Entrepreneural challenge in July this year. It was one of the three startups chosen to present at the showcase of emerging product companies at the NASSCOMM product conclave 2008.

    And this is not confined to national boundaries. It is one of only two (as far as I know) Pune-based companies to be featured in TechCrunch (actually TechCrunchIT), one of the most influential tech blogs in the world (the other Pune company featured in TechCrunch is Pubmatic).

    Why all this attention for Druvaa? Other than the fact that it has a very strong team that is executing quite well, I think two things stand out:

    • It is one of the few Indian product startups that are targeting the enterprise market. This is a very difficult market to break into, both, because of the risk averse nature of the customers, and the very long sales cycles.
    • Unlike many other startups (especially consumer oriented web-2.0 startups), Druvaa’s products require some seriously difficult technology.

    Rediff has a nice interview with the three co-founders of Druvaa, Ramani Kothundaraman, Milind Borate and Jaspreet Singh, which you should read to get an idea of their background, why they started Druvaa, and their journey so far. Druvaa also has a very interesting and active blog where they talk technology, and is worth reading on a regular basis.

    The rest of this article talks about their technology.

    Druvaa has two main products. Druvaa inSync allows enterprise desktop and laptop PCs to be backed up to a central server with over 90% savings in bandwidth and disk storage utilization. Druvaa Replicator allows replication of data from a production server to a secondary server near-synchronously and non-disruptively.

    We now dig deeper into each of these products to give you a feel for the complex technology that goes into them. If you are not really interested in the technology, skip to the end of the article and come back tomorrow when we’ll be back to talking about google keyword searches and web-2.0 and other such things.

    Druvaa Replicator

    Overall schematic set-up for Druvaa Replicator
    Overall schematic set-up for Druvaa Replicator

    This is Druvaa’s first product, and is a good example of how something that seems simple to you and me can become insanely complicated when the customer is an enterprise. The problem seems rather simple: imagine an enterprise server that needs to be on, serving customer requests, all the time. If this server crashes for some reason, there needs to be a standby server that can immediately take over. This is the easy part. The problem is that the standby server needs to have a copy of the all the latest data, so that no data is lost (or at least very little data is lost). To do this, the replication software continuously copies all the latest updates of the data from the disks on the primary server side to the disks on the standby server side.

    This is much harder than it seems. A simple implementation would simply ensure that every write of data that is done on the primary is also done on the standby storage at the same time (synchronously). This is unacceptable because each write would take unacceptably long and this would slow down the primary server too much.

    If you are not doing synchronous updates, you need to start worrying about write order fidelity.

    Write-order fidelity and file-system consistency

    If a database writes a number of pages to the disk on your primary server, and if you have software that is replicating all these writes to a disk on a stand-by server, it is very important that the writes should be done on the stand-by in the same order in which they were done at the primary servers. This section explains why this is important, and also why doing this is difficult. If you know about this stuff already (database and file-system guys) or if you just don’t care about the technical details, skip to the next section.

    Imagine a bank database. Account balances are stored as records in the database, which are ultimately stored on the disk. Imagine that I transfer Rs. 50,000 from Basant’s account to Navin’s account. Suppose Basant’s account had Rs. 3,00,000 before the transaction and Navin’s account had Rs. 1,00,000. So, during this transaction, the database software will end up doing two different writes to the disk:

    • write #1: Update Basant’s bank balance to 2,50,000
    • write #2: Update Navin’s bank balance to 1,50,000

    Let us assume that Basant and Navin’s bank balances are stored on different locations on the disk (i.e. on different pages). This means that the above will be two different writes. If there is a power failure, after write #1, but before write #2, then the bank will have reduced Basant’s balance without increasing Navin’s balance. This is unacceptable. When the database server restarts when power is restored, it will have lost Rs. 50,000.

    After write #1, the database (and the file-system) is said to be in an inconsistent state. After write #2, consistency is restored.

    It is always possible that at the time of a power failure, a database might be inconsistent. This cannot be prevented, but it can be cured. For this, databases typically do something called write-ahead-logging. In this, the database first writes a “log entry” indicating what updates it is going to do as part of the current transaction. And only after the log entry is written does it do the actual updates. Now the sequence of updates is this:

    • write #0: Write this log entry “Update Basant’s balance to Rs. 2,50,000; update Navin’s balance to Rs. 1,50,000” to the logging section of the disk
    • write #1: Update Basant’s bank balance to 2,50,000
    • write #2: Update Navin’s bank balance to 1,50,000

    Now if the power failure occurs between writes #0 and #1 or between #1 and #2, then the database has enough information to fix things later. When it restarts, before the database becomes active, it first reads the logging section of the disk and goes and checks whether all the updates that where claimed in the logs have actually happened. In this case, after reading the log entry, it needs to check whether Basant’s balance is actually 2,50,000 and Navin’s balance is actually 1,50,000. If they are not, the database is inconsisstent, but it has enough information to restore consistency. The recovery procedure consists of simply going ahead and making those updates. After these updates, the database can continue with regular operations.

    (Note: This is a huge simplification of what really happens, and has some inaccuracies – the intention here is to give you a feel for what is going on, not a course lecture on database theory. Database people, please don’t write to me about the errors in the above – I already know; I have a Ph.D. in this area.)

    Note that in the above scheme the order in which writes happen is very important. Specifically, write #0 must happen before #1 and #2. If for some reason write #1 happens before write #0 we can lose money again. Just imagine a power failure after write #1 but before write #0. On the other hand, it doesn’t really matter whether write #1 happens before write #2 or the other way around. The mathematically inclined will notice that this is a partial order.

    Now if there is replication software that is replicating all the writes from the primary to the secondary, it needs to ensure that the writes happen in the same order. Otherwise the database on the stand-by server will be inconsistent, and can result in problems if suddenly the stand-by needs to take over as the main database. (Strictly speaking, we just need to ensure that the partial order is respected. So we can do the writes in this order: #0, #2, #1 and things will be fine. But #2, #0, #1 could lead to an inconsistent database.)

    Replication software that ensures this is said to maintain write order fidelity. A large enterprise that runs mission critical databases (and other similar software) will not accept any replication software that does not maintain write order fidelity.

    Why is write-order fidelity difficult?

    I can here you muttering, “Ok, fine! Do the writes in the same order. Got it. What’s the big deal?” Turns out that maintaining write-order fidelity is easier said than done. Imagine the your database server has multiple CPUs. The different writes are being done by different CPUs. And the different CPUs have different clocks, so that the timestamps used by them are not nessarily in sync. Multiple CPUs is now the default in server class machines. Further imagine that the “logging section” of the database is actually stored on a different disk. For reasons beyond the scope of this article, this is the recommended practice. So, the situation is that different CPUs are writing to different disks, and the poor replication software has to figure out what order this was done in. It gets even worse when you realize that the disks are not simple disks, but complex disk arrays that have a whole lot of intelligence of their own (and hence might not write in the order you specified), and that there is a volume manager layer on the disk (which can be doing striping and RAID and other fancy tricks) and a file-system layer on top of the volume manager layer that is doing buffering of the writes, and you begin to get an idea of why this is not easy.

    Naive solutions to this problem, like using locks to serialize the writes, result in unacceptable degradation of performance.

    Druvaa Replicator has patent-pending technology in this area, where they are able to automatically figure out the partial order of the writes made at the primary, without significantly increasing the overheads. In this article, I’ve just focused on one aspect of Druvaa Replicator, just to give an idea of why this is so difficult to build. To get a more complete picture of the technology in it, see this white paper.

    Druvaa inSync

    Druvaa inSync is a solution that allows desktops/laptops in an enterprise to be backed up to a central server. (The central server is also in the enterprise; imagine the central server being in the head office, and the desktops/laptops spread out over a number of satellite offices across the country.) The key features of inSync are:

    • The amount of data being sent from the laptop to the backup server is greatly reduced (often by over 90%) compared to standard backup solutions. This results in much faster backups and lower consumption of expensive WAN bandwidth.
    • It stores all copies of the data, and hence allows timeline based recovery. You can recover any version of any document as it existed at any point of time in the past. Imagine you plugged in your friend’s USB drive at 2:30pm, and that resulted in a virus that totally screwed up your system. Simply uses inSync to restore your system to the state that existed at 2:29pm and you are done. This is possible because Druvaa backs up your data continuously and automatically. This is far better than having to restore from last night’s backup and losing all data from this morning.
    • It intelligently senses the kind of network connection that exists between the laptop and the backup server, and will correspondingly throttle its own usage of the network (possibly based on customer policies) to ensure that it does not interfere with the customer’s YouTube video browsing habits.

    Data de-duplication

    Overview of Druvaa inSync. 1. Fingerprints computed on laptop sent to backup server. 2. Backup server responds with information about which parts are non-duplicate. 3. Non-duplicate parts compressed, encrypted and sent.
    Overview of Druvaa inSync. 1. Fingerprints computed on laptop sent to backup server. 2. Backup server responds with information about which parts are non-duplicate. 3. Non-duplicate parts compressed, encrypted and sent.

    Let’s dig a little deeper into the claim of 90% reduction of data transfer. The basic technology behind this is called data de-duplication. Imagine an enterprise with 10 employees. All their laptops have been backed up to a single central server. At this point, data de-duplication software can realize that there is a lot of data that has been duplicated across the different backups. i.e. the 10 different backups of contain a lot of files that are common. Most of the files in the C:\WINDOWS directory. All those large powerpoint documents that got mail-forwarded around the office. In such cases, the de-duplication software can save diskspace by keeping just one copy of the file and deleting all the other copies. In place of the deleted copies, it can store a shortcut indicating that if this user tries to restore this file, it should be fetched from the other backup and then restored.

    Data de-duplication doesn’t have to be at the level of whole files. Imagine a long and complex document you created and sent to your boss. Your boss simply changed the first three lines and saved it into a document with a different name. These files have different names, and different contents, but most of the data (other than the first few lines) is the same. De-duplication software can detect such copies of the data too, and are smart enough to store only one copy of this document in the first backup, and just the differences in the second backup.

    The way to detect duplicates is through a mechanism called document fingerprinting. Each document is broken up into smaller chunks. (How do determine what constitutes one chunk is an advanced topic beyond the scope of this article.) Now, a short “fingerprint” is created for each chunk. A fingerprint is a short string (e.g. 16 bytes) that is uniquely determined by the contents of the entire chunk. The computation of a fingerprint is done in such a way that if even a single byte of the chunk is changed, the fingerprint changes. (It’s something like a checksum, but a little more complicated to ensure that two different chunks cannot accidently have the same checksum.)

    All the fingerprints of all the chunks are then stored in a database. Now everytime a new document is encountered, it is broken up into chunks, fingerprints computed and these fingerprints are looked up in the database of fingerprints. If a fingerprint is found in the database, then we know that this particular chunk already exists somewhere in one of the backups, and the database will tell us the location of the chunk. Now this chunk in the new file can be replaced by a shortcut to the old chunk. Rinse. Repeat. And we get 90% savings of disk space. The interested reader is encouraged to google Rabin fingerprinting, shingling, Rsync for hours of fascinating algorithms in this area. Before you know it, you’ll be trying to figure out how to use these techniques to find who is plagiarising your blog content on the internet.

    Back to Druvaa inSync. inSync does fingerprinting at the laptop itself, before the data is sent to the central server. So, it is able to detect duplicate content before it gets sent over the slow and expensive net connection and consumes time and bandwidth. This is in contrast to most other systems that do de-duplication as a post-processing step at the server. At a Fortune 500 customer site, inSync was able reduce the backup time from 30 minutes to 4 minutes, and the disk space required on the server went down from 7TB to 680GB. (source.)

    Again, this was just one example used to give an idea of the complexities involved in building inSync. For more information on other distinguishinging features, check out the inSync product overview page.

    Have questions about the technology, or about Druvaa in general? Ask them in the comments section below (or email me). I’m sure Milind/Jaspreet will be happy to answer them.

    Also, this long, tech-heavy article was an experiment. Did you like it? Was it too long? Too technical? Do you want more articles like this, or less? Please let me know.

    Related articles:

    Enhanced by Zemanta