<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>punetech.com &#187; backup</title>
	<atom:link href="http://punetech.com/tag/backup/feed/" rel="self" type="application/rss+xml" />
	<link>http://punetech.com</link>
	<description>Connecting together Pune&#039;s Technologists</description>
	<lastBuildDate>Tue, 07 Feb 2012 02:55:39 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3</generator>
		<item>
		<title>Pune-based Druva get $12M in Series B from Nexus/Sequoia &#8211; This time its official</title>
		<link>http://punetech.com/pune-based-druva-get-12m-in-series-b-from-nexussequoia-this-time-its-official/</link>
		<comments>http://punetech.com/pune-based-druva-get-12m-in-series-b-from-nexussequoia-this-time-its-official/#comments</comments>
		<pubDate>Thu, 25 Aug 2011 02:45:00 +0000</pubDate>
		<dc:creator>Navin Kabra</dc:creator>
				<category><![CDATA[Featured]]></category>
		<category><![CDATA[Money Matters]]></category>
		<category><![CDATA[News]]></category>
		<category><![CDATA[backup]]></category>
		<category><![CDATA[enterprise]]></category>
		<category><![CDATA[funding]]></category>
		<category><![CDATA[startup]]></category>
		<category><![CDATA[storage]]></category>

		<guid isPermaLink="false">http://punetech.com/?p=3357</guid>
		<description><![CDATA[Pune-based Druva software, which makes enterprise backup software, has just cosed a $12 million round of funding from Nexus Venture Partners and existing investory Sequoia. In April 2010, they had raised $5 million from Sequoia and the Indian Angel Investors. This funding is going to be used by Druva to make a strong push into [...]]]></description>
			<content:encoded><![CDATA[<p>Pune-based Druva software, which makes enterprise backup software, has just cosed a $12 million round of funding from <a href="http://www.nexusvp.com/">Nexus Venture Partners</a> and existing investory Sequoia. In April 2010, they had raised $5 million from Sequoia and the Indian Angel Investors. </p>
<p>This funding is going to be used by Druva to make a strong push into cloud-based backup. Cloud infrastructure for a bandwidth and storage intensive like backup can be a significant expense, and of course, sales and marketing too. </p>
<p>A few weeks back a partially inaccurate version of this story had been leaked by Economic Times and was <a href="http://punetech.com/backup-software-provider-druva-com-get-10-million-funding-from-nexus/">reported by PuneTech</a>, but we &#8220;withdrew&#8221; the story after Druva called us up and let us know that it was premature to talk about it. Talking about a company&#8217;s funding round before everything is finalized and the money is in the bank is dangerous for a number of reasons including:</p>
<ul>
<li>Funding is a tricky thing and there are no guarantees until the money is in the bank. Many things can, and do go wrong. One bad day on the stock market can cause VCs to reconsider any deals that are not final.</li>
<li>From the time the startup received a term-sheet from the VC until the deal is finalized, there is usually a <em>no shopping</em> clause which prevents the startup from talking about the details of the deal with anybody else. This is to ensure that the startup does not use this offer to try and create a bidding war between VCs. Hence, if the details leak out the VCs might feel that the startup is trying to violate the no shopping clause</li>
<li>Most importantly, if word leaks out that a VC is funding a company for amount X, then in next few days is is possible that the VC&#8217;s contacts in the industry (probably other VCs) keep saying &#8220;Why are you paying X? I don&#8217;t think it is worth more than Y?&#8221; and this can cause the VC to reconsider the deal. This is very dangerous for the startup.</li>
</ul>
<p>This time however, the news is official (and is actually better than the deal reported by Indian Express).</p>
<p>As for what Druva does exactly, and why it is one of our favorite Pune companies, <a href="http://punetech.com/backup-software-provider-druva-com-get-10-million-funding-from-nexus/">just read the previous article, which had a bunch of links</a>. Here are some other interesting tidbits about Druva:</p>
<ul>
<li>&#8220;Druva&#8217;s disruptive innovation reduces the storage footprint and bandwidth requirement for backup by orders of magnitude compared to other industry solutions&#8221; -Jishnu Bhattacharjee, Nexus</li>
<li>Druva, founded in 2007, has amassed more than 750 customers and protects more than 300,000 endpoints (<em>i.e.</em> servers, laptops, PCs) worldwide</li>
<li>InSync&#8217;s global, source-based deduplication reduces bandwidth and storage by 90 percent while providing 100 percent accuracy for Microsoft Outlook and Office applications</li>
</ul>
<p><a href="http://www.nexusvp.com/news-details.asp?id=107">Here&#8217;s the full press release regarding this news</a></p>
]]></content:encoded>
			<wfw:commentRss>http://punetech.com/pune-based-druva-get-12m-in-series-b-from-nexussequoia-this-time-its-official/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>News of Druva&#8217;s funding was inaccurate and premature</title>
		<link>http://punetech.com/news-of-druvas-funding-was-inaccurate-and-premature/</link>
		<comments>http://punetech.com/news-of-druvas-funding-was-inaccurate-and-premature/#comments</comments>
		<pubDate>Mon, 08 Aug 2011 07:13:28 +0000</pubDate>
		<dc:creator>Navin Kabra</dc:creator>
				<category><![CDATA[Money Matters]]></category>
		<category><![CDATA[News]]></category>
		<category><![CDATA[backup]]></category>
		<category><![CDATA[enterprise]]></category>
		<category><![CDATA[funding]]></category>
		<category><![CDATA[startup]]></category>
		<category><![CDATA[storage]]></category>

		<guid isPermaLink="false">http://punetech.com/?p=3293</guid>
		<description><![CDATA[On Friday, based on an Economic Times report, we reported that Pune-based enterprise backup software provider Druva has received $10 million in funding from Nexus VP. Unfortunately this news appears to be inaccurate. Here is a comment from Jaspreet Singh, CEO of Druva: Thanks Navin, but this news is not very accurate. This was unethically [...]]]></description>
			<content:encoded><![CDATA[<p>On Friday, based on an <a href="http://economictimes.indiatimes.com/tech/ites/nexus-venture-to-invest-10-mn-in-druva-software/articleshow/9489240.cms">Economic Times report</a>, we reported that Pune-based enterprise backup software provider Druva has received $10 million in funding from Nexus VP. Unfortunately this news appears to be inaccurate.</p>
<p>Here is a comment from Jaspreet Singh, CEO of Druva:</p>
<blockquote>
<p>Thanks Navin, but this news is not very accurate. This was unethically leaked and then misreported by Peerzada (abrar.shz@timesgroup.com) of ET for some cheap thrills.</p>
<p>Not sure when would people this these grow up and stop screwing lives of entrepreneurs who are already fighting against all the odds.</p>
<p>You have been a great supporter and I would give you a call sometime next week to give accurate information and some more good news.</p>
</blockquote>
<p>Basically, Druva is indeed in an advanced stage in their second round funding process, but it is not done yet, and they cannot talk about the details of the amount or the investors involved. The details that came out in the ET report are inaccurate.</p>
<p>We wish Druva luck, and hope to hear the official good news sometime soon.</p>
]]></content:encoded>
			<wfw:commentRss>http://punetech.com/news-of-druvas-funding-was-inaccurate-and-premature/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Backup Software Provider Druva.com get $10 million funding from Nexus</title>
		<link>http://punetech.com/backup-software-provider-druva-com-get-10-million-funding-from-nexus/</link>
		<comments>http://punetech.com/backup-software-provider-druva-com-get-10-million-funding-from-nexus/#comments</comments>
		<pubDate>Fri, 05 Aug 2011 08:46:40 +0000</pubDate>
		<dc:creator>Navin Kabra</dc:creator>
				<category><![CDATA[Money Matters]]></category>
		<category><![CDATA[News]]></category>
		<category><![CDATA[backup]]></category>
		<category><![CDATA[enterprise]]></category>
		<category><![CDATA[funding]]></category>
		<category><![CDATA[startup]]></category>
		<category><![CDATA[storage]]></category>

		<guid isPermaLink="false">http://punetech.com/?p=3285</guid>
		<description><![CDATA[Update: It appears that the report in ET, on which this article is based, was inaccurate. Please see this update. Pune-based startup Druva, which sells enterprise backup software, has just closed a second round of funding worth $10 million from Nexus Venture Partners, reports Economic Times. In April 2010, Druva had raised $5 million from [...]]]></description>
			<content:encoded><![CDATA[<p><em><strong>Update</strong>: It appears that the report in ET, on which this article is based, was inaccurate. Please <a href="http://punetech.com/news-of-druvas-funding-was-inaccurate-and-premature/">see this update</a>.</em></p>
<p>Pune-based startup <a href="http://druva.com">Druva</a>, which sells enterprise backup software, has just closed a second round of funding worth $10 million from <a href="http://www.nexusvp.com/">Nexus Venture Partners</a>, <a href="http://economictimes.indiatimes.com/tech/ites/nexus-venture-to-invest-10-mn-in-druva-software/articleshow/9489240.cms">reports Economic Times</a>. </p>
<p>In April 2010, Druva had raised $5 million from Sequoia and the Indian Angel Network. At that time, <a href="http://punetech.com/world-class-software-products-can-come-out-of-india-interview-with-ceo-of-druva/">these are the reasons we gave for why we liked Druva</a>:</p>
<blockquote>
<ul>
<li>Druva is a purely homegrown startup. This is not a company started by someone in the US setting up a development center in India.</li>
<li>Druva is a product startup. It is not a services company. Hence, it has a potential for exponential growth and returns.</li>
<li>Druva is not done by serial entrepreneurs. The co-founders are all first-time entrepreneurs who quit their big-company jobs to start Druva. This should give hope to all the first-time entrepreneurs in Pune.</li>
</ul>
</blockquote>
<p>Druva has been one of PuneTech&#8217;s favorite startups and we have covered it extensively in the past, so, frankly, there isn&#8217;t much new that we&#8217;ll be able to say about it. Instead, we&#8217;ll simply point readers to the older articles:</p>
<ul>
<li><a href="http://punetech.com/world-class-software-products-can-come-out-of-india-interview-with-ceo-of-druva/">&#8220;World-class software products can come out of India&#8221; &#8211; Interview with Jaspreet Singh, CEO of Druva</a></li>
<li><a href="http://punetech.com/druvaa-from-protoin-presenter-to-protoin-sponsor-in-18-months/">Druva: From proto.in presenter to proto.in sponsor in 18 months</a></li>
<li><a href="http://punetech.com/understanding-data-de-duplication/">Understanding Data De-duplication</a> &#8211; an article about an underlying technology that gives Druva an edge over its competitors</li>
<li><a href="http://punetech.com/technology-overview-druvaa-continuous-data-protection/">Technology Overview &#8211; Druva Continuous Data Protection</a> &#8211; An article about Continuous Data Protection, the first product that Druva came out with. As far as we understand, Druva is no longer selling this product. This is an interesting lesson on how software startups have to &#8216;pivot&#8217; and change their product line in response to market demands, and how things can go in a completely different direction than what founders originally envisaged</li>
<li><a href="http://punetech.com/understanding-rpo-and-rto-in-backups/">Understanding RTO and RPO in backups</a> &#8211; A simple tech overview of the important parameters on which a backup solution should be evaluated.</li>
</ul>
<p>We also want to point out that Druva is one of the sponsors of <a href="http://punetech.com/why-you-should-register-to-attend-python-conference-pune-sept-2011-right-now/">PyCon &#8211; the International Python Conference that&#8217;s happening in Pune next month</a>.</p>
<p>We wish <a href="http://druva.com">Druva</a> luck, and although getting another round of VC funding is not as good an indicator of success as an IPO or an acquisition, we would still like to repeat what we said in April 2010: </p>
<blockquote>
<ul>
<li>We now have in our midst a startup success story that will hopefully inspire a 100 new software product startups in Pune. </li>
</ul>
</blockquote>
]]></content:encoded>
			<wfw:commentRss>http://punetech.com/backup-software-provider-druva-com-get-10-million-funding-from-nexus/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Pune Startup launches Vaultize &#8211; Cloud-Based Enterprise Backup &amp; DR</title>
		<link>http://punetech.com/pune-startup-launches-vaultize-cloud-based-enterprise-backup-dr/</link>
		<comments>http://punetech.com/pune-startup-launches-vaultize-cloud-based-enterprise-backup-dr/#comments</comments>
		<pubDate>Tue, 02 Aug 2011 05:39:47 +0000</pubDate>
		<dc:creator>Navin Kabra</dc:creator>
				<category><![CDATA[Featured]]></category>
		<category><![CDATA[Overviews]]></category>
		<category><![CDATA[backup]]></category>
		<category><![CDATA[disaster recovery]]></category>
		<category><![CDATA[entreprise]]></category>
		<category><![CDATA[infrastructure]]></category>
		<category><![CDATA[product]]></category>

		<guid isPermaLink="false">http://punetech.com/?p=3271</guid>
		<description><![CDATA[Pune Startup Anoosmar Technologies, has just come out of stealth mode, and announced the public beta of Vaultize, which they describe as: Vaultize is next generation data protection: cloud-based backup and disaster recovery that also enables collaboration between users, synchronization of devices and sharing over web. Vaultize turns your zero-returns investment in backup into an [...]]]></description>
			<content:encoded><![CDATA[<p>Pune Startup <a href="http://anoosmar.com">Anoosmar Technologies</a>, has just come out of stealth mode, and announced the public beta of <a href="http://vaultize.com">Vaultize</a>, which they describe as:</p>
<blockquote>
<p>Vaultize is next generation data protection: cloud-based backup and disaster recovery that also enables collaboration between users, synchronization of devices and sharing over web. Vaultize turns your zero-returns investment in backup into an asset that improves availability, increases productivity and makes sharing easy.</p>
</blockquote>
<p>Anoosmar Technoloies has been founded by <a href="http://www.linkedin.com/in/aakekre">Anand Kekre</a> and <a href="http://in.linkedin.com/in/ankurp">Ankur Panchbudhe</a>, both of whom are Pune old-timers, with an ex-Veritas (Symantec), and ex-McAfee background. Both of them have been in the data protection, security, and storage space for over 10 years, and have deep expertise in enterprise infrastructure software. Between them they have 64 US patents.</p>
<p>Before you dismiss Vaultize by comparing it with Dropbox, or <insert your favorite cloud-based backup service here>, remember that Vaultize is not a consumer product &#8211; it is targeting the enterprise space. In that sense, I see Vaultize as more of a competitor to Pune&#8217;s <a href="http://druva.com">Druva</a>. However, given the backgrounds of the founders of Druva and founders of Vaultize, I would be tempted to guess that Druva is likely to be more interested in enterprise backup, and replication and generally areas more to do with performance and availability in an enterprise, while Vaultize is likely to move more in the direction of archiving, and e-discovery and generally areas more to do with risk management and legal compliance. But that&#8217;s pure speculation &#8211; I might be wrong.</p>
<p>Also check out the <a href="http://www.vaultize.com/customers/">customer case studies page</a> and the <a href="http://www.vaultize.com/management-team/">management team page</a>.</p>
<p>Druva is one of the few Pune software product companies that has received funding from well known VCs, and hence, Anoosmar, which has a similar pedigree and similar target markets, is a company to watch closely.</p>
]]></content:encoded>
			<wfw:commentRss>http://punetech.com/pune-startup-launches-vaultize-cloud-based-enterprise-backup-dr/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>&#8220;World-class software products can come out of India&#8221; &#8211; Interview with CEO of Druva</title>
		<link>http://punetech.com/world-class-software-products-can-come-out-of-india-interview-with-ceo-of-druva/</link>
		<comments>http://punetech.com/world-class-software-products-can-come-out-of-india-interview-with-ceo-of-druva/#comments</comments>
		<pubDate>Tue, 06 Apr 2010 11:24:11 +0000</pubDate>
		<dc:creator>Navin Kabra</dc:creator>
				<category><![CDATA[In Depth]]></category>
		<category><![CDATA[Interview]]></category>
		<category><![CDATA[backup]]></category>
		<category><![CDATA[enterprise]]></category>
		<category><![CDATA[funding]]></category>
		<category><![CDATA[startup]]></category>
		<category><![CDATA[storage]]></category>

		<guid isPermaLink="false">http://punetech.com/?p=2150</guid>
		<description><![CDATA[We now have in our midst a startup success story that will hopefully inspire a 100 new software product startups in Pune. PuneTech and the Pune Open Coffee Club both started about 2 years ago, and the steadily increasing memberships and vitality of these communities points to a very strong startup community in Pune. However, [...]]]></description>
			<content:encoded><![CDATA[<p>We now have in our midst a startup success story that will hopefully inspire a 100 new software product startups in Pune.</p>
<p>PuneTech and the <a href="http://punetech.com/wiki/POCC">Pune Open Coffee Club</a> both started about 2 years ago, and the steadily increasing memberships and vitality of these communities points to a very strong startup community in Pune. However, throughout those two years, one question always cast a doubt on the long-term potential of this startup ecosystem. And that question was: Where are the success stories?</p>
<div id="attachment_2151" class="wp-caption alignright" style="width: 248px"><a href="http://punetech.com/tag/backup"><img class="size-full wp-image-2151" title="logo-druva" src="http://punetech.com/wp-content/uploads/2010/04/logo-druva.gif" alt="Druva Software is a Pune-based backup software product startup. Click on the logo to see all PuneTech articles about backup software (mostly about Druva)" width="238" height="49" /></a><p class="wp-caption-text">Druva Software is a Pune-based backup software product startup. Click on the logo to see all PuneTech articles about backup software (mostly about Druva)</p></div>
<p>Druva software (previously known as Druvaa) which just <a href="http://economictimes.indiatimes.com/Infotech-Software/Young-tech-cos-stir-investor-interest/articleshow/5764980.cms">closed a $5 million round of funding led by Sequoia Capital</a> answers that question. Of course, getting a round of VC funding is not as good an indicator of success as <a href="http://punetech.com/top-ways-in-which-persistent-and-anand-deshpande-have-benefitted-the-pune-tech-community/">an IPO</a> or an acquisition. And of course, there have been other successes in the past. But still this news is great, for the following reasons:</p>
<ul>
<li>Druva is a purely homegrown startup. This is not a company started by someone in the US setting up a development center in India.</li>
<li>Druva is a product startup. It is not a services company. Hence, it has a potential for exponential growth and returns.</li>
<li>Druva is not done by serial entrepreneurs. The co-founders are all first-time entrepreneurs who quit their big-company jobs to start Druva. This should give hope to all the first-time entrepreneurs in Pune.</li>
<li>There haven&#8217;t been many high-profile successes in recent times, and this one comes as a breath of fresh air.</li>
</ul>
<p>Druva has been one of PuneTech&#8217;s favorite startups. With <a href="http://punetech.com/tag/backup/">5 different PuneTech articles</a>, this is probably the company that has received maximum coverage from us. And a quick look at the articles gives hints as to why:</p>
<ul>
<li>It is a product company, which is always more interesting than a services company; it&#8217;s especially interesting to watch the product evolve over time.</li>
<li>It requires some very complex technology, not something that any company could easily build. Plus, they are happy to write detailed technical articles about the technology that underlies their products.</li>
<li>It has repeatedly featured in high profile startup events in India, from <a class="zem_slink" title="Proto.in" rel="homepage" href="http://proto.in/">proto.in</a> to the NASSCOM summit</li>
</ul>
<p>PuneTech spoke to <a href="http://punetech.com/druvaa-from-protoin-presenter-to-protoin-sponsor-in-18-months/">Jaspreet Singh</a>, CEO of Druva, over the phone, and here are some quick notes based on this conversation. There are a number of unique features here that other Pune entrepreneurs would do well to take note of.</p>
<p><strong>On the current state of the company</strong></p>
<p>Druva has $2.5 million revenue run rate, coming from about 400+ customer deployments. Most of this is from their flagship product, the <em>inSync</em> remote laptop disk-to-disk backup solution. Recently they also introduced <em>Phoenix</em> a remote server disk-to-disk backup solution. They have about 23 employees, most of them in Pune, with a few sales people elsewhere. The product is developed entirely in Pune.</p>
<p><strong>How do they manage enterprise support for 400 customers with such a small employee base?</strong></p>
<p>Although supporting their customers is a very high priority for Druva, one of the things they focus on very hard is to make the product very easy to use and very easy to support &#8211; so that to a large extent, most of their customers don&#8217;t really require any support. They have a &#8220;release often&#8221; philosophy which ensures that customers always have the latest, bug-fixed, version of the software.</p>
<p>Another area that they put a lot of effort in, is in ensuring that the product is easy to install. A lot of their <a href="http://www.druva.com/case-studies/customer-testimonials">customer testimonials</a> speak of how easy it was to self-install the software. By contrast, the comparable software from the more established players in the market requires professional services help for installation.</p>
<p><strong>How do they manage sales without a strong US/Europe presence?</strong></p>
<p>Instead of the tradition of <em>hand-holding</em> that is a common feature of enterprise sales in this domain, Druva decided to go a different route. They made their software freely downloadable from the web, and made it easy to install and try. As a result, most of their customers approach them after having first tried the product out via the website. And many of their sales, even large ones, have happened over skype/email, with no in-person customer visits.</p>
<p><strong>How do they compete with the large <a class="zem_slink" title="Multinational corporation" rel="wikipedia" href="http://en.wikipedia.org/wiki/Multinational_corporation">MNCs</a>, the established players in this market?</strong></p>
<p>We were very surprised to learn that Druva does not try to compete with the incumbents on cost. Jaspreet told us that in fact the average Druva sale tends to be 3x more expensive than the comparative offering from the established players. Druva scores on ease of use, simplicity, and most imporantly, the technology.</p>
<p>Jaspreet points out that one of Druva&#8217;s strong points is the easy-to-use source-level de-duplication. Which means that when backing up a laptop, they can ignore duplicate content even before the data is sent to the remote backup server. Specifically consider the gigabytes of windows operating system files on your laptop. Most of these files are likely to be identical across all laptops of a company. Druva&#8217;s software would know beforehand that there is a copy of those files on the backup server, and would never send it across. Such optimizations ensure that backing up 15 TeraBytes of data from a number of different laptops just results in about 2 or 3 TeraBytes being send across the network. This results in an increase in speed, reduction in network bandwidth consumed, and in disk-space consumed.</p>
<p>By contrast, traditional backup systems do de-duplication at the destination. Which means that all the data is sent to the server over the network, and only then is the server able to remove duplicate content. This means that the speed and network bandwidth improvements are lost.</p>
<p>Also, claims Jaspreet, Druva&#8217;s backups are fully searchable &#8211; a feature that is not available with most competitors.</p>
<p><strong>What is their primary challenge currently?</strong></p>
<p>Jaspreet says that they want to build a high-quality, world-class product, and for that he needs lots of high-quality, world-class people. While they&#8217;ve obviously managed to build a team like that which got them so far, they need many more such people in the coming days, and that&#8217;s a significant challenge. He says that it is difficult, if not impossible to find &#8220;readymade&#8221; world-class talent here (even when &#8220;world-class&#8221; salary and/or equity is offered!). Instead, he feels that the only approach that works is to find individuals (whether freshers or industry veterans) who have the right attitude and potential and then nurture them into the required shape.</p>
<p><em>(As an aside, we&#8217;d like to point out that is a pattern. Pretty much every startup we talk to mentions hiring of high-quality people as one of their primary challenges. This is a problem that needs a solution, and I&#8217;m hoping that some entrepreneur in Pune is looking at this as an opportunity.)</em></p>
<p>Parting thoughts: In the Druva co-founders, we have people who have been through the entire process, from zero to VC-funding, in Pune, recently. And they are nice guys. Pune entrepreneurs should take advantage of this, and flock to them for guidance, advice and mentorship.</p>
<div class="zemanta-pixie" style="margin-top: 10px; height: 15px;"><a class="zemanta-pixie-a" title="Reblog this post [with Zemanta]" href="http://reblog.zemanta.com/zemified/d1ca606d-fe9c-40fc-8e96-f96069defbaa/"><img class="zemanta-pixie-img" style="border: medium none; float: right;" src="http://img.zemanta.com/reblog_b.png?x-id=d1ca606d-fe9c-40fc-8e96-f96069defbaa" alt="Reblog this post [with Zemanta]" /></a></div>
]]></content:encoded>
			<wfw:commentRss>http://punetech.com/world-class-software-products-can-come-out-of-india-interview-with-ceo-of-druva/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Druvaa: From proto.in presenter to proto.in sponsor in 18 months</title>
		<link>http://punetech.com/druvaa-from-protoin-presenter-to-protoin-sponsor-in-18-months/</link>
		<comments>http://punetech.com/druvaa-from-protoin-presenter-to-protoin-sponsor-in-18-months/#comments</comments>
		<pubDate>Mon, 27 Jul 2009 09:54:50 +0000</pubDate>
		<dc:creator>Navin Kabra</dc:creator>
				<category><![CDATA[Interview]]></category>
		<category><![CDATA[video]]></category>
		<category><![CDATA[backup]]></category>
		<category><![CDATA[protodotin]]></category>
		<category><![CDATA[startup]]></category>
		<category><![CDATA[storage]]></category>

		<guid isPermaLink="false">http://punetech.com/?p=1529</guid>
		<description><![CDATA[Pune based backup software startup Druvaa has gone from being a 3-person startup that presented at proto.in 18 months back, to a 16-person company that is profitable, and sponsored proto.in this weekend. We caught up with Jaspreet Singh of Druvaa during proto and had a conversation with him about how they are doing. Note: Please [...]]]></description>
			<content:encoded><![CDATA[<p>Pune based <a class="zem_slink" href="http://en.wikipedia.org/wiki/Backup_software" title="Backup software" rel="wikipedia">backup software</a> startup <a href="http://punetech.com/wiki/Druvaa">Druvaa</a> has gone from being a 3-person startup that presented at <a class="zem_slink" href="http://en.wikipedia.org/wiki/Proto.in" title="Proto.in" rel="wikipedia">proto.in</a> 18 months back, to a 16-person company that is profitable, and sponsored <a href="http://punetech.com/attend-protoin-from-home-follow-the-live-online-coverage/">proto.in this weekend</a>. We caught up with Jaspreet Singh of Druvaa during proto and had a conversation with him about how they are doing. </p>
<p><object width="425" height="344"><param name="movie" value="http://www.youtube.com/v/Ux7VzPwZzKM&amp;hl=en&amp;fs=1&amp;"><param name="allowFullScreen" value="true"><param name="allowscriptaccess" value="always"><embed src="http://www.youtube.com/v/Ux7VzPwZzKM&amp;hl=en&amp;fs=1&amp;" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="425" height="344"></embed></object><br />
Note: Please turn up the volume. The sound quality is not-so-great. (Hopefully future videos will be better.)</p>
<p>Please also check out the older PuneTech articles about Druvaa:</p>
<ul>
<li><a href="http://punetech.com/understanding-data-de-duplication/">Understanding Data De-duplication</a></li>
<li><a href="http://punetech.com/technology-overview-druvaa-continuous-data-protection/">Technology Overview &#8211; Druvaa Continuous Data Protection</a></li>
<li><a href="http://punetech.com/druvaa-presentation-video/">Druvaa Presentation at proto.in 3</a></li>
<p><a href="http://punetech.com/druvaa-presentation-video/">	</a>
<li><a href="http://punetech.com/understanding-rpo-and-rto-in-backups/">Understanding RPO and RTO in Backups</a></li>
</ul>
<p>Interesting note: You&#8217;ll notice that over the years, Druvaa has shifted gears from selling continuous protection (which they started off with) to remote backups (which is their primary product now). This is a feature of any startup &#8211; adapting to the needs of the market. </p>
<div style="margin-top: 10px; height: 15px;" class="zemanta-pixie"><a class="zemanta-pixie-a" href="http://reblog.zemanta.com/zemified/640b9309-2071-482f-98df-6e57551b4a6e/" title="Reblog this post [with Zemanta]"><img style="border: medium none ; float: right;" class="zemanta-pixie-img" src="http://img.zemanta.com/reblog_b.png?x-id=640b9309-2071-482f-98df-6e57551b4a6e" alt="Reblog this post [with Zemanta]"></a></div>
]]></content:encoded>
			<wfw:commentRss>http://punetech.com/druvaa-from-protoin-presenter-to-protoin-sponsor-in-18-months/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Understanding Data De-duplication</title>
		<link>http://punetech.com/understanding-data-de-duplication/</link>
		<comments>http://punetech.com/understanding-data-de-duplication/#comments</comments>
		<pubDate>Thu, 15 Jan 2009 01:42:30 +0000</pubDate>
		<dc:creator>Navin Kabra</dc:creator>
				<category><![CDATA[In Depth]]></category>
		<category><![CDATA[Technology]]></category>
		<category><![CDATA[backup]]></category>
		<category><![CDATA[enterprise]]></category>
		<category><![CDATA[storage]]></category>

		<guid isPermaLink="false">http://punetech.com/?p=876</guid>
		<description><![CDATA[Druvaa is a Pune-based startup that sells fast, efficient, and cheap backup (Update: see the comments section for Druvaa&#8217;s comments on my use of the word &#8220;cheap&#8221; here &#8211; apparently they sell even in cases where their product is priced above the competing offerings) software for enterprises and SMEs. It makes heavy use of data [...]]]></description>
			<content:encoded><![CDATA[<p><em><a href="http://punetech.com/wiki/Druvaa"><img class="alignright" title="Druvaa Logo" src="http://farm4.static.flickr.com/3187/2802701750_e748d19789_o.gif" alt="" width="226" height="49" /></a><a title="PuneTech profile of Druvaa backup software" href="http://punetech.com/wiki/Druvaa">Druvaa</a> is a Pune-based startup that sells fast, efficient, and cheap <a class="zem_slink" title="Backup software" rel="wikipedia" href="http://en.wikipedia.org/wiki/Backup_software">backup</a> (<strong>Update</strong>: see the comments section for Druvaa&#8217;s comments on my use of the word &#8220;cheap&#8221; here &#8211; apparently they sell even in cases where their product is priced above the competing offerings) software for enterprises and SMEs. It makes heavy use of data de-duplication technology to deliver on the promise of speed and low-bandwidth consumption. In this article, reproduced with permission <a href="http://blog.druvaa.com/2009/01/09/understanding-data-deduplication/">from their blog</a>, they explain what exactly data de-duplication is and how it works.<br />
</em></p>
<h2>Definition of Data De-duplication</h2>
<blockquote><p><a class="zem_slink" title="Data deduplication" rel="wikipedia" href="http://en.wikipedia.org/wiki/Data_deduplication">Data deduplication</a> or Single Instancing essentially refers to the elimination of redundant data. In the deduplication process, duplicate data is deleted, leaving only one copy (single instance) of the data to be stored. However, indexing of all data is still retained should that data ever be required.</p></blockquote>
<p><strong>Example</strong><br />
A typical email system might contain 100 instances of the same 1 MB file attachment. If the email platform is backed up or archived, all 100 instances are saved, requiring 100 MB storage space. With data deduplication, only one instance of the attachment is actually stored; each subsequent instance is just referenced back to the one saved copy reducing storage and bandwidth demand to only 1 MB.</p>
<h2>Technological Classification</h2>
<p>The practical benefits of this technology depend upon various factors like –</p>
<ol>
<li><strong>Point of Application</strong> &#8211; Source Vs Target</li>
<li><strong>Time of Application</strong> &#8211; Inline vs Post-Process</li>
<li><strong>Granularity</strong> &#8211; File vs Sub-File level</li>
<li><strong>Algorithm</strong> &#8211; Fixed size blocks Vs Variable length data segments</li>
</ol>
<p>A simple relation between these factors can be explained using the diagram below -</p>
<p style="text-align: center;"><a href="http://blog.druvaa.com/wp-content/uploads/2009/01/dedup-tree.jpg"><img class="size-full wp-image-78 aligncenter" title="Deduplication Technological Classification" src="http://blog.druvaa.com/wp-content/uploads/2009/01/dedup-tree.jpg" alt="Deduplication Technological Classification" width="404" height="280" /></a></p>
<h3>Target Vs Source based Deduplication</h3>
<p><strong>Target based deduplication</strong> acts on the target data <a class="zem_slink" title="Data storage device" rel="wikipedia" href="http://en.wikipedia.org/wiki/Data_storage_device">storage media</a>. In this case the client is unmodified and not aware of any deduplication. The deduplication engine can embedded in the hardware array, which can be used as NAS/SAN device with deduplication capabilities. Alternatively it can also be offered as an independent software or hardware appliance which acts as intermediary between backup server and storage arrays. In both cases it improves only the storage utilization.</p>
<p style="text-align: center;"><a href="http://blog.druvaa.com/wp-content/uploads/2009/01/target-source-dedup.jpg"><img class="size-full wp-image-98 aligncenter" title="Target Vs Source Deduplication" src="http://blog.druvaa.com/wp-content/uploads/2009/01/target-source-dedup.jpg" alt="Target Vs Source Deduplication" width="563" height="154" /></a></p>
<p>On the contrary <strong>Source based deduplication</strong> acts on the data at the source before it’s moved. A deduplication aware backup agent is installed on the client which backs up only unique data. The result is improved bandwidth and storage utilization. But, this imposes additional computational load on the backup client.</p>
<h3>Inline Vs Post-process Deduplication</h3>
<p>In target based deduplication, the deduplication engine can either process data for duplicates in real time (i.e. as and when its send to target) or after its been stored in the target storage.</p>
<p>The former is called <strong>inline deduplication</strong>. The obvious advantages are -</p>
<ol>
<li>Increase in overall efficiency as data is only passed and processed once</li>
<li>The processed data is instantaneously available for post storage processes like recovery and replication reducing the <a title="Understanding RPO and RTO" href="http://blog.druvaa.com/2009/01/09/understanding-data-deduplication/blog.druvaa.com/2008/03/22/understanding-rpo-and-rto/" target="_blank">RPO and RTO</a> window.</li>
</ol>
<p>the disadvantages are -</p>
<ol>
<li>Decrease in write throughput</li>
<li>Extent of deduplication is less &#8211; Only fixed-length block deduplication approach can be use</li>
</ol>
<p>The inline deduplication only processed incoming raw blocks and does not have any knowledge of the files or file-structure. This forces it to use the fixed-length block approach (discussed in details later).</p>
<div class="mceTemp mceIEcenter">
<dl id="attachment_111" class="wp-caption aligncenter" style="width: 510px;">
<dt class="wp-caption-dt"><a href="http://blog.druvaa.com/wp-content/uploads/2009/01/inline-post-dedup.jpg"><img class="size-full wp-image-111" title="Inline Vs Post Process Deduplication" src="http://blog.druvaa.com/wp-content/uploads/2009/01/inline-post-dedup.jpg" alt="Inline Vs Post Process Deduplication" width="500" height="95" /></a></dt>
</dl>
</div>
<p><strong>The post-process deduplication</strong> asynchronously acts on the stored data. And has an exact opposite effect on advantages and disadvantages of the <em>inline deduplication</em> listed above.</p>
<h3>File vs Sub-file Level Deduplication</h3>
<p>The duplicate removal algorithm can be applied on full file or sub-file levels. Full file level duplicates can be easily eliminated by calculating single checksum of the complete file data and comparing it against existing checksums of already backed up files. It’s simple and fast, but the extent of deduplication is very less, as it does not address the problem of duplicate content found inside different files or data-sets (e.g. emails).</p>
<p>The sub-file level deduplication technique breaks the file into smaller fixed or variable size blocks, and then uses standard hash based algorithm to find similar blocks.</p>
<h3>Fixed-Length Blocks v/s Variable-Length Data Segments</h3>
<p>Fixed-length block approach, as the name suggests, divides the files into fixed size length blocks and uses simple checksum (MD5/SHA etc.) based approach to find duplicates. Although it’s possible to look for repeated blocks, the approach provides very limited effectiveness. The reason is that the primary opportunity for data reduction is in finding duplicate blocks in two transmitted datasets that are made up mostly &#8211; but not completely &#8211; of the same data segments.</p>
<p style="text-align: center;"><a href="http://blog.druvaa.com/wp-content/uploads/2009/01/file-bocks.jpg"><img class="size-full wp-image-83 aligncenter" title="Data Sets and Block Allignment" src="http://blog.druvaa.com/wp-content/uploads/2009/01/file-bocks.jpg" alt="Data Sets and Block Allignment" width="321" height="193" /></a></p>
<p>For example, similar data blocks may be present at different offsets in two different datasets. In other words the block boundary of similar data may be different. This is very common when some bytes are inserted in a file, and when the changed file processes again and divides into fixed-length blocks, all blocks appear to have changed.</p>
<p>Therefore, two datasets with a small amount of difference are likely to have very few identical fixed length blocks.</p>
<p><strong>Variable-Length Data Segment technology</strong> divides the data stream into variable length data segments using a methodology that can find the same block boundaries in different locations and contexts. This allows the boundaries to “float” within the data stream so that changes in one part of the dataset have little or no impact on the boundaries in other locations of the dataset.</p>
<h2>ROI Benefits</h2>
<p>Each organization has a capacity to generate data. The extent of savings depends upon – but not directly proportional to – the number of applications or end users generating data. Overall the deduplication savings depend upon following parameters –</p>
<ol>
<li>No. of applications or end users generating data</li>
<li>Total data</li>
<li>Daily change in data</li>
<li> Type of data (emails/ documents/ media etc.)</li>
<li> Backup policy (weekly-full – daily-incremental or daily-full)</li>
<li> Retention period (90 days, 1 year etc.)</li>
<li>Deduplication technology in place</li>
</ol>
<p>The actual benefits of deduplication are realized once the same dataset is processed multiple times over a span of time for weekly/daily backups. This is especially true for <em>variable length data segment</em> technology which has a much better capability for dealing with arbitrary byte insertions.</p>
<p><strong>Numbers</strong><br />
While some vendors claim 1:300 ratios of bandwidth/storage saving. Our customer statistics show that, the results are <strong>between 1:4 to 1:50</strong> for source based deduplication.</p>
<h6 class="zemanta-related-title" style="font-size: 1em;">Related articles by Zemanta</h6>
<ul class="zemanta-article-ul">
<li class="zemanta-article-ul-li"><a href="http://www.vnunet.com/vnunet/news/2228451/tape-alive">Tape is alive and well for storage</a></li>
<li class="zemanta-article-ul-li"><a href="http://www.vnunet.com/vnunet/news/2228286/better-storage-management-key">Better storage management key to success</a></li>
<li class="zemanta-article-ul-li"><a href="http://www.infoworld.com/article/08/11/03/Dell_Quantum_EMC_line_up_on_deduplication_1.html">Dell, Quantum, EMC line up on de-duplication</a></li>
<li class="zemanta-article-ul-li"><a href="http://www.theregister.co.uk/2008/09/19/restoring_exchange_in_30_seconds/">Mission Impossible: Restoring Exchange in 30 Seconds</a></li>
<li class="zemanta-article-ul-li"><a href="http://ostatic.com/173391-blog/cleversafe-goes-open-source-with-its-storage-software">Cleversafe Goes Open Source With its Storage Software</a></li>
</ul>
<div class="zemanta-pixie" style="margin-top: 10px; height: 15px;"><a class="zemanta-pixie-a" title="Zemified by Zemanta" href="http://reblog.zemanta.com/zemified/36a76173-9f53-4f2e-9dc1-53de4f7866bf/"><img class="zemanta-pixie-img" style="border: medium none; float: right;" src="http://img.zemanta.com/reblog_b.png?x-id=36a76173-9f53-4f2e-9dc1-53de4f7866bf" alt="Reblog this post [with Zemanta]" /></a></div>
]]></content:encoded>
			<wfw:commentRss>http://punetech.com/understanding-data-de-duplication/feed/</wfw:commentRss>
		<slash:comments>11</slash:comments>
		</item>
		<item>
		<title>Technology overview &#8211; Druvaa Continuous Data Protection</title>
		<link>http://punetech.com/technology-overview-druvaa-continuous-data-protection/</link>
		<comments>http://punetech.com/technology-overview-druvaa-continuous-data-protection/#comments</comments>
		<pubDate>Wed, 27 Aug 2008 05:56:25 +0000</pubDate>
		<dc:creator>Navin Kabra</dc:creator>
				<category><![CDATA[In Depth]]></category>
		<category><![CDATA[Overviews]]></category>
		<category><![CDATA[backup]]></category>
		<category><![CDATA[funding]]></category>
		<category><![CDATA[ian]]></category>
		<category><![CDATA[replication]]></category>
		<category><![CDATA[startups]]></category>
		<category><![CDATA[storage]]></category>
		<category><![CDATA[Technology]]></category>

		<guid isPermaLink="false">http://punetech.com/?p=208</guid>
		<description><![CDATA[Druvaa, a Pune-based product startup that makes data protection (i.e. backup and replication) software targeted towards the enterprise market, has been all over the Indian startup scene recently. It was one of the few Pune startups to be funded in recent times (Rs. 1 crore by Indian Angel Network). It was one of the three [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://druvaa.com/"><img class="alignright" title="Druvaa Logo" src="http://farm4.static.flickr.com/3187/2802701750_e748d19789_o.gif" alt="" width="226" height="49" /></a></p>
<p><a href="http://punetech.com/wiki/Druvaa">Druvaa</a>, a Pune-based product startup that makes data protection (<em>i.e.</em> backup and replication) software targeted towards the enterprise market, has been all over the Indian startup scene recently. It was one of the few Pune startups to be funded in recent times (<a href="http://www.livemint.com/2008/01/17222252/Indian-Angel-Network-backs-Pun.html">Rs. 1 crore by Indian Angel Network</a>). It was one of the three startups that <a href="http://www.pluggd.in/2008/07/tie-canaan-entrepreneurial-challenge-2008-winners-announced">won</a> the <a href="http://www.tienewdelhi.org/canaan/">TiE-Canaan Entrepreneural challenge</a> in July this year. It was one of the three startups chosen to present at the <a href="http://blog.nasscom.in/emerge/2008/08/11/nasscom-showcase-of-emerging-product-companies/">showcase of emerging product companies</a> at the <a href="http://www.nasscom.in/Nasscom/Templates/CustomEvents.aspx?id=54184">NASSCOMM product conclave</a> 2008.</p>
<p>And this is not confined to national boundaries. It is one of only two (as far as I know) Pune-based companies to be <a href="http://www.techcrunchit.com/2008/07/09/druvaa-dusts-for-fingerprints-to-save-bandwidth-and-storage/">featured in TechCrunch</a> (actually TechCrunchIT), one of the most influential tech blogs in the world (the other Pune company featured in TechCrunch is <a href="http://punetech.com/wiki/Pubmatic/">Pubmatic</a>).</p>
<p>Why all this attention for Druvaa? Other than the fact that it has a very strong team that is executing quite well, I think two things stand out:</p>
<ul>
<li>It is one of the few Indian product startups that are targeting the enterprise market. This is a very difficult market to break into, both, because of the risk averse nature of the customers, and the very long sales cycles.</li>
<li>Unlike many other startups (especially consumer oriented web-2.0 startups), Druvaa&#8217;s products require some seriously difficult technology.</li>
</ul>
<p>Rediff has a <a href="http://www.rediff.com/getahead/2008/mar/03druvaa.htm">nice interview</a> with the three co-founders of Druvaa, <a href="http://punetech.com/wiki/Ramani_Kothandaraman">Ramani Kothundaraman</a>, <a href="http://punetech.com/wiki/Milind_Borate">Milind Borate</a> and <a href="http://punetech.com/wiki/Jaspreet_Singh">Jaspreet Singh</a>, which you should read to get an idea of their background, why they started Druvaa, and their journey so far. <a href="http://blog.druvaa.com/">Druvaa also has a very interesting and active blog</a> where they talk technology, and is worth reading on a regular basis.</p>
<p>The rest of this article talks about their technology.</p>
<p>Druvaa has two main products. <a href="http://www.druvaa.com/products/insync.html"><strong>Druvaa inSync</strong></a> allows enterprise desktop and laptop PCs to be backed up to a central server with over 90% savings in bandwidth and disk storage utilization. <a href="http://www.druvaa.com/products/replicator.html"><strong>Druvaa Replicator</strong></a> allows replication of data from a production server to a secondary server near-synchronously and non-disruptively.</p>
<p>We now dig deeper into each of these products to give you a feel for the complex technology that goes into them. If you are not really interested in the technology, skip to the end of the article and come back tomorrow when we&#8217;ll be back to talking about google keyword searches and web-2.0 and other such things.</p>
<h3>Druvaa Replicator</h3>
<div class="wp-caption alignright" style="width: 253px"><img title="Druvaa Replicator architecture" src="http://farm4.static.flickr.com/3110/2801854421_13fcc0c588_o.gif" alt="Overall schematic set-up for Druvaa Replicator" width="243" height="339" /><p class="wp-caption-text">Overall schematic set-up for Druvaa Replicator</p></div>
<p>This is Druvaa&#8217;s first product, and is a good example of how something that seems simple to you and me can become insanely complicated when the customer is an enterprise. The problem seems rather simple: imagine an enterprise server that needs to be on, serving customer requests, all the time. If this server crashes for some reason, there needs to be a standby server that can immediately take over. This is the easy part. The problem is that the standby server needs to have a copy of the all the latest data, so that no data is lost (or at least very little data is lost). To do this, the replication software continuously copies all the latest updates of the data from the disks on the primary server side to the disks on the standby server side.</p>
<p>This is much harder than it seems. A simple implementation would simply ensure that every <em>write</em> of data that is done on the primary is also done on the standby storage at the same time (<em>synchronously</em>). This is unacceptable because each write would take unacceptably long and this would slow down the primary server too much.</p>
<p>If you are not doing synchronous updates, you need to start worrying about write order fidelity.</p>
<h4>Write-order fidelity and file-system consistency</h4>
<p>If a database writes a number of pages to the disk on your primary server, and if you have software that is replicating all these writes to a disk on a stand-by server, it is very important that the writes should be done on the stand-by in the same order in which they were done at the primary servers. This section explains why this is important, and also why doing this is difficult. If you know about this stuff already (database and file-system guys) or if you just don&#8217;t care about the technical details, skip to the next section.</p>
<p>Imagine a bank database. Account balances are stored as records in the database, which are ultimately stored on the disk. Imagine that I transfer Rs. 50,000 from Basant&#8217;s account to Navin&#8217;s account. Suppose Basant&#8217;s account had Rs. 3,00,000 before the transaction and Navin&#8217;s account had Rs. 1,00,000. So, during this transaction, the database software will end up doing two different writes to the disk:</p>
<ul>
<li>write #1: Update Basant&#8217;s bank balance to 2,50,000</li>
<li>write #2: Update Navin&#8217;s bank balance to 1,50,000</li>
</ul>
<p>Let us assume that Basant and Navin&#8217;s bank balances are stored on different locations on the disk (i.e. on different pages). This means that the above will be two different writes. If there is a power failure, after write #1, but before write #2, then the bank will have reduced Basant&#8217;s balance without increasing Navin&#8217;s balance. This is unacceptable. When the database server restarts when power is restored, it will have lost Rs. 50,000.</p>
<p>After write #1, the database (and the file-system) is said to be in an <em>inconsistent</em> state. After write #2, <em>consistency</em> is restored.</p>
<p>It is always possible that at the time of a power failure, a database might be <em>inconsistent</em>. This cannot be prevented, but it can be cured. For this, databases typically do something called <a class="zem_slink" title="Write ahead logging" rel="wikipedia" href="http://en.wikipedia.org/wiki/Write_ahead_logging">write-ahead-logging</a>. In this, the database first writes a &#8220;log entry&#8221; indicating what updates it is going to do as part of the current transaction. And only after the log entry is written does it do the actual updates. Now the sequence of updates is this:</p>
<ul>
<li>write #0: Write this log entry &#8220;Update Basant&#8217;s balance to Rs. 2,50,000; update Navin&#8217;s balance to Rs. 1,50,000&#8243; to the logging section of the disk</li>
<li>write #1: Update Basant&#8217;s bank balance to 2,50,000</li>
<li>write #2: Update Navin&#8217;s bank balance to 1,50,000</li>
</ul>
<p>Now if the power failure occurs between writes #0 and #1 or between #1 and #2, then the database has enough information to fix things later. When it restarts, before the database becomes active, it first reads the logging section of the disk and goes and checks whether all the updates that where claimed in the logs have actually happened. In this case, after reading the log entry, it needs to check whether Basant&#8217;s balance is actually 2,50,000 and Navin&#8217;s balance is actually 1,50,000. If they are not, the database is inconsisstent, but it has enough information to restore consistency. The recovery procedure consists of simply going ahead and making those updates. After these updates, the database can continue with regular operations.</p>
<p>(Note: This is a huge simplification of what really happens, and has some inaccuracies &#8211; the intention here is to give you a feel for what is going on, not a course lecture on database theory. Database people, please don&#8217;t write to me about the errors in the above &#8211; I already know; I have a Ph.D. in this area.)</p>
<p>Note that in the above scheme the order in which writes happen is very important. Specifically, write #0 must happen before #1 and #2. If for some reason write #1 happens before write #0 we can lose money again. Just imagine a power failure after write #1 but before write #0. On the other hand, it doesn&#8217;t really matter whether write #1 happens before write #2 or the other way around. The mathematically inclined will notice that this is a <a href="http://mathworld.wolfram.com/PartialOrder.html">partial order</a>.</p>
<p>Now if there is replication software that is replicating all the writes from the primary to the secondary, it needs to ensure that the writes happen in the same order. Otherwise the database on the stand-by server will be inconsistent, and can result in problems if suddenly the stand-by needs to take over as the main database. (Strictly speaking, we just need to ensure that the partial order is respected. So we can do the writes in this order: #0, #2, #1 and things will be fine. But #2, #0, #1 could lead to an inconsistent database.)</p>
<p>Replication software that ensures this is said to maintain <em>write order fidelity</em>. A large enterprise that runs mission critical databases (and other similar software) will not accept any replication software that does not maintain write order fidelity.</p>
<h4>Why is write-order fidelity difficult?</h4>
<p>I can here you muttering, &#8220;Ok, fine! Do the writes in the same order. Got it. What&#8217;s the big deal?&#8221; Turns out that maintaining write-order fidelity is easier said than done. Imagine the your database server has multiple CPUs. The different writes are being done by different CPUs. And the different CPUs have different clocks, so that the timestamps used by them are not nessarily in sync. Multiple CPUs is now the default in server class machines. Further imagine that the &#8220;logging section&#8221; of the database is actually stored on a different disk. For reasons beyond the scope of this article, this is the recommended practice. So, the situation is that different CPUs are writing to different disks, and the poor replication software has to figure out what order this was done in. It gets even worse when you realize that the disks are not simple disks, but complex disk arrays that have a whole lot of intelligence of their own (and hence might not write in the order you specified), and that there is a volume manager layer on the disk (which can be doing striping and RAID and other fancy tricks) and a file-system layer on top of the volume manager layer that is doing buffering of the writes, and you begin to get an idea of why this is not easy.</p>
<p>Naive solutions to this problem, like using locks to serialize the writes, result in unacceptable degradation of performance.</p>
<p>Druvaa Replicator has patent-pending technology in this area, where they are able to automatically figure out the partial order of the writes made at the primary, without significantly increasing the overheads. In this article, I&#8217;ve just focused on one aspect of Druvaa Replicator, just to give an idea of why this is so difficult to build. To get a more complete picture of the technology in it, see <a href="http://www.druvaa.com/documents/wp_druvaa_replicator_technology.pdf">this white paper</a>.</p>
<h3>Druvaa inSync</h3>
<p>Druvaa inSync is a solution that allows desktops/laptops in an enterprise to be backed up to a central server. (The central server is also in the enterprise; imagine the central server being in the head office, and the desktops/laptops spread out over a number of satellite offices across the country.) The key features of inSync are:</p>
<ul>
<li>The amount of data being sent from the laptop to the backup server is greatly reduced (often by over 90%) compared to standard backup solutions. This results in much faster backups and lower consumption of expensive <a href="http://en.wikipedia.org/wiki/Wide_area_network">WAN</a> bandwidth.</li>
<li>It stores all copies of the data, and hence allows timeline based recovery. You can recover any version of any document as it existed at any point of time in the past. Imagine you plugged in your friend&#8217;s USB drive at 2:30pm, and that resulted in a virus that totally screwed up your system. Simply uses inSync to restore your system to the state that existed at 2:29pm and you are done. This is possible because Druvaa backs up your data continuously and automatically. This is far better than having to restore from last night&#8217;s backup and losing all data from this morning.</li>
<li>It intelligently senses the kind of network connection that exists between the laptop and the backup server, and will correspondingly throttle its own usage of the network (possibly based on customer policies) to ensure that it does not interfere with the customer&#8217;s YouTube video browsing habits.</li>
</ul>
<h4>Data de-duplication</h4>
<div class="wp-caption alignnone" style="width: 560px"><img title="Druvaa inSync overview" src="http://farm4.static.flickr.com/3127/2801899527_a52a4d2882_o.gif" alt="Overview of Druvaa inSync. 1. Fingerprints computed on laptop sent to backup server. 2. Backup server responds with information about which parts are non-duplicate. 3. Non-duplicate parts compressed, encrypted and sent. " width="550" height="537" /><p class="wp-caption-text">Overview of Druvaa inSync. 1. Fingerprints computed on laptop sent to backup server. 2. Backup server responds with information about which parts are non-duplicate. 3. Non-duplicate parts compressed, encrypted and sent. </p></div>
<p>Let&#8217;s dig a little deeper into the claim of 90% reduction of data transfer. The basic technology behind this is called <a href="http://searchstorage.techtarget.com/sDefinition/0,,sid5_gci1248105,00.html"><em>data de-duplication</em></a>. Imagine an enterprise with 10 employees. All their laptops have been backed up to a single central server. At this point, data de-duplication software can realize that there is a lot of data that has been duplicated across the different backups. <em>i.e.</em> the 10 different backups of contain a lot of files that are common. Most of the files in the C:\WINDOWS directory. All those large powerpoint documents that got mail-forwarded around the office. In such cases, the de-duplication software can save diskspace by keeping just one copy of the file and deleting all the other copies. In place of the deleted copies, it can store a shortcut indicating that if this user tries to restore this file, it should be fetched from the other backup and then restored.</p>
<p>Data de-duplication doesn&#8217;t have to be at the level of whole files. Imagine a long and complex document you created and sent to your boss. Your boss simply changed the first three lines and saved it into a document with a different name. These files have different names, and different contents, but most of the data (other than the first few lines) is the same. De-duplication software can detect such copies of the data too, and are smart enough to store only one copy of this document in the first backup, and just the differences in the second backup.</p>
<p>The way to detect duplicates is through a mechanism called document fingerprinting. Each document is broken up into smaller chunks. (How do determine what constitutes one chunk is an advanced topic beyond the scope of this article.) Now, a short &#8220;fingerprint&#8221; is created for each chunk. A fingerprint is a short string (<em>e.g.</em> 16 bytes) that is uniquely determined by the contents of the entire chunk. The computation of a fingerprint is done in such a way that if even a single byte of the chunk is changed, the fingerprint changes. (It&#8217;s something like a checksum, but a little more complicated to ensure that two different chunks cannot accidently have the same checksum.)</p>
<p>All the fingerprints of all the chunks are then stored in a database. Now everytime a new document is encountered, it is broken up into chunks, fingerprints computed and these fingerprints are looked up in the database of fingerprints. If a fingerprint is found in the database, then we know that this particular chunk already exists somewhere in one of the backups, and the database will tell us the location of the chunk. Now this chunk in the new file can be replaced by a shortcut to the old chunk. Rinse. Repeat. And we get 90% savings of disk space. The interested reader is encouraged to google Rabin fingerprinting, shingling, Rsync for hours of fascinating algorithms in this area. Before you know it, you&#8217;ll be trying to figure out how to use these techniques to find who is plagiarising your blog content on the internet.</p>
<p>Back to Druvaa inSync. inSync does fingerprinting at the laptop itself, before the data is sent to the central server. So, it is able to detect duplicate content before it gets sent over the slow and expensive net connection and consumes time and bandwidth. This is in contrast to most other systems that do de-duplication as a post-processing step at the server. At a Fortune 500 customer site, inSync was able reduce the backup time from 30 minutes to 4 minutes, and the disk space required on the server went down from 7TB to 680GB. (<a href="http://www.druvaa.com/documents/Druvaa_inSync_Overview_and_Advantage.pdf">source</a>.)</p>
<p>Again, this was just one example used to give an idea of the complexities involved in building inSync. For more information on other distinguishinging features, check out the <a href="http://www.druvaa.com/documents/Druvaa_inSync_Overview_and_Advantage.pdf">inSync product overview page</a>.</p>
<p>Have questions about the technology, or about Druvaa in general? Ask them in the comments section below (or <a href="mailto:navin@punetech.com">email me</a>). I&#8217;m sure Milind/Jaspreet will be happy to answer them.</p>
<p>Also, this long, tech-heavy article was an experiment. Did you like it? Was it too long? Too technical? Do you want more articles like this, or less? Please <a href="mailto:navin@punetech.com">let me know</a>.</p>
<p><strong>Related articles:</strong></p>
<ul>
<li><a href="http://www.techcrunchit.com/2008/07/09/druvaa-dusts-for-fingerprints-to-save-bandwidth-and-storage/">TechCrunchIT coverage of Druvaa</a></li>
<li><a href="http://punetech.com/druvaa-presentation-video/">Video of Druvaa&#8217;s product pitch</a></li>
<li><a href="http://punetech.com/understanding-rpo-and-rto-in-backups/">Understanding RPO and RTO in backups</a></li>
<li><a href="http://blog.druvaa.com/2008/08/23/pc-backup-six-must-have-features/">PC Backups &#8211; Six must have features</a> &#8211; from the Druvaa Blog</li>
<li><a href="http://blog.druvaa.com/2008/08/04/druvaa-insync-v21-released/">The fast now gets faster &#8211; Druvaa inSync 2.1 releases</a> &#8211; from the Druvaa Blog</li>
</ul>
<div class="zemanta-pixie" style="margin-top: 10px; height: 15px;"><a class="zemanta-pixie-a" title="Enhanced by Zemanta" href="http://www.zemanta.com/"><img class="zemanta-pixie-img" style="border: medium none; float: right;" src="http://img.zemanta.com/zemified_c.png?x-id=22151a05-a92c-48d9-af0b-8518fcd2c3eb" alt="Enhanced by Zemanta" /></a></div>
]]></content:encoded>
			<wfw:commentRss>http://punetech.com/technology-overview-druvaa-continuous-data-protection/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Druvaa Presentation Video</title>
		<link>http://punetech.com/druvaa-presentation-video/</link>
		<comments>http://punetech.com/druvaa-presentation-video/#comments</comments>
		<pubDate>Fri, 11 Apr 2008 20:37:22 +0000</pubDate>
		<dc:creator>Navin Kabra</dc:creator>
				<category><![CDATA[Overviews]]></category>
		<category><![CDATA[video]]></category>
		<category><![CDATA[backup]]></category>
		<category><![CDATA[profile]]></category>
		<category><![CDATA[protodotin]]></category>
		<category><![CDATA[replication]]></category>
		<category><![CDATA[startups]]></category>
		<category><![CDATA[storage]]></category>

		<guid isPermaLink="false">http://punetech.com/druvaa-presentation-video/</guid>
		<description><![CDATA[This is a video that Pune backup and replication startup Druvaa made at the proto.in event in January 2008. : Click here if you cannot see the video above. From Druvaa&#8217;s Blog. See also other other proto.in videos from YouTube]]></description>
			<content:encoded><![CDATA[<p>This is a video that <a href="/wiki/Druvaa">Pune backup and replication startup Druvaa</a> made at the <a href="http://proto.in/">proto.in</a> event in January 2008.  :</p>
<p><object width="425" height="355"><param name="movie" value="http://www.youtube.com/v/fxmXG0A_fZs&#038;hl=en"></param><param name="wmode" value="transparent"></param><embed src="http://www.youtube.com/v/fxmXG0A_fZs&#038;hl=en" type="application/x-shockwave-flash" wmode="transparent" width="425" height="355"></embed></object></p>
<p>Click <a href="http://www.youtube.com/v/fxmXG0A_fZs">here</a> if you cannot see the video above. From <a href="http://blog.druvaa.com/2008/04/09/druvaa-presentation-protoin/">Druvaa&#8217;s Blog</a>. See also other <a href="http://blog.proto.in/2008/04/09/protoin-videos-are-out/">other proto.in videos</a> from <a href="http://www.youtube.com/user/startupsindia">YouTube</a></p>
]]></content:encoded>
			<wfw:commentRss>http://punetech.com/druvaa-presentation-video/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Understanding RPO and RTO in backups</title>
		<link>http://punetech.com/understanding-rpo-and-rto-in-backups/</link>
		<comments>http://punetech.com/understanding-rpo-and-rto-in-backups/#comments</comments>
		<pubDate>Mon, 24 Mar 2008 10:26:05 +0000</pubDate>
		<dc:creator>Navin Kabra</dc:creator>
				<category><![CDATA[Technology]]></category>
		<category><![CDATA[backup]]></category>
		<category><![CDATA[cdp]]></category>
		<category><![CDATA[disaster recovery]]></category>
		<category><![CDATA[druvaa]]></category>
		<category><![CDATA[rpo]]></category>
		<category><![CDATA[rto]]></category>
		<category><![CDATA[storage]]></category>

		<guid isPermaLink="false">http://punetech.com/understanding-rpo-and-rto-in-backups/</guid>
		<description><![CDATA[This post is based on an article posted by Jaspreet Singh on the Druvaa Blog. Druvaa is a Pune-based startup based on Continuous data protection (CDP) technology. Recovery Point Objective (RPO) and Recovey Time Objective (RTO) are some of the most important parameters of a disaster recovery or data protection plan. These objectives guide the [...]]]></description>
			<content:encoded><![CDATA[<p><em>This post is based on an article posted by <a href="/wiki/Jaspreet_Singh">Jaspreet Singh</a> on the <a href="http://blog.druvaa.com/2008/03/22/understanding-rpo-and-rto/">Druvaa Blog</a>. <a href="/wiki/Druvaa">Druvaa</a> is a <a href="/wiki/Category:Startups">Pune-based startup</a> based on <a href="/wiki/CDP">Continuous data protection (CDP)</a> technology.</em></p>
<p>Recovery Point Objective (RPO) and Recovey Time Objective (RTO) are some of the most important parameters of a disaster recovery or data protection plan. These objectives guide the enterprises in choosing an optimal data backup (or rather restore) plan.</p>
<p><strong>RPO &#8211; Recovery Point Objective</strong> (<a href="http://en.wikipedia.org/wiki/Recovery_Point_Objective" title="Wikipedia RPO definition" target="_blank">wikipedia</a>)</p>
<blockquote><p>“Recovery Point Objective (RPO) describes the amount of data lost &#8211; measured in time. Example: After an outage, if the last available good copy of data was from 18 hours ago, then the RPO would be 18 hours.”</p></blockquote>
<p>In other words it is the answer to the question &#8211; <strong>“<em>Up to what point in time can the data  be recovered ?</em>“</strong>.</p>
<p><strong>RTO &#8211; Recovery Time Objectives</strong> (<a href="http://en.wikipedia.org/wiki/Recovery_Time_Objective" title="Wikipedia RTO definition" target="_blank">wikipedia</a>)</p>
<blockquote><p>“The Recovery Time Objective (RTO) is the duration of time and a service level within which a <a href="http://en.wikipedia.org/wiki/Business_process" title="Business process">business process</a> must be restored after a disaster in order to avoid unacceptable consequences associated with a break in <a href="http://en.wikipedia.org/wiki/Business_continuity" title="Business continuity">continuity</a>.</p>
<p>[...]</p>
<p>It should be noted that the RTO attaches to the business process and not the resources required to support the process.”</p></blockquote>
<p>In another words it is the answer to the question &#8211; <strong>“<em>How much time did you take to recover after notification of a business process disruption ?</em>“</strong></p>
<p>The RTO/RPO and the results of the <a href="http://en.wikipedia.org/wiki/Business_Impact_Analysis" title="Wikipedia BIA definition" target="_blank">Business Impact Analysis</a> (BIA) in its entirety provide the basis for identifying and analyzing viable strategies for inclusion in the business continuity plan. Viable strategy options would include any which would enable resumption of a business process in a time frame at or near the RTO/RPO. This would include alternate or manual workaround procedures and would not necessarily require computer systems to meet the objectives.</p>
<p>There is always a gap between the actuals (<strong>RTA/RPA</strong>) and objectives introduced by various manual and automated steps to bring the business application up. These actuals can only be exposed by disaster and business disruption rehearsals.</p>
<p><strong>Some Examples &#8211; </strong></p>
<p><em>Traditional Backups</em></p>
<p>In traditional tape backups, if your backup plan takes 2 hours for a scheduled backup at 0600 hours and 1800 hours, then a primary site failure at 1400 hrs would leave you with an option of restoring from the 0600 hrs backup which means RPA of 8 hours and 2 hours RTA.</p>
<p><em>Continuous Replication</em></p>
<p>Replication provides higher RPO guarantees as the target system  contains a mirrored image of the source. The RPA values depend upon how fast the changes are applied and if the replication is synchronous or asynchronous. RPO is dependent only on how soon the data on target/replicated site can be made available to the application.</p>
<h3>About Druvaa Replicator</h3>
<p>Druvaa Replicator is a Continuous Data Protection and Replication (CDP-R) product which near-synchronously and non-disruptively replicates changes on production sever to target site and provides point-in-time snapshots for  instant data access.</p>
<p>The partial  synchronous replication ensures that the data is written to a local or remote cache (caching server) <u><em>before</em></u> its application can write locally. <strong>This ensures up to 5 sec RPO guarantees</strong> . CDP technology (still beta) enables up to 1024 snapshots (beta) at that target storage which helps the admin to access <strong>current or any past point-in-time </strong>consistent image of data instantly, <strong>ensuring under 2 sec RTO</strong>.</p>
<p>More Information &#8211; <a href="http://www.druvaa.com/products/replicator/" title="Druvaa Replicator">http://www.druvaa.com/products/replicator/</a></p>
]]></content:encoded>
			<wfw:commentRss>http://punetech.com/understanding-rpo-and-rto-in-backups/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
	</channel>
</rss>

