Tag Archives: backup

Druva, Pune-based backup software startup, gets $25M in Series D Funding

Techcrunch reports that Druva, the Pune-based backup software startup, has just received $25MM In Series D Funding from Sequoia, along with Nexus Venture Partners and Tenaya capital, bringing their total funding to $92MM.

Usually, online backup providers get compared with Dropbox and Box.com and other, famous consumer oriented backup providers, but in keeping with Pune’s tradition of Enterprise Software products, Druva is differentiating itself by focusing on the B2B market:

First of all, they have concentrated strictly on the enterprise market, foregoing the SMB and consumer markets that bring with them the lure of big user numbers, but lower revenue.

Secondly, rather than being strictly a backup tool or sync and share, they have chosen a different route, what he says is more intelligent than simply offering “a data graveyard.” Instead, they look at data protection and governance on mobile devices, working with eDiscovery vendors like Recommind and AccessData to help companies entangled in litigation isolate and remove content involved in the lawsuit from the affected mobile devices with minimum possible disruption to the owner.

And thirdly they can provide mobile device recovery if a device is lost to get a user back up and running  quickly, and they can help IT assess what if any essential data might have been compromised..

and, as a result, they’re playing in a slightly different market than traditional cloud storage/backup providers:

puts them more in competition with traditional backup/governance/eDiscovery vendors like EMC, Symantec and HP than cloud storage vendors like Dropbox and Box. In fact, he says his sales typically involve both legal and IT, so it’s a bit of a different play than pure cloud storage would suggest.

The approach has gotten the attention of customers like NASA, Pfizer, Dell and Hitachi among others, and they have gained 900 customers since their last funding round, growing from 2100 to 3000 enterprise customers in the last 10 months.

Read the full article.

Or check out all articles on PuneTech tagged ‘backup’ – you’ll get a history of Druva over the years.

LiveBlog: Intelligence at the Edge

This is a live-blog of the event organized by @NexusVP, with the CTOs of @DruvaInc, @Helpshift, and @Uniken_Inc, talking about “Intelligence at the Edge” – i.e. the increasing amount of enterprise data that is now found in mobiles, laptops, and other devices of their employees, and how that is changing the world of enterprise software.

The panel consisted of these people:

  • Jishnu Bhattacharjee (@b_jishnu), of Nexus Venture Partners:
  • Sanjay Deshpande, CEO and Chief Innovation Officer at Uniken, a Pune-based enterprise security company.
  • BG (@ghoseb), CTO and Co-founder at Helpshift, a Pune-based company that provides a software platform that allows mobile app developers to incorporate high quality customer service and support into their apps.
  • Milind Borate, CTO and Co-Founder at Druva, a Pune-based company that provides backup solutions for the enterprise.

Here is a random list of interesting stuff said during this discussion:

  • More and more data and intelligence is being pushed at the edges of the corporate networks. Translation: Imagine a large company. It has an IT department that runs many servers and complicated applications in their labs and data centers. In the past, most of the data and intelligence of the enterprise was in these servers. But in recent times, the devices in the hands of the employees (the desktops, laptops, mobile phones) have more and more powerful apps, more sensitive data, and more unique data (i.e. data which is not replicated on the servers). This is the “edge” of the enterprise.
  • What does Druva do? Druva looks at data that is sitting on laptops, mobiles, and other devices at the edge from 4 different angles:
    • Backup of the data
    • Data theft prevention if the data falls in the wrong hands
    • Analyzing the data on all these devices and providing intelligence (actionable insights)
    • Being able to share that data with others: colleagues within the company, but also outside – customers, vendors

  • What does Helpshift do? Built a SDK that mobile developers can download and incorporate into their app to automatically and easily get very sophisticated customer service into their app. For example:
    • Reduce customer service calls through the use of in-app FAQs, which can easily be updated by the developer – updates to the FAQ can be pushed to all customers mobiles automatically
    • When a customer reports an issue, the Helpshift runtime uses breadcrumbs to keep track of what the customer was doing just before hitting the issue, so that without any extra effort on the part of the customer, details of the device, the configuration and what exactly caused the bug are sent to the server
    • Now they are focusing on building machine-learning based higher level features. Their bigger customers have millions of daily users and get thousands of support issues per day. So, they need sophisticated analysis to figure out the common patterns.
    • 80% of Helpshift’s market is the US and the remaining 20% is from the rest of the world, including Europe and Latin America
    • 80% of the money comes from iPhone users. But Android is still young, and growing.
  • What does Uniken do? Uniken realized that most of the technology on the internet has been driven by media companies who want to sell ads on their websites, and maximize the number of users, whereas enterprises (like banks) are trying to use the same internet to give a very secure experience to their (captive) users. There is a mismatch here, and what the enterprises need is a much more secure environment where they have much more control over all the pieces in the chain – including the network and the devices being used by the customers. This is the area Uniken is in.
  • Indian market vs US market: In India, there is a software/web/mobile market, but a lot of it is mostly consumer oriented. The B2B software market is still not really well developed, and is it not easy to make much money here.
    • 60% of Druva’s revenue comes from the US, 30% from Europe, and 10% comes from the rest of the world (India included).
    • Druva started off trying to sell in the Indian market. They tried in-person enterprise sales, and had a tough time. In the meantime, they started getting enquiries from the US from people who had simply downloaded their software, tried it out, liked it, and wanted to buy it. Over time, this increased, and they soon realized that US was where the real market was.
    • One of the key things that helped them was that they built software that was very easy to download off the web and install without requiring any help from the company itself. This was unheard of in the enterprise backup business (which was dominated by companies like Symantec/Veritas, EMC etc.)
    • Druva used Google adwords very effectively to market its products. The big players like Symantec/Veritas, EMC have very large sales organizations with great reach, and it would have been very difficult for Druva to compete with them in terms of reach of their salespeople. But Google adwords allowed them to reach out to customers all over the world.
  • BigData is big. The number of devices (mobiles, laptops, desktops) that people are using is so huge, that with even minimal intelligence in each device the amount of data is huge – petabytes.
    • Collect as much data as possible. You will find uses for it later.
    • Don’t worry about where/how to store the data. Just store it in flat files initially, and then later you can figure out where to put it to analyze it.
    • No single software will solve all your problems. Use everything – SQL, NoSQL, Hadoop, etc.
    • What has made this possible is the fact that all these devices are now internet connected, and hence all the data can be collected and stored centrally in the cloud. Further, again because of the internet connectivity, it is possible to push software updates to the devices, so the data collection abilities can be continuously upgraded.
  • How has Uniken managed to sell into the Indian enterprise market? It is currently 100% in the Indian market – and it sees India as a big market, with lots of potential. Most Indian software startups struggle with this (as seen by Druva’s experience above). You need to do this:
    • In any company, identify the right person – the one who has enough vision to do things differently, try new products, and who can also get things done in that company
    • Choosing the right champion in the customer company is key
    • Keep meeting the right people, keep selling them your story, keep plugging away, until the sale happens
    • Think of an enterprise sale as dating with a long-term relationship in mind
    • Have lots of patience. Don’t give up. India is a market requires a lot of patience.

Interview with Vaultize: Pune based Enterprise File Sharing and Sync Service

PuneTech has just learnt that Pune-based Enterprise file sharing and sync (EFSS) startup Vaultize has just received funding from Tata Capital Innovations Fund.

Vaultize builds cloud-based solutions for enterprise endpoint (i.e. laptop/desktop/mobile devices) sync, backup, encryption, security, and anytime/anywhere access to corporate file servers – all of this in a way that is visible to, and under the control of administrators in the company.

Through Vaultize’s endpoint encryption, sensitive corporate information remains encrypted on endpoints ensuring protection against unauthorized access and potential data leakage from lost or stolen device. In addition to on-disk protection, Vaultize’s patent pending encryption technology used in file sharing and backup ensures that the data is encrypted or decrypted only on endpoints – guaranteeing end-to-end protection

Vaultize, which so far has a presence only in India (with some sales outside via channel partners) will use the funding to scale up its expansion across the world, with immediate plans to establish operations in the US and Europe in conjunction with channel partners. The company will also use the funds towards building up its sales, marketing and engineering teams, and to enhance its global partner program.

PuneTech spoke with CEO and co-founder Anand Kekre. Here are some excerpts from the conversation:

Question: Another Pune startup in the storage and backup space, Druva.com, has been in the limelight in the last few years. How is Vaultize different?

Actually, we are not in the same market as Druva. Druva is more of a backup solution. Also, while they do have cloud-based backup, their focus traditionally has been on on-premise backup.

By contrast we are in what Gartner calls the EFSS (Enterprise File Sync and Share) space. We are more concerned with providing access to enterprise data from any device, from any where, without compromising on security. We ensure the enterprise data can be accessed from any device – including personal mobile devices – while at all times ensuring that the data is encrypted at all times, whether it is being sent over the network, or when it is stored on the disk in the device.

There are two major things we do that are unique to our solution. First, all the data that is being shared, synced, or in general being moved around via Vaultize is encrypted at source. This ensures that the data is never at risk once it leaves the device. Specifically, any data going over the network, or stored on Vaultize servers is always encrypted and the encryption key is only available at the endpoints (i.e. devices). Second, we do data de-duplication at source. That is, if the speed/latency and network bandwidth consumption is greatly improved by detecting whether the Vaultize servers already have a copy of the data that needs to be sent/synced (for example, same attachment being shared by various people), and only sending across the unique content. And this is achieved without losing the benefits of encryption-at-source, using patent-pending technology.

Question: So, your software can ensure that use of mobile devices with enterprise data is secure?

Across the world, there has been a proliferation of consumer file sharing and Bring-Your-Own-Device (BYOD) trends, and this has resulted in an increase in data loss, security and compliance risks.

There are two different aspects to ensuring security for BYOD devices. First, the enterprise needs to ensure that it is safe to allow a mobile device to connect to the enterprise network – i.e. it is an authorized device, and it only has authorized applications, and more specifically, does not have viruses and malware. This area is called Mobile Device Management (MDM). Vaultize does not deal with this issue.

Once a device has been allowed to connect to the network, Vaultize ensures that the data on the device is safe and secure by encrypting all the sensitive data on the disk, by being able to sync data across various devices, and geographies, and by providing secure (via encryption) access to the data from anywhere, in a way that compiles with all the enterprise security policies.

And it does all of this in a way that can be easily managed and controlled by the enterprise IT administrators.

Question: What is your team size currently, and how are you planning on expand
ing it?

Currently, we are about 15 people, all in India. Over the next year, we hope to expand our team to about 30-35 people. We will be looking to expand not only in the area of sales and marketing, but also engineering, QA, and support.

For more information about Vaultize, see http://www.vaultize.com

Pune-based Druva get $12M in Series B from Nexus/Sequoia – This time its official

Pune-based Druva software, which makes enterprise backup software, has just cosed a $12 million round of funding from Nexus Venture Partners and existing investory Sequoia. In April 2010, they had raised $5 million from Sequoia and the Indian Angel Investors.

This funding is going to be used by Druva to make a strong push into cloud-based backup. Cloud infrastructure for a bandwidth and storage intensive like backup can be a significant expense, and of course, sales and marketing too.

A few weeks back a partially inaccurate version of this story had been leaked by Economic Times and was reported by PuneTech, but we “withdrew” the story after Druva called us up and let us know that it was premature to talk about it. Talking about a company’s funding round before everything is finalized and the money is in the bank is dangerous for a number of reasons including:

  • Funding is a tricky thing and there are no guarantees until the money is in the bank. Many things can, and do go wrong. One bad day on the stock market can cause VCs to reconsider any deals that are not final.
  • From the time the startup received a term-sheet from the VC until the deal is finalized, there is usually a no shopping clause which prevents the startup from talking about the details of the deal with anybody else. This is to ensure that the startup does not use this offer to try and create a bidding war between VCs. Hence, if the details leak out the VCs might feel that the startup is trying to violate the no shopping clause
  • Most importantly, if word leaks out that a VC is funding a company for amount X, then in next few days is is possible that the VC’s contacts in the industry (probably other VCs) keep saying “Why are you paying X? I don’t think it is worth more than Y?” and this can cause the VC to reconsider the deal. This is very dangerous for the startup.

This time however, the news is official (and is actually better than the deal reported by Indian Express).

As for what Druva does exactly, and why it is one of our favorite Pune companies, just read the previous article, which had a bunch of links. Here are some other interesting tidbits about Druva:

  • “Druva’s disruptive innovation reduces the storage footprint and bandwidth requirement for backup by orders of magnitude compared to other industry solutions” -Jishnu Bhattacharjee, Nexus
  • Druva, founded in 2007, has amassed more than 750 customers and protects more than 300,000 endpoints (i.e. servers, laptops, PCs) worldwide
  • InSync’s global, source-based deduplication reduces bandwidth and storage by 90 percent while providing 100 percent accuracy for Microsoft Outlook and Office applications

Here’s the full press release regarding this news

News of Druva’s funding was inaccurate and premature

On Friday, based on an Economic Times report, we reported that Pune-based enterprise backup software provider Druva has received $10 million in funding from Nexus VP. Unfortunately this news appears to be inaccurate.

Here is a comment from Jaspreet Singh, CEO of Druva:

Thanks Navin, but this news is not very accurate. This was unethically leaked and then misreported by Peerzada (abrar.shz@timesgroup.com) of ET for some cheap thrills.

Not sure when would people this these grow up and stop screwing lives of entrepreneurs who are already fighting against all the odds.

You have been a great supporter and I would give you a call sometime next week to give accurate information and some more good news.

Basically, Druva is indeed in an advanced stage in their second round funding process, but it is not done yet, and they cannot talk about the details of the amount or the investors involved. The details that came out in the ET report are inaccurate.

We wish Druva luck, and hope to hear the official good news sometime soon.

Backup Software Provider Druva.com get $10 million funding from Nexus

Update: It appears that the report in ET, on which this article is based, was inaccurate. Please see this update.

Pune-based startup Druva, which sells enterprise backup software, has just closed a second round of funding worth $10 million from Nexus Venture Partners, reports Economic Times.

In April 2010, Druva had raised $5 million from Sequoia and the Indian Angel Network. At that time, these are the reasons we gave for why we liked Druva:

  • Druva is a purely homegrown startup. This is not a company started by someone in the US setting up a development center in India.
  • Druva is a product startup. It is not a services company. Hence, it has a potential for exponential growth and returns.
  • Druva is not done by serial entrepreneurs. The co-founders are all first-time entrepreneurs who quit their big-company jobs to start Druva. This should give hope to all the first-time entrepreneurs in Pune.

Druva has been one of PuneTech’s favorite startups and we have covered it extensively in the past, so, frankly, there isn’t much new that we’ll be able to say about it. Instead, we’ll simply point readers to the older articles:

We also want to point out that Druva is one of the sponsors of PyCon – the International Python Conference that’s happening in Pune next month.

We wish Druva luck, and although getting another round of VC funding is not as good an indicator of success as an IPO or an acquisition, we would still like to repeat what we said in April 2010:

  • We now have in our midst a startup success story that will hopefully inspire a 100 new software product startups in Pune.

Pune Startup launches Vaultize – Cloud-Based Enterprise Backup & DR

Pune Startup Anoosmar Technologies, has just come out of stealth mode, and announced the public beta of Vaultize, which they describe as:

Vaultize is next generation data protection: cloud-based backup and disaster recovery that also enables collaboration between users, synchronization of devices and sharing over web. Vaultize turns your zero-returns investment in backup into an asset that improves availability, increases productivity and makes sharing easy.

Anoosmar Technoloies has been founded by Anand Kekre and Ankur Panchbudhe, both of whom are Pune old-timers, with an ex-Veritas (Symantec), and ex-McAfee background. Both of them have been in the data protection, security, and storage space for over 10 years, and have deep expertise in enterprise infrastructure software. Between them they have 64 US patents.

Before you dismiss Vaultize by comparing it with Dropbox, or , remember that Vaultize is not a consumer product – it is targeting the enterprise space. In that sense, I see Vaultize as more of a competitor to Pune’s Druva. However, given the backgrounds of the founders of Druva and founders of Vaultize, I would be tempted to guess that Druva is likely to be more interested in enterprise backup, and replication and generally areas more to do with performance and availability in an enterprise, while Vaultize is likely to move more in the direction of archiving, and e-discovery and generally areas more to do with risk management and legal compliance. But that’s pure speculation – I might be wrong.

Also check out the customer case studies page and the management team page.

Druva is one of the few Pune software product companies that has received funding from well known VCs, and hence, Anoosmar, which has a similar pedigree and similar target markets, is a company to watch closely.

“World-class software products can come out of India” – Interview with CEO of Druva

We now have in our midst a startup success story that will hopefully inspire a 100 new software product startups in Pune.

PuneTech and the Pune Open Coffee Club both started about 2 years ago, and the steadily increasing memberships and vitality of these communities points to a very strong startup community in Pune. However, throughout those two years, one question always cast a doubt on the long-term potential of this startup ecosystem. And that question was: Where are the success stories?

Druva Software is a Pune-based backup software product startup. Click on the logo to see all PuneTech articles about backup software (mostly about Druva)
Druva Software is a Pune-based backup software product startup. Click on the logo to see all PuneTech articles about backup software (mostly about Druva)

Druva software (previously known as Druvaa) which just closed a $5 million round of funding led by Sequoia Capital answers that question. Of course, getting a round of VC funding is not as good an indicator of success as an IPO or an acquisition. And of course, there have been other successes in the past. But still this news is great, for the following reasons:

  • Druva is a purely homegrown startup. This is not a company started by someone in the US setting up a development center in India.
  • Druva is a product startup. It is not a services company. Hence, it has a potential for exponential growth and returns.
  • Druva is not done by serial entrepreneurs. The co-founders are all first-time entrepreneurs who quit their big-company jobs to start Druva. This should give hope to all the first-time entrepreneurs in Pune.
  • There haven’t been many high-profile successes in recent times, and this one comes as a breath of fresh air.

Druva has been one of PuneTech’s favorite startups. With 5 different PuneTech articles, this is probably the company that has received maximum coverage from us. And a quick look at the articles gives hints as to why:

  • It is a product company, which is always more interesting than a services company; it’s especially interesting to watch the product evolve over time.
  • It requires some very complex technology, not something that any company could easily build. Plus, they are happy to write detailed technical articles about the technology that underlies their products.
  • It has repeatedly featured in high profile startup events in India, from proto.in to the NASSCOM summit

PuneTech spoke to Jaspreet Singh, CEO of Druva, over the phone, and here are some quick notes based on this conversation. There are a number of unique features here that other Pune entrepreneurs would do well to take note of.

On the current state of the company

Druva has $2.5 million revenue run rate, coming from about 400+ customer deployments. Most of this is from their flagship product, the inSync remote laptop disk-to-disk backup solution. Recently they also introduced Phoenix a remote server disk-to-disk backup solution. They have about 23 employees, most of them in Pune, with a few sales people elsewhere. The product is developed entirely in Pune.

How do they manage enterprise support for 400 customers with such a small employee base?

Although supporting their customers is a very high priority for Druva, one of the things they focus on very hard is to make the product very easy to use and very easy to support – so that to a large extent, most of their customers don’t really require any support. They have a “release often” philosophy which ensures that customers always have the latest, bug-fixed, version of the software.

Another area that they put a lot of effort in, is in ensuring that the product is easy to install. A lot of their customer testimonials speak of how easy it was to self-install the software. By contrast, the comparable software from the more established players in the market requires professional services help for installation.

How do they manage sales without a strong US/Europe presence?

Instead of the tradition of hand-holding that is a common feature of enterprise sales in this domain, Druva decided to go a different route. They made their software freely downloadable from the web, and made it easy to install and try. As a result, most of their customers approach them after having first tried the product out via the website. And many of their sales, even large ones, have happened over skype/email, with no in-person customer visits.

How do they compete with the large MNCs, the established players in this market?

We were very surprised to learn that Druva does not try to compete with the incumbents on cost. Jaspreet told us that in fact the average Druva sale tends to be 3x more expensive than the comparative offering from the established players. Druva scores on ease of use, simplicity, and most imporantly, the technology.

Jaspreet points out that one of Druva’s strong points is the easy-to-use source-level de-duplication. Which means that when backing up a laptop, they can ignore duplicate content even before the data is sent to the remote backup server. Specifically consider the gigabytes of windows operating system files on your laptop. Most of these files are likely to be identical across all laptops of a company. Druva’s software would know beforehand that there is a copy of those files on the backup server, and would never send it across. Such optimizations ensure that backing up 15 TeraBytes of data from a number of different laptops just results in about 2 or 3 TeraBytes being send across the network. This results in an increase in speed, reduction in network bandwidth consumed, and in disk-space consumed.

By contrast, traditional backup systems do de-duplication at the destination. Which means that all the data is sent to the server over the network, and only then is the server able to remove duplicate content. This means that the speed and network bandwidth improvements are lost.

Also, claims Jaspreet, Druva’s backups are fully searchable – a feature that is not available with most competitors.

What is their primary challenge currently?

Jaspreet says that they want to build a high-quality, world-class product, and for that he needs lots of high-quality, world-class people. While they’ve obviously managed to build a team like that which got them so far, they need many more such people in the coming days, and that’s a significant challenge. He says that it is difficult, if not impossible to find “readymade” world-class talent here (even when “world-class” salary and/or equity is offered!). Instead, he feels that the only approach that works is to find individuals (whether freshers or industry veterans) who have the right attitude and potential and then nurture them into the required shape.

(As an aside, we’d like to point out that is a pattern. Pretty much every startup we talk to mentions hiring of high-quality people as one of their primary challenges. This is a problem that needs a solution, and I’m hoping that some entrepreneur in Pune is looking at this as an opportunity.)

Parting thoughts: In the Druva co-founders, we have people who have been through the entire process, from zero to VC-funding, in Pune, recently. And they are nice guys. Pune entrepreneurs should take advantage of this, and flock to them for guidance, advice and mentorship.

Reblog this post [with Zemanta]

Druvaa: From proto.in presenter to proto.in sponsor in 18 months

Pune based backup software startup Druvaa has gone from being a 3-person startup that presented at proto.in 18 months back, to a 16-person company that is profitable, and sponsored proto.in this weekend. We caught up with Jaspreet Singh of Druvaa during proto and had a conversation with him about how they are doing.


Note: Please turn up the volume. The sound quality is not-so-great. (Hopefully future videos will be better.)

Please also check out the older PuneTech articles about Druvaa:

Interesting note: You’ll notice that over the years, Druvaa has shifted gears from selling continuous protection (which they started off with) to remote backups (which is their primary product now). This is a feature of any startup – adapting to the needs of the market.

Reblog this post [with Zemanta]

Understanding Data De-duplication

Druvaa is a Pune-based startup that sells fast, efficient, and cheap backup (Update: see the comments section for Druvaa’s comments on my use of the word “cheap” here – apparently they sell even in cases where their product is priced above the competing offerings) software for enterprises and SMEs. It makes heavy use of data de-duplication technology to deliver on the promise of speed and low-bandwidth consumption. In this article, reproduced with permission from their blog, they explain what exactly data de-duplication is and how it works.

Definition of Data De-duplication

Data deduplication or Single Instancing essentially refers to the elimination of redundant data. In the deduplication process, duplicate data is deleted, leaving only one copy (single instance) of the data to be stored. However, indexing of all data is still retained should that data ever be required.

Example
A typical email system might contain 100 instances of the same 1 MB file attachment. If the email platform is backed up or archived, all 100 instances are saved, requiring 100 MB storage space. With data deduplication, only one instance of the attachment is actually stored; each subsequent instance is just referenced back to the one saved copy reducing storage and bandwidth demand to only 1 MB.

Technological Classification

The practical benefits of this technology depend upon various factors like –

  1. Point of Application – Source Vs Target
  2. Time of Application – Inline vs Post-Process
  3. Granularity – File vs Sub-File level
  4. Algorithm – Fixed size blocks Vs Variable length data segments

A simple relation between these factors can be explained using the diagram below –

Deduplication Technological Classification

Target Vs Source based Deduplication

Target based deduplication acts on the target data storage media. In this case the client is unmodified and not aware of any deduplication. The deduplication engine can embedded in the hardware array, which can be used as NAS/SAN device with deduplication capabilities. Alternatively it can also be offered as an independent software or hardware appliance which acts as intermediary between backup server and storage arrays. In both cases it improves only the storage utilization.

Target Vs Source Deduplication

On the contrary Source based deduplication acts on the data at the source before it’s moved. A deduplication aware backup agent is installed on the client which backs up only unique data. The result is improved bandwidth and storage utilization. But, this imposes additional computational load on the backup client.

Inline Vs Post-process Deduplication

In target based deduplication, the deduplication engine can either process data for duplicates in real time (i.e. as and when its send to target) or after its been stored in the target storage.

The former is called inline deduplication. The obvious advantages are –

  1. Increase in overall efficiency as data is only passed and processed once
  2. The processed data is instantaneously available for post storage processes like recovery and replication reducing the RPO and RTO window.

the disadvantages are –

  1. Decrease in write throughput
  2. Extent of deduplication is less – Only fixed-length block deduplication approach can be use

The inline deduplication only processed incoming raw blocks and does not have any knowledge of the files or file-structure. This forces it to use the fixed-length block approach (discussed in details later).

Inline Vs Post Process Deduplication

The post-process deduplication asynchronously acts on the stored data. And has an exact opposite effect on advantages and disadvantages of the inline deduplication listed above.

File vs Sub-file Level Deduplication

The duplicate removal algorithm can be applied on full file or sub-file levels. Full file level duplicates can be easily eliminated by calculating single checksum of the complete file data and comparing it against existing checksums of already backed up files. It’s simple and fast, but the extent of deduplication is very less, as it does not address the problem of duplicate content found inside different files or data-sets (e.g. emails).

The sub-file level deduplication technique breaks the file into smaller fixed or variable size blocks, and then uses standard hash based algorithm to find similar blocks.

Fixed-Length Blocks v/s Variable-Length Data Segments

Fixed-length block approach, as the name suggests, divides the files into fixed size length blocks and uses simple checksum (MD5/SHA etc.) based approach to find duplicates. Although it’s possible to look for repeated blocks, the approach provides very limited effectiveness. The reason is that the primary opportunity for data reduction is in finding duplicate blocks in two transmitted datasets that are made up mostly – but not completely – of the same data segments.

Data Sets and Block Allignment

For example, similar data blocks may be present at different offsets in two different datasets. In other words the block boundary of similar data may be different. This is very common when some bytes are inserted in a file, and when the changed file processes again and divides into fixed-length blocks, all blocks appear to have changed.

Therefore, two datasets with a small amount of difference are likely to have very few identical fixed length blocks.

Variable-Length Data Segment technology divides the data stream into variable length data segments using a methodology that can find the same block boundaries in different locations and contexts. This allows the boundaries to “float” within the data stream so that changes in one part of the dataset have little or no impact on the boundaries in other locations of the dataset.

ROI Benefits

Each organization has a capacity to generate data. The extent of savings depends upon – but not directly proportional to – the number of applications or end users generating data. Overall the deduplication savings depend upon following parameters –

  1. No. of applications or end users generating data
  2. Total data
  3. Daily change in data
  4. Type of data (emails/ documents/ media etc.)
  5. Backup policy (weekly-full – daily-incremental or daily-full)
  6. Retention period (90 days, 1 year etc.)
  7. Deduplication technology in place

The actual benefits of deduplication are realized once the same dataset is processed multiple times over a span of time for weekly/daily backups. This is especially true for variable length data segment technology which has a much better capability for dealing with arbitrary byte insertions.

Numbers
While some vendors claim 1:300 ratios of bandwidth/storage saving. Our customer statistics show that, the results are between 1:4 to 1:50 for source based deduplication.

Reblog this post [with Zemanta]