Most PuneTech readers get PuneTech home delivered via our by email or via RSS (free!). Which is great – that is how we recommend keeping in touch with what going on in tech in Pune.
However, one disadvantage of that is that you miss out on all the comments that readers leave on the post. Compared to some other sites, we do not get a whole lot of comments, but the ones we do get are usually good quality comments that add something to the post and we felt that there is a need to allow our readers to get a RSS feed or daily email of the latest comments. So here goes: Get comments on PuneTech posts by our readers via email or via RSS.
To motivate you, I have listed here a few recent posts that garnered some interesting comments/discussions, and where reading the comments would improve your understanding of the material covered in the parent post.
Also, we kept hearing from some of our readers that sometimes they would miss an event even though they had read about it in PuneTech because they forgot about it. To tackle this, we have started a (free) PuneTech events SMS reminder service. After you sign up, you’ll get an SMS when we find out about an interesting event, and then another SMS reminder one day before the event.
And, for the really selective and really motivated reader, we also provide the option to subscribing (RSS only!) to just articles from specific categories, or articles with a specific tag. For example, you can subscribe to the RSS feed for all articles in the “events” category, or just the “featured” articles, or you can subscribe to one of the many tags that we use, like all articles about startups, or linux, or those somehow relating to the tech community in Pune. Figuring out all the categories and tags, and the corresponding RSS feeds is left as an exercise for the motivated reader. And this is only for particularly selective readers who do not wish to subscribe to our full feed (i.e. the regular feed of PuneTech articles).
Final note: the regular PuneTech articles subscription (email / RSS), the PuneTech comments subscription(email / RSS), and the PuneTech SMS reminder service are three different, independent services. Subscribing to one does not automatically subscribe you to the others. You have to subscribe to each one separately.
Proto.in is the premiere conference for startups in India. The most recent edition was held in Bangalore last week. The next one is now being planned and the organizers are asking whether it should be in Pune or Mumbai.
Instead of making it simply a twitter popularity contest, I say let’s give them a host of compelling reasons why it must be Pune. In the comments to this post below, please suggest some good reasons why the next proto.in should be in Pune. I’ll collate the top reasons and create a new post out of them and forward it to the proto organizers.
To get you started, here are my reasons:
Pune the undiscovered country: If you go to a typical startup event (proto, headstart) in Bangalore/Mumbai/Delhi, you’ll run into the same faces over and over again. Pune is different. There is a lot that Pune has to offer that the rest of the ecosystem is not aware of (some examples below). Now would be a good time for proto to start the process of reaching out beyond the top 3 metros.
Pune is a hotbed of activity: Just look at the tech events calendar for Pune to get an idea of the various and varied activities. Multiple pages of them – note the Page 2, Page 3 etc at the bottom of that page.
Students: The future of the startup ecosystem is in the hands of students, and in our ability to get them interested in startups. And what better place to start that than Pune. We have boat loads of students. Who are enthusiastic and motivated. Gnunify a Free and Open Source conference organied mostly by students of Pune is expected to attract 600+ students.
These are the first few reasons that come to my mind. Please add to the list.
See the comments section for many more great reasons. A couple that I wanted to highlight right away:
Atul points out that: Pune has very few professional VC offices. VCs visiting Pune in the context of Proto.in might find obvious funding choices that they would have otherwise missed on.
Enthusiastic and others point out: It’s cheaper to organize it in Pune
Santosh points out: Pune OpenCoffee Club (550+) energetic Volunteers, Startups, and Entrepreneurs
and also: Doing it in Pune will definitely draw out techies in numbers for Startup Shotgun
and finally: Pune weather is far better than Bombay weather
And there are more below. And more keep coming in. If you are reading this in an email or RSS feed, please visit the website to see the full list of comments. You can subscribe to PuneTech comments feed (RSS, or email) too.
That’s right – gate-crash, like uninvited wedding guests. It’s free, and it’s an unconference. Gate-crashing will actually be in the spirit of the event.
And if you are a student, then you must go, because in that one day, you will get more education than an entire month of your so-called classes.
With this in mind, here are PuneTech’s top 5 reasons why students must gate-crash Drupal Camp
Surprise your future employers: If you are in software, no matter what job you get, sooner or later you will be asked to build or oversee the building of a website, and there you should surprise everyone by having it ready, singlehandedly in one week as against their projections of 2-months using 3 people.
Beat the recession: As IT budgets get cut around the world, the web-development community will be hard hit, and companies will look to reduce spending on expensive web development. That’s why you should become a ninja at inexpensive web development using Drupal and its wide array of readymade modules.
Use this knowledge when you don’t get a job: Hey, let’s face it, if you are a student right now, it is very likely that you are not getting campus placement this year (or if you have been placed already, there’s a chance that it will get delayed or canceled). Instead of sitting around moping, the smart thing to do would be to start your own website and get to work for yourself. And for this, events like Drupal camp are great, for knowledge, for ideas and for contacts. For more reasons, see “Top 10 reasons why now is the best time to start a business with Drupal.”
Because you can: Think of all the poor little underprivileged students in Bangalore/Delhi/Hyderabad/Mumbai who cannot attend Drupal camp even if they want to. Because Drupal camp doesn’t happen in Bangalore. It doesn’t happen in Mumbai. It’s happening in Pune, and you are getting it free. Don’t give up this chance, because in the future, you (or your employer) will be charged exorbitant amounts for events like these, and you’ll still not be able to go because you have a stupid deadline.
If you are already going for Drupal camp, then please add to the list of reasons in the comments below.
Drupal Camp details:
What: Drupal Camp India ’09. Drupal Camp is a free, unconference, being conducted in Pune, with the objective to build up the community base and bring people closer. Details. When: Saturday, 31st January, 10am to 5pm Where: SICSR, 1st floor, Atur Centre, Gokhale Cross Road, Model Colony, Pune. Map. Fees and Registration: This event is free for all to attend. Register here. Or not!
More from the website:
Drupal camp pune is an effort to pull in all drupal developers located in india to come together and cherish this wonderful CMS cum application framework. More than the sessions, its about interacting with fellow drupalers and listening to their experiences. This doesn’t discount us to not prepare session list, On our menu we have things like:
1> Advanced Module development
2> Site auto configuration using patterns.
3> Insight into Galaminds.
4> Managing staging,production and live sites specially syncing with Migraine.
5> Drupal on EC2
6> Case study on ILoveBolly
Pune is a hotbed of activity for CAD Software – both, for users as well as developers. We asked Yogesh Kulkarni, who has more than a decade of experience in this industry to team up with Amit Paranjape to give PuneTech readers an overview of this area.
CAD is defined as the use of computer technology to aid in the design of a part, a sub-assembly, or an entire product. Design can include Technical Drawings with Symbol based Representations, Visualization, 3D Rendering, and Simulation. Note, the term ‘Product’ could range from a small Widget, to an iPOD, too a large Building. Components of CAD technologies have also found widespread use in somewhat unrelated fields such Animation & Gaming.
Consider a World-War II era vintage B-17 Flying Fortress bomber; probably the only bomber ever to manufactured on an assembly line. How was it designed? Each and every part was painstakingly drawn on a drafting board. The various components and sub-assemblies were represented through various engineering drawing conventions. Yet the true visualization of how all these complex pieces fit and worked together, was left to that of the top engineers’ minds. And what about the complex 3-D shapes such as the wings? How were they designed and tested? Actual wooden models had to be created for this to visualize their shapes and test out their air-flow characteristics in wind tunnels. You can think of an army of literally hundreds of Draftsmen working on various pieces of this complex machine. Cars were designed the same way. ‘Machine Designing’ had elements of ‘Art’ in it. This style of designing was with us until recently. It’s only in the past 2-3 decades (even more recent in many SMEs in India) that computers have started replacing these ubiquitous ‘A1’ sized drawing boards that ruled the designers shop for so many decades.
Fast forward to today, and now let’s look at how Boeing’s latest 787 Dreamliner is being designed. This truly 21st century aircraft is built with composites instead of the traditional aluminum structures, and a whole bunch of other interesting innovations. All put together, Boeing claims to improve fuel efficiency by over 20% compared to other modern day commercial airplanes. All the designs of the Dreamliner are done using CAD. From the smallest widget to the entire airframe, the drawing, designing, assembling, and visualization is done on computer monitors. These designs are also evaluated for their validity and performance via Computer Aided Engineering (CAE). CAE works in conjunction with CAD to simulate and analyze various mechanical and other aspects of the design. Similarly, Computer Aided Manufacturing (CAM) works closely with CAD to help manufacture the complex parts on Computer Numerically Controlled (CNC) machines.
CAD has evolved a great deal over the past few decades along with the rise in computing and graphics power. Earlier CAD solutions were simple 2-Dimensional solutions for drawing machines and structures. These still represented a big step forward over drawing boards in terms of ability to save, edit and reuse drawings. Initial 3-Dimensional solutions were based on ‘wireframe models’ and ‘surface modeling’. Loosely speaking, these represented the outer edges and the external surfaces of a solid object in mathematical terms. Real 3-D capability involves representing the real object as a solid model. Mathematically, this involves a series of complex equations and data points. This had to wait for computing power to catch-up. Only in the late 1980s did this power become available to a wider engineering community via desktop workstations.
At a high level, you can think of a CAD package to have 2 important pieces: 1) The backend mathematical engine and 2) The front-end graphical rendering service.
Earlier CAD programs were primarily written in FORTRAN. Present day, CAD packages are typically developed in C or C++. Rendering was not a strong point of the earlier solutions. However over the past 2 decades, life-like rendering and simulation (rotation, motion, etc.) have become a reality. This capability has also taken this technology into the Animation & Gaming fields.
CAD works closely with other allied areas such as CAE (Computer Aided Engineering), CAM (Computer Aided Manufacturing), as well as PLM (Product Lifecycle Management). In fact, CAD/CAM or CAD/CAE are often used together to describe the entire workflow. In this section we will take a brief look at these allied areas. In future, PuneTech will feature more detailed and specific articles about each of these areas.
Computer Aided Engineering is the use of computer technology to support engineering tasks such as analysis, simulation and optimization. These tasks are often performed by the engineer in close synchronization of the actual CAD activities. An example of ‘Analysis’ could be leveraging mathematical techniques such as ‘FEA/FEM’ (Finite-Element-Analysis/Finite-Element-Method) for designing a safe Bridge. ‘Simulation’ can be used to study how a mechanical assembly with various moving parts work together, on a computer screen, before actually building it. ‘Optimization’ can build on top of Analysis and Simulation to come up with the ‘most optimal’ design that meets the designer’s requirements. ‘Most Optimal’ could mean least weight, smallest number of parts, least friction, highest reliability, etc. depending on the designer’s primary objective.
Computer Aided Manufacturing is the use of computer technology to manufacture complex parts on automated machine tools. These machine tools are commonly referred to as ‘CNC’ or ‘Computer Numerically Controlled’ machines. Here’s a simple example. Let’s say an engineer has created a complex 3-D shape consisting of various contours for a new car’s exterior. The exterior parts are made by die-stamping in huge presses. The ‘dies’ are essentially molds made of hard metal. Principally, they are similar to a clay mold that is used to create various artifacts out of Plaster-of-Paris. These metal dies themselves have to be created by machining a ‘die-block’ to create a solid mirror image of the final part. This complex 3-D shape needs a sophisticated machine tool that can machine (cut/drill/shape) metal across multiple (3 or more) dimensions.
Controlling the motion of these machine tools is similar to controlling a robotic arm. CAM packages convert the solid designs in CAD packages into a set of coordinates and path instructions, along with desired speeds & acceleration/deceleration profiles for the machine tools, and communicate these instructions to the CNC machines.
PLM or ‘Product Lifecycle Management’ is not directly related to CAD like CAE or CAM. Instead, PLM as the name suggests focuses on managing the entire lifecycle of designing activity across multiple groups and departments in a company. A complete design is not limited to the machine designer. Various other players come into the picture. These include Purchasing Managers who have to source design components and sub-assemblies from vendors; Cost Accountants who want to keep a tab on the overall material and manufacturing costs of a design, Compliance Experts who want to review the design for various agency compliance requirements, etc. Similarly there are requirements to maintain the design as it goes through various versions/upgrades through its life-cycle. PLM enables collaboration across different departments on the key aspects of the design. PLM also enables collaboration between designers in terms of sharing parts data, etc.
AutoCAD® by Autodesk is one of the most popular CAD packages out there. It focuses more on 2-D drawings such as part drawings, architect plans, electronic circuit designs, etc.
Packages like Catia® by Dassault, NX® by Siemens-Unigraphics, Pro/E® by Parametric Corporation are popular 3-D Solid Modeling Solutions. These solutions find wide use in Automotive, Aerospace and Other Manufacturing Industry Segments.
CAD in Pune
Due to the strong industrial and manufacturing base, Pune not only contains some of the biggest users of CAD/CAM software, but it also hosts some of the biggest developers of CAD/CAM software in the world.
Leaving the sobriquets such as ‘Detroit of East’ aside, it is safe to say that Pune is indeed the primary automotive hub of India. Pioneering Indian automotive companies such as Tata Motors, Bajaj Auto, Bharat Forge and Kirloskar Oil Engines are headquartered here. Along with these, a number of top multi-nationals such as Mercedes-Benz, General Motors and Volkswagen are also based here.
These big auto-majors, along with other industrial powerhouses such as Cummins Diesel have created a strong industrial manufacturing ecosystem in Pune. These OEM (Original Equipment Manufacturers) in turn drive requirements for sub-assemblies and components to Tier-1 and Tier-2 vendors.
A large number of small and med-sized industrial automation companies have also sprung up in Pune. These companies design and develop various factory automation and material handling solutions for automotive as well as other industries.
Designing activity is important at all levels, in all these companies – big or small. As a result, Pune has become probably the biggest user of various 2-D and 3-D CAD applications and other associated CAE/CAM applications, in India.
However, the ecosystem for CAD doesn’t stop here! Given Pune’s dominance in Information Technology and the huge CAD users’ base, many CAD/CAM/CAE software companies worldwide have found Pune to be the natural choice for their R&D and Service Centers. All CAD majors described in the earlier section have some development presence in Pune. Pune also has software service companies focusing on this area, such as Geometric Systems.
Amit Paranjape is one of the driving forces behind PuneTech. He has been in the supply chain management area for over 12 years, most of it with i2 in Dallas, USA. For more details, see his PuneTech profile.
What: Technology trends in Business Intelligence by Prasad Kulkarni of SAS R&D India. When: Friday January 30th, 2008, 6:30pm to 8:30pm Where: Damle Hall, Damle Path, Behind Indsearch, Off Law College Road Registration and Fees: Free for CSI Members & Students, Rs. 100 for others. Register here.
Details – Technology trends in Business Intelligence
This lecture will cover technological advances in BI domain. It will start with a discussion on general trends in BI and will relate them to technology. Primary focus is on different technologies used currently, their necessity and type of problem they are solving in the business intelligence domain. It will discuss areas like SOA (Service oriented architecture), SaaS (Software as a service), MDM (Master data management), Real time warehousing, Click stream data warehouses, Federated/integrated search, Web 2.0, Data visualization and so on. The participant will know how such technologies are solving problems specific to BI domain.
It is not necessary to have attended the previous lecture.
For more information about other lectures in this series, and in general other tech events in Pune, see the PuneTech events calendar.
About the speaker – Prasad Kulkarni
Prasad Kulkarni is working with SAS Research And Development India Pvt. Ltd for past 8 years as Associate Director – Platform Research and Development. He leads the core technology group at SAS R&D Pune. Prasad holds post graduation degree in computer management from University Of Pune and has 12 years of experience in the field of information technology. He has worked with product development setups in India. With SAS his focus areas are Metadata Management, Data Warehousing, Data visualization and Data access.
Hover.in is a Pune startup that provides a service for web publishers (i.e. website/blog owners) to automatically insert extra content into the webpages, in the form of a bubble that appears when the mouse is hovered over underlined words. The bubble can be informational (like a map appearing wherever a name of a place appears, or a background information about a company appearing wherever a name of a company appears), or it can be contextual, in-text, advertisement from hover’s network of partners, and most importantly it is fully under the publisher’s control. While services like this have been around in other forms, hover.in believes that its ability to handle any language, and the focus on Indian market sets it apart from the competition. See the PuneTech profile of hover.in to get a better idea of what hover.in provides.
PuneTech interviewed Arun Prabhudesai, CEO of Hover.in, (he also runs popular Indian Business blog trak.in) to get a deeper look at hover.in. To be true to the “tech” part of PuneTech, we also asked some technical questions that were answered by Bhasker V. Kode (Bosky), CTO of Hover.
Q: Congratulations on getting funded – especially under these economic conditions. How do you plan on using this funding – what will be the focus areas?
The seed funding was finalized few months back before the whole “recession” thing started constantly ringing in our ears.
Actually, from hover.in perspective we feel this funding as more of strategic investment where Media2Win – being a leading digital media agency – will help us to go to the market. We have immensely benefitted from the experience Me2W brings on table.
The funding is being mostly used to ramp up our technical resources and infrastructure.
Q: Your main “customers” are website publishers. Are you targeting any specific geography, like India (as the .in domain name would suggest)?
Hover.in in-text platform is global and open for web publishers and bloggers from all geographies. However, we are actively targeting Indian market first. India currently does not have any in-text platform and that’s puts us in a great position to capture this market. Infact, hover.in is world’s first in-text platform that is also language agnostic, which opens up a large chunk of regional language websites.
Q: I keep hearing that “there isn’t enough money to be made from online advertisements alone in India, except for a few specific verticals.” And you seem to be going squarely after this market. What is your take on this issue?
You know, this people have started talking about because there are too many ad networks that have come up in last couple of years…more than 15 odd I can count on my fingers !
But if you look at the larger picture, online advertisements are the only ones that are growing year on year. Traditional advertising is hardest hit…
For us the advantage is, we DO NOT compete with traditional ad networks as they are 99% display advertising. We are in-text and this market has not even tapped. From publisher perspective, it is an additional channel for content and monetization.
From Advertisers, this is the most targeted way of displaying their advertisement. Also, as we follow CPA / CPC kind of model, advertisers have full ROI on investment.
Q: If I remember right, you are using Erlang for your development – a very non-standard choice. Can you share the reasons behind the choice? Isn’t it difficult to get Erlang developers? In retrospect are you happy with this decision?
Erlang has been used to build fault-tolerant and distributed applications for quite some time in areas like telecom, especially for allowing highly granular choices in networking. Off-late projects like ejabberd, mnesia, yaws and tsung have shown how products spanning several hundred nodes can be implemented with the erlang stack and in particular – web technologies.
It most definitely is a paradigm shift courtesy of it’s functional programming concepts, and we are glad we took that decision because of its inherent focus on distributed systems, and although the erlang developer community in India is non-existent, with the right attitude towards learning now a day’s it does’nt matter. Moreover it only took a couple of months for our developers to get used to the semantics, following which as with any stack – it’s about what you do with that expertise.
Erlang gives you that power, but at the same time – there are areas where it might not seem a right fit and perhaps look to perl or ruby for tasks that suit them. For example, we use python wherever it seems required as well. The good part is erlang open-source community has quite a closely-knit presence online, which does help quite a lot. We ourselves are now looking at contributing and opening up internal projects.
Q: One of the important challenges for hover.in will be scalability. How are you gearing up to handle that?
Right from day one, erlang based systems like ours are designed built for horizontal scaling – which allows plug-n-play addition to our growing cluster. Regardless of the development stack you work on – some things need to be built in early and that’s something we spend a whole lot of time during our first year fine tuning.
Especially for us – where practically every page hit – for every one of our users – reflects a page visit to us where we need to compute and render hovers in a matter of milliseconds. To this end – before starting out application-logic, we first built out our own distributed priority-queuing systems, our own distributed crawler and various indexing mechanisms, a time-splicing based cpu allocation for various tasks, which made things like adding jobs, executing them a more controlled operation regardless of what the actual job is and has been handling burst mode quite well.
Moreover, we can also add workers on-the-fly to almost all major tasks much like an Amazon ec2 instance where each work itself is supervised for crash recovery thanks to erlang’s open telecom platform libraries and guidelines. Caching is something else we have and continue to work on consistently. No matter how many papers, algorithms or implementations there are out there – every system needs to fine tune their own unique set of optimizations vs compromises that reflect their infrastructure, traffic, memory & space needs,etc ..
Having granular control of this is something that is a real challenge as well as a pleasure with the stack (Linux, Yaws, Mnesia, Erlang). We ‘ve also been quick to adopt cloud-computing initiatives like Amazon s3, and more recently cloudfront for our static content delivery needs.
We’re also working on a parallel map-reduce implementation, exploring options with xmpp, and better logging for our developers to find and fix glitches or bottlenecks, eventually translating to a faster and better user experience for our users.
Q: You moved to Pune specifically to start hover.in. What made you choose Pune?
Yes, I did move to Pune to start hover.in, however, it would not be fair to say that is the only reason why I moved here. I have lived most of my formative years here in Pune, before going to USA. And as you know, once a Puneite, always a Puneite!
Actually we had to choose from 2 cities – Chennai (Our Co-founder, Bhasker VK, is from Chennai) and Pune. Few important aspects tilted the balance in favour of latter. Better weather, Pune’s proximity to Mumbai where majority of big publishers, investors and advertisers have their offices. To add to it all Pune has great startup & tech community.
Q: In the journey so far, have you made any significant mistakes that you’d like to share with others, so they can learn from your experience?
Absolutely… Mistakes are important aspect of learning process and especially for first generation entrepreneurs like Bosky and Me. I think “attention to detail” is one of the most important aspects that an entrepreneur should watch for. You need to have clear in-depth blueprint in your mind about the direction your startup is going to take, otherwise it’s very easy to fall off the cliff!
Optimizing, especially during these tough times – be it resources, infrastructure or even your time. Optimize everything. Startups can’t afford any leaks.
The third thing and the one which I don’t see very often. Partner with other startups; see if there are any synergies between you. In most cases it is a win-win situation for both of them
Q: Are you partnering with other startups? At this stage, would it be possible for you to share info about any of these partnerships?
Yes, we are…one example would be Alabot (another Pune startup -ed.). Where we have got their NLP application (Travel bot) inside our hoverlet. So for any travel related publishers, it becomes a boon. So a win-win situation for both of us.
Another example would be – Before we got our own office, 2 other startups were kind enough to accommodate us for few weeks – These kind of partnerships in any way possible go a long way !
Q: What would your advice for struggling Pune entrepreneurs be?
Entrepreneurship is a roller coaster ride … It ain’t easy, but the thrills along the way make it all more than worth it!
Just jump into the rough waters and by the time you reach the other side, you will be glad you did it….
What: A lecture and demonstration of SAGE mathematical software by Dr. K.K. Surendran
When: Friday, 23 Jan, 4:30pm Where: Bhaskaracharya Pratishthana, 56/14, Erandavane, Damle Path, Off Law College Road, Pune Fees and Registration: This event is free for all. No registration required
Details: SAGE combines various open source mathematics software packages and seamlessly integrates their functionality into a common experience. Its aim is to create a Free Open Source equivalent to Magma, Mathematica, Maple and Matlab.
Sage was picked as the Hot Spot of the month in November 2008 by mathforum.org
The demo attempts to give a overview of SAGE with the aim of introducing mathematics students and teachers to appreciate the relevance of this wonderful open source software in contemporary mathematics education and research.
For years, the common industry perception has been that MySQL is faster and easier to use than PostgreSQL. PostgreSQL is perceived as more powerful, more focused on data integrity, and stricter at complying with SQL specifications, but correspondingly slower and more complicated to use.
Like many perceptions formed in the past, these things aren’t as true with the current generation of releases as they used to be. DBAs, developers, and IT managers and decision-makers will benefit from this hour-long presentation about the pros and cons of using PostgreSQL or MySQL, which will include a discussion about the ongoing trend towards using open source in the enterprise.
About the Speaker – Jim Mlodgenski
Jim is one of EnterpriseDB’s first employees and joined the company in May, 2005. As Senior Database Architect he has been responsible for EnterpriseDB’s technical pre-sales, professional services, providing customized solutions and training.
Prior to joining EnterpriseDB, Jim was a partner and architect at Fusion Technologies, a technology services company founded by EnterpriseDB’s chief architect, Denis Lussier. For nearly a decade, Jim developed early designs and concepts for Fusion’s consulting projects and specialized in Oracle application development, Web development, and open source.
What: WATBlog Wednesday Pune – a “mixer” for Pune techies to enjoy each others’ company amidst free beer and snacks When: Wednesday, 21st Jan, 7pm onwards Where: Gaia Lounge, Garden of Eden, Sector 20 A, Near Kharadi Mundhwa Bridge, Kharadi, Chandan Nagar Registration and Fees: This event is free for all, but entries are limited, so you must register here.
Why you should attend
We’ve had far too many tech events in Pune where all stand around seriously and exchange business cards. Meeting in a more informal, more social atmosphere would be good for the community. So be there. And use twitter for carpooling.
Note: Those who are afraid of landing up at an event full of “boozers”, have no worries. This is not going to be like your college buddies’ drinking party where everybody gets pissed drunk and throws up on the couch. The free beer is there only to attract the crowd (and believe me it works, even on people with multiple successful startups behind them and millions in the bank), but drinking will be moderate, people will be polite, and there will be no fistfights. (At least amongst the Pune crowd; don’t know about the rowdies coming down from Mumbai…)
Druvaa is a Pune-based startup that sells fast, efficient, and cheap backup (Update: see the comments section for Druvaa’s comments on my use of the word “cheap” here – apparently they sell even in cases where their product is priced above the competing offerings) software for enterprises and SMEs. It makes heavy use of data de-duplication technology to deliver on the promise of speed and low-bandwidth consumption. In this article, reproduced with permission from their blog, they explain what exactly data de-duplication is and how it works.
Definition of Data De-duplication
Data deduplication or Single Instancing essentially refers to the elimination of redundant data. In the deduplication process, duplicate data is deleted, leaving only one copy (single instance) of the data to be stored. However, indexing of all data is still retained should that data ever be required.
A typical email system might contain 100 instances of the same 1 MB file attachment. If the email platform is backed up or archived, all 100 instances are saved, requiring 100 MB storage space. With data deduplication, only one instance of the attachment is actually stored; each subsequent instance is just referenced back to the one saved copy reducing storage and bandwidth demand to only 1 MB.
The practical benefits of this technology depend upon various factors like –
Point of Application – Source Vs Target
Time of Application – Inline vs Post-Process
Granularity – File vs Sub-File level
Algorithm – Fixed size blocks Vs Variable length data segments
A simple relation between these factors can be explained using the diagram below –
Target Vs Source based Deduplication
Target based deduplication acts on the target data storage media. In this case the client is unmodified and not aware of any deduplication. The deduplication engine can embedded in the hardware array, which can be used as NAS/SAN device with deduplication capabilities. Alternatively it can also be offered as an independent software or hardware appliance which acts as intermediary between backup server and storage arrays. In both cases it improves only the storage utilization.
On the contrary Source based deduplication acts on the data at the source before it’s moved. A deduplication aware backup agent is installed on the client which backs up only unique data. The result is improved bandwidth and storage utilization. But, this imposes additional computational load on the backup client.
Inline Vs Post-process Deduplication
In target based deduplication, the deduplication engine can either process data for duplicates in real time (i.e. as and when its send to target) or after its been stored in the target storage.
The former is called inline deduplication. The obvious advantages are –
Increase in overall efficiency as data is only passed and processed once
The processed data is instantaneously available for post storage processes like recovery and replication reducing the RPO and RTO window.
the disadvantages are –
Decrease in write throughput
Extent of deduplication is less – Only fixed-length block deduplication approach can be use
The inline deduplication only processed incoming raw blocks and does not have any knowledge of the files or file-structure. This forces it to use the fixed-length block approach (discussed in details later).
The post-process deduplication asynchronously acts on the stored data. And has an exact opposite effect on advantages and disadvantages of the inline deduplication listed above.
File vs Sub-file Level Deduplication
The duplicate removal algorithm can be applied on full file or sub-file levels. Full file level duplicates can be easily eliminated by calculating single checksum of the complete file data and comparing it against existing checksums of already backed up files. It’s simple and fast, but the extent of deduplication is very less, as it does not address the problem of duplicate content found inside different files or data-sets (e.g. emails).
The sub-file level deduplication technique breaks the file into smaller fixed or variable size blocks, and then uses standard hash based algorithm to find similar blocks.
Fixed-Length Blocks v/s Variable-Length Data Segments
Fixed-length block approach, as the name suggests, divides the files into fixed size length blocks and uses simple checksum (MD5/SHA etc.) based approach to find duplicates. Although it’s possible to look for repeated blocks, the approach provides very limited effectiveness. The reason is that the primary opportunity for data reduction is in finding duplicate blocks in two transmitted datasets that are made up mostly – but not completely – of the same data segments.
For example, similar data blocks may be present at different offsets in two different datasets. In other words the block boundary of similar data may be different. This is very common when some bytes are inserted in a file, and when the changed file processes again and divides into fixed-length blocks, all blocks appear to have changed.
Therefore, two datasets with a small amount of difference are likely to have very few identical fixed length blocks.
Variable-Length Data Segment technology divides the data stream into variable length data segments using a methodology that can find the same block boundaries in different locations and contexts. This allows the boundaries to “float” within the data stream so that changes in one part of the dataset have little or no impact on the boundaries in other locations of the dataset.
Each organization has a capacity to generate data. The extent of savings depends upon – but not directly proportional to – the number of applications or end users generating data. Overall the deduplication savings depend upon following parameters –
No. of applications or end users generating data
Daily change in data
Type of data (emails/ documents/ media etc.)
Backup policy (weekly-full – daily-incremental or daily-full)
Retention period (90 days, 1 year etc.)
Deduplication technology in place
The actual benefits of deduplication are realized once the same dataset is processed multiple times over a span of time for weekly/daily backups. This is especially true for variable length data segment technology which has a much better capability for dealing with arbitrary byte insertions.
While some vendors claim 1:300 ratios of bandwidth/storage saving. Our customer statistics show that, the results are between 1:4 to 1:50 for source based deduplication.