Monthly Archives: May 2008

CSI-Pune’s ILM Seminar – A Report

CSI-Pune conducted a half-day workshop on Information Life-cycle management. T.M. Ravi, founder and CEO of Mimosa Systems gave the keynote presentation. There were product/project pitches from IBM, Zmanda, Coriolis. A talk on storage trends by Abhinav Jawadekar. Finally a panel discussion with representation from Symantec (V. Ganesh), BMC (Bladelogic; Monish Darda), Zmanda (K K George), IBM, Symphony (Surya Narayanan), and nFactorial (Hemant Joshi).

Here are my cryptic notes from the conference:

  • T.M. Ravi, CEO of Mimosa, gave talk on what he sees as challenges in storage/ILM. New requirements coming from the customers – Huge amounts of user-generated unstructured data in enterprises. Must manage it properly for legal, security and business reasons. Interesting new trends coming from the technology side – new/cheap disks. De-duplication. Storage intensive apps (eg. video). Flash storage. Green storage (i.e. energy conscious storage). SaaS and storage in the cloud (e.g. Amazon S3). Based on this, storage software should focus on these things: 1. Increase Information content of data 2. Improve security. 3. Reduce legal risk. Now he segues into a pitch for Mimosa’s products. i.e. You must have an enterprise-wide archive: 1. continuous capture (i.e. store all versions of the data). 2. Full text indexing of all the content and allow users to search by keyword. 3. Single instance storage (SIS) aka De-duplication, to reduce the storage requirements. 4. Retention policies. Mimosa is an archiving appliance that can be used for 1. ediscovery, 2. recovery, 3. end-user searches, 4. storage cost reduction.
  • Then there was a presentation from IBM on General Parallel File System (GPFS). Parallel, highly available distributed file system. I did not really understand how this is significantly different from all the other such products already out there. Also, I am not sure what part of this work is being done in Pune. Caching of files over WAN in GPFS (to improve performance when it is being accessed from a remote location) is being developed here (Ujjwal Lanjewar).
  • There was also a presentation on the SAN simulator tool. This is something that allows you to simulate a storage area network, including switches and disk arrays. It has been open-sourced and can be downloaded here. A lot of the work for this tool happens in Pune (Pallavi Galgali).
  • KKG from Zmanda demonstrating recovery manager for MySQL. This whole product has been architected and developed in Pune
  • Bernali from Coriolis demonstrated CoLaMa – a virtual machine lifecycle manager a virtual machine lifecycle manager. This is essentially CVS for virtual machine images. A version management software to keep track of all your VM images. Check out image. Work on it. Check it in. A new version gets stored in the repository. And it only stores the differences between the image – so space savings. It auto-extracts info like OS info, patchlevel etc.
  • Coriolis’ was the only live demo. The others were flash demos which looked lame (and had audio problems). Suggestion to all – if you are going to give a flash demo, at least turn off the audio and do the talking yourself. This would involve the audience much better.
  • Abhinav Jawadekar gave nice introductory talk on the various interesting technologies and trends in storage. It would have been very useful and helpful for someone new to the field. However, in this case, I think it was wasted on audience most of who’ve been doing this for 5+ years. The only new stuff was in the last few slides that were about energy aware storage (aka green storage). (For example, he pointed out that data-center class storage in Pune is very expensive due to the high storage costs – due to power, cooling, UPS, genset, the operating costs of a 42U rack are $800 to $900 per month.)
  • The panel discussion touched upon a number of topics, not all of them interesting. I did not really capture notes of that.

Overall, it was an interesting evening. With about 50 people attending, the turnout was a little lower than I expected. I’m not sure what needs to be done in Pune to get people to attend. If you have suggestions, let me know. If you are interested in getting in touch with any of the people mentioned above, let me know, and I can connect you.

PubMatic Launches Default Optimization Service to Recapture $1 Billion in Lost Advertising Revenue

A large number of websites rely on advertising as their primary source of income. Typically the ads are served by third party ad networks (for example Google’s adsense). When an ad network is unable to find an appropriate ad for a specific page being served to a specific reader, they put in default or public service ads (PSAs). This is not good because the website makes no money from PSAs.

Ad optimization service Pubmatic, says that more than half the ads that get shown at an average website are PSAs:

Over the past several months we’ve studied just how severely default ads are affect our publishers and the numbers are jaw dropping. We found that ad networks defaulted 56% of the time on average and as much as 87% of the time. We also found that that the traditional static daisy chain of ad networks may be effective at reducing blank ads but is highly ineffective at maximizing a publisher’s revenue. Between 20% and 30% of publishers’ ad inventory is going to waste.

(Source.)

This is of course terrible, if true.

Now Pubmatic has just launched the default optimization service that automates the process of contacting other ad networks whenever the ad being served by the primary network is a PSA

PubMatic’s default optimization service automates the reselling process, allowing publishers to instantly redirect unsold ad inventory back to PubMatic, which fills that inventory with the highest paying ad impression every time. PubMatic’s new service is an automated solution for the billion-dollar loss that plagues the industry.

For more details, see the press release.
Related articles:
Pubmatic and Komli to power all eBay.in ads
PubMatic releases AdPrice Index: Sensex for Online Ads
See also:  The PuneTech wiki profile of Pubmatic , and Mukul Kumar, head of engineering at Pubmatic (based in Pune).

Archival, e-Discovery and Compliance

Archival of e-mails and other electronic documents, and the use of such archives in legal discovery is an emerging and exciting new field in enterprise data management. There are a number of players in this area in Pune, and it is, in general, a very interesting and challenging area. This article gives a basic background. Hopefully, this is just the first in a series of articles and future posts will delve into more details.

Background

In the US, and many other countries, when a company files a lawsuit against another, before the actual trial takes place, there is a pre-trial phase called “discovery”. In this phase, each side asks the other side to produce documents and other evidence relating to specific aspects of the case. The other side is, of course, not allowed to refuse and must produce the evidence as requested. This discovery also applies to electronic documents – most notably, e-mail. Now unlike other documents, electronic documents are very easy to delete. And when companies were asked to produce certain e-mails in court as part of discovery, more and more of them began claiming that relevant e-mails had already been deleted, or they were unable to find the e-mails in their backups. The courts are not really stupid, and quickly decided that the companies were lying in order to avoid producing incriminating evidence. This gave rise to a number of laws in recent times which specify, essentially, that relevant electronic documents cannot be deleted for a certain number of years, and that they should be stored in an easily searchable archive, and that failure to produce these documents in a reasonable amount of time is a punishable offense. This is terrible news for most companies, because now, all “important” emails (for a very loose definition of “important”) must be stored for many years. Existing backup systems are not good enough, because those are not really searchable by content. And the archives cannot be stored on cheap tapes either, because those are not searchable. Hence, they have to be on disk. This is a huge expense. And a management nightmare. But failure to comply is even worse. There have been actual instances of huge fines (millions of dollars, and in once case, a billion dollars) imposed by courts on companies that were unable to produce relevant emails in court. In some cases, the company executives were slapped with personal fines (in addition to fines on the company). On the other hand, this is excellent news for companies that sell archival software that helps you store electronic documents for the legally sufficient number of years in a searchable repository. The demand for this software, driven by draconian legal requirments, is HUGE, and an entire industry has burgeoned to service this market. Just e-mail archival soon be a billion dollar market. (Update: Actually it appears that in 2008, archival software alone is expected to touch 2 billion dollars with a growth rate of 47% per year, and e-discovery and litigation support software market will be 4 billion growing at 27%. And this doesn’t count the e-discovery services market which is much much larger.) There are three major chunks to this market:

  • Archival – Ability to store (older) documents for a long time on cheaper disks in a searchable repository
  • Compliance – Ensuring that the archival store complies with all the relevant laws. For example, the archive must be tamperproof.
  • e-Discovery – The archive should have the required search and analysis tools to ensure that it is easy to find all the relevant documents required in discovery

Archival

Archival software started its life before the advent of these compliance laws. Basic email archival is simply a way to move all your older emails out of your expensive MS exchange database, into cheaper, slower disks. And shortcuts are left in the main Exchange database so that if the user ever wants to access one of these older emails, it is fetched on demand from the slower archival disks. This is very much like a page fault in virtual memory. The net effect is that for most practical purposes, you’ve increased the sizes of peoples’ mailboxes without a major increase in price, without a decrease in performance for recent emails, and some decrease in performance for older emails. Unfortunately, these guys had only middling success. Think about it – if your IT department is given a choice between spending money on an archival software that will allow them to increase your mailbox size, or simply telling all you users to learn to live with small mailbox sizes, what would they choose? Right. So the archival software companies saw only moderate growth. All of this changed when the e-discovery laws came into effect. Suddenly, archival became a legal requirement instead of a good-to-have bonus feature. Sales exploded. Startups started. And it also added a bunch of new requirements, described in the next two sections.

Compliance

Before I start, I should note that “IT Compliance” in general is really a huge area and includes all kinds of software and services required by IT to comply with any number of laws (like Sarbanes Oxley for accounting, HIPAA for medical records, etc.) That is not the compliance I am referring to in this article. Here we only deal with compliance as it pertains to archival software. The major compliance requirement is that for different types of e-mails and electronic documents, the laws prescribe the minimum number of years for which they must be retained by the company. And, no company wants to really keep any of these documents a day more than is minimally required by the law. Hence, for each document, the archival software must maintain the date until which the document must be retained, and on that day, it must automatically delete that document. Except, if the document is “relevant” to a case that is currently running. Then the document cannot be deleted until the case is over. This allows us to introduce the concept of a legal hold (or a deletion hold) that is placed on a document or a set of documents as soon as it is determined that it is relevant to a current case. The archival software ensures that documents with a deletion hold are not deleted even if their retention period expires. The deletion hold is only removed after the case is over. The archival software needs to ensure that the archive is tamperproof. Even if the CEO of the company walks up to the system one day in the middle of the night, he should not be able to delete or modify anything. Another major compliance requirement is that the archival software must make it possible to find “relevant” documents in a “reasonable” amount of time. The courts have some definition of what “relevant” and “reasonable” mean in this context, but we’ll not get into that. What it really means for the developers, is that there should be a fairly sophisticated search facility that allows searches by keywords, by regular expressions, and by various fields of the metadata (e.g., find me all documents authored by Basant Rajan from March to September 2008).

e-Discovery

Sadly, just having a compliant archive is no longer good enough. Consider a typical e-discovery scenario. A company is required to produce all emails authored by Basant Rajan pertaining to the volume manager technology in the period March to September 2008. Now just producing all the documents by Basant for that period which contain the words “volume manager” is not good enough. Because he might have referred to it as “VM“. Or he might have just talked about space optimized snapshots without mentioning the words volume manager. So, what happens is that all emails written by Basant in that period are retreived, and a human has to go through each email, to determine if it is relevant to volume manager or not. And this human must be a lawyer. Who charges $4 per email because he has to pay off his law school debt. For a largish company, a typical lawsuit might involve millions of documents. Literally. Now you know why there is so much money in this market. Just producing all documents by Basant and dumping them on the opposing lawyers is not an option. Because the company does not want to disclose to the opposing side anything more than is absolutely necessary. Who knows what other smoking guns are present in Basant’s email? Thus, a way for different archival software vendors to differentiate themselves is the sophistication they can bring to this process. The ability to search for concepts like “volume management” as opposed to the actual keywords. The ability to group/cluster a set of emails by concepts. The ability to allow teams of people to collaboratively work on this job. The ability to search for “all emails which contain a paragraph similar to this paragraph“. If you know how to do this last part, I know a few companies that would be desperate to hire you.

What next?

In Pune, there are at least two companies Symantec, and Mimosa Systems, working in this area. (Mimosa’s President and CEO, T.M. Ravi, is currently in town and will give the keynote for CSI-Pune’s ILM Seminar this Thursday. Might be worth attending if you are interested in this area.) I also believe that CT Summation’s CaseVault system also has a development team here, but I am unable to find any information about that – if you have a contact there, please let me know. For some (possibly outdated) details of the other (worldwide) players in this market, see this report from last year. If you are from one of these companies, and can write an article on what exactly your software does in this field, and how it is better than the others, please let me know. I also had a very interesting discussion with Paul C. Easton of True Legal Partners, an e-Discovery outsourcing firm, where we talked about how they use archiving and e-discovery software, but more generally we also talked about how legal outsourcing to India, suitability of Pune for the same, competition from China, etc. I will write an article on that sometime soon – stay tuned (by subscribing to the PuneTech by email or via RSS)

WiCamp – “From Ideas to Dollars” barcamp – 30 May

What: WiCamp is a barcamp with the theme “From Ideas to Dollars”, organized by Wipro Technologies. BarCamps are un-conference like meetings. Here like minded people come over and share their learning on a topic of mutual interest through presentation, demos, videos, etc.

When: Friday, 30th May, 2pm to 7pm

Where: Wipro Technologies, Platinum Training hall, Tower-3, Lower Ground, Phase- I, MIDC, Rajiv Gandhi Infotech Park, Hinjwadi, Pune

Who can attend: This event is open to all. There are no fees. Please register at the WiCamp website.

Themes for WiCamp:

  • How firms arrive at value proposition for an innovation
  • Partnering to take ideas faster to market
  • Open Innovation
  • Measuring returns on innovation
  • Shrinking the time to market
  • Increasing hit-rate of ideas
  • Managing teams for innovation
  • Personal creativity to organizational creativity

More than the actual presentations given what I like about a barcamp is the people you meet in the corridors, and the enthusiasm and energy levels. You should definitely attend if you’ve never attended a barcamp before. For details, see the WiCamp wiki, which will keep getting updated until the last day.

CSI-Pune Seminar on Information Lifecycle Management – 29 May

What: Seminar on Information Lifecycle Management. ILM consists all the technologies required during the lifetime of some data stored in an enterprise. How data comes in, where it is stored, the storage hardware/software and architecture, how it is archived and backed up, retention policies, and deletion policies.

When: Thursday, 29 May 2008, 4pm to 9pm

Where: National Insurance Academy 25,Balewadi, Baner Road

Fees: Rs. 400 for CSI members, Rs. 500 for others

Registration: register online

Detailed Program:

3:30 pm – 4:00 pm : Registration

4:00 pm – 4:15 pm : Inauguration and release of CSI Newsletter

4:15 pm – 5:15 pm : Keynote address – T. M. Ravi (President and CEO Mimosa Systems)

5:15 pm – 5:45 pm : Tea Break

5:45 pm – 6:45 pm
Demonstration of products – IBM, Zmanda Technologies, Coriolis

6:45 pm – 7:30 pm
Technical talk – Most promising new technologies in Storage and ILM space by by Abhinav Jawadekar – Founder, Sound Paradigm, Software Engineering Services.

7:30 pm – 8:15 pm
Panel discussion – Technology Trends in Storage and its correlation with Career Opportunities . Panelists are Surya Narayanan (Symphony), K K George (Zmanda Technologies), Monish Darda (Bladelogic), Bhushan Pandit (Nes technologies), V. Ganesh (Symantec). Moderated by Hemant Joshi (nFactorial Software)

8:15 pm onwards: Dinner

The event is open to everybody, but you have to register online. Fees are Rs. 400 for members, Rs. 500 for non-members.

Commercial work pilots begin on Eka, Tata’s supercomputer

The Financial Express reports that Eka, the 4th fastest supercomputer in the world, built in Pune by the Tata’s Computation Research Lab (CRL), has begun running pilot projects for various commercial entities. Excerpts:

According to sources close to the development, the main application areas are in aerospace and aerodynamics, automotive design and engineering, academics, animation, weather forecasting and so on

and

Although the company would use some of these application in house, be it for Tata Motors or Tata Elxsi, much of the revenues would flow in from outside the Tata Group, mostly from abroad. 

These would include aircraft design companies like Boeing, Lockheed and Airbus.

See also:
Building the world’s 4th largest supercomputer

121 Tech unveils easyTXT signage solutions | Televisionpoint.com News

From  Televisionpoint.com:

121 Technologies has unveiled its Bluetooth/SMS enabled easyTXT smart signage solutions. easyTXT is a digitally enhanced media format for deployment at OOH locations such as retail spaces, airports, malls/multiplexes, exhibitions, events. The technology allows potential customers’ mobile phones to be registered in an intuitive manner to receive offers and notifications.

[…]

The technology is location specific. For instance, a pizza outlet in a food court in a mall can send out specific SMS’s about its meal deals. For this, the outlet has to install a light box with the easyTXT hardware unit mounted inside it. The unit has bright flashing LEDs that ensure maximum eyeballs see the signage.

Once near the signage, users need to turn on the Bluetooth feature on their mobile handsets, accept the incoming connection and enter the pairing code written on the signage to start receiving the offers/notifications from that brand. The pairing code ensures that only those people interested in receiving the offers get them.

Related articles:
InfoBeanz: Free web-based platform for “digital signage”

Monkiri adds Blogger/WordPress support – becomes more useful

When Pune-based social web-clipping service Monkiri had launched a couple of weeks ago, I had complained that it’s utility is very limited because it does not allow posting the clips to the user’s personal blog. They have moved quickly, and fixed the problem. They now support Blogger and WordPress blogs. That should certainly increase its adoption rate.

See also: SocialMedian, a personalized news recommendation service, whose development is being done in Pune. There are some similarities with Monkiri (i.e. the social bookmarking aspect), but significant differences too (i.e. the personalized recommendation part). The other big difference is that SocialMedian has a world-wide focus, whereas Monkiri appears to have an India-only focus.

IdeaCampPune – a Report

These are the quick-n-dirty notes that I took during IdeaCampPune. (Actually, they are my live-tweets of the event.) I’ve tried to include relevant links where I could find them.

Generally, I think it went rather well. The organizers had stopped registration after 80 entries (there were a bunch of other people who wanted to get in, but couldn’t) – so I guess about 80 people attended. All the sessions were attended by a roomful of people and many sessions had lots of discussion and audience participation.

If you find any one of these initiatives interesting, and need help with getting in touch with the relevant person, let me know and I can try to put you in touch.

Notes:

  • First hour spent in “corridor discussions”. About a dozen ideas listed on the board so far.
  • Kaushik R (http://lin.cr/ru) et al trying to create a platform for structured participation of Industry in (Pune) colleges
  • IIT-B Alumni association inviting nominations for Innovations-2009 (http://innovations-pune.com/). I’ve heard good things about the Innovations program from a number of people. (Beware, it is not all IT/Tech, some of the innovations might be in a different field. But still interesting.)
  • Srikanth Sunderarajan (COO, Persistent) pushing people to be serious about their ideas – to think everything through.
  • Anupam Saraph, CIO of Pune, talking about “Design for Pune” (http://lin.cr/rv) – competition/game to design Pune of the future. Competition will have participants using SimCity (http://lin.cr/rw). Possible first prize – spend a day with Will Wright (top game designer). See my recent article about some of the initiatives that Anupam is currently spending his time on. 
  • Abhay Shete (http://lin.cr/s1) talking about the Semantic Web (http://lin.cr/ry); faceted browsing (http://lin.cr/rz
  • A couple of students from COEP talking about Swarm Intelligence (http://lin.cr/s2). Complex behavior from simple pieces.
  • Anurag Agarwal is creating the “Mentor India Program”. Pick promising students from colleges and mentor them for a few years.
  • Shashikant Kore (http://lin.cr/s3) talking about SMS based micropayments. I wonder if his new company (still in stealth) is about this
  • The Design for Pune (http://lin.cr/rv) presentation repeated by popular demand (for those who missed it)
  • Aparna talking about “India, I Care” http://lin.cr/s5 – Lots of audience participation
  • Entrepreneurship cell IIT-B; promoting entrepreneurship amongst youth. http://genportal.org/ – funding, incubation, showcase, etc.
  • Freeman, talking about the use of mp3 players in (rural?) education (the “one mp3 player per child” project). http://clrindia.net/ is using radio to run an english teaching program. Digital Hall provides videos that teachers can use to run lessons. Teacher only has to answer questions/doubts. OLPC classroom near Karjat is also rather cool. Also see http://curriki.org/ a wikipedia for curricula. All of these have some problems. That’s where mp3pc comes in. The web has a large amount of audio web content that is freely downloadable. Usable for education. use cheap chinese mp3 players for education! Easy to use, cheap, sturdy, easy to carry around. Many hours of education. Can use even while the child is doing something else (e.g. walking 5km to school). Freeman is hoping to do a pilot program with the 15 mp3 players he bought. Looking for a good institute where to run it. 
  • Aditi talking about “Generating Rural Income”. Can we tap traditional abilities of villagers/tribals to generate income? A number of interesting NGOs were mentioned during this talk, but I was unable to capture their names. Hopefully someone will blog about this in more detail.
  • Shyamal leads a discussion on various ways of saving power (mainly about our use of computers)
  • My own discussion on what we can do to improve participation in tech community events in Pune. Some interesting ideas that I hope to try out in the near future
  • Guna talking about “Lean Thinking”. The Toyota way. How to incorporate lean thinking in your own startup. Identify waste, measure it, and eliminate it.
  • One presentation on a single website where you can go to see all your e-mail. Everybody jumped on the poor guy. Surprisingly, in a day full of unconferenced talks, this was the only presentation that did not have interesting content.
  • bosky101 conducting a session on brainstorming. Creating random patterns of different ideas. Everybody having fun coming up with unrelated terms/areas/activities. 
  • Since a lot of the ideas presented today have been about social responsibility, Anurag felt that the audience would be interested in the other activity that he has been a part of for the last year.  http://aksharbharati.org – the program he started for creating libraries for disadvantaged kids. They have created 50 libraries so far (using, 40 to 50 people across 5 chapters). He hopes to double the numbers this year.
  • There was a wrap up session in the end where people talked about which ideas they liked the best. Design for Pune was the clearly one the ideas that people were most excited about. Other honourable mentions: SMS micropayments, Mentor India, 1MP3PC.

A Vision for e-Governance in Pune

In an earlier article, I wrote about how Pune now has a CIO, who is pushing various initiatives to make Pune the city with the best use of technology for governance.

At my request, Dr. Anupam Saraph, the CIO of Pune, has written two articles about this aspect of his work. The first one is a vision piece painting a picture of Pune in 2015. An excerpt:

The pain of providing the same information over and over at different counters is history. The first time I registered myself to ilife, through my computer at home, I was asked to provide information to identify myself. I was requested to visit any one of the 14 ward offices to provide a photograph and my thumbprint to receive my Pune-card, my username and a password to access ilife. That was it.

My Pune-card provides me with cashless bus-travel, parking and entry into all electronic access public locations as well as electronic entry enabled private locations. It works as a cash-card and also replaces time-consuming procedures with countless forms to make applications. It simplifies and secures transactions as I can simply allow the service providers to swipe my card and take my thumbprint to access information. Only information that I have marked as allow through Pune-card will be accessed at points-of-transaction. The transaction is updated in my account on ilife.

If you read the whole article, you’ll notice that none of the ideas contained there are futuristic, or taken from sci-fi. They are all things that can be implemented relatively easily using today’s technology. All that is needed is execution and political will. And there are indications that the political will is there.

While a vision statement might be good as an inspiration, it is worthless without concrete short-term goals and projects. Dr. Saraph has written another article that lists some of the specific projects that are already underway. There is already industry interest for some of these projects, for example, Unwire Pune, and Pune Cards. Others, like Design for Pune and MyWard, will depend more upon community participation.

This is where you come in. All of these projects can do with help. From web-design and usability, to server and database tuning. Or, if you are a non-technology person, you can help with spreading the word, or simply by participating. I am planning to start a discussion on these topics at IdeaCampPune tomorrow (Saturday). Dr. Saraph will also try and attend those discussions. (Registration for that event is now closed, so you will not be able to attend unless you’ve already registered. However, if there is a good discussion, and any concrete actions result from it, I’ll write an article on that in the next week. Stay tuned. If you’ve already registered, please note that the venue has shifted to Persistent’s Aryabhatta facility near Nal Stop.)

SEAP is already behind these initiatives (in fact, the appointment of Anupam Saraph is a joint partnership between PMC, SEAP, Dr. Saraph.) Civic commissioner Praveensinh Pardeshi is very supportive of the project. Companies like Persistent, Eclipsys, nVidia have already pitched in by providing free manpower or resources.

But given the scope of the project, more volunteers are welcome. I have already committed to spending some time every week on projects that can use my expertise, like Design for Pune and MyWard.

It is very easy to get cynical about any projects undertaken by the government. Especially PMC. And that was my first reaction too. However, I have now come to believe that a few people can make a difference. Participate. Enthusiastically. Passionately. Try to convince your friends. One out of 50 will join you. That might be enough. Isn’t it worth trying?

Related articles: