Monthly Archives: November 2008

What to expect at Barcamp Pune

Update: Our hearts go out to the people of Mumbai, but our bodies continue their day-to-day activities in Pune. Barcamp Pune 5 will happen. It starts at 10am, at SICSR (Symbiosis Institute of Computer Studies and Research), Model Colony, Atur Centre, Opp. Om Super Market, Pune (map).

Barcamp Pune 5 is this Saturday (29th November) and I am hoping that this time, a lot of people who are not familiar with the concept of barcamp will show up. Earlier, we wrote about what is barcamp, and why you should attend. For those still sitting on the fence, wondering whether to attend, let me try and give an idea of what to expect at BCP5.

The barcamp page shows 250+ registrations. So expect at least 150 people or more to show up.

Expect chaos. I mean that in a good way. Conferences are thoughtfully organized by committees of experts to include topics that you should know about. Barcamp is not a conference – it’s an unconference. No suits, no committees. What you will get is talks on topics that you want to know about.

Expect a tweet-up. At 5:30pm. A meeting of people who use and love twitter. If you are not a twitter-user yet, you should be. Attend the tweet-up to find out why.

Expect talks from a set of very wide-ranging topics (mostly tech): how to secure your home wireless network from hackers, PHP, how to build iPhone applications, the semantic web, using maps in your applications and websites, robotics, bootstrapping your startup. And many more. It’s an unconference, so can’t know in advance all the different presentations that will ultimately happen. And some of the presentations will actually happen with 4 people huddled around a laptop in a corner on the floor.

Expect enthusiasm. Expect to see lots of energy. People passionate about technology. Blogging. Web-2.0.

Expect to see people not just from Pune, but also Mumbai, Bangalore, Aurangabad, Hyderabad and other cities. Yes, barcamps are worth travelling 800km for.

Expect to see students, junior techies, people with 15+ years of experience, programmers, managers, designers, NGO volunteers, open source fanatics, .NET gurus, savvy businessmen and geeks.

Expect to make new friends. You’ll meet lots of interesting people. If you are thinking of a startup, you might meet your co-founder. If you are interested in Erlang, you might meet other Erlang enthusiasts. If you are a recruiter, you might meet your latest recruits. If you are single, you might meet your future partner (hey! it happens).

Expect to go off to dinner parties with strangers.

Expect to give and receive business cards – some old school habits die hard.

Expect people to take lots of photographs and upload them to flickr. Expect blog posts about barcamp to trickle in all through the next week. Expect wi-fi. Expect live-blogging. Expect twittering.

Expect struggling startups to demonstrate their work. Some of them would have been doing this for the nth time, because they’ve been going to every barcamp in the country to get visibility for their startup. In other cases, you’ll be the first people in the world to see the new product.

Expect the unexpected.

Expect to learn. To teach. To discuss. To argue. To agree. To disagree. To clap. To laugh. To giggle. To boo (yes, that happens too.)

Expect the boring people to stay at home, and only the interesting people to show up.

Expect those who wanted to come but could not to be profoundly unhappy.

Expect those who did not attend to have missed something great.

An Introduction to Joomla! CMS

If you’ve been following the tech scene in Pune, you’d be aware of the tremendous success of PHPCamp Pune with over a 1000 registrants. One thing that quickly became clear during PHPCamp is the interest in having special interest groups for more specialized areas within PHP hacking – specifically Open Social, Drupal and Joomla!. To help you stay in touch, we asked Amit Kumar Singh, one of the primary movers behind PHPCamp, and behind the Joomla Users Group, India to give our readers an overview of Joomla! – what it is, and why is it so popular. This article is intentionally low-tech at our request – to give people just an quick overview of Joomla! If you want more details, especially technical deep dives, head over to Amit’s blog where he often has articles about Joomla!

Have you ever wondered how you can quickly build a website for yourself or your organization? If yes, then read on to find how you can do so.

What is Joomla!

Joomla! is a open source, content management system( CMS), written in PHP, licensed under GPL and managed by OSM Foundation .

Joomla is the English spelling of the Swahili word jumla meaning “all together” or “as a whole”.  You can read more about history of Joomla at wikipedia.

Well, in one word, secret to build websites quickly and easily is Joomla!. It takes the pain out of building and maintaining websites. It is designed and build to make managing websites easier for a layman.

Where to use

It can be used to build

  • Personal Websites
  • Company’s Website
  • Small Business Websites
  • NGO Websites
  • Online magazines and publications websites
  • School and colleges Websites

This is basically list of things that can be done with Joomla out of box.  Some of the core features of Joomla are

  • Article management
  • User registration and contacts
  • Themes
  • Search
  • Polling
  • Language support
  • Messaging
  • News Feeds and advertisement

If you need more, then you can easily extend Joomla to do lot more things and even use the framework to build some powerful applications. For example if you want to add additional fields to user registration form you can use community builder, if you want to put e-commerce shopping cart you can use vituemart, if you want to add forum you can use fireboard.

You can also see how others are using Joomla at Joomla sites showcase forum.

How to Extend

For me the best part of using Joomla is that it is very easy to customize and enhance. You can find extensions for your needs by simply looking in JED, just in case your need is really very unique then you can extend Joomla to suit your specific needs by writing simple components and modules.

If you get stuck while building something you can always find help from very active and helpful community members either at main Joomla Forum site or at Joomla User Group Pune.

About the Author – Amit Kumar Singh

Amit works as Technical Architect at Pune It Labs Pvt Ltd. He considers himself as a jack-of-all-trades related to technology, and trying to master PHP. Along with others he has started Joomla! Users Group Pune and is am part of un/organisers for PHPcamp, barcamp pune, opensocial developer garge, Joomladay. He has also created opensource plugins for Joomla, wordpress, jquery.

What is multi-core architecture and why you need to understand it

Dhananjay Nene has just written a brilliant article in which he gives a detailed overview of multi-core architectures for computer CPUs – why they came about, how they work, and why you should care. Yesterday, Anand Deshpande, CEO of Persistent Systems, while speaking at the IndicThreads conference on Java Technologies exhorted all programmers to understand multi-core architectures and program to take advantage of the possibilities they provide. Dhananjay’s article is thus very timely for both, junior programmers who wish to understand why Anand was attaching so much importance to this issue, and what they need to do about it, and also for managers in infotech to understand how they need to deal with that issue.

Dhananjay sets the stage with this lovely analogy where he compares the CPU of your computer with superman (Kal-El) and then multi-core is explained thus:

One fine morning Kal’s dad Jor-El knocked on your door and announced that Kal had a built in limitation that he was approaching, and that instead of doubling his productivity every year, he shall start cloning himself once each year (even though they would collectively draw the same salary). Having been used to too much of the good life you immediately exclaimed – “But thats preposterous – One person with twice the standard skill set is far superior to 2 persons with a standard skill set, and many years down the line One person with 64 times the standard skill sets is far far far superior to 64 persons with a standard skill set”. Even as you said this you realised your reason for disappointment and consternation – the collective Kal family was not going to be doing any lesser work than expected but the responsibility of ensuring effective coordination across 64, 128 and 256 Kals now lay upon you the manager, and that you realised was a burden extremely onerous to imagine and even more so to carry. However productive the Kal family was, the weakest link in the productivity was now going to be you the project manager. That in a nutshell is the multicore challenge, and that in a nutshell is the burden that some of your developers shall need to carry in the years to come.

What is to be done? First is to understand which programs are well suited to take advantage of a multi-core architecture, and which ones:

if Kal had been working on one single super complex project, the task of dividing up the activities across his multiple siblings would be very onerous, but if Kal was working on a large number of small projects, it would be very easy to simply distribute the projects across the various Kal’s and the coordination and management effort would be unlikely to increase much.

Dhananjay goes into more detail on this and many other issues, that I am skimming over. For example:

Some environments lend themselves to easier multi threading / processing and some make it tough. Some may not support multi threading at all. So this will constrain some of your choices and the decisions you make. While Java and C and C++ all support multi threading, it is much easier to build multi threaded programs in Java than in C or C++. While Python supports multi threading building processes with more than a handful of threads will run into the GIL issue which will limit any further efficiency improvements by adding more threads. Almost all languages will work with multi processing scenarios.

If you are a programmer or a manager of one, you should read the entire article.  In fact, as we mentioned in  a previous PuneTech post (Why Python is better than Java), you should really subscribe to his blog. He writes detailed and insightful articles that, as a techie, you would do well to read. If you are interested in programming languages, I would recommend reading “Contrasting java and dynamic languages”, and “Performance Comparison – C++ / Java / Python / Ruby/ Jython / JRuby / Groovy”. And if you are a blogger, check out his tips for software/programming blogging.

Dhananjay is a Pune-based software Engineer with 17 years in the field. Passionate about software engineering, programming, design and architecture. For more info, check out his PuneTech wiki profile.

Introducing Ask PuneTech – Ask us anything and we’ll get you the answer

Update: Ask PuneTech is being replaced by ForPune.com, a site that is much better suited to answer questions you might have about Pune. Please ask a question there, and we (PuneTech and the rest of the tech community in Pune) will monitor that site and try to get an answer for you. To ensure that the right people read your question, please make sure to tag your question “punetech” so that it gets read by the right people.

The rest of this page is kept for historical purposes, but we prefer that you use ForPune.com instead of the mechanisms described below.



Quick Summary for those too lazy to read the whole page: Ask PuneTech is a One-Stop Shop for all the information needed by infotech professionals and infotech startups in Pune. Ask us a question, or for a referral, or for some information and we find it for you, or we will put you in touch with person who has previous experience in that area to guide you. Just send an email to ask@punetech.com.

Details: PuneTech has been running for 8 months now, and it has been very successful in filling the information gap that existed in the tech ecosystem in Pune. The PuneTech blog reaches about 700 people on a daily basis. The PuneTech wiki is slowly adding more and more useful data (see, for example, the page on tech user groups and organizations in Pune). The PuneTech calendar has become the most comprehensive source of information about tech events in the city.

However, one of the more unexpected ways in which we are helping people is in finding answers to our readers’ questions, or connecting them to the right people. In doing this, we have realized that there is a a need of a service where people who are facing some problem need to be connected with someone who has solved that problem in the past. The internet and google are supposed to solve this problem, but it doesn’t work that way. The web is not particularly useful if you are looking for specific, local information (like, who is a good CA for STPI registration). And sometimes the problem complicated and you need to talk to a human who can guide you. We noticed that we have been doing more and more of that in recent times, and we decided to formalize the concept and announce it. Thus was born the “Ask PuneTech” service.

Through PuneTech, we now know enough experienced people in the field of infotech that for any question you might have as a techie in Pune, or an infotech startup in Pune, we can connect you to the right person who can help you. More importantly, we know enough people who would happily help others like this for free – just for the satisfaction of preventing others from going through the same pain that they went through. So send the question to ask@punetech.com and we’ll get back to you with the answer.

This is a free service, but if you ask a question, you need to promise that when you get a solution to you problem, you will write it up in an article and send it to us for publishing on the PuneTech wiki for the benefit of all the others in the future. This way, the knowledgebase will continue to grow.

The rest of this page has more details in a Q&A format to make you think that you are talking to yourself:

Q: How does it work?

A: If you are a techie in Pune, or a Pune-based info tech startup, and you have some question or difficulty send us the question at ask@punetech.com. If we already know the answer (i.e. someone has asked that question before) you’ll get the answer right away. If not, we’ll tap our network of volunteer experts, and try to get the answer for you. Or, we’ll put you in touch with a person who we believe will be able to guide you appropriately, and whose judgment we have some confidence in. We’ll provide you with of the source of the information, and our confidence in the validity of the information, and/or our confidence in the reputation of the source.

This will be a completely free service, but there is one rule you have to follow: If you find a good enough solution to your problem (and assuming that the question / answer happens to be of general interest), you must write it up and send it to us so that we can post it to the PuneTech wiki for the benefit of the rest of the world. You must promise us this as a “payment” for using our service.

Sample questions:

1. Can you suggest a good consultant / service provider for the following:

a. Accounting

b. Company registration

c. STPI registration

d. Branding / marketing

e. Design (logo, website)

f. Intellectual property (copyright, trademarks, patents)

g. Legal questions (cyber law, privacy)

h. Facilities (rental accommodation, furniture, etc.)

2. Can you suggest a good free / paid service for

a. Web hosting

b. Broadband internet

c. Website development

d. Search Engine Optimization

3. Where can I find a Ph.D. in statistics / maths / data-mining in Pune to help me with xyz

4.

Warning: Ask PuneTech is not intended to be a jobs board. Do not post about open positions you might have. Do not advertise yourself as a candidate. Such requests will be summarily deleted. Requests for persons with specific skills will be entertained only if the skills required are very specific (e.g. good: looking for help with Erlang; bad: looking for Java expert) and for a short consultation (e.g. good: need help with installing SuSE on a Dell laptop, bad: need a QA person).

Q: How much do you charge for this?

A: This is a free service. We are a group of volunteers doing this because we really want the tech ecosystem in Pune to improve, become more vital, become more active, and in general increase the productivity of Pune’s techies. We personally benefit from this indirectly: vastly improved networking, increase in visibility, increase in exposure, having a finger on the pulse of Pune, and the undying gratitude of some of the smartest people in Pune.

Q: Do you take money from services/experts whom you are recommending?

No. PuneTech and “Ask PuneTech” are community-driven, non-commercial services that do not accept any money from any source for any purpose whatsoever. Services/sources will be recommended purely on the basis of quality of service provided.

Q: Why not create a wiki for this and be done? Why is this service needed?

A: We tried and it doesn’t work. For some reason, people don’t add content to a wiki. There isn’t enough critical mass, and/or visibility for this to work. So, to tackle that problem, we are trying this, where we manually connect the “askers” to the “askees”, and “force” the “askers” to update the wiki after the question has been answered.

Q: Why not have an open mailing list where anybody can post questions and answers?

A: One problem is that there are many people who are not aware of the appropriate mailing list or website or forum where the question can be asked. In such cases, the Ask PuneTech service will simply point you towards the appropriate forum where you should ask your question. And in case, the question doesn’t get answered there, then we try to find an actual person to answer the question for you.

That brings us to the second problem: Sometimes posting to a mailing list works, but often it doesn’t. In other cases, there are many questions which do not get answers on the mailing list, in spite of the fact that the list does contain members who know the answers. It is often the case that the person who knows the answer would be happy to guide you if you called him up on the phone, but doesn’t have the time to type out the answer in email (or doesn’t read the mailing list so closely.) We are hoping to overcome this problem by introducing a personal touch.

Where possible, we will redirect you to an appropriate mailing list. e.g. if you have problems with connecting to BSNL broadband from your Ubuntu system, we’ll send you to the PLUG mailing list.

Q: How do you keep tabs on the quality / reliability of the information you provide?

A: We intend to tackle this in two ways. First, for all the suggestions you get, we’ll provide you with of the source of the information, and our confidence in the validity of the information, and/or our confidence in the reputation of the source. Hopefully this will be enough for you to take a slightly more informed decision. We feel that this will be better than having no information at all.

Second, over time, we intend to build a database of information sources and our confidence in the accuracy of their information. After we suggest something to someone, we expect them to provide us feedback as to how well it worked out. We’ll collate this information over time so that the quality of the information improves.

Q: Why are you doing this manually? Shouldn’t you use technology / wisdom of the crowds / automation to do this more efficiently?

OR

Q: Is this the same as Yahoo!Answers / LinkedIn Answers / xyz Forums / CraigsList?

A: We believe that technological / wisdom-of-the-crowds solutions only work at a large scale (like Wikipedia / Yahoo!Answers). It does not work at a smaller scale (like Pune). Otherwise, a viable technology solution would have emerged by now. The lack of that has forced us to try this out.

That said, we definitely intend to use leverage technology as much as possible in this endeavor. We just feel that initially this has to be done manually until it gathers enough of a critical mass.

Also, if you feel that a technology solution will beat this, we would encourage you to try. This is one argument we’d love to lose. We don’t care whether our approach wins or some other approach. We just want a solution to the problem of lack of reliable information.

Q: If this is free, it is not sustainable! This service will shut down after 6 months!

A: We have been doing http://punetech.com as a free service for almost an year now, and we have a pretty good feel for what is sustainable and what isn’t. In any case, we’ll worry about sustainability a little later. For now we are focusing on proving that there is a need, and people will use a service like this, and it will be useful. (We are convinced of that, but we need to prove it.) Once that has been proved, we can worry about how to sustain it.

Q: Since this is volunteer driven, it is not scalable!

A: See answer to previous question.

Q: How can I help?

A:

  1. Start using the service. The more people use it, the more useful the service becomes (and the knowledgebase grows)
  2. Make yourself available to us as an expert who can provide the answer (in whatever happens to be your area of expertise). We promise to not bother you too much. We’ll only forward you as many questions are you are willing to handle. If we don’t know you, we’ll initially put you down as an “untrusted source” and over time, your designation will (hopefully!) change to “reliable”.
  3. Tell all your friends about this service. (Actually, while you are at it, also tell your friends to subscribe to PuneTech at http://punetech.com/subscribe/ to get information about all the latest news, events, tech groups, startups and technologies in Pune.

Q: Who is behind this service?

A: The questions asked to “Ask PuneTech” are answered by a network of volunteers across Pune, who all share a passion for technology, or a passion for Pune, or, in most cases, both.

The service is coordinated by the people who run PuneTech, the most comprehensive source of information about technology in Pune (if you find a better source please let us know!) It was started by Navin Kabra. Amit Paranjape is a key advisor and evangelist for PuneTech, and one of the top volunteers. Another special mention goes to the Pune OpenCoffee Club, which has a close symbiotic relationship with PuneTech. There are a number of other people, too numerous to mention, who contribute in varying degrees.

Both, PuneTech and “Ask PuneTech” are intended to be community-driven, non-commercial services. To ensure that the content remains free from bias and vested interests, we do not accept any money from any source for any purpose whatsoever.

Open-source Code Camp last weekend – A report

The Pune Linux Users Group (PLUG) had organized a code camp on Saturday, with the intention of getting a bunch of developers to get together and develop code, talk about code, answer each others’ coding questions on specific coding projects.

Aditya Godbole one of the PLUG members who attended, posted this overview of the event at the PLUG mailing list:

The following work was successfully done at the Code Camp –

  1. Abhijit Bhopatkar – Added some functionality to TeamGit. He was very excited about it and shot off a long mail to the list as soon as he finished it, so I’m not going to spend any more words on that. Refer to his earlier mail on the list.
  2. Guntapalli Karunakar – Started on something, but ended up spending most of the time in critical maintenance tasks of the Indlinux server!
  3. Ashish Bhujbal and Amit Karpe – Worked on an HCL prototype notebook. Tried to resolve some issues with the X display rotation and calibration of the touchscreen. Solved both issues. Were trying to finish solving a problem with hibernation before going into hibernation themselves.
  4. Aditya Godbole – Fixed 3 bugs in the lush-opencv package and added a utility function. One of the fix is already in the upstream cvs.

Of course, along with all this, we had a blast (which was the primary motive anyway).

Thanks to Manjusha for doing a bit of running around for organisation (in return for which we configured her ssh server 🙂 ). Thanks to Sudhanwa and Shantanoo for hanging around to give us company.

Abhijit Bhopatkar, whose work is mentioned in point #1 above, posted details of teamGit:

teamGit is a functional git gui written in qt, its ultimate aim is to add functionality
on top of git targeted at small closely nit teams.

After a succesfull codecamp session, I have tagged the repo with v0.0.8!!! You can now get the .deb from ubuntu intrepid ppa deb http://ppa.launchpad.net/bain-devslashzero/ubuntu. intrepid main
package name is teamgit.

The main project website is http://www.devslashzero.com/teamgit

There are many small changes and feature adds you can take a look at repo here: http://gitorious.org/projects/teamgit/repos/mainline

The major feature add however is addition on **Advanced** menu.

This menu is constructed on the fly parsing output of ‘git help –all’ Then when you click on a menu item it issues git help , parses the manpage and presents its options in a guified form. It even display nice tooltips describing the option.

This is just a first stage of the planned feature. Ultimately this advanced menu will be just a ‘Admin’ feature. People will be able to save the selected options and parameters as ‘Receipies’ and can cook a nice receipes package particular to their needs/organisations.

The feature is not really complete yet, but you can issue simple commands using it. There _are_bugs_ but i couldn’t wait to showcase this nifty feature.

Check out the screenshot http://www.devslashzero.com/images/teamgitscreenshots/screenshot-teamgit22nov.png

Photos of the event have been posted on flickr (thanks G Karunakar).

PLUG also holds monthly meetings on the first saturday of every month from 4pm to 6pm at SICSR. You can keep track of these and other tech events in Pune by following the PuneTech calendar, or by generally subscribing to the PuneTech feed.

81% of Pune’s Wi-Fi Networks are insecure – ClubHack report

Wi-Fi Security in Pune. Only the WEP encrypted access points (cream colored pie) are secure. Everything else is unsecure.
Wi-Fi Security in Pune. Only the WPA encrypted access points (cream colored pie) are secure. Everything else is insecure.

ClubHack, the group hell-bent on hammering some sense of security hygiene into the heads of an ignorant and careless public, went around Pune making a note of how secure or insecure various Wi-Fi hotspots in the city were, and found that a full 50% were not protected at all, and another 31% were only weakly protected. That just leaves 19% adequately protected.

If you have no idea what I am talking about, here is a little bit of explanation. More and more users are now using wireless networking cards to get their internet access. In such a setup, there is a Wi-Fi card that goes into your desktop/laptop (most modern laptops have this built-in), and to complete the connection there is a device that needs to be plugged into your internet connection (i.e. your broadband cable, or telephone line). This device is called an access point (AP), and is typically a wireless router. The computer then communicates wirelessly with your wi-fi router to connect to the internet.

The above report points out that in 50% of all wi-fi access points installed in Pune, there is no protection against random third-party computers from connecting to the AP. That’s like leaving your front door open. Not only can they access the internet using your AP, but more importantly, it is very likely that they can access the other computers on your network, and can tap into the network traffic going back and forth between those computers and the internet. If you are unlucky, they can get access to sensitive data, like passwords to your email account, or worse, bank account. Or, if, like our government, you want to focus on the wrong thing, you can worry that THE TERRORISTS CAN USE YOUR NETWORK TO SEND BOMB THREATS!!! (and we dutifully reported that in PuneTech.)

Of the remaining, 31% think that they have protected their AP using encryption, but the encryption scheme they are using (WEP) is known to be very weak, and can be broken in a matter of minutes. Which means that a hacker (cracker actually) sitting in a car outside your building can easily break into the network without anybody realizing it.

How did ClubHack find out? This is what they did:

On 10th November 2008, ClubHack created a setup in a car which included laptops & GPS enabled devices for the exercise. The car was driven in all the popular areas which included IT parks, multiplexes, residential areas, markets, busy streets etc. While the car was driving at a normal speed, the GPS and wireless enabled devices sensed the availability of wireless signals on the road. These signals were then recorded with details like MAC address of the access point, name of the network, security used, longitude and latitude of the location where the signal of a particular network was highest.

And just in case anybody amongst you is thinking that what they did was illegal and actionable, don’t worry! They took permission of Pune Police to undertake this mission, and Pune Police actually sent an officer to accompany them. For some more details of their project and findings, you can check out the short report, or the full report (PDF).

What should you do? If you are reading PuneTech, then no doubt you are one of the smart ones who are in the 19% that use WPA based encryption. But just in case someone slipped through, what you need to do is educate yourself about wi-fi security issues, and ensure that you change the settings on your wi-fi access point to use one of the WPA based encryption schemes. (There are 6 or 7 variants like WPA-PSK, WPA2-Personal, etc. Any one of them will do.) And please change the default administrator password for your AP. And if you have no clue what I am talking about, get a friend who understands to help you. Or pony up the Rs. 1000 for the wi-fi security workshop that ClubHack is going to conduct next month, or the Rs. 8000 for the wi-fi security training that AirTight networks is going to conduct later this month. This last one is certainly recommended if you are the network admin for one of the IT companies that ClubHack managed to snag during their wardrive.

And just in case the remaining 19% are feeling very pleased with yourself, I should also point out that security guru Bruce Schneier keeps his own wi-fi network open. It is a fascinating, and insightful, and a different take on this issue that you should read. But inspite of Bruce’s sage advice, I keep my router protected with WPA. Because Bruce’s advice amounts to saying that I should leave my door open, but keep all my drawers, and cupboards, and closets and bedroom door locked, and the fridge and TV chained to the wall. I’m not a security guru, and I am sure I’ll leave some door open. Don’t want to take that chance.

Pune company watch: Companies that are doing work related to this area in Pune: Airtight Networks, Symantec, QuickHeal.

Data management and data quality in business intelligence

I am liveblogging CSI Pune‘s lecture on Data Management and Data Quality in Business Intelligence, by Ashwin Deokar of SAS R&D Pune.

Huge amounts of data being generated these days. Different technologies (from databases, to RFID tags and GPS units), different platforms (PCs, servers, cellphones), different vendors. And all this data is often duplicated and inconsistent. All of this data needs to be collected in one place, and cleaned up?

Why? Three reasons:

  • Competitive business environment: With better, and more granular data, you can increase your profits, and reduce costs. For example, Walmart forcing RFID tags on all items that are supplied to them by suppliers – and tracking their locations for very accurate and up-to-date inventory control
  • Regulatory and Compliance requirements: e.g. US government has seriously strict data gathering and storage requirements for hospitals (HIPAA). If you can’t generate this data, you go to jail. That certainly reduces your ability to increase profits.
  • Adherence to Industry standards: If you can’t produce and consume data in the format that everybody else understands, you can’t play with the big boys

The key areas of study in this area are:

  • Data governance: Policies that govern the use of data in an organization. Done usually from the point of view of increasing data security (prevent hackers from getting in, prevent data from leaking out inadvertently), ensuring compliance with regulations, and optimal use of data for organizational growth.
  • Data architecture and design: Overall architecture – data storage, ETL process design, BI architecture, etc.
  • Database management: Since there are huge quantities of data, making a mistake here will pretty much doom the whole project to failure through overload. Which database? Optimizing the performance. Backup, recovery, integrity management, etc.
  • Data security: Who should have access? Which data needs to be kept private?
  • Data quality: Lots of work needed to ensure that there is a single version of the truth in the data warehouse. Especially difficult for non-transactional data (i.e. data that is not there in a database). e.g. Ashwin Deokar is the same as A.P. Deokar. Need fancy software that will do these transformations on the data.
  • Data Warehousing and Business Intelligence: What this component does is covered in a previous PuneTech article.

Data Quality. Why this is an important problem:

  • 96000 IRS tax refund cheques did not get delivered because of incorrect addresses.
  • An acquiring company, which acquired another company mainly for the customer base found that the acquisition was vastly overvalued – because the got 50% fewer customers than expected. Due to duplicates in the database.
  • A cable company lost $500,000 because a mislabeled shipment resulted in a cable being laid at a wrong location.
  • A man defrauded a CD company by taking their “introductory” offer (of free CDs) over 1600 times, by registering that many different accounts with different address. Since he did not really have that many different addresses, he managed to fool their computers by making slightly different address using minor changes like extra punctuation marks, fictitious apartment numbers, slightly different spellings, etc. Total damage: $250,000.

There is a process, combination of automated algorithms, and human assistance to help with improving data quality. And it is not just about duplicate data, or incorrect data. You also need to worry about missing data. And fetching it from the appropriate “other” sources.

What do you do?

  • Clean up your data by standardizing it using rules – have canonical spellings for names, addresses, e etc.
  • Use fancy algorithms to detect duplicates which are obvious by just looking at the strings. For example, “IBM” and “International Business Machines” do not look similar. But if they have the same address, same number of employees, etc., then you can say they are the same. (And you can have thresholds that adjust the sensitivity of this matching process.)
  • Use external data to clean up the data for de-duplication. For example, US Postal service publishes a CD of every valid address in the US. Businesses buy that CD and use that to convert all their address data to this standard format. That will result in major de-duplication.

SAS provides tools for all the steps in this process. And since it has all the pieces, it has the advantage of ensuring that there is a single meta-data repository for all the steps in this process – which is a huge advantage. SAS has the best ETL tools. It also exists in analytics, and BI. It has OLAP capabilities, but it really excels in business intelligence applications.

SAS R&D Pune has engineers working on various core products that are used in this area – meta-data, ETL, BI components. It also has a consulting group that helps companies deploy SAS products and use them – and that ends up working on all the parts of the data management / data quality process.

Upcoming conferences and tech events in Pune – Nov/Dec 2008

IdeaCamp Pune (source: InsideSocialWeb.com)
Idea Camp Pune, 2008. Photo courtesy InsideSocialWeb.com

The next couple of months are going to rather active in Pune, with a host of really good conferences and events coming up. Some of these are free events, while others have a fee associated with them. We have written about some of them on PuneTech before, while some you’ll be hearing about for the first time. Some of them are for hardcore techies, while others are more tangential. In any case, there is something for everyone in here. Take this opportunity to improve your skills, or improve your business network.Except for power cuts, it is a great time to be a techie in Pune.

Nov 19 CSI Pune Lecture: Data Management for BI : Ashwin Deokar from SAS R&D Pune will talk about issues in data management in Business Intelligence. Free for members & students, Rs. 100 for others, Rs 50 for Persistent employees
Nov 22, 23 Code Camp: 24-hour code camp organized by Pune Linux Users Group. Free: anybody can attend.
Nov 22 Pune OpenCoffee Club Meeting – Pune Startup’s Pain Points : Get together with other startups in the Pune area and discuss solutions to common problems. Free, anybody can attend, no registration required.
Nov 25,26,27 IndicThreads Conference on Java Technologies: 3-day conference on Java; speakers from all over India. Fees range from Rs. 4000 to 8500 depending on various things.
Nov 27, 28 Conference on Advances in Usability Engineering: organized by Viswakarma Institute of Information Technology. Rs 3500 for professionals, Rs. 2000 for academics and Rs. 500 for students.
Nov 27, 28 Wi-Fi Security Training from AirTightNetworks: Airtight Networks has some of the best wi-fi security products in the world, and they have all been developed fully in Pune. Rs. 8000 before 21 Nov, Rs 10000 afterwards
Nov 27 World Usability Date, Pune 2008 (part of the Usability Conference: This event is a part of the Usability Engineering conference listed a couple of lines above; but this part of the conference (3pm to 6pm) is free and open to all.
Nov 29 Barcamp Pune 5: If you don’t know what a barcamp is read this to find out and figure out why you should attend.
Dec 4,5,6,7 Pune Design Festival 2008: Fees and registration details not yet available
Dec 06+ ClubHack – 2-day InfoTech Security Conference: One day of presentations on security, and one day of workshops. INR 1000 for talk sessions, INR 1000 for each workshop. On the spot registration INR 1500
Dec 12+ Society of Technical Communication – 2-day conference on technical writing: Fees and registration details not yet available
Dec 17 CSI Pune Lecture: Data Management for BI: next in the Business intelligence series by SAS R&S India. Fees most likely: Rs. 100 for others, Rs 50 for Persistent employees
Dec 20 OpenSocial Developer Garage: Conference for OpenSocial developers and enthusiasts. This is a free conference, but by invitation only – Register here to be considered for invitation.

And there are some great events in January too.

Did we miss any? Please add them to the common tech events calendar of Pune. Or, send us a mail with details of the event, and we’ll add it.

BSNL to provide managed network services in Pune

The Indian Express reports that Pune has yet again been chosen as a city to introduce a new hi-tech service: Managed Network Services from BSNL. The basic idea is that instead of simply providing broadband and other types of network access, BSNL will take over the entire job of handling the network of small companies. The hardware, software and maintenance will be handled by BSNL, and the company just has to pay the monthly rent (starting at Rs. 8000 p.m.)


BSNL chairman and managing director Kuldeep Goyal on Friday launched the Managed Network Services from Pune. Calling this service a ‘new chapter’ in the history of BSNL, Goyal said, “We chose Pune to run the project on a pilot basis, as we are sure of getting a good feedback here.”

[…]

“The assured service with fixed monthly charges and no capital expenditure investment will reduce the entry barrier to the customer and help in expanding our service. Initially, we are targeting 50 small and medium enterprises as our customers.” he added.

The integrated secured data service is actually a bundle of hardware, connectivity package and completely managed services, including the 24×7 proactive monitoring of customer network and remote configuration of troubleshooting.

See full article.

Pune OpenCoffee Club meeting: Pune startup’s pain points

What: Pune OpenCoffee Club get-together. Solving the pain points of Pune’s startups

When: Saturday, 22nd November, 4pm – 7pm

Where: SICSR, Model Colony. Here is the map.

Registration and Fees: This event is free for everyone – no registration required

Details:

The

Are you a Pune startup struggling with issues that you shouldn’t need to struggle with? Electricity. Access to good service providers: CA, STPI registration, Website design, hosting? Difficulty with hiring? Need an SEO expert, but no idea who is realiable/worth it? Looking for someone to help you with facilities/furniture? Looking for Masters/PhD in mathematics/statistics to consult and don’t know where to look? Its very likely that your peers have faced the same problems and some have found solutions the hard way. Let’s discuss in a group and look for specific suggestions from those who’ve been there, done that. We’ll try to moderate the discussion aggressively to ensure relevance, and prevent tangents and rambling, or pointless bitching. We’ll try to collect and tabulate the most useful answers and post them on the web for the benefit of others. If you are unable to attend the meeting, but do have a question that you would like to raise, you can email them to me (navin@punetech.com) and I’ll try to get them asked/answered.

Also, if you are a Pune startup and would like to demo your product to or present it to the community at this get-together, please get in touch with us.

The Pune OpenCoffee Club meets on the third Saturday of every month from 4 to 7 pm at SICSR. (This time, we moved it to the 4th Saturday due to a phantom conflict with Barcamp Pune.) In addition, there is Startup Cinema – information about with you can find on the POCC mailing list. Often, there are also other ad hoc meetings organized by members of POCC. See the PuneTech calendar for a comprehensive list of all upcoming tech events in Pune.