Category Archives: Overviews

Looking for a job? Try PuneStartupJobs…

Looking for a job? Try PuneStartupJobs

There’s a new jobs listing forum in town, and it contains postings of jobs that you will not find anywhere else. Check out the PuneStartupJobs mailing list (which is an initiative of the Pune Open Coffee Club).

Pune OpenCoffee Club - POCC Logo
POCC is an informal group of the Pune Startup ecosystem. It contains more than 2500+ people who either have their own startups, or want to start one, or provide some service (or funding) to startups. Click on the logo to find all punetech articles about the POCC. Thanks to threenovember.com for the POCC logo.

The Pune Open Coffee Club is an informal group for all those interested in the Pune startup ecosystem, and many of the startups on that group realized that the conventional avenues for job postings were either too ineffective, or too expensive for the smaller startups. To counter this problem, the PuneStartupJobs mailing list was started. In keeping with the philosophy behind the POCC, the PuneStartupJobs mailing list is also free – any POCC member can post job postings, and anybody can subscribe to receive updates.

Features of PuneStartupJobs:

  • Free. No fees for posting. No fees for subscribing
  • Anybody can subscribe
  • Focused: Only Pune Startups can post. (Some other postings (e.g. Mumbai) get through once in a while, but it’s largely local.)
  • Moderated: All posts are moderated, so no spam.
  • A weekly digest of PuneStartupJobs postings is auto-posted to the main PuneStartups mailing list. This ensures wider (but delayed) circulation to a larger group. (Thanks to Pune startup Thinking Space Technologies for implementing this functionality.)

So, if you’re someone likely to be interested in getting a job with a startup in Pune, or if you might know someone who might be interested, or simply, if you’re interested in finding out what kinds of people Pune’s startups are looking to hire, you should subscribe…

Pune Incubator body ISBA launches Rs. 55cr venture capital fund

The VC Circle blog has just posted information about a new venture capital fund launched by a Pune based association – Indian STEP and Business Incubators Association (ISBA). ISBA is an association of startup/business incubators, incubatee startups, and other people interested in this ecosystem.

The fund will focus on sector-agnostic investments in companies with no proven track record.

Arihant Group, a company engaged in steel manufacturing in Pune, has contributed a major chunk to this fund while the other investor in the fund is Mumbai-based Adventa Infratructure Pvt Ltd.

[…]

The fund looks at an average investment of Rs 2.5 crore, and expects equity stake somewhere between 5% and 30% in investee companie

The ISBA was set up in 2004 and aims to promote business incubation activities in the country through exchange of information, sharing of experience, and other networking assistance among Indian Business Incubators, Science and Technology Entrepreneurs Parks (STEPs) and other related organizations engaged in the promotion of start-up enterprises.

These are the planned activities of ISBA:

» Provide advice on finding out the requirements and conditions for starting an incubator, creating business plan, recruiting incubator managers, and incubator development issues;
» Maintain and update a data base containing the contact information of business incubation experts;
» Lobby for Indian incubators at national and international levels;
» To organize workshops, conferences, seminars, or training services;
» Publish a newsletter;
» To organize media conferences and other activities to create awareness about the incubator programme and get public participation;

For more information about ISBA, see it’s website.

Reblog this post [with Zemanta]

Business Continuity Management Lifecycle and Key Contractual Requirements

(This overview of Business Continuity Management is a guest post by Dipali Inamdar, Head of IT Security in Geometric)

In emergency situations like pandemic outbreaks, power failures, riots, strikes, infrastructure issues, it is important that your business does not stop functioning. A plan to ensure this is called a Business Continuity Plan (BCP), and it is of prime importance to your business to ensure minimum disruption and smooth functioning of your operations. Earlier most companies would document business continuity plans only if their clients asked for it and would focus mainly on IT recovery. But scenarios have changed now. Corporations of all sized have now realized the importance of keeping their business functioning at all time and hence they are working towards a well defined business continuity management framework. Business continuity (BC) is often understood as a process to handle events that could disrupt business. However, BC is more than just recovery. The plan should also ensure proper business resumption after recovering from the disruption.

Business continuity management is a continuous life cycle as follows:

Business Continuity Planning Lifecycle
Click on the image to see it in full size

How does one start with BCM?

Business Impact Analysis (understanding the organization)

The first step is to conduct a Business Impact analysis. This would help you to identity critical business systems and processes and how their outage (downtime) could affect your business. You cannot have plan in place for all the processes without considering financial investments needed to have those in place. CEO’s inputs and client BC requirements also serve as input for impact analysis.

Defining the plan (Determining BCM strategy)

The next step is to identify the situations that could lead to disruption of the identified critical processes.

The situations could be categorized as:

  • Natural and environmental: – Earthquakes, floods, hurricanes etc
  • Human related: – Strikes, terrorist attacks, pandemic situation, thefts etc
  • IT related: – critical systems failure, virus attacks etc
  • Others: – Business Competition, power failure, Client BC contractual requirements

It might not be feasible to have plans for each and every situation, as implementing the defined plans needs to be practically possible. After the situations have been identified one needs to identify different threats, threat severity (how serious will be the impact on business if threat materializes) and their probability of occurrence (likelihood of threat materialization). Based on threat severity and occurrence levels critical risks are identified.

Implementing the plan (Developing and implementing BCP response)

The identified risks and additional client specific BCP requirements serve as inputs to the creation of BCPs. BCPs should focus on mitigation plan for the identified risks. BCP should be comprehensive, detailing roles and responsibilities of all the response teams. Proper budget needs to be allocated. Once the plan is documented the plan should be implemented.

The different implementation as per BCP could include having redundant infrastructure, signing up Service Level Agreements (SLAs) with service providers, having backup power supply, sending backup tapes to offshore sites, and training people in cross skills, having proper medicines or masks for addressing pandemic situations.

BCP should also have proper plans in place to resume business as usual. Business resumption is a critical and very important aspect of business continuity framework.

Testing and improving plan (Exercising, maintaining and reviewing)

Once the plans are documented and implemented the plans should be regularly tested. The tests could be scheduled or as and when the need arises. One can simulate different tests like moving people to other locations, having primary infrastructure down, testing UPS and diesel generator capacity, calling tree tests, evacuation drills, having senior management backups to take decisions, transport arrangements etc.

The tests will help you identify areas which need improvement in the BCP. The gaps between the expected and actual results need to be compared. The test results needs to be published to senior management. The plan needs to be reviewed regularly to update latest threats and have mitigations for the critical ones which will result in continuous lifecycle. One can schedule internal audits or apply for BS25999 certification to ensure proper compliance to BCP requirements.

Pune faces threats of irregular power supply, pandemic out break etc which could lead to business disruptions. One needs to have detailed plans for critical threats to ensure continuity of critical operations. The plans should also have detailed procedure to ensure proper business resumption. Plans may be documented but actual action during emergency situations is very important.

Important note: Contractual requirements

When signing off specific contractual requirements with clients, certain precautions must be taken as follows:

  • Before signing stringent SLAs it should be checked that there is a provision for exclusions or relaxations during disaster situations as you will not be able to achieve SLAs during disaster scenarios
  • When BCP requirements are defined in client contracts the responsibilities or expectations from the clients should also be clearly documented and agreed to ensure effective execution of the BCP
  • BCP requirements can only be effectively implemented when proper budget allocations are planned. So for specific BCP requirements cost negotiations with the client are important. Usually this is ignored, so it is important that the sales team should be appraised before agreeing on BCP requirements with the client.
  • Do not sign-off on vague BCP requirements. They should be clear, specific and practically achievable
  • Before signing off any contract which has a penalty clause, it should be reviewed thoroughly to ensure that compliance to those clauses is practically possible

About the author: Dipali Inamdar

Dipali Inamdar, Head – IT security in Geometric Ltd, has more than 11 years of experience in Information Technology and Information Security domain. She is a certified CISA, ISO27001 Lead Auditor, BS25999 Lead Auditor and ISO2000 Internal auditor. She has worked in sectors spanning BPO, IT and ITES companies, Finance sector for Information Security and Business Continuity Management. She is currently operating out of Pune and is very passionate about her field. See her linked-in profile for more details.

Reblog this post [with Zemanta]

Allow your marketing department to produce translated versions of your website with Dubzer

Do in half an hour what would normally take weeks!

dubzer-logoNormally, making a website available in multiple languages is the last thing on the minds of a web developer. It costs time and money, is boring work, is error prone, and in the mind of a developer, it does not add any new features, or sexiness to the website. So, the developer drags his or her feet over this.

Of course, marketing does not see things the same way. Large chunks of the market are excluded when a website talks just in English. And trying to get engineering to produce the other language versions is a struggle.

Like they say, the best person to do something is the person who is most passionate about it. Which means that it would be great if the marketing department could translate the website without having to involve the developers.

That is the promise of Dubzer, which launched at Demo Fall 09’s Alpha Pitch today, and become the latest Pune startup to hit the world stage (after onion.tv that debuted at TechCrunch50’s demopit) from Santosh Dawara and Anjali Gupta, who previously brought us Lipikaar (the software for typing in Indian languages, which my mom loves), and Bookeazy (the much loved, but now dead, movie ticket booking service).

Dubzer will allow non-technical people to create translated versions of a website, or parts of a website, without requiring any significant changes to the backend of the website. It appears that this will be a hosted service, where you provide Dubzer with the URL of your website, they crawl it and then provide you with an online platform where you can start translating and publishing portions of your website (start with the most popular, or most important pages first). There are a whole bunch of features indicating that enterprise users are also being targeted – specifically, ability to translate intranets, fine-grained access control (i.e. who has permission to translate which portions of the site). Another interesting feature is that they allow you to implement different mechanisms of translation – i.e. free & paid translation methods, such as machine based (e.g. google translate), volunteer driven (e.g. what facebook is doing), crowd-sourced (as in wikipedia), or professional translation (sometimes, you get what you pay for).

Unlike Bookeazy and Lipikaar, Dubzer is actually incubated by Persistent Systems, which means that the team sits in Persistent Systems’ premises (except Anjali who has left Pune, traitorously defected to Bangalore). Their board of advisors includes Anand Deshpande, Founder and MD of Persistent, Abhijit Athavale, President of Markonix, and creator of PuneChips, and Jugal Gupta, CEO of Databyte.

One interesting point to note: Last year the Lipikaar founders ran into the problem of translating their website into all the various Indian languages (they are after all trying to sell software for writing in 18 different languages, so they better have their own website in those languages). When they did not find a decent solution, they decided to build it themselves and Dubzer was born. Similarly, Arun Prabhudesai was looking for a way to monetize his blog, trak.in with in-text ads, and did not find any appropriate solution. So, he decided to build it himself and hover.in was born.

There-in lies a lesson for us all…

Reblog this post [with Zemanta]

Venture Center – Pune’s incubator for startups in biotech, chemical, materials sciences

Kaushik Gala, Business Development Manager at Venture Center is looking for all innovators in the areas of biology, chemical, and material sciences.
Kaushik Gala, Business Development Manager at Venture Center is looking for all innovators in the areas of biology, chemical, and material sciences.

Venture Center is an incubator housed in NCL Pune, created with this purpose:

To nucleate and nurture technology and knowledge-based enterprises for India by leveraging the scientific and engineering competencies of the institutions in the region.

Envisioned Future: To be the organisation that will be credited with creating, shaping and sustaining a “Pune cluster” of innovative technology businesses with a significant economic impact regionally, nationally and globally within the next 20 years.

To find out more about Venture Center, we interviewed Kaushik Gala, the Business Development Manager of Venture Center. Here are excerpts from the interview:

1. What is Venture Center?

Entrepreneurship Development Center (‘Venture Center’) is a technology business incubator approved by the Department of Science & Technology, Government of India. Venture Center is incorporated as a Section 25 not-for-profit Company established under the Companies Act 1956.

Venture Center was setup with support from the Department of Science & Technology – National Science and Technology Entrepreneurship Development Board (DST-NSTEDB) and National Chemical Laboratory (NCL) (constituent lab of the Council of Scientific and Industrial Research).

2. What are the services that Venture Center will provide incubatee companies?

Venture Center provides:

· Infrastructure – Dedicated labs, shared work-benches, analytical facilities, offices, hot-desks, etc.

· Advisory – Intellectual property, business planning, startup nuts-and-bolts issues, etc.

· Fund-raising – Seed stage fund raising from various sources including government agencies (eg. MoMSME), professional investors, etc.

· Technology commercialization program (‘Lab2Mkt’)

· Information and learning center – Library, databases, workshops, seminars, etc.

3. At what stage do you expect innovators and/or startup companies to approach you?

We offer resources and services at all stages of an early-stage technology startup – ranging from idea/conception, to prototype to Series A/B financing.

4. Obviously you are not interested in incubating any and all startups? Can you describe, with some examples, what sectors you are limiting yourself to?

Our focus is on the areas of material, chemical and biological sciences and related engineering / software ventures. However, some of our services are open to all individual entrepreneurs and startups.

Specific examples include startups that have commercialization technologies related to surgical implants, membranes for water purification, CFD and modeling solutions, etc.

5. Are the innovators expected to move to Pune, into your facility, to avail of any of your services?

For startups that need our infrastructure facilities, being located in Pune is obviously preferable. However, for services such as advisory and fund-raising, they can be located outside Pune as well.

6. How is Venture Center funded? What are your long-term funding plans?

Venture Center is funded via:

· A grant from DST-NSTEDB for start-up costs and operational expenses for the first 5-years

· In-kind support from NCL

· Donations from well wishers

After the fifth year of operation, Venture Center is expected to become self-sufficient. Besides generating revenue from a variety of services, our long-term funding plans include:

· Raising capital from governmental agencies and professional investors to set up an early-stage (‘seed’) fund for investment in technology ventures

· Raising grant funds from governmental and corporate agencies to expand our services portfolio

· Partnering with other R&D labs, domestic/foreign incubators, etc.

You can find out more about Venture Center at its website which is packed with a huge amount of detail. Information about the executive team behind Venture Center is here.
Reblog this post [with Zemanta]

optimization: a technical overview

(This is the fourth in the PuneTech series of articles on optimization by Dr. Narayan Venkatasubramanyan, an Optimization Guru and one of the original pioneers in applying Optimization to Supply Chain Management. The first one was an ‘overview’ case study of optimization. The second was architecture of a decision support system. The third was optimization and organizational readiness for change.

For Dr. Narayan Venkatasubramanyan’s detailed bio, please click here. For the full series of articles, click here.)

this is a follow-up to optimization: a case study. frequent references in this article to details in that article would make this one difficult to read for someone who hasn’t at least skimmed through that.

the problem of choice

the wikipedia article on optimization provides a great overview of the field. it does a thorough job by providing a brief history of the field of mathematical optimization, breaking down the field into its various sub-fields, and even making a passing reference to commercially available packages that help in the rapid development of optimization-based solutions. the rich set of links in this page lead to detailed discussions of each of the topics touched on in the overview.

i’m tempted to stop here and say that my job is done but there is one slight problem: there is a complete absence of any reference to helicopter scheduling in an offshore oil-field. not a trace!

this brings me to the biggest problem facing a young practitioner in the field: what to do when faced with a practical problem?

of course, the first instinct is to run with the technique one is most familiar with. being among the few in our mba program that had chosen the elective titled “selected topics in operations research” (a title that i’m now convinced was designed to bore and/or scare off prospective students who weren’t self-selected card-carrying nerds), we came to the problem of helicopter scheduling armed with a wealth of text-book knowledge.

an overview of linear programming

A series of linear constraints on two variable...
the lines represent the constraints. the blue region is the set of all “permissible values”. the objective function is used to choose one (“the most optimal”) out of the blue points. image via wikipedia

having recently studied linear and integer programming, we first tried to write down a mathematical formulation of the problem. we knew we could describe each sortie in terms of variables (known as decision variables). we then had to write down constraints that ensured the following:

  • any set of values of those decision variables that satisfied all the constrains would correspond to a sortie
  • any sortie could be described by a set of permissible set of values of those decision variables

this approach is one of the cornerstones of mathematical programming: given a practical situation to optimize, first write down a set of equations whose solutions have a one-to-one correspondence to the set of possible decisions. typically, these equations have many solutions.

click here for an animated presentation that shows how the solutions to a system of inequalities can be viewed graphically.

the other cornerstone is what is called an objective function, i.e., a mathematical function in those same variables that were used to describe the set of all feasible solutions. the solver is directed to pick the “best” solution, i.e., one that maximizes (or minimizes) the objective function.

the set of constraints and the objective function together constitute a mathematical programming problem. the solution that maximizes (or minimizes) the objective function is called an optimal solution.

linear programming – an example

googling for “linear programming examples” leads to millions of hits, so let me borrow an example at random from here: “A farmer has 10 acres to plant in wheat and rye. He has to plant at least 7 acres. However, he has only $1200 to spend and each acre of wheat costs $200 to plant and each acre of rye costs $100 to plant. Moreover, the farmer has to get the planting done in 12 hours and it takes an hour to plant an acre of wheat and 2 hours to plant an acre of rye. If the profit is $500 per acre of wheat and $300 per acre of rye how many acres of each should be planted to maximize profits?”

the decisions the farmer needs to make are: how many acres of wheat to plant? how many acres of rye to plant? let us call these x and y respectively.

so what values can x and y take?

  • since we know that he has only 10 acres, it is clear that x+y must be less than 10.
  • the problem says that he has to plant at least 7 acres. we have two choices: we can be good students and write down the constraint “x+y >= 7” or we can be good practitioners and demand to know more about the origins of this constraint (i’m sure every OR professional of long standing has scars to show from the times when they failed to ask that question.)
  • the budget constraint implies that 200x + 100y <= 1200. again, should we not be asking why this farmer cannot borrow money if doing so will increase his returns?
  • finally, the time constraint translates into x + 2y <= 12. can he not employ farm-hands to increase his options?
  • the non-negativity constraints (x, y >= 0) are often forgotten. in the absence of these constraints, the farmer could plant a negative amount of rye because doing so would seem to get him more land, more money, and more time. clearly, this is practically impossible.

as you will see if you were to scroll down that page, these inequalities define a triangular region in the x,y plane. all points on that triangle and its interior represents feasible solutions: i.e., if you were to pick a point, say (5,2), it means that the the farmer plants 5 acres each of wheat and 2 acres of rye. it is easy to confirm that this represents no more than 10 acres, no less than 7 acres, no more than $1200 and no more than 12 hours. but is this the best solution? or is there another point within that triangle?

this is where the objective function helps. the objective is to maximize the profit earner, i.e., maximize 500x + 300y. from among all the points (x,y) in that triangle, which one has the highest value for 500x + 300y?

this is the essence of linear programming. LPs are a subset of problems that are called mathematical programs.

real life isn’t always lp

in practice, not all mathematical programs are equally hard. as we saw above, if all the constraints and the objective function are linear in the decision variables and if the decision variables can take on any real value, we have a linear program. this is the easiest class of mathematical programs. linear programming models can be used to describe, sometimes approximately,a large number of commercially interesting problems like supply chain planning. commercial packages like OPL, GAMS, AMPL, etc can be used to model such problems without having to know much programming. packages like CPLEX can solve problems with millions of decision variables and constraints and produce an optimal solution in reasonable time. lately, there have been many open source solvers (e.g., GLPK) that have been growing in their capability and competing with commercial packages.

Illustrates a cutting plane algorithm to solve...
integer programming problems constrain the solution to specific discrete values. while the blue lines represent the “feasible region”, the solution is only allowed to take on values represented by the red dots. this makes the problem significantly more difficult. image via wikipedia

in many interesting commercial problems, the decision variables is required to take on discrete values. for example, a sortie that carries 1/3 of a passenger from point a to point b and transports the other 2/3 on a second flight from point a to point b would not work in practice. a helicopter that lands 0.3 in point c and 0.7 in point d is equally impractical. these variables have to be restricted to integer values. such problems are called integer programming problems. (there is a special class of problems in which the decision variables are required to be 0 or 1; such problems are called 0-1 programming problems.) integer programming problems are surprisingly hard to solve. such problems occur routinely in scheduling problems as well as in any problem that involves discrete decisions. commercial packages like CPLEX include a variety of sophisticated techniques to find good (although not always optimal) solutions to such problems. what makes these problems hard is the reality that the solution time for such problems grows exponentially with the growth in the size of the problem.

another class of interesting commercial problems involves non-linear constraints and/or objective functions. such problems occur routinely in situations such refinery planning where the dynamics of the process cannot be described (even approximately) with linear functions. some non-linear problems are relatively easy because they are guaranteed to have unique minima (or maxima). such well-behaved problems are easy to solve because one can always move along an improving path and find the optimal solution. when the functions involved are non-convex, you could have local minima (or maxima) that are worse than the global minima (or maxima). such problems are relatively hard because short-sighted algorithms could find a local minimum and get stuck in it.

fortunately for us, the helicopter scheduling problem had no non-linear effects (at least none that we accounted for in our model). unfortunately for us, the discrete constraints were themselves extremely hard to deal with. as we wrote down the formulation on paper, it became quickly apparent that the sheer size and complexity of the problem was beyond the capabilities of the IBM PC-XT that we had at our disposal. after kicking this idea around for a bit, we abandoned this approach.

resorting to heuristics

we decided to resort to a heuristic approach, i.e., an approach that used a set of rules to find good solutions to the problem. the approach we took involved the enumeration of all possible paths on a search tree and then an evaluation of those paths to find the most efficient one. for example, if the sortie was required to start at point A and drop off m1 men at point B and m2 men at point C, the helicopter could

  • leave point A with the m1 men and proceed to point B, or
  • leave point A with the m2 men and proceed to point C, or
  • leave point A with the m1 men and some of the m2 men and proceed to point B, or
  • leave point A with the m1 men and some of the m2 men and proceed to point C, or
  • . . .

if we were to select the first possibility, it would drop off the m1 men and then consider all the options available to it (return to A for the m2 men? fly to point D to refuel?)

we would then traverse this tree enumerating all the paths and evaluating them for their total cost. finally, we would pick the “best” path and publish it to the radio operator.

at first, this may seem ridiculous. the explosion of possibilities meant that this tree was daunting.

there were several ways around this problem. firstly, we never really explicitly enumerated all possible paths. we built out the possibilities as we went, keeping the best solution until we found one that was better. although the number of possible paths that a helicopter could fly in the course of a sortie was huge, there were simple rules that directed the search in promising directions so that the algorithm could quickly find a “good” sortie. once a complete sortie had been found, the algorithm could then use it to prune searches down branches that seemed to hold no promise for a better solution. the trick was to tune the search direction and prune the tree without eliminating any feasible possibilities. of course, aggressive pruning would speed up the search but could end up eliminating good solutions. similarly, good rules to direct the search could help find good solutions quickly but could defer searches in non-obvious directions. since we were limited in time, so the search tree was never completely searched, so if the rules were poor, good solutions could be pushed out so late in the search that they were never found, at least not in time to be implemented.

one of the nice benefits of this approach was that it allowed the radio operator to lock down the first few steps in the sortie and leave the computer to continue to search for a good solution for the remainder of the sortie. this allowed the optimizer to continue to run even after the sortie had begun. this bought the algorithm precious time. allowing the radio operator the ability to override also had the added benefit of putting the user in control in case what the system recommended was infeasible or undesirable.

notice that this approach is quite far from mathematical programming. there is no guarantee of an optimal solution (unless one can guarantee that pruning was never too aggressive and that we exhaustively searched the tree, neither of which could be guaranteed in practical cases). nevertheless, this turned out to be quite an effective strategy because it found a good solution quickly and then tried to improve on the solution within the time it was allowed.

traditional operations research vs. artificial intelligence

this may be a good juncture for an aside: the field of optimization has traditionally been the domain of operations researchers (i.e., applied mathematicians and industrial engineers). even though the field of artificial intelligence in computer science has been the source of many techniques that effectively solve many of the same problems as operations research techniques do, OR-traditionalists have always tended to look askance at their lowly competitors due to the perceived lack of rigour in the AI techniques. this attitude is apparent in the wikipedia article too: after listing all the approaches that are born from mathematical optimization, it introduces “non-traditional” methods with a somewhat off-handed “Here are a few other popular methods:” i find this both amusing and a little disappointing. there have been a few honest attempts at bringing these two fields together but a lot more can be done (i believe). it would be interesting to see how someone steeped in the AI tradition would have approached this problem. perhaps many of the techniques for directing the search and pruning the tree are specific instances of general approaches studied in that discipline.

if there is a moral to this angle of our off-shore adventures, it is this: when approaching an optimization problem, it is tempting to shoot for the stars by going down a rigorous path. often, reality intrudes. even when making technical choices, we need to account for the context in which the software will be used, how much time there is to solve the problem, what are the computing resources available, and how it will fit into the normal routine of work.

other articles in this series

this article is the fourth in the series of short explorations related to the application of optimization. i’d like to share what i’ve learned over a career spent largely in the business of applying optimization to real-world problems. interestingly, there is a lot more to practical optimization than models and algorithms. each of the the links leads to a piece that dwells on one particular aspect.

optimization: a case study
architecture of a decision-support system
optimization and organizational readiness for change
optimization: a technical overview (this article)

About the author – Dr. Narayan Venkatasubramanyan

Dr. Narayan Venkatasubramanyan has spent over two decades applying a rare combination of quantitative skills, business knowledge, and the ability to think from first principles to real world business problems. He currently consults in several areas including supply chain and health care management. As a Fellow at i2 Technologies, he tackled supply chains problems in areas as diverse as computer assembly, semiconductor manufacturer, consumer goods, steel, and automotive. Prior to that, he worked with several airlines on their aircraft and crew scheduling problems. He topped off his days at IIT-Bombay and IIM-Ahmedabad with a Ph.D. in Operations Research from the University of Wisconsin-Madison.

He is presently based in Dallas, USA and travels extensively all over the world during the course of his consulting assignments. You can also find Narayan on Linkedin at: http://www.linkedin.com/in/narayan3rdeye

Reblog this post [with Zemanta]

Architecture of a decision-support system

(PuneTech is honored to have Dr. Narayan Venkatasubramanyan, an Optimization Guru and one of the original pioneers in applying Optimization to Supply Chain Management, as our contributor. I had the privilege of working closely with Narayan at i2 Technologies in Dallas for nearly 10 years.

For Dr. Narayan Venkatasubramanyan’s detailed bio, please click here.

This is the second in a series of articles that we will publish once a week for a month. The first one was an ‘overview’ case study of optimization. Click here for the full series.)

this is a follow-up to optimization: a case study. frequent references in this article to details in that article would make this one difficult to read for someone who hasn’t at least skimmed through that.


a layered view of decision-support systems

it is useful to think of a decision-support system as consisting of 4 distinct layers:

  1. data layer
  2. visibility layer
  3. predictive/simulation layer
  4. optimization layer

the job of the data layer is to capture all the data that is relevant and material to the decision at hand and to ensure that this data is correct, up-to-date, and easily accessible. in our case, this would include master/static data such as the map of the field, the operating characteristics of the helicopter, etc as well as dynamic data such as the requirements for the sortie, ambient conditions (wind, temperature), etc. this may seem rather obvious at first sight but a quick reading of the case study shows that we had to revisit the data layer several times over the course of the development of the solution.

as the name implies, the visibility layer provides visibility into the data in a form that allows a human user to exercise his/her judgment. very often, a decision-support system requires no more than just this layer built on a robust data layer. for example, we could have offered a rather weak form of decision support by automating the capture of dynamic data and presenting to the radio operator all the data (both static and dynamic), suitably filtered to incorporate only parts of the field that are relevant to that sortie. he/she would be left to chart the route of the helicopter on a piece of paper, possibly checking off requirements on the screen as they are satisfied. even though this may seem trivial, it is important to note that most decision-support systems in everyday use are rather lightweight pieces of software that present relevant data to a human user in a filtered, organized form. the human decision-maker takes it from there.

the predictive/simulation layer offers an additional layer of help to the human decision-maker. it has the intelligence to assess the decisions made (tentatively) by the user but offers no active support. for instance, a helicopter scheduling system that offers this level of support would present the radio operator with a screen on which the map of the field and the sortie’s requirements are depicted graphically. through a series of mouse-clicks, the user can decide whom to pick up, where to fly to, whether to refuel, etc. the system supports the user by automatically keeping track of the weight of the payload (passenger+fuel) and warning the user of violations, using the wind direction to compute the rate of fuel burn, warning the user of low-fuel conditions, monitoring whether crews arrive at their workplace on time, etc. in short, the user makes decisions, the system checks constraints and warns of violations, and provides a measure of goodness of the solution. few people acknowledge that much of corporate decision-making is at this level of sophistication. the widespread use of microsoft excel is clear evidence of this.

the optimization layer is the last of the layers. it wrests control from the user and actively recommends decisions. it is obvious that the effectiveness of optimization layer is vitally dependent on the data layer. what is often overlooked is that the acceptance of the optimization layer by the human decision-maker often hinges on their ability to tweak the recommendations in the predictive layer, even if only to reassure themselves that the solution is correct. often, the post-optimization adjustments are indispensable because the human decision-maker knows things that the system does not.

the art (and science) of modeling

the term “decision-support system” may seem a little archaic but i will use it here because my experience with applying optimization has been in the realm of systems that recommend decisions, not ones that execute them. there is always human intervention that takes the form of approval and overrides. generally speaking, this is a necessary step. the system is never all-knowing. as a result, its view of reality is limited, possibly flawed. these limitations and flaws are reflected in its recommendations.

this invites the question: if there are known limitations and flaws in the model, why not fix them?

this is an important question. the answer to this is not nearly as obvious as it may appear.

before we actually construct a model of reality, we must consciously draw a box around that portion of reality that we intend to include in the model. if the box is drawn too broadly, the model will be too complex to be tractable. if the box is drawn too tightly, vital elements of the model are excluded. it is rare to find a decision problem in which we find a perfect compromise, i.e., we are able to draw a box that includes all aspects of the problem without the problem becoming computationally intractable.

unfortunately, it is hard to teach the subtleties of modeling in a classroom. in an academic setting, it is hard to wrestle with the messy job of making seemingly arbitrary choices about what to leave in and what to exclude. therefore, most students of optimization enter the real world with the impression that the process of modeling is quick and easy. on the contrary, it is at this level that most battles are won or lost.

note: the term modeling is going to be unavoidably overloaded in this context. when i speak of models, students of operations research may immediately think in terms of mathematical equations. those models are still a little way down the road. at this point, i’m simply talking about the set of abstract interrelationships that characterize the behaviour of the system. some of these relationships may be too complex to be captured in a mathematical model. as a result, the mathematical model is yet another level removed from reality.

consider our stumbling-and-bumbling approach to modeling the helicopter scheduling problem. we realized that the problem we faced wasn’t quite a text-book case. our initial approach was clearly very narrow. once we drew that box, our idealized world was significantly simpler than the real world. our world was flat. our helicopter never ran out of fuel. the amount of fuel it had was never so much that it compromised its seating capacity. it didn’t care which way the wind was blowing. it didn’t care how hot it was. in short, our model was far removed from reality. we had to incorporate each of these effects, one by one, because their exclusion made the gap between reality and model so large that the decisions recommended by the model were grossly unrealistic.

it could be argued that we were just a bunch of kids who knew nothing about helicopters, so trial-and-error was the only approach to determining the shape of the box we had to draw.

not true! here’s how we could have done it differently:

if you were to examine what we did in the light of the four-layer architecture described above, you’d notice that we really only built two of the four: the data layer and the optimization layer. this is a tremendously risky approach, an approach that has often led to failure in many other contexts. it must be acknowledged that optimization experts are rarely experts in the domain that they are modeling. nevertheless, by bypassing the visibility and predictive layers, we had sealed off our model from the eyes of people who could have told us about the flaws in it.

each iteration of the solution saw us expanding the data layer on which the software was built. in addition to expanding that data layer, we had to enhance the optimization layer to incorporate the rules implicit in the new pieces of data. here are the steps we took:

  1. we added the fuel capacity and consumption rate of each helicopter to the data layer. and modified the search algorithm to “remember” the fuel level and find its way to a fuel stop before the chopper plunged into the arabian sea.
  2. we added the payload limit to the data layer. and further modified search algorithm to “remember” not to pick up too many passengers too soon after refueling or risk plunging into the sea with 12 people on board.
  3. we captured the wind direction in the data layer and modified the computation of the distance matrix used in the optimization layer.
  4. we captured the ambient temperature as well as the relationship between temperature and maximum payload in the data layer. and we further trimmed the options available to the search algorithm.

we could have continued down this path ad infinitum. at each step, our users would have “discovered” yet another constraint for us to include. back in those days, ongc used to charter several different helicopter agencies. i remember one of the radio operator telling me that some companies were sticklers for the rules while others would push things to the limit. as such, a route was feasible or not depending on whether the canadian company showed up or the italian one did! should we have incorporated that too in our model? how is one to know?

this question isn’t merely rhetorical. the incorporation of a predictive/simulation layer puts the human decision-maker in the driver’s seat. if we had had a simulation layer, we would have quickly learned the factors that were relevant and material to the decision-making process. if the system didn’t tell the radio operator which way the wind was blowing, he/she would have immediately complained because it played such a major role in their choice. if the system didn’t tell him/her whether it was the canadian or the italian company and he didn’t ask, we would know it didn’t matter. in the absence of that layer, we merrily rushed into what is technically the most challenging aspect of the solution.

implementing an optimization algorithm is no mean task. it is hugely time-consuming, but that is really the least of the problems. optimization algorithms tend to be brittle in the following sense: a slight change in the model can require a complete rewrite of the algorithm. it is but human that once one builds a complex algorithm, one tends to want the model to remain unchanged. one becomes married to that view of the world. even in the face of mounting evidence that the model is wrong, one tends to hang on. in hindsight, i would say we made a serious mistake by not architecting the system to validate the correctness of the box we had drawn before we rushed ahead to building an optimization algorithm. in other words, if we had built the solution systematically, layer by layer, many of the surprises that caused us to swing wildly between jubilation and depression would have been avoided.

other articles in this series

this article is the second in a series of short explorations related to the application of optimization. i’d like to share what i’ve learned over a career spent largely in the business of applying optimization to real-world problems. interestingly, there is a lot more to practical optimization than models and algorithms. each of the the links below leads to a piece that dwells on one particular aspect.
articles in this series:
optimization: a case study
architecture of a decision-support system (this article)
optimization and organizational readiness for change
optimization: a technical overview

About the author – Dr. Narayan Venkatasubramanyan

Dr. Narayan Venkatasubramanyan has spent over two decades applying a rare combination of quantitative skills, business knowledge, and the ability to think from first principles to real world business problems. He currently consults in several areas including supply chain and health care management. As a Fellow at i2 Technologies, he tackled supply chains problems in areas as diverse as computer assembly, semiconductor manufacturer, consumer goods, steel, and automotive. Prior to that, he worked with several airlines on their aircraft and crew scheduling problems. He topped off his days at IIT-Bombay and IIM-Ahmedabad with a Ph.D. in Operations Research from the University of Wisconsin-Madison.

He is presently based in Dallas, USA and travels extensively all over the world during the course of his consulting assignments. You can also find Narayan on Linkedin at: http://www.linkedin.com/in/narayan3rdeye

Reblog this post [with Zemanta]

Optimization: A case study

(PuneTech is honored to have Dr. Narayan Venkatasubramanyan, an Optimization Guru and one of the original pioneers in applying Optimization to Supply Chain Management, as our contributor. I had the privilege of working closely with Narayan at i2 Technologies in Dallas for nearly 10 years.

PuneTech has published some introductory articles on Supply Chain Management (SCM) and the optimization & decision support challenges involved in various real world SCM problems. Who better to write about this area in further depth than Narayan!

For Dr. Narayan Venkatasubramanyan’s detailed bio, please click here.

This is the first in a series of articles that we will publish once a week for a month. For the full series of articles, click here.)

the following entry was prompted by a request for an article on the topic of “optimization” for publication in punetech.com, a website co-founded by amit paranjape, a friend and former colleague. for reasons that may have something to do with the fact that i’ve made a living for a couple of decades as a practitioner of that dark art known as optimization, he felt that i was best qualified to write about the subject for an audience that was technically savvy but not necessarily aware of the application of optimization. it took me a while to overcome my initial reluctance: is there really an audience for this after all, even my daughter feigns disgust every time i bring up the topic of what i do. after some thought, i accepted the challenge as long as i could take a slightly unusual approach to a “technical” topic: i decided to personalize it by rooting in a personal-professional experience. i could then branch off into a variety of different aspects of that experience, some technical, some not so much. read on …

background

the year was 1985. i was fresh out of school, entering the “real” world for the first time. with a bachelors in engineering from IIT-Bombay and a graduate degree in business from IIM-Ahmedabad, and little else, i was primed for success. or disaster. and i was too naive to tell the difference.

for those too young to remember those days, 1985 was early in rajiv gandhi‘s term as prime minister of india. he had come in with an obama-esque message of change. and change meant modernization (he was the first indian politician with a computer terminal situated quite prominently in his office). for a brief while, we believed that india had turned the corner, that the public sector companies in india would reclaim the “commanding heights” of the economy and exercise their power to make india a better place.

CMC was a public sector company that had inherited much of the computer maintenance business in india after IBM was tossed out in 1977. quickly, they broadened well beyond computer maintenance into all things related to computers. that year, they recruited heavily in IIM-A. i was one of an unusually large number of graduates who saw CMC as a good bet.

not too long into my tenure at at CMC, i was invited to meet with an mid-level manager in electronics & telecommunications department of the oil and natural gas commission of india (ONGC). the challenge he posed us was simple: save money by optimizing the utilization of helicopters in the bombay high oilfield.

the problem

the bombay high offshore oilfield, the setting of our story
the bombay high offshore oilfield, the setting of our story

the bombay high oilfield is about 100 miles off the coast of bombay (see map). back then, it was a collection of about 50 oil platforms, divided roughly into two groups, bombay high north and bombay high south.

(on a completely unrelated tangent: while writing this piece, i wandered off into searching for pictures of bombay high. i stumbled upon the work of captain nandu chitnis, ex-navy now ONGC, biker, amateur photographer … who i suspect is a pune native. click here for a few of his pictures that capture the outlandish beauty of an offshore oil field.)

movement of personnel between platforms in each of these groups was managed by a radio operator who was centrally located.

all but three of these platforms were unmanned. this meant that the people who worked on these platforms had to be flown out from the manned platforms every morning and brought back to their base platforms at the end of the day.

at dawn every morning, two helicopters, flew out from the airbase in juhu, in northwestern bombay. meanwhile, the radio operator in each field would get a set of requirements of the form “move m men from platform x to platform y”. these requirements could be qualified by time windows (e.g., need to reach y by 9am, or not available for pick-up until 8:30am) or priority (e.g., as soon as possible). each chopper would arrive at one of the central platforms and gets its instructions for the morning sortie from the radio operator. after doing its rounds for the morning, it would return to the main platform. at lunchtime, it would fly lunchboxes to the crews working at unmanned platforms. for the final sortie of the day, the radio operator would send instructions that would ensure that all the crews are returned safely to their home platforms before the chopper was released to return to bombay for the night.

the challenge for us was to build a computer system that would optimize the use of the helicopter. the requirements were ad hoc, i.e., there was no daily pattern to the movement of men within the field, so the problem was different every day. it was believed that the routes charted by the radio operator were inefficient. given the amount of fuel used in these operations, an improvement of 5% over what they did was sufficient to result in a payback period of 4-6 months for our project.

this was my first exposure to the real world of optimization. a colleague of mine — another IIM-A graduate and i — threw ourselves at this problem. later, we were joined yet another guy, an immensely bright guy who could make the lowly IBM PC-XT — remember, this was the state-of-the-art at that time — do unimaginable things. i couldn’t have asked to be a member of a team that was better suited to this job.

the solution

we collected all the static data that we thought we would need. we got the latitude and longitude of the on-shore base and of each platform (degrees, minutes, and seconds) and computed the distance between every pair of points on our map (i think we even briefly flirted with the idea of correcting for the curvature of the earth but decided against it, perhaps one of the few wise moves we made). we got the capacity (number of seats) and cruising speed of each of the helicopters.

we collected a lot of sample data of actual requirements and the routes that were flown.

we debated the mathematical formulation of the problem at length. we quickly realized that this was far harder than the classical “traveling salesman problem”. in that problem, you are given a set of points on a map and asked to find the shortest tour that starts at any city and touches every other city exactly once before returning to the starting point. in our problem, the “salesman” would pick and/or drop off passengers at each stop. the number he could pick up was constrained, so this meant that he could be forced to visit a city more than once. the TSP is known to be a “hard” problem, i.e., the time it takes to solve it grows very rapidly as you increase the number of cities in the problem. nevertheless, we forged ahead. i’m not sure if we actually completed the formulation of an integer programming problem but, even before we did, we came to the conclusion that this was too hard of a problem to be solved as an integer program on a first-generation desktop computer.

instead, we designed and implemented a search algorithm that would apply some rules to quickly generate good routes and then proceed to search for better routes. we no longer had a guarantee of optimality but we figured we were smart enough to direct our search well and make it quick. we tested our algorithm against the test cases we’d selected and discovered that we were beating the radio operators quite handily.

then came the moment we’d been waiting for: we finally met the radio operators.

they looked at the routes our program was generating. and then came the first complaint. “your routes are not accounting for refueling!”, they said. no one had told us that the sorties were long enough that you could run out of fuel halfway, so we had not been monitoring that at all!

Dhruv
ONGC’s HAL Dhruv Helicopters on sorties off the Mumbai coast. Image by Premshree Pillai via Flickr

so we went back to the drawing board. we now added a new dimension to the search algorithm: it had to keep track of fuel and, if it was running low on fuel during the sortie, direct the chopper to one of the few fuel bases. this meant that some of the routes that we had generated in the first attempt were no longer feasible. we weren’t beating the radio operators quite as easily as before.

we went back to the users. they took another look at our routes. and then came their next complaint: “you’ve got more than 7 people on board after refueling!”, they said. “but it’s a 12-seater!”, we argued. it turns out they had a point: these choppers had a large fuel tank, so once they topped up the tank — as they always do when they stop to refuel — they were too heavy to take a full complement of passengers. this meant that the capacity of the chopper was two-dimensional: seats and weight. on a full tank, weight was the binding constraint. as the fuel burned off, the weight constraint eased; beyond a certain point, the number of seats became the binding constraint.

we trooped back to the drawing board. “we can do this!”, we said to ourselves. and we did. remember, we were young and smart. and too stupid to see where all this was going.

in our next iteration, the computer-generated routes were coming closer and closer to the user-generated ones. mind you, we were still beating them on an average but our payback period was slowly growing.

we went back to the users with our latest and greatest solution. they looked at it. and they asked: “which way is the wind blowing?” by then, we knew not to ask “why do you care?” it turns out that helicopters always land and take-off into the wind. for instance, if the chopper was flying from x to y and the wind was blowing from y to x, the setting was perfect. the chopper would take off from x in the direction of y and make a bee-line for y. on the other hand, if the wind was also blowing from x to y, it would take off in a direction away from y, do a 180-degree turn, fly toward and past y, do yet another 180-degree turn, and land. given that, it made sense to keep the chopper generally flying a long string of short hops into the wind. when it could go no further because they fuel was running low or it needed to go no further in that direction because there were no passengers on board headed that way, then and only then, did it make sense to turn around and make a long hop back.

“bloody asymmetric distance matrix!”, we mumbled to ourselves. by then, we were beaten and bloodied but unbowed. we were determined to optimize these chopper routes, come hell or high water!

so back we went to our desks. we modified the search algorithm yet another time. by now, the code had grown so long that our program broke the limits of the editor in turbo pascal. but we soldiered on. finally, we had all of our users’ requirements coded into the algorithm.

or so we thought. we weren’t in the least bit surprised when, after looking at our latest output, they asked “was this in summer?”. we had now grown accustomed to this. they explained to us that the maximum payload of a chopper is a function of ambient temperature. on the hottest days of summer, choppers have to fly light. on a full tank, a 12-seater may now only accommodate 6 passengers. we were ready to give up. but not yet. back we went to our drawing board. and we went to the field one last time.

in some cases, we found that the radio operators were doing better than the computer. in some cases, we beat them. i can’t say no creative accounting was involved but we did manage to eke out a few percentage point of improvement over the manually generated routes.

epilogue

you’d think we’d won this battle of attrition. we’d shown that we could accommodate all of their requirements. we’d proved that we could do better than the radio operators. we’d taken our machine to the radio operators cabin on the platform and installed it there.

we didn’t realize that the final chapter hadn’t been written. a few weeks after we’d declared success, i got a call from ONGC. apparently, the system wasn’t working. no details were provided.

i flew out to the platform. i sat with the radio operator as he grudgingly input the requirements into the computer. he read off the output from the screen and proceeded with this job. after the morning sortie was done, i retired to the lounge, glad that my work was done.

a little before lunchtime, i got a call from the radio operator. “the system isn’t working!”, he said. i went back to his cabin. and discovered that he was right. it is not that our code had crashed. the system wouldn’t boot. when you turned on the machine, all you got was a lone blinking cursor on the top left corner of the screen. apparently, there was some kind of catastrophic hardware failure. in a moment of uncommon inspiration, i decided to open the box. i fiddled around with the cards and connectors, closed the box, and fired it up again. and it worked!

it turned out that the radio operator’s cabin was sitting right atop the industrial-strength laundry room of the platform. every time they turned on the laundry, everything in the radio room would vibrate. there was a pretty good chance that our PC would regress to a comatose state every time they did the laundry. i then realized that this was a hopeless situation. can i really blame a user for rejecting a system that was prone to frequent and total failures?

other articles in this series

this blog entry is intended to set the stage for a series of short explorations related to the application of optimization. i’d like to share what i’ve learned over a career spent largely in the business of applying optimization to real-world problems. interestingly, there is a lot more to practical optimization than models and algorithms. each of the the links below leads to a piece that dwells on one particular aspect.

optimization: a case study (this article)
architecture of a decision-support system
optimization and organizational readiness for change
optimization: a technical overview

About the author – Dr. Narayan Venkatasubramanyan

Dr. Narayan Venkatasubramanyan has spent over two decades applying a rare combination of quantitative skills, business knowledge, and the ability to think from first principles to real world business problems. He currently consults in several areas including supply chain and health care management. As a Fellow at i2 Technologies, he tackled supply chains problems in areas as diverse as computer assembly, semiconductor manufacturer, consumer goods, steel, and automotive. Prior to that, he worked with several airlines on their aircraft and crew scheduling problems. He topped off his days at IIT-Bombay and IIM-Ahmedabad with a Ph.D. in Operations Research from the University of Wisconsin-Madison.

He is presently based in Dallas, USA and travels extensively all over the world during the course of his consulting assignments. You can also find Narayan on Linkedin at: http://www.linkedin.com/in/narayan3rdeye

Reblog this post [with Zemanta]

Beyond keyword search: Adding Findability to your information

(PuneTech reader Titash Neogi has been in the information architecture domain for many years, and is passionate about making information more accessible to people. In recent times, he has been studying the problem of the difficulties of finding information within large enterprises, and his thoughts on the approaches for solving these problems. In this article, he gives us an overview of the concept of “findability” of information in an enterprise.)

One of the greatest human needs that have evolved in the 21st century is the need to know as much as possible about something before making a decision. While human beings have forever been driven towards learning and knowing more, in the last decade technology has added a new dimension to this.

There was a time when we could take the word of our neighbour, colleague, friend or the man at the grocery shop and reach a decision. Since information was only available in finite ways and in finite volumes, there was not much competitive edge or wide reaching impact.

The Internet has changed all of that. Today, the fear that there’s knowledge out there that could be relevant to our decisions, and that we are not using it and getting a lesser deal, haunts us all.

And this need in turn has fuelled the growth of information, made it more complex to deal with and more voluminous in size. Pick up any topic and you would find thousands of pages on the internet related to that topic. There are facts, figures, opinions, comments, user reviews. Information, unlike wealth, has grown directly in proportion to its usage.

This information fire hose impacts both individuals and enterprises. While individuals crave to know as much as possible before committing to something, enterprises find their customers more demanding or their competition more informed.

Staying on top of this complex, voluminous information tidal wave has become crucial for survival. As a response to this, search companies have sprung up, with Google in the lead. For about a decade now, search engines of all sorts are battling it out with terabytes of content on the Internet.

However, the Internet (and other networks within or without the enterprise) are moving from being information stores to knowledge networks. As the volume and complexity of knowledge grows, search as we know it today, is becoming inadequate. Search companies are losing ground fast.

Search as a tool is fine for information stores, but poor for knowledge bases. Knowledge bases need to have findability. Search is used when you know exactly what you are looking for, and are trying to figure out where it is. Findability is when you have only a vague idea of what you want to achieve, and you rely on the knowledge base to guide you into the direction of finding more and more information that is useful and relevant to you.

Understanding Findability

Findability could mean different things to different people or even to the same person at different times. Peter Morville, credited to be the guy who coined the term findability, defines it as:

The quality of being locatable or navigable. At the item level, we can evaluate to what degree a particular object is easy to discover or locate. At the system level, we can analyze how well a physical or digital environment supports navigation and retrieval.

While, a lot of people confuse findability with search, the two are really not the same thing. Search tries to solve the problem of locating information that you already know exists somewhere in a corpus.

Findability encompasses search, but also deals with the problem of how to make the searcher aware of other relevant information, that they didn’t know existed in the corpus. Findability exposes the knowledge within a corpus.

For example, when you are looking for a home loan rate from IDBI bank, it’s a search problem. You want to locate the document/URL that talks about the interest rates of IDBI. However, if you were someone very new to the entire banking/housing scenario and you didn’t know the names of any banks in India, or what loans they offered, you are dealing with a findability problem. While solving this problem will definitely involve search, but you can immediately see that it also involves a lot of information engineering, semantic modelling and usability engineering.

So in a sense, Findability is the big daddy of search. The example above might sound very impractical, but you can easily abstract it and see that it applies to a lot of scenarios in different ways. The future is about solving findability problems.

While Findability concerns all of us in our everyday life, it poses some interesting challenges for modern enterprises. This article will try to focus on findability issues in the enterprise and some pointers at how to solve them.

Why is findability important for organizations?

Internally, as organizations grow bigger in size and complexity and short on budgets and time, no one can afford to waste energy and resources in duplicating the knowledge that already exists in the organization. Smart companies will leverage the knowledge within their workforce and beat their competition. Enterprise 2.0 is about knowledge competition – do we know what we know, and can we use that in new ways.

Externally, as product complexity increases and newer offerings come up, it is going to be a crucial challenge for a company to communicate with its customers and let them know the range of product offerings that they have. An effective findability solution allows customers to automatically explore new products and solutions that might have come up, and might be more relevant for them.

The four-headed monster

To outline a few typical findability problems in an enterprise:

Brute Force findability: This is the most elementary form of findability problem, and can be simply classified as search. It’s like performing a grep over the content base with a particular pattern.

“I remember that the document contained the word log = 2 in its text”

Today’s search engines are very good at solving these problems. In fact this form of search has evolved simply because of the inability of search engines to understand natural language, leading users to rely on grep techniques to find documents/results in the quickest possible manner.

However, as organizations grow and we move from data to knowledge, this form of search will increasingly become impossible to scale or use.

Knowledge findability: This is the next level of findability and comprises the most common “how do I” or “what is” kind of questions.

“How do I know if I really need to move my web application from struts 1 to struts 2?”

“What kind of data backup product should I buy if I am a SOHO with a limited budget?”

Index based search engines are trying various complex algorithms to solve these findability problems.

The last few years have seen a lot of semantic search solutions trying to tackle this problem, by performing semantic analysis of content and indexing them based on meaning rather than sheer word statistics.

People/expertise findability: A lot of times, we find people asking for in house experts

“Anyone who has worked on technology X + platform Y in the organisation”

Typically today, this is handled by Word of Mouth or grapevine, which not only becomes impossible to scale in a cross geographic organisation, but is also inefficient and limited in scope.

A semantic analysis engine, plugged into a HRMS DB or an organisation’s intranet can very effectively solve this problem. An index-based search in a similar scenario is likely to pull up a lot of noise and irrelevant results, rather than solving this problem.

Social findability: Findability that relates to knowledge implicit in the community.

“What do all Linux newbie’s read when they join the organisation”

“What’s the best starting point for understanding deployment of Product X – the product guide or the support technote?”

No semantic or index based search can ever completely fill this gap. A good approach to solving this problem would be to marry a social-tagging system such as de.li.ci.ous or digg and a semantic analysis engine. The Findability solution would need to work as a facilitator that allows people to share their personal experiences and knowledge around a product and build a knowledge community.

What becomes obvious from this discussion is that findability is not a single technology or solution that can be purchased over the counter, deployed and then expected to perform wonders after couple hours of crawling or indexing. This is just plain vanilla search. Search can provide results to queries but not necessarily answers to questions. A good search engine can make your content searchable, but it does nothing to solve your findability problem.

The mistake that most organizations make today is to deploy a million dollar search engine and then expect it to solve a problem it was never designed to solve – a findability problem.

Solving the Findability problem

A good findability platform needs to bring together expertise and lessons learnt from the fields of Semantic search, Usability, Information Architecture, Graphic Design and Text Engineering. And above all, people need to understand that a findability solution can never be a “one size fits all” solution. It can never be an appliance that you can deploy over your networks and forget about.

Think of findability solution as an ERP solution. It needs to have various modules that can understand and talk to different information stores in the organisation. The first step in solving an organizations findability problem is to analyze its findability need and then deploying or developing all or some specific modules of that solution. Also important is the right combination of content strategy, user query analysis, and search and interface design.

Who’s working on Findability?

There are a lot of people trying to take their stab at findability. Solutions and products range from sophisticated semantic search applications to information architecture consulting firms. However, in the limited scope of this article, I would like to touch upon few companies/individuals who stand out in their attempts to solve the findability problem.

First, Peter Morville at Semantic Studios is doing ground breaking work in this area. The stuff he puts up on his site (www.findability.org) is pretty exciting and educating. There are also a few start-ups in this space, but I would like to mention connectbeam (www.connectbeam.com), a bay area based start-up that caught my notice. They are trying to solve the social findability problem within the enterprise and I found their approach very unique.

Finally

I am old school and I like to conclude my articles with something to think about. In a global, recession ridden economy, findability affects your bottom-line one way or another. To quote Peter Morville,

“You can’t use, what you can’t find” (and neither can your customers)

About the author – Titash Neogi

Titash Neogi is working with Symantec Corp (formerly VERITAS India) for past six years at various profiles in customer support, knowledge management and content management divisions. At present he is the architect and lead developer for Symantec’s new semantic search based help system initiative.

Reblog this post [with Zemanta]

Internet Traffic Tracking and Measurement

comScore Search Ratings, Dec. 2005-2006, Live,...
Image by dannysullivan via Flickr

(As the web upgrades to web-2.0, it becomes a difficult challenge to figure out the value of companies that are serving this market. Since most web-2.0 companies are in an early stage of their evolution, they can’t be measured on the basis of the revenues they are earning. Instead, one needs to guess at the future earnings based on measuring the thing that they’ve currently managed capture – i.e. the number and demographics of visitors, and the amount of attention they are paying to the site. Pune-based entrepreneur Vibhushan Waghmare, who has co-founded a marketing analytics startup, MQuotient, gives us an overview of this space, points out some problems, and wonders if there is an opportunity for some entrepreneur to step in and provide solutions.)

Introduction

A good product or service will always attract appreciation and success, but what will make it stand out from crowd and fetch the premium is knowledge of exactly how good it is from the rest. Qualitative strategy decisions are important to set the direction, but real numbers and insights from these numbers are required to actually know how fast/slow is one moving in that direction and how far it is from the target.

This applies to online internet businesses as well. As against the established brick-and-mortar businesses which are driven primarily by monetary profitability, evolving online businesses have been searching for the parameters to judge and measure the success or failure of the business.

A few days before the Dot Com bust (of early 2000s) happened, we had seen how internet companies’ valuation shot off the roof based on parameters like eyeballs they generated – and hypothetically – could be monetized. We had ExciteAtHome paying $780 million for BlueMountain.com, an online greeting cards company with 11 million monthly visitors and negligible revenues (which was sold to American Greetings after 2 years for just $35 million!). Back then, page-hits on the servers was what each site measured and investors bought into.

Today we are into web2.0 world, and parameters for measuring success of online business have also evolved to 2.0 version. Before the Lehman Brothers folded up their shop, we had valuations of these socionets soaring to astronomical levels – all based on the unique users they can generate. Page-hits have given way to page-views per unique user, and now we talk about more evolved and derived parameters like unique users visiting the site and time spent by each unique user on the website. With the ghosts of Dot Com bust not yet laid to rest, investors and entrepreneurs are much more cautious and are becoming scientific in tracking and measuring the internet traffic. Still every now and then, we keep getting news about socionets with their latest 2.0 apps being chased by good money because of their platform of involved users, although all that they do there is poke each other and take up a challenge of some random quiz. We all know the problems giants like Google are facing when it comes to monetizing a socionet like Orkut.com. Economists have predicted 8 of the last 5 economic meltdowns, and I don’t want to sound like one. I just want to point out to you the issues faced by online businesses today.

User Panel based traffic estimation

I was reminded of these measurement arguments when last weekend I attended an interesting talk organized by Pune OpenCoffee Club. We had owner of a reputed online gaming portal talking about the kind of traffic his games attract. He used comScore extensively to compare himself in the online gaming world and stated that getting into the top 5 of the comScore list of online gaming sites worldwide is the target he has set for himself. (I don’t know whether the list was of page views or unique users these gaming sites are generating). Definitely a great target to chase!

Image representing comScore as depicted in Cru...
Image via CrunchBase

While comScore does provide an elaborate analysis of the website traffic and is considered a standard worldwide, before we set our business targets based on it, we need to understand the methodology used for this tracking. comScore has a panel of around 2 million internet users worldwide (16,000 in India) and these users install monitoring software from comScore on their computers. This monitoring software is used to determine which websites are being visited by these users, and how much time they are spending on each site. comScore then uses extensive statistical methods to extrapolate these numbers to the behaviour of all the users (not just comScore’s user panel). More details on methodology are here). They have elaborate analysis like time spent by each user, IP tracking, repeating users, incoming and outgoing traffic and many more such details.

But what needs to be noticed is the fact that comScore excludes traffic from cyber-cafes and users under age 15. For India, I am sure that is a sizable mass of internet users. And when it comes to activities like online gaming, I am afraid, absolute numbers shown by comScore might be drastically away from the reality. Cyber cafe still remains an important point of access for Indians and excluding this traffic can result in misleading conclusions. Internet is being taught in schools and at least in cities, school kids are using internet extensively for both studies and entertainment purposes. In such situation, excluding users under age 15 might not always provide the best traffic numbers, especially for activity like online gaming.

Image representing Alexa as depicted in CrunchBase
Image via CrunchBase

The other good bet in terms of tracking online traffic is Alexa.com. Alexa again, is a panel research based on the information gathered through a browser toolbar that their panel of users download and install in their browser. However in over more than a decade that I have spent on internet, I have not seen a single browser with Alexa toolbar. Apart from the high-end users of internet, I wonder if an average internet user would actually go to www.alexa.com and download and install their toolbar.

There are some other tracking and measurement services available, but mostly it has been Alexa and comScore who are quoted for such purposes.

One can argue that both comScore and Alexa work based on a random sample and hence same error in reporting would appear across traffic measurement for all sites. Given this, Alexa and comScore can be reliably used to compare two internet destinations or to detect any deviation from normal trend. However for absolute numbers, I guess there is lot more needed to be done.

For developed countries where most of the traffic originates from home, school or offices and very less from cyber cafes, these numbers might work, but for India with its huge cyber cafe traffic, I guess a more extensive tracking system is required. Cyber cafes continue to be important point of access, often the only access point in tier II and III cities. I have seen young school kids flocking these cyber cafes which serve more as gaming parlours; parents creating matrimony profiles of their children with the help of assistant (generally the owner) at the cyber cafe; and young college students playing pranks on their friends through Orkut and also getting their first experience to mature content over internet. comScore is missing this traffic by excluding cyber cafes.

Although this traffic might not be very huge in terms of absolute numbers, general observation is that these new users of internet (who learn how to use internet in cyber cafes) are more likely to click on ads as online behaviour has not matured to differentiate an online advertisement from a genuine article. I once saw a school kid trying to fill up a life insurance form because the advertisement offered some lucky draw prize on filling the form (Of course he never completed the form for the lack of PAN number :-)). This audience would be of the least interest to all the online advertisers and brands since they hardly convert into any transaction; however these would be the guys who would most likely click on all those online advertisements and hence form important part of the online advertisement industry.

Is there an entrepreneurship opportunity here

I am sure that all hosting servers do have the exact numbers about traffic coming to them, however key is in profiling this traffic and consolidating and analysing this information into useful insights. Quite often websites who try to track and measure their traffic resort to putting javascripts on their pages for this purpose. This adds to page-weight and slows down the site, a significant problem in country like India where high-speed broadband is still a luxury. These efforts give reasonable tracking and measurement of traffic from server side alone. However to prove the worthiness of traffic generated by website, system needs to track the demographic details of this traffic. System should provide information about the age, education profile, income level and other details in which advertisers and investors would be interested. Of course proxy variables need to be used for this tracking along with all the principles of market research and with due care of privacy of the user. Also the system should be encompassing enough to take care of diversity in internet usage as we see in India and in developed western countries and also in non-English speaking countries.

Creating such a tracking and measurement system for India would need investment, and given the current level of online advertisement spends in the country it needs to be analyzed whether this investment is justified.
Do you guys see an entrepreneurship opportunity in this?

About the author – Vibhushan Waghmare

Vibhushan is a co-founder of MQuotient, a Pune-based startup that uses cutting-edge quantitative analytics and mathematical modeling to build software products for marketing analytics, and in general deliver solutions for enterprise marketing challenges. Before co-founding MQuotient, Vibhushan was managing the Search product at Yahoo! India. He is an MBA from IIM Ahmedabad and an Electrical Engineer from REC, Nagpur. He has also held positions with Amdocs & Cognizant Technology Solutions. Check out his blog, his linked-in page, or his twitter page for more about him.

Reblog this post [with Zemanta]