Tag Archives: csi

Overview of Business Intelligence and Data Warehousing

I am liveblogging the CSI Pune lecture on Business Intelligence and Data Warehousing. These are quick-n-dirty notes, so please forgive the uneven flow and typos. This page is being updated every few minutes.

There’s a large turnout – over 100 people here.

Business Intelligence is an area that covers a number of different technologies for gathering, storing, analyzing and providing access to data that will help an large company make better business decisions. Includes decision support systems (i.e. databases that run complex queries (as opposed to databases that run simple transactions)), online analytical processing (OLAP), statistical analysis, forecasting and data mining. This is a huge market, with major players like Microsoft, Cognos, IBM, SAS, Business Objects, SPSS in the fray.

What kind of decisions does this help you with? How to cut costs. Better understanding of customers (which ones are credit worthy? which one are at most risk of switching to a competitor’s product?) Better planning of flow of goods or information in the enterprise.

This is not easy because amount of data is exploding. There’s too much data. Humans can’t make sense of all of them.

To manage this kind of information you need a big storage platform and a systematic way of storing all the information and being able to analyze the data (with the aforementioned complex queries). Collect together data from different sources in the enterprise. Pull from various production servers and stick it into an offline, big, fat database. This is called a data warehouse.

The data needs to be cleaned up quite a lot before it is usable. Inconsistencies between data from different data sources. Duplicates. Mis-matches. If you are combining all the data into one big database, it needs to be consistent and without duplicates. Then you start analyzing the data. Either with a human doing various reports and queries (OLAP), or the computer automatically finding interesting patterns (data mining).

Business Intelligence is an application that sits on top of the Data Warehouse.

Lots of difficult problems to be solved.

Many different data sources: flat files, CSVs, legacy systems, transactional databases. Need to pick updates from all these sources on a regular basis. How to do this incrementally and efficiently?  How often – daily, weekly, monthly? Parallelized loading for speed. How to do this without slowing down the production system. Might have to do this during a small window at night. So now you have to ensure that the loading finishes in the given time window.

This is the first lecture of a 6-lecture series. Next lectures will be Business Applications of BI. This will give an idea of which industries benefit from BI – specific examples: e.g. banking for assessing credit risk, fraud, etc. Then Data Management for BI. Various issues in handling large volumes of data; data quality, transformation and loading. These are huge issues, and need to be handled very carefully, to ensure that the performance remains acceptable in spite of the huge volumes. Next lecture is technology trends in BI. Where is this technology going in the future. Then one lecture on role of statistical techniques in BI. You’ll need a bit of a statistical background to appreciate this lecture. And final session on careers in BI. For detailed schedule and other info of this series, see the Pune Tech events calendar, which is the most comprehensive source of tech events info for Pune.

SAS R&D India works on Business Applications of BI (5 specific verticals like banking), on Data management, on some of the solutions. A little of the analytics – forecasting. Not working on core analytics – that is only at HQ.

We are trying to get the slides used in this talk from the speaker. Hopefully in a few days. Please check back by Monday.

Reblog this post [with Zemanta]

CSI Pune Lecture: Overview of Business Intelligence and Data Warehousing – 27 Aug 2008

Computer Society of India – Pune Chapter presents a lecture series on Data warehousing. This is the first lecture in that series:

What: Overview of Business Intelligence & Data warehousing by Vibhas Joshi, head of R&D at SAS R&D India.

When: Wednesday, August 27th, 2008, 6:30pm to 8:30pm
Where: Dewang Mehta Auditorium, Persistent Systems, Senapati Bapat Road
Entry: Free for CSI Members, Rs. 100 for others. Register here.

Details – Overview of BI & Data warehousing

Concepts of data warehouse, data marts, OLAP and data mining, understand relationship between transactional systems and data warehouse.

About the Speaker – Vibhas Joshi

Vibhas is with SAS R&D india as Head R&D , Program Manager – Industry Intelligence solutions, Member of Management Team.
Vibhas holds a Masters degree in Physics from the University of Pune, a Diploma in Computer Management from Jamnalal Bajaj Institute as well as a Masters in Management Studies from University of Pune. He is certified PMP.

He has over 25  years of experience in the IT. He has special skills in General Management, Program Management, Project Management, Software Product Development, Requirement Engineering, Database Management, Software Development Methodologies, and Infrastructure set-up.

Vibhas has conducted numerous training programs covering Project Management, Requirement Management and Software Engineering.

Vibhas in the course of his assignments has worked in the following business domains: Banking, Financial Services, Insurance, Manufacturing, Telecom.

For more information about other lectures in this series, and in general other tech events in Pune, see the tech events calendar at upcoming.

Reblog this post [with Zemanta]

CSI-Pune’s ILM Seminar – A Report

CSI-Pune conducted a half-day workshop on Information Life-cycle management. T.M. Ravi, founder and CEO of Mimosa Systems gave the keynote presentation. There were product/project pitches from IBM, Zmanda, Coriolis. A talk on storage trends by Abhinav Jawadekar. Finally a panel discussion with representation from Symantec (V. Ganesh), BMC (Bladelogic; Monish Darda), Zmanda (K K George), IBM, Symphony (Surya Narayanan), and nFactorial (Hemant Joshi).

Here are my cryptic notes from the conference:

  • T.M. Ravi, CEO of Mimosa, gave talk on what he sees as challenges in storage/ILM. New requirements coming from the customers – Huge amounts of user-generated unstructured data in enterprises. Must manage it properly for legal, security and business reasons. Interesting new trends coming from the technology side – new/cheap disks. De-duplication. Storage intensive apps (eg. video). Flash storage. Green storage (i.e. energy conscious storage). SaaS and storage in the cloud (e.g. Amazon S3). Based on this, storage software should focus on these things: 1. Increase Information content of data 2. Improve security. 3. Reduce legal risk. Now he segues into a pitch for Mimosa’s products. i.e. You must have an enterprise-wide archive: 1. continuous capture (i.e. store all versions of the data). 2. Full text indexing of all the content and allow users to search by keyword. 3. Single instance storage (SIS) aka De-duplication, to reduce the storage requirements. 4. Retention policies. Mimosa is an archiving appliance that can be used for 1. ediscovery, 2. recovery, 3. end-user searches, 4. storage cost reduction.
  • Then there was a presentation from IBM on General Parallel File System (GPFS). Parallel, highly available distributed file system. I did not really understand how this is significantly different from all the other such products already out there. Also, I am not sure what part of this work is being done in Pune. Caching of files over WAN in GPFS (to improve performance when it is being accessed from a remote location) is being developed here (Ujjwal Lanjewar).
  • There was also a presentation on the SAN simulator tool. This is something that allows you to simulate a storage area network, including switches and disk arrays. It has been open-sourced and can be downloaded here. A lot of the work for this tool happens in Pune (Pallavi Galgali).
  • KKG from Zmanda demonstrating recovery manager for MySQL. This whole product has been architected and developed in Pune
  • Bernali from Coriolis demonstrated CoLaMa – a virtual machine lifecycle manager a virtual machine lifecycle manager. This is essentially CVS for virtual machine images. A version management software to keep track of all your VM images. Check out image. Work on it. Check it in. A new version gets stored in the repository. And it only stores the differences between the image – so space savings. It auto-extracts info like OS info, patchlevel etc.
  • Coriolis’ was the only live demo. The others were flash demos which looked lame (and had audio problems). Suggestion to all – if you are going to give a flash demo, at least turn off the audio and do the talking yourself. This would involve the audience much better.
  • Abhinav Jawadekar gave nice introductory talk on the various interesting technologies and trends in storage. It would have been very useful and helpful for someone new to the field. However, in this case, I think it was wasted on audience most of who’ve been doing this for 5+ years. The only new stuff was in the last few slides that were about energy aware storage (aka green storage). (For example, he pointed out that data-center class storage in Pune is very expensive due to the high storage costs – due to power, cooling, UPS, genset, the operating costs of a 42U rack are $800 to $900 per month.)
  • The panel discussion touched upon a number of topics, not all of them interesting. I did not really capture notes of that.

Overall, it was an interesting evening. With about 50 people attending, the turnout was a little lower than I expected. I’m not sure what needs to be done in Pune to get people to attend. If you have suggestions, let me know. If you are interested in getting in touch with any of the people mentioned above, let me know, and I can connect you.

CSI-Pune Seminar on Information Lifecycle Management – 29 May

What: Seminar on Information Lifecycle Management. ILM consists all the technologies required during the lifetime of some data stored in an enterprise. How data comes in, where it is stored, the storage hardware/software and architecture, how it is archived and backed up, retention policies, and deletion policies.

When: Thursday, 29 May 2008, 4pm to 9pm

Where: National Insurance Academy 25,Balewadi, Baner Road

Fees: Rs. 400 for CSI members, Rs. 500 for others

Registration: register online

Detailed Program:

3:30 pm – 4:00 pm : Registration

4:00 pm – 4:15 pm : Inauguration and release of CSI Newsletter

4:15 pm – 5:15 pm : Keynote address – T. M. Ravi (President and CEO Mimosa Systems)

5:15 pm – 5:45 pm : Tea Break

5:45 pm – 6:45 pm
Demonstration of products – IBM, Zmanda Technologies, Coriolis

6:45 pm – 7:30 pm
Technical talk – Most promising new technologies in Storage and ILM space by by Abhinav Jawadekar – Founder, Sound Paradigm, Software Engineering Services.

7:30 pm – 8:15 pm
Panel discussion – Technology Trends in Storage and its correlation with Career Opportunities . Panelists are Surya Narayanan (Symphony), K K George (Zmanda Technologies), Monish Darda (Bladelogic), Bhushan Pandit (Nes technologies), V. Ganesh (Symantec). Moderated by Hemant Joshi (nFactorial Software)

8:15 pm onwards: Dinner

The event is open to everybody, but you have to register online. Fees are Rs. 400 for members, Rs. 500 for non-members.

LifeScience and Healthcare Informatics talk and panel discussion – 21 May

Computer Society of India – Pune Chapter presents:

What: Use of Informatics in LifeSciences and Healthcare; Talk by Abhi Gholap, President and CEO, Optra Systems; Panel discussion with participants from UoP, Jubilant BioSys, TCS, etc.

When: Wednesday, May 21st, 2008, 6:30pm to 8:30pm
Where: Damle Hall, behind Indsearch, off Law College Road
Entry: Free for CSI Members, Rs. 100 for others

Talk by Abhi Gholap, President and CEO, Optra Systems

Life science and healthcare industry is witnessing innovations and advancements in instrumentation in recent years. This has resulted into increase in the amount of image data overtime. More than 67% of life science & healthcare data is in the form of Images. Image acquisition, Image analysis, Image management, Regulatory compliances, Image collaboration are some of its dimensions, which need verticalization for Genomics, Proteomics, Pharmacogenomics, Toxicology, Preclinical and Clinical Trials and Patient care. With over half a million bio researchers and 6-7 years long drug discovery and development lifecycle, total available Imaging tool and informatics market is estimated to be at $5 bn with 11.9% annual growth rate.

About the Speaker

Abhijeet, a seasoned technopreneur, has over 14 years of experience in BioMedical and Life Science Imaging software industry. He started his career with Imaging Group at TCS. Later he worked for leading Imaging companies like CedarTech, Siemens Medical Systems and ByzanTech solutions in United States . In 2002, he founded a digital pathology product company in Silicon Valley , California . After successful exit in 2006, he founded Optra Systems, a solutions and services company with complete focus on Life Science and Healthcare informatics.

Abhi earned his M.Tech. with specialization in BioMedical Imaging from IIT Mumbai , India . Abhi is known as thought leader in Optical Imaging software applications while actively working as member of many institutions. He was recently felicitated by Rotary Club as recipient of “Vocational Excellence Award” towards his research in BioMedical Imaging. He is an active member of Pune Vyaspeeth and other social forums. He is credited with several research publications in addition to 11 patents.

Panel Discussion

Dr. Deepti Deobagkar, University of Pune

Dr. Prashant Naik, Head Informatics, Jubilant Biosys, Bangalore

Vishal Katariya, LLM(US), LLM ( UK ), IPR Management

Dr. Raj Srinivasan, Head, Biosciences, TCS

To Register for the seminar please click HERE