TechWeekend #3: Website Performance, Scalability and Availability: Sept 5

Scalability (Source: Domas Mituzas, Wikipedia)
Click on the image to see other PuneTech articles on Scalability (Image Source: Domas Mituzas, Wikipedia)
What: TechWeekend featuring “Website Scalability and Performance” by Mukul Kumar, VP Engineering at Pubmatic, and “Website Availability and Recovering from Failures and Disasters” by Sameer Anja, Associate Director at KPMG
When: Saturday, 5th Sept, 4pm
Where: Symbiosis Institute of Computer Studies and Research, Atur Centre, Model Colony. Map.
Registration and Fees: This event is free for all to attend. Please register here.

Website Scalability and Performance – Mukul Kumar

Mukul will talk about the various aspects of what it takes to run a very high traffic website – something that he has a lot of experience with at Pubmatic, the ad optimization service for web publishers, where they serve over a billion requests per month.

Mukul Kumar (mukul.kumar [at] pubmatic [dot] com) is a Co-Founder and VP of Engineering at Pubmatic, and Mukul is responsible for PubMatic’s engineering team and resides in Pune, India. Mukul was previously the Director of Engineering at PANTA Systems, a high performance computing startup. Previous to that he joined VERITAS India as the 13th employee and helped it grow to over 2,000 individuals as Director of Engineering for the NetBackup group, Veritas’ main product. He has filed for 14 patents in systems software, storage software, and application software and proudly proclaims his love of Π and can recite it to 60 digits. Mukul is a graduate of IIT Kharagpur with a degree in electrical engineering.

Website Availability and Recovery from Disasters – Sameer Anja

While everyone looks at security and focuses on confidentiality, privacy and integrity; an oft neglected parameter is of availability. While “neglected” may be seem like a strong term, the truth is that we overlook basic data on availability and do not even implement simple to-dos which would help in remediating the situation. The session is aimed at identifying such simple remedies, look at impacts, the assessment model and put forward various scenarios and possible solutions available. The session does not focus on specific products and instead endeavours to use existing technologies used for web site development and how they can be used for ensuring availability. Some principles of disaster recovery will also be covered.

Sameer is a Senior Manager in the IT Advisory practice and is working with KPMG since January 2007 and has 12+ years of work experience in the areas of Information Security, Product design and development, system and network administration. Worked on process and technology areas of Information Security. Worked on Governance and Compliance areas like SOX, Basel II, ISO 15048, SSE -CMM, Data Privacy apart from ISO 27001, Identity Management and Business Continuity design and testing. Experience working with startups and established setups. Speaker at various conferences/ seminars within India and abroad. Trained for six sigma green belt.

7 thoughts on “TechWeekend #3: Website Performance, Scalability and Availability: Sept 5

  1. Hi Mukul,

    It was indeed a very interesting session particularly the interaction and the discussions around various points of interest. I seem to recollect that you were investigating use of Hadoop to process the humongous amount of data that you get from the activity (clicks etc) at the various websites where the ads get displayed. I was wondering whether you have considered using Amazon Elastic MapReduce offering http://aws.amazon.com/elasticmapreduce/

    It seems that this requires the data to be in S3 – Would loading data in S3 be a prohibitive cost? Are there any other concerns that you would have using the cloud for this use case as opposed to your own hosted servers ?

    Regards
    Abhijit

  2. Hi Abhijit – we have considered AWS, however there were 2 problems – 1) storing in S3 would be too slow for very large data sets and 2) storing on EBS will be very costly. EBS is very costly per GB.

    That said it is possible to run Hadoop on EC2, many people are doing that. The way would you would do the hardware configuration is – use a large EC2 instance, and use local storage that comes with the server. Then you can run HDFS on top of the local storage.

    I hope that helps.

    Thanks,
    Mukul.

  3. Thanks Mukul,

    That certainly makes sense. I had a couple of follow up questions:

    * How do you get your data (the input data) onto the local storage of your large EC2 instances?

    * The EC2 instances are not persistent i.e. when they go down any local storage disappears. How do plan for such an eventuality i.e. some of your EC2 instances which are part of the Hadoop cluster go down?

    Regards
    Abhijit

Leave a Reply to Mukul Kumar Cancel reply

Your email address will not be published. Required fields are marked *