Event Report: 7th IndicThreads Software Development Conference
(This is a live blog of the talks being given at the 7th IndicThreads Software Development Conference happening in Pune today and tomorrow. The slides of the individual talks will become available on the conference website tomorrow.)
Keynote Address: Suhas Kelkar: Technology Is Just The Start
The the keynote address was by Suhas Kelkar, CTO (APAC) for BMC Software, also in charge of driving BMC’s Incubator Team globally as well as the overall technical and business strategy.
The main point Suhas made was that technology is a great way to innovate, but technology is not enough. There are many other things beyond technology to look at to build great software products.
Any technology innovator must understand the technology adoption curve that Geoffrey Moore explained in Crossing the Chasm. First you innovate and create a new technology, and the first people to use it are the early adopters. And to keep the early adopters you need to keep adding technology features. But the set of early adopters is small. You cannot get mainstream success based on the early adopters. And most new technology innovations fail at this stage. Getting the mainstream to adopt the technology you need to win them over with non-technology things like better user experience, reliability, low cost, etc. And this is necessary, otherwise you’re technology innovation will be wasted. So you must learn to focus on these things (at the right stage in the life-cycle of the product)
Technology innovation is easy. Feature/function innovation is easy. User experience innovation is hard. Getting reliability without increasing cost is very hard. The best technologists need to work on these problems.
Karanbir Gujral: Grammar of Graphics: A New Approach to Visualization
The world has too much data, but not enough information. Not enough people working on converting raw data to easily consumable, actionable information. Hence, the area of data visualization is very important.
“Grammar of Graphics” is a new approach to visualization. It is not simply a library of different types of chart. Instead, it is a language which can be used to describe chart type, or even easily declare/create new chart types. Think of it as the SQL equivalent of graphics.
The language describes features of charts, not specific names/types. For example, instead of saying “bar chart”, you would describe it as a chart with basic 2D co-ordinates, categorical x numeric, displayed with intervals.
Where is this available?
- Book: Grammar of Graphics by Leland Wilkinson
- Open Source software packages:
- Ggplot2 for R users
- Bokeh for Python users
- Commercial Software
- IBM RAVE
More specifically, the grammar allows specification of a chart using six different properties:
- Element Type:
- point, line, area, interval (bar), polygon, schema, text.
- Any element type can be used with any underlying data
- Any number of dimensions (1D, 2D, etc)
- Clustering and stacking
- Map projections
- Simple Axis
- Nested Axis
- Facet Axis
- Standard sequential/stacking layout
- Graph Layouts (Network, Treelike)
- Custom Layouts
- Map data to graphic attributes
- Color (exterior/interior, gradient)
- Size (width, height, both)
- Splitting data into multiple charts
Ideally, you would like each of these 6 different types of properties to be orthogonal to each other. You should be able to mix and match any type of element type with any type of co-ordinate system, with any aesthetics/style. Thus, specifying the values of these six different properties gives a specific type of chart.
Karanbir gave an example of how a simple bar chart is specified using a language like this. The specification was in JSON. I could not capture it in this live blog, but check out the slides for the actual specification. It is quite interesting.
Having to write out a largish specification in JSON just to get a bar chart doesn’t seem to be that useful, but the real power of this approach becomes apparent when you need to make changes or combine the different properties in new/unique ways. The power of orthogonality and composition become readily apparent, and you are able to do really big and cool changes with simple modifications to the specification. Karanbir’s demo of different visualizations for the same data using simple changes to the spec was very intersting. Again, see the slides for some examples.
Mohan Kumar Muddana: JVM the Polyglot Platform
The Java Virtual Machine (JVM) was initially built for Java. But now it handles many different languages. More specifically, it was built to only handle classes, but now so many different types of language constructs from modern languages newer than Java are handled. Since Java6 (JSR223), JVM has supported scripting (Groovy, Jython, JRuby). Starting with Java7 (JSR292), the JVM has added the ‘invokedynamic’ construct, allowing support for dynamically typed languages, which makes it easier to implement all the features of interpreted languages on the JVM – and not just those that are compiled to bytecode.
Why would you want to run other languages on the JVM? Because you can get the speed and stability of the JVM and combine it with the productivity, or ease-of-use, or the different paradigm of the other languages.
Some interesting JVM based languages to check out:
- Scala: Pure object oriented language with Traits, Case Classes and Pattern Matching, and Type Erasure.
- Clojure: Pure functional language
The slides of the talk contain lots of details and examples of why these languages are cool.
Sushanta Pradhan: Typesafe Stack Software Development on the JVM
The world is moving towards apps that need Scalability, Distributed, Parallel & Concurrent, Dynamic and Agile and Swift.
Concurrent access to data-structures is a big pain when trying to implement parallel programs.
Scala is a functional, concise programming language written on top of the JVM, and inter-operates with Java (i.e. you can very easily access all the Java standard library and any other Java code you might have from Scala code). It has immutable and mutable variables; it has tuples, multiple assignments in a single statements, sensible defaults, and operator overloading.
Scala has the following important features that help with implementation of parallel code:
- Parallel Data-Structures
- Distributed Data-Structures
- Actor Model
- Software Transactional Memory
Akka Middleware is a concurrent, scalable and fault tolerant framework based on the actor model. It is event driven and highly performant. It has a supervision model which establishes relationships between actors, and the supervisor can detect and correct errors of the subordinate model. This is a model of reliability based on the belief that you cannot completely avoid failures, so you should simply let your actor fail, and allow the supervisor to take corrective action.
Akka is written using Scala, but has Java APIs, so you can use Akka without really having to use or know Scala.
The Play framework, gives Ruby-on-Rails style agility in the Java world by adopting the convention-over-configuration techniques from Rails, has seamless integration with Akka, and hot deployment which makes it easier to have agile code updates. Play has an easy, out of the box setup for unit and functional testing, asynchronous HTTP request handling, WebSocket support, support for caching (e.g. memcaching).
Play has progressive stream processing. This is based on the iteratee model, which improves over the iterator model by allowing you to easily work with streaming data in which the collection size is not fixed beforehand. Similarly there are enumerator and enumeratees.
Play is written using Scala, and has integration with Akka, but it can be used without Akka, and it has Java APIs, so you can use Play without really having to use or know either Scala or Akka.
Scala, Akka and Play together are called the TypeSafe stack, and is a modern JVM based software stack for building scalable applications.
Mushtaq Ahmed/Shripad Agashe: Using Play! 2.0 to Build a Website
Important features of Play!
- Powerful Templating: Play! has A heavy focus on type-safety across all the layers. This means that even the templating engine has type-safety built in; thus the templates are statically compiled and check for type safety.
- Routes: Statically compiled reverse routes. Ease of refactoring routes due to compiler support. All references to routes are via reverse routes, no hard-coded routes.
- Non-blocking: End-to-end non-blocking architecture. Thus a small number of threads can handle large numbers of concurrent requests. There are no callbacks.
- Great Developer Workflow: Hot reloading, error display (compile time and runtime errors shown in browser), and in-memory database during development.
The rest of this talk contained a code walk-through of Play! features and an implementation of an insurance website using Play! 2.0.
Arun Tomar: Cloud Automation using Chef
Chef is an open source framework that allows the creation of automated scripts that help with hardware/software infrastructure management and deployment. Previously, app upgrades/deployments were manual processes that took hours if not days. Chef allows creation of scripts (written in Ruby) which fully automate these processes.
Chef is cross-platform, so it supports pretty much everything: Windows, Linux, Sun, Mac, BSD, AIX etc.
A Chef deployment consists of a Chef server, which holds all the configuration information for all your infrastructure (aka recipes), a Chef workstation (i.e. the machine from which you will run the Chef recipes), and the actual servers/machines that will be controlled/configured using Chef (aka Chef nodes). Chef client software needs to be installed on each node. The Chef workstation then takes the Chef script (or recipe) from the Chef server, breaks it up into pieces to be executed on each individual server/machine and sends those instructions to the Chef client on the nodes using ssh.
Each Chef recipe is used to configure some part of a server: e.g. Apache, MySQL, Hadoop, etc. These recipes describe a series of resources that should be in a particular state – packages that should be installed, services that should be running, or files that should be written. Chef makes sure each resource is properly configured, and gives you a safe, flexible, easily-repeatable mechanism for making sure your servers are always running exactly the way you want them to.
Core principles of Chef:
- Idempotence: doing an operation should be idempotent. This ensures that when scripts fail halfway through, you can just re-run the whole script and everything will just work properly and the script will complete cleanly, without doing anything twice.
- Order Matters: the steps in a recipe will get executed in the order they’re written
- You are the Boss: Most infrastructure solutions force you to adapt to their “one-size-fits-all” approaches, even though every infrastructure challenge is different. This is crazy. Instead, Chef allows you to be in full charge of everything and you can configure everything exactly the way you want it – and you have the full power of the Ruby programming language to specify this to any level of detail required.
A big advantage of Chef is that Chef recipes are essentially a self-documenting record of your actual infrastructure. In addition, when you add new servers, Chef automatically discovers information your system. So, for example, when you need to do something on production, you simply ask Chef, which knows what is really happening, and you don’t need to rely on static documentation that might be out of date.
Another advantage of Chef is the extensive library of existing recipes for standard things that people would like to install/control using Chef. For example, consider this:
This line is enough to install apache2 on any machine, irrespective of which OS the machine is running. This is possible because all the hardwork of actual commands and configurations for Apache2 for all the various supported platforms has already been done for you by someone else!
In addition, Chef contains templates (e.g. a template for apache2 ) which contain the various configuration parameters that can be configured by you easily. So if you don’t want to standard apache2 installation, you can “include” a template, over-ride a few parameters, and you’re done.
Here are some examples of what chef can do:
- It can use rpm, or apt-get, or dpkg, or yum, or python-pip, or python-easy_install, or ruby gems, or whatever your package installer happens to be and install packages. It is smart enough to know whether a package is already installed and which version, and will not do unnecessary things.
- It can update configuration files, start and stop services, and install plugins for software packages that can handle plugins.
- It can automatically clone git repositories, make/build/install, handle dependencies, execute scripts.
Shubham Srivastava: NoSQL – Now and the Path Ahead
- Scalability and Performance
- Horizontal scalability is better than vertical scalability
- Hardware is getting cheaper and processing power in increasing
- Less operational complexity compared to RDBMS solutions
- Get automatic sharding as default
- Scale without Hefty Cost
- Commodity Hardware, and free/open source software (think versions, upgrades, maintenance)
- Flexibility in Data Modeling.
- Key-value stores (very powerful…)
- Hierarchical or Graph Data
- Storing values like maps, or maps of maps
- Document databases – with arbitrarily complex objects
The next part of the talk was a detailed look at advantages and disadvantages of a specific type of NoSQL data base for answering specific queries.