Tag Archives: persistent

Turing100 Lecture: Life and work of Judea Pearl – 9 Mar

March 5, 2013Events, Featureddatabases, Events, fault tolerance, persistent, transactions, turing, turing100Navin Kabra

As part of the Turing100 Lecture Series this time, there is a talk on the life and work of 2011 Turing Award recipient Judea Pearl, followed by a “Turing 100” quiz that teams of professionals and students can participate in.

Judea Pearl was given the Turing award for the development of a calculus for probabilistic and causal reasoning. On Saturday, 9 March, Mukund Deshpande, Head of the Business Intelligence and Analytics Competency at Persistent will talk about this work.

The event is free for everyone to attend. Register here

About the Turing Awards

The Turing awards, named after Alan Turing, given every year, are the highest achievement that a computer scientist can earn. And the contributions of each Turing award winner are then, arguably, the most important topics in computer science.

About Turing 100 @ Persistent Lecture Series

This year, the Turing 100 @ Persistent lecture series will celebrate the 100th anniversary of Alan Turing’s birth by having a monthly lecture series. Each lecture will be presented by an eminent personality from the computer science / technology community in India, and will cover the work done by one Turing award winner.

The lecture series will feature talks on Ted Codd (Relational Databases), Vint Cerf and Robert Kahn (Internet), Ken Thompson and Dennis Ritchie (Unix), Jim Gray, Barbara Liskov, and others. Full schedule is here

This is a lecture series that any one in the field of computer science must attend. These lectures will cover the fundamentals of computer science, and all of them are very relevant today.

Fees and Registration

This is a free event. Anyone can attend.

The event will be at Dewang Mehta Auditorium, Persistent Systems, SB Road, from 2pm to 5pm on Saturday 9th March. This event is free and open for anybody to attend. Register here

Turing100 Lecture: Life and work of John Backus (Fortran|BNF) – 2 Feb

January 28, 2013Events, Featureddatabases, Events, fault tolerance, persistent, transactions, turing, turing100Navin Kabra

The Turing100 Lecture Series come back with the 6th session. This time, there are two Technical talks, centered around the life and works of 1977 Turing Award recipient, Dr. John Backup.

In addition to several other contributions, Dr Backus is well-known for his pioneering work on Fortran as well as the inventor of the Backus-Naur Form (BNF) which is widely used as the notation for formal syntax.

On Saturday, 2nd Feb, Abhijit Vichare, will talk about the life and work of John Backus.

This will be followed by a session on “Early History of Fortran: The Making of a Wonder” by Prof. Uday Khedkar, Department of CSE, IIT-Bombay.

The event is free for everyone to attend. Register here

About the Turing Awards

About Turing 100 @ Persistent Lecture Series

This is a lecture series that any one in the field of computer science must attend. These lectures will cover the fundamentals of computer science, and all of them are very relevant today.

Fees and Registration

This is a free event. Anyone can attend.

The event will be at Dewang Mehta Auditorium, Persistent Systems, SB Road, from 2pm to 5pm on Saturday 2nd February. This event is free and open for anybody to attend. Register here

Turing100 Lecture: Life and work of Jim Gray – By Anand Deshpande – 5 Jan

January 4, 2013Events, Featureddatabases, Events, fault tolerance, persistent, transactions, turing, turing100Navin Kabra

The Turing100 Lecture Series come back with the 6th session. This time, there are two Technical talks, centered around the life and works of 1998 Turing Award recipient, Dr. Jim Gray.

Jim Gray made seminal contributions to database and transaction processing research and implementions of real systems at Tandem, IBM, and Microsoft. Among his best known achievements are

granular database locking
two-tier transaction commit semantics
the “five-minute rule” for allocating storage
the data cube operator for data warehousing applications
describing the requirements for reliable transaction processing (memorably called the ACID test) and implemented them in software.

On Saturday, 5th January, Anand Deshpande, CEO of Persistent Systems, will talk about the life and work of Dr. Jim Gray, the recipient of the 1998 Turing Award. This talk is a part of the Turing Awards monthly lecture series that happens at Persistent’s Dewang Mehta Auditorium.

This will be followed by a session on “Software Faults, Failures and Their Mitigation – contributions of Dr. Jim Gray to the area of Fault Tolerant Systems” by Prof. Kishore Trivedi, Duke University, USa.

The event is free for everyone to attend. Register here

About the Turing Awards

About Turing 100 @ Persistent Lecture Series

This is a lecture series that any one in the field of computer science must attend. These lectures will cover the fundamentals of computer science, and all of them are very relevant today.

Fees and Registration

This is a free event. Anyone can attend.

The event will be at Dewang Mehta Auditorium, Persistent Systems, SB Road, from 2pm to 5pm on Saturday 5th January. This event is free and open for anybody to attend. Register here

Turing100 Event Report: Work of Butler Lampson – Systems

November 24, 2012Event ReportEvents, persistent, systems, turing, turing100Navin Kabra

(This is a live-blog of Neeran Karnik‘s talk on Butler Lampson, as part of the Turing100 Lecture Series happening at Persistent. Since it is being typed during the talk, please forgive the typos and bad structuring of the article. This article is also very incomplete.)

Butler Lampson has contributions in a wide area of computer science fields. Here is the Turing Award Citation:

For contributions to the development of distributed, personal computing environments and the technology for their implementation: workstations, networks, operating systems, programming systems, displays, security and document publishing.

The number of different areas of computer science touched there is breathtaking.

Systems Built

Here is just a sampling of some of the work of Lampson that resulted in entire areas of computer hardware and software:

The first personal computer:
- The first personal computer in the world, the Xerox Alto, was conceived in a 1972 memo written by Lampson.
- Important contributions of the Alto:
  - First “personal” computer
  - First computer that used a desktop metaphor
  - First computer to use a mouse-driven graphical interface
- Lampson later work on the follow-up workstation designs Dorado and Wildflower (research products) which later resulted in a successful commercial product (Star).
The Bravo Editor
- Lampson designed the first WYSIWYG editor in the world in 1974. This shipped with the Xerox Alto. This work can ultimately be seen to have led to the development of Microsoft Word
The Xerox 9700 and Dover Laser Printers
- The first laser printer was designed in 1969 at Xerox Parc and Lampson worked on the electronic design of it.
The SDS 940 Time-sharing system
- The first general-purpose time-sharing system.

And those were just the systems he built.

What about more fundamental contributions to computer science? Here is a list:

The two-phase commit protocol.
- This is the fundamental building block of all transactional processing in databases that are spread out across machines and/or geographies.
The CAL time-sharing system
Programming Languages
- MESA and SPL: for systems programming. Modern threads developed from here
- Euclid: first programming language to use verification
Security:
- Access matrix model, unifying capabilities and ACLs
- Theory of principals speaking for other principals
- Microsoft Palladium
- Scrubbing disk storage
- Research on how economic factors affect security
Networking
- Co-inventor of ethernet!

How is Systems Research different?

Butler Lampson was one of the few great computer scientists who spent a lot of in the laboratory with actual hardware, getting his hands dirty. This is not the kind of work normally associated with Turing award winners, but it is the kind of work that has really given us the actual hardware that we use in computer science today.

Some thoughts on why systems programming is different more difficult.

Designing and building large computing systems, or complex computing systems or both. Computers (from tablets to supercomputers), networks, storage and other hardware and OS, programming languages, and other infrastructure software.

Systems design is different from other parts of computers science (e.g. algorithm design) because it’s external interface is to the real world (and hence it is imprecise, subject to change, and generally underspecified), lots of moving parts (i.e. more internal structure and more internal interfaces), module-level design choices have wider implications on the end product, and measure of success is less clear than in other fields. There is no such thing as an optimal answer, so avoiding terrible designs is more important than finding the most optimal one.

Hints on Computer System Design

This is a paper written by Lampson giving hints on how to build good systems. He uses hints, because in systems work, there are no infallible rules. So there are just hints which guide your thinking. Werner Vogels, CTO of Amazon, who oversees some of the most complex and scalable computing infrastructure in the world is a fan of this work. He finds these hints very useful, and says that they are more important today because they’ve withstood the test of time

These hints talk about functionality (what you’re doing), speed (are you doing it quickly enough), and fault-tolerance (and will you keep doing it) In systems, the interface design is the most important part. Do this well and other stuff will follow.

Hints about Functionality/Interface:
- Do one thing at a time, and do it well
  - KISS – Keep it Short Stupid
  - Don’t generalize – because generalizations are wrong
  - Don’t overload interfaces with too much work
  - “Perfection is reached not when there is no longer anything to add, but when there is no longer anything to take away” – Antoine de St. Exupery
- An interface should capture the minimum essentials
  - Don’t promise more than you can deliver
- Cost of an interface should be predictable
- Better to have basic and fast functionality, rather than slow and complex
  - Clients who need speed, are happy with the basic+fast interface
  - Clients how need complexity, can build it on top of your basics
- Don’t hide the power
  - If a lower layer abstraction is more powerful, a higher layer abstraction should not hide that power. It should expose the power. The abstraction should only conceal undesirable properties
- Keep the basic interfaces stable; parts of the system that will be used by many different modules should not change
- Treat the first system that you’re building as a prototype that you will throw away
- Divide and conquer
  - Reduce the problem to smaller pieces that you can solve more easily
  - Bite of as much as you can handle right now, and then leave the rest for the next iteration.
Hints on Speed
- Split resources in a fixed way, rather than a sharing way
  - Do not spend too much effort in trying to share resources
  - Overhead of multiplexing usually not worth it
  - Fixed allocation this makes things faster, and more predictable
- Cache Answers to Expensive Computations
- Dynamic Translation
  - Use different representations for use of some structure vs. implementation of the same, and then dynamically translate between them.
  - “All problems in computer science can be solved by adding a level of indirection”
- Use hints
  - i.e. things that don’t have to be 100% accurate, but can improve performance if they are correct
  - For example, the routing tables in internet packet routing. They can be incorrect, out-of-date. In most cases, they work, and give great performance. Rarely, they don’t work, and you can recover.
- When in doubt, use brute force
  - Hardware is cheap
  - Brute force allows for cheaper faster implementation
  - e.g. chess-playing computers that use brute force have defeated grandmasters, while complex algorithms trying to “understand” chess.
- Compute in the background whenever possible
  - e.g. if a webpage update results in an email being sent, the email sending should happen in the background.
- Use batch processing when possible
- Safety First
  - Correctness is more important than speed
  - Being clever and optimal is over-rated
  - A general purpose system cannot really optimize, because it doesn’t really know the use cases
  - Hardware is cheap
Hints for Fault Tolerance
- End-to-End Error Recovery
  - “Error detection and recover at the application level is necessary. Error detection and recovery at lower layers is not necessary, and should only be done for performance reasons.”
  - Saltzer et. all, classic 1984 paper
  - Example: if transferring file from one computer to another there are lots and lots of steps (file being transferred from storage to memory to network), and in lots of sub-tasks (transferring chunks of file)
- Lot updates to record the truth about an object state
  - Current state of the object in memory is a hint
  - The true state of the object is what can be recovered from the storage and logs
- Actions should be atomic or restartable
  - Either an action is all-or-nothing, or if there is a failure in between the action should be able to restart from where it left off

Lampson himself suggests that reading the whole hints paper at once might be tiresome, so it is better to read it in small doses at bedtime. And he points out that he himself has ignored most of these hints sometimes, but then has always regretted them.

Turing100 Lecture: Butler Lampson – Systems, Security, Verification and more – Nov 24

November 22, 2012Events, FeaturedEvents, functional programming, languages, persistent, programming, turing, turing100Navin Kabra

In 1992, Butler Lampson received the Turing award in for his contributions to the development of distributed, personal computing environments and the technology for their implementation: workstations, networks, operating systems, programming systems, displays, security and document publishing.

On Saturday, 24th November, Neeran Karnik, Senior Architect at BMC Software, will give a talk about Butler Lampson’s work. This talk is a part of the Turing Awards monthly lecture series that happens at Persistent’s Dewang Mehta Auditorium.

This will be followed by a session on [“Experience Sharing – Systems design and development Projects in India”]. The speakers include Dr. Basant Rajan, CEO of Coriolis Software (previously CTO of Symantec India), and Abhay Ghaisas, Product Development Architect BMC Software.

The event is free for everyone to attend. Register here

About the Turing Awards

About Turing 100 @ Persistent Lecture Series

This is a lecture series that any one in the field of computer science must attend. These lectures will cover the fundamentals of computer science, and all of them are very relevant today.

Fees and Registration

This is a free event. Anyone can attend.

The event will be at Dewang Mehta Auditorium, Persistent Systems, SB Road, from 2pm to 5pm on Saturday 24th November. This event is free and open for anybody to attend. Register here

BioCrats – Life Sciences & Software Conference – 7 Dec

November 21, 2012EventsEvents, life sciences, persistentNavin Kabra

BioCrats is a Persistent Systems’ initiative with an aim to build a vibrant scientific community in the field of life sciences by connecting, sharing and moving on basis of exchanging ideas.

‘BioCrats Connect’ Convention is a half day interactive session, bringing together leading industrialists, scholars and academicians. It’s a platform to interact and discuss science, research and development activities and industry practices with a focus on promoting innovation and research. BioCrats Connect also provides a unique venue for advancing the academic debate, offering a wide perspective and a variety of networking opportunities for researchers. For the 7th ‘BioCrats Connect’ Convention, Venture Center, NCL, Pune is also a collaborating partner.

BioCrats will be held on Friday 7th December, 2012 – 2:00 pm to 6:00 pm- at Dewang Mehta Auditorium, Persistent, Pune.

Check this page for details of the agenda, schedule etc.

Fees and Registration

This event is free and open for anybody to attend. Please register here

Turing100 Lecture: Robin Milner and Polymorphic Type Inference in Programming Langauges – Oct 6

October 3, 2012Events, FeaturedEvents, functional programming, languages, persistent, programming, turing, turing100Navin Kabra

Robin Milner received the Turing award in 1991 for three major contributions to computer science:

In the area of automated theorem proving – He developed LCF, the first theoretically sound yet practical tool for machine assisted proof construction
In the area of programming language design – He developed ML, the first language to use polymorphic type inference along with a type-safe execution handling mechanism, something that underlies some of the most interesting new programming languages that are being developed today, and
In the area of concurrency – He developed CCS, a general theory of concurrency.

On 6th October, Navin Kabra (yes, that’s me), will give a talk about Robin Milner’s work. This talk is a part of the Turing Awards lecture series that happens at Persistent’s Dewang Mehta Auditorium at 2pm on the first Saturday of every month this year.

This will be followed by a panel discussion on “Should every programmer learn functional programming”. The panelists include Dhananjay Nene, Chief Architect at Vayana Software, Prof. Raju Pandey, of University of California-Davis, Rustom Modi, who has been teaching functional programming at the University of Pune for over 15 years, and who is a founder if i-Magus which delivers training in functional programming and other related technologies, and Kedar Swadi, CTO at AlgoAnalytics, and others. For more details of the panel discussion see this article

The event is free for everyone to attend. Register here

Abstract of the Talk

In this talk, I will give a brief overview of Robin Milner’s career, following by a technical dive into his work. I will briefly cover his work on automated theorem proving and LCF, which served as the motivation for the development of ML, the programming language intended to be used for automated theorem proving. ML ended up having a huge impact on the design of modern programming languages and its influence can be seen in important modern languages like Microsoft’s F#, Haskell, and the JVM based Scala. The bulk of my talk will cover the design of ML, with a specific focus on the polymorphic type inference system used in ML. Type inference is an important aspect of a lot of modern programming languages, and can be found, for example, in Google’s Go Language, Perl6, Visual Basic 9.0 onwards, C# version 3.0 onwards.

About the Turing Awards

About Turing 100 @ Persistent Lecture Series

This is a lecture series that any one in the field of computer science must attend. These lectures will cover the fundamentals of computer science, and all of them are very relevant today.

Fees and Registration

This is a free event. Anyone can attend.

The event will be at Dewang Mehta Auditorium, Persistent Systems, SB Road, from 2pm to 5pm on Saturday 6th October. This event is free and open for anybody to attend. Register here

Event Report: The Work and Impact of Bob Kahn and Vint Cerf

September 8, 2012Event Report, FeaturedEvents, internet, networking, persistent, tcpip, turing100Navin Kabra

(This is a liveblog of the Turing100@Persistent Lecture on Bob Kahn and Vint Cerf by R. Venkateswaran, CTO of Persistent Systems. Since it is being typed as the event is happening, it is not really well structured, but should rather be viewed as a collection of bullet points of interesting things said during the talk.)

Vint Cerf and Bob Kahn

Vint Cerf: Widely known as the father of the internet. He is President of the ACM, Chief Internet Evangelist at Google, Chairman of the ICANN and many other influential positions. In addition to the Turing Award, he has also received the Presidential Medal of Freedom in 2005 and was elected to the Internet Hall of Fame in 2012.

Bob Kahn: Worked at AT&T Bell Labs, MIT, then while working with BBN, he got involved with the DARPA and Vint Cerf and they together worked on packet switching networks, and invented the IP and TCP.

The birth of the internet: TCP and IP. 70s and 80s.

The Internet:
- The first 20 years:
  - Trusted network
  - Defense, Research and Academic network
  - Non-commercial
  - Popular apps: email, ftp, telnet
- Next 20 years:
  - Commercial use
  - Multiple levels of ownership – increased distrust and security concerns
  - Wide range of apps: email, WWW, etc
What did Vint Cerf and Bob Kahn do?
- The problem:
  - There were many packet switched networks at that time
  - But very small, limited and self contained
  - The different networks did not talk to each other
  - Vint Cerf and Bob Kahn worked on interconnecting these networks
- The approach
  - Wanted a very simple, and reliable interface
  - Non-proprietary solution. Standardized, non-patented, “open”
  - Each network talked its own protocol, so they wanted a protocol neutral mechanism of connecting the networks.
  - Each network had its own addressing scheme, so they had to invent a universal addressing scheme.
  - Packets (information slices) forwarded from one host to another via the “internetwork”
  - Packets sent along different routes, no guarantees of in-order delivery. Actually no guarantee of delivery
  - Packets have sequence numbers, so end point needs to reassemble them in order
- The protocol
  - A “process header” identifies which process on the end host should be delivered the packets. This is today called the “port”
  - Retransmissions to ensure reliable delivery. And duplicate detection.
  - Flow control – to limit number of un-acknowledged packets, prevent bandwidth hogging
  - A conceptual “connection” created between the end processes (TCP), but the actual network (IP) does not know or understand this
  - Mechanism to set up and tear down the “connection” – the three-way handshake
  - This are the main contributions of their seminal paper
- The Layered Network Architecture
  - Paper in 1974 defining a 4 layered network model based on TCP/IP.
  - This later became the basis of the 7 layer network architecture
- The Internet Protocol
- Packet-switched datagram network
- Is the glue between the physical network and the logical higher layers
- Key ideas:
  - Network is very simple
  - Just route the packets
  - Robust and scalable
  - Network does not guarantee any thing other than best effort
    - No SLA, no guarantee of delivery, not guarantee of packet ordering
  - Dumb network, smart end-host
  - Very different from the existing, major networks of that time (the “circuit-switched” telephone networks of that time)
  - No state maintained at any node of the network
- Advantages
  - Can accommodate many different types of protocols and technologies
  - Very scalable
- The Transport Layer
- UDP
  - Most simplistic higher level protocol
  - Unreliable, datagram-based protocol
  - Detect errors, but no error corrections
  - No reliability guarantees
  - Great for applications like audio/video (which are not too affected by packet losses) or DNS (short transactions)
- TCP
  - Reliable service on top of the unreliable underlying network
  - Connection oriented, ordered-stream based, with congestion and flow control, bi-directional
  - State only maintained at the end hosts, not at the intermediate hosts

Internet 2.0 – Commercialization

The birth of the world wide web: late 80s early 90s
- Tim Berners-Lee came up with the idea of the the world-wide-web
- 1993: Mosaic, the first graphical web browser
- First Commercial ISP (Internet Service Provider) – Dial up internet
- Bandwidth doubling every 6 months
- Push for multi-media apps
Push for higher bandwidth and rich apps
- Net apps (like VoIP, streaming video) demand higher bandwidth
- Higher bandwidth enables other new applications
- Apps: email, email with attachments, streaming video, intranets, e-commerce, ERP, Voice over Internet, Interactive Video Conferencing
Dumb Network no longer works
- Single, dumb network cannot handle all these different applications
- Next Generation Networks evolved
- Single, packet-switched network for data, voice and video
- But with different levels of QoS guarantees for different services
Clash of Network Philosophies: BellHeads vs NetHeads (mid-90s)
- Two major approaches: the BellHeads (circuit switched Telephone background), and the NetHeads (from the IP background)
- BellHeads philosophy: network is smart, endpoints are dumb; closed, proprietary communities; expect payment for service; per-minute charges; Control the evolution of the network; want strong regulations
- NetHeads philosophy: network is dumb, endpoints are smart; open community; expect cheap or free services; no per-minute charges; want network to evolve organically without regulations.
- These two worlds were merging, and there was lots of clashes
- BellHead network example: Asynchronous Transfer Mode (ATM) network
  - Fixed sized packets over a connection oriented network
  - Circuit setup from source to destination; all packets use same route
  - Low per-packet processing at each intermediate node
  - Much higher speeds than TCP/IP (10Gbps)
  - A major challenge for the NetHeads
- Problems for NetHeads
  - To support 10Gbps and above, each packet needs to be processed in less than 30ns, which is very difficult to do because of all the processing needed (reduce TTL, lookup destination address, manipulate headers, etc)
  - As sizes of networks increased, sizes of lookup tables increased
  - Almost read to concede defeat
- IP Switching: Breakthrough for NetHeads
  - Use IP routing on top of ATM hardware
  - Switch to ATM circuit switching (and bypass the routing layer) if a long-running connection detected.
  - Late 90s, all IP networking companies started implementing variations on this concept
- MPLS: Multi-Protocol Lable Switching
  - Standard developed by IP networking companies
  - Insert a layer between TCP and IP (considered layer 2.5)
  - Separates packet forwarding from packet routing
  - Edges of the network do the full IP routing
  - Internal nodes only forward packets, and don’t do full routes
  - Separate forwarding information from routing information, and put forwarding info in an extra header (MPLS label – layer 2.5)
  - MPLS Protocol (mid-97)
    - First node (edge; ingress LSR) determines path, inserts MPLS label header
    - Internal nodes only look at MPLS label, and forwards appropriately, without doing any routing and without looking at IP packet
    - Last node (edge; egress LSR) removes the MPLS label
    - Label switching at intermediate nodes can be implemented in hardware; significant reduction in total latency
  - MPLS is now basis of most internet networking

Internet 3.0: The Future

End of the network centric viewpoint. (Note: These are futuristic predictions, not facts. But, for students, there should be lots of good project topics here.)

Problems with today’s internet
- Support for mobility is pretty bad with TCP/IP.
- Security: viruses, spams, bots, DDOS attacks, hacks
  - Internet was designed for co-operative use; not ideal for today’s climate
- Multi-homing not well supported by TCP/IP
  - Change in IP address results in service disruption
  - What if you change your ISP, your machine, etc?
  - Cannot be done seamlessly
- Network is very machine/ip centric (“Where”)
  - What is needed are People-centric networks (“Who”) and content centric (“What”)
  - IP address ties together identity and location; this is neither necessary, nor desirable
Three areas of future research:
- Delay Tolerant Network (DTN) Architecture
  - Whenever end-to-end delay is more than a few 100 milliseconds, various things start breaking in today’s networks
  - DTN’s characterized by:
    - Things that are not always connected to the network. For example, sensor networks, gadgets, remote locations. Another Example: remote villages in Africa have a bus visiting them periodically, and that gives them internet access for a limited time every day.
    - Extremely Long Delays
    - Asymmetric Data Rates
    - High Error Rates
  - Needs a store-and-forward network
- Content-centric Networks
  - Instead of everything being based on IP-address, how about giving unique identifiers to chunks of content, and define a networking protocol based on this
  - Strategy: let the network figure out where the content is and how to deliver it
  - Security: the content carries the authorization info, and unauthorized access is prevented
- Software Defined Networks
  - Virtualizing the Network
  - Search the net for: “OpenFlow”
  - Hardware Router only does packet forwarding, but end applications can update the routing tables of the router using the OpenFlow protocol. App has a OpenFlow controller that sends updates to the OpenFlow agent on the Hardware Router.
  - In the hardware/OS world, virtualization (VMWare, Xen, VirtualBox) are slowly taking over; OpenFlow is a similar idea for network hardware
  - Oracle, VMWare have had major acquisitions in this space recently

Turing100 Lecture: Vint Cerf + Bob Kahn – “Fathers of the Internet” – 8 Sept

September 5, 2012Events, FeaturedEvents, internet, networking, persistent, turing100Navin Kabra

Vinton Cerf and Robert Kahn invented TCP and IP, the two protocols at the heart of the internet, and are hence considered the “Fathers of the Internet”. For this and other fundamental contributions, they were awarded the Turing award in 2004.

On 8th September, R. Venkateswaran, CTO of Persistent Systems, will give a talk on the life and work of Vint Cerf and Bob Kahn as a part of the Turing 100 Lecture Series organized by Persistent in Pune on the first Saturday of every month (although this month it was shifted to the second Saturday).

In addition, this Saturday’s event will also feature a talk on “Net Neutrality: The Supply and Demand Side Perspective” by Dr. V Sridhar, a research fellow with Sasken.

About the Turing Awards

About Turing 100 @ Persistent Lecture Series

This year, the Turing 100 @ Persistent Lecture Series will celebrate the 100th anniversary of Alan Turing’s birth by having a monthly lecture series. Each lecture will be presented by an eminent personality from the computer science / technology community in India, and will cover the work done by one Turing award winner.

This is a lecture series that any one in the field of computer science must attend. These lectures will cover the fundamentals of computer science, and all of them are very relevant today.

Fees and Registration

This is a free event. Anyone can attend.

The event will be at Dewang Mehta Auditorium, Persistent Systems, SB Road, from 2pm to 5pm on Saturday 8th September. This event is free and open for anybody to attend. Register here

Event Report: Sham Navathe on E.F. Codd

August 4, 2012Event Report, Featureddatabases, Events, persistent, turing100Navin Kabra

(This is a liveblog of Sham Navathe’s lecture on E.F. Codd as part of the Turing 100 @ Persistent lecture series.)

Sham Navathe does not really need an introduction – since he is famous for his book “Fundamentals of Database Systems” written by with Ramez Elmasri, which is prescribed for undergraduate database courses all over the world. His full background can be looked up in his Wikipedia page, but it is worth mentioning that Navathe is a Punekar, being a board topper from Pune in his HSC. He regularly visits Pune since he has family here.

Today’s lecture is about E.F. Codd, a British mathematicians, logician, and analyst, who was given the Turing award in 1981 for his invention of the relational databases model. He is one of the 3 people to have received a Turing award for work in databases (the other two being Charlie Bachman in 1973 for introducing the concept of data structures and the network database model, and Jim Gray in 1998 for his work on database transaction processing research and technical leadership in system implementation.)

Ted Codd studied in Oxford, initially studying Chemistry, before doing a stint with the Royal Air Force and then getting degree in Maths. He later emigrated to US, worked in IBM, did a PhD from University of Michigan, and finally went back to IBM. At that time, he led the development of the world’s first “multiprogramming system” – sort of an operating system.

Codd quit IBM in 1984 because he was not happy with the “incomplete implementation of the relational model.” He believed that SQL is a “convenient” and “informal” representation of the relational model. He published rules that any system must follow before it could be called a relational database management system, and complained that most commercial systems were not really relational in that sense – and some were simply thin pseudo-relational layer on top of older technology.

Invention of the Relational Model

In 1963-64, IBM developed the IMS database management system based on the hierarchical model. In 1964-65 Honeywell developed IDS, based on the network model. In 1968, Dave Childs of Michigan first proposed a set-oriented database management system. In 1969 Codd published “The derivability, redundancy, and consistency of relations stored in large databases” (IBM research report, RJ-599, 1969). This was the work that led to the seminal paper, “A Relational Model for Large Shared Data Banks” (CACM, 13:6, June 1970). Other classic papers are: “Extending the Relational Model to capture more meaning” (ACM TODS, 4:4, Dec 1979), which is called the RM/T model. He is also the inventor of the term OLAP (Online Analytical Processing).

After Codd’s proposal of the relational model, IBM was initially reluctant to commercialize the idea. Instead, Michael Stonebraker of UC-Berkeley along with PhD students created INGRES, the first fully relational system. INGRES ultimately became Postres database which is one of the leading open source databases in the world today. In the meantime, Relational Software Inc. brought another relational database product to the market. This ultimately became Oracle. After this, IBM heavily invested in System R that developed the relational DBMS ideas fully. Codd was involved in the development of System R – and most of the fundamental ideas and algorithms underlying most modern RDBMS today are heavily influenced by System R.

Interesting RDBMS developments after Codd’s paper:

1975: PhD students in Berkeley develop an RDBMS
1976: System R first relational prototype system from IBM
1979: First proposal for a cost based optimizer for database queries
1981: Transactions (by Jim Gray)
1981: SQL/DS First commercial RDBMS

Two main motivations for the relational model:

- Ordering dependence
- Indexing dependence
- Access path dependence
In DBMS before RDBMS, there was a heavy dependence of the program (and programmer) on the way the data is modeled, stored and navigated:

All of this was hardcoded in the program. And Codd wanted to simplify database programming by removing these dependencies.
– Loss of programmer productivity due to manual optimization.

Codd’s fundamental insight was that freeing up the application programmer from knowing about the layout of the data on disk would lead to huge improvements in productivity. For example, in the network or hierarchical models, a data model in which a Student has a link to the Department that he is enrolled in, is very different from a model in which each Department links to all the students that are enrolled there. Depending upon which model is used, all application programs would be different, and switching from one model to another would be difficult later on. Instead, Codd proposed the relational model which would store this as the Student relation, the Department relation, and finally the Enrolment relation that connects Student and Department.

The Relational Data Model

A relation is simply an unordered set of tuples. On these relations, the following operations are permitted:

Permutation
Projection
Join
Composition

Of course, other operators from set theory can be applied to relations, but then the result will not be a relation. However, the operations given above take relations and the results are also relations. Thus, all the relational operators can again be applied to the results of this operation.

He defined 3 types of relations:

Expressible: is any relation that can be expressed in the data model, using the data access language
Named: is any relation that has been given a specific name (i.e. is listed in the schema)
Stored: is a relation that is physically stored on disk

He also talked about 3 important properties of relations:

Derivability: A relation is derivable if it can be expressed in terms of the data access language (i.e. can be expressed as a sequence of relational operations)
Redundancy: A set of relations is called strongly redundant if one of the relations can be derived from the other relations. i.e. it is possible to write a relational operation on some of the relations of the set whose result is the same as one of the other relations. A set of relations is weakly redundant if there is a relation in that set which has a projection that is derivable from the other relations. Good database design entails that strongly redundant sets of relations should not be used because of problems with inconsistency. However, weakly redundant relations are OK, and used for performance purposes. (Materialized views.)
Consistency / Inconsistency: Codd allowed the definition of constraints governing the data in a set of relations, and a database is said to be consistent if all the data in the database satisfies those constraints, and is said to be inconsistent if not.

In the years that followed, a bunch of IBM research reports on normalization of databases followed.

Turing Award Lecture

His talk is titled: “Relational Databases: A Practical Foundation for Productivity”. His thoughts at that time:

Put users in direct touch with databases
Increase productivity of DP professionals in developing applications
Concerned that the term “relational” was being misued

He points out that in relational data model, data can be addressed symbolically, as “relation name, primary key value, attribute name”. This is much better than embedding links, or positional addressing (X(i, j)).

The relational data model encompasses structure, manipulation and integrity of data. Hence, it is a complete model, because all 3 aspects are important for data management.

Characteristics of relational systems:

MUST have a data sub-language that allows users to query the data using SELECT, PROJECT and JOIN operators
MUST NOT have user visible navigation links between relations
MUST NOT convey any information in the way tuples are ordered

He was worried that relational system might not be able to give performance as good as the performance of non-relational systems. He talked about:

performance oriented data structures
efficient algorithms for converting user requests into optimal code

In future work, he mentioned the following

Domains and primary keys
Updating join-type views
Outer-joins
Catalogs
Design aids at logical and physical level
Location and replication transparency in distributed databases
Capture meaning of data
Improved treatment of missing, null and inapplicable values
Heterogeneous data

This was a remarkably prescient list. In the 30 years since this talk, most of this has actually happened either in commercial databases or in research labs. We have pretty much achieved #1 to #6, while #7 to #9 have seen a lot of research work but not wide commercial acceptance yet.

Concluding Thoughts

Relational model is a firm foundation for data management. Nothing else compares.
On this foundation we were able to tackle difficult problems in the areas of design, derivability, redundancy, consistency, replication as well as language issues. All of these would have been very difficult otherwise
Proponents of NoSQL databases as well as map-reduce/hadoop type of systems need to keep in mind that large data management cannot really be done in an ad hoc manner.
Codd’s RM/T model was an attempt to capture metadata management, but fell short of what was needed.

Audience Questions

Q: Why did object-oriented databases not catch on?

A: There was a lack of understanding amongst the wider community as to the best way of using object-oriented ideas for data management. OODBMS companies were not really able to really educate the wider community, and hence failed. Another problem is that object-oriented DBMS systems made the data model complex but there were not corresponding developments in the query language and optimization tools.

Q: When the relational model was developed, did they constrain themselves due to the hardware limitations of those days?

A: Codd did mention that when deciding on a set of operations for the relational model, one consideration was ‘Which of these operations can be implemented on today’s hardware’. On the other hand, there were lots of companies in the 80s which tried to implement specialized hardware for relational/DBMS. However, none of those companies really succeeded.

In the future, it is very unlikely that anyone will develop a new data model with improvements in hardware and processing power. However, new operators and new ways of parallelizing them will certainly be developed.

Q: There are areas of data management like incomplete, inexact data; common sense understanding of data; deduction and inferencing capabilities. These are not really handled by today’s DBMS systems. How will this be handled in the future.

A: There have been many interesting and elegant systems proposed for these areas, but non have seen commercial success. So we’ll have to wait a while for any of these to happen.

Will be updated every 15 minutes. Please refresh regularly.