Tag Archives: turing100

Turing100 Lecture: Life and work of Jim Gray – By Anand Deshpande – 5 Jan

The Turing100 Lecture Series come back with the 6th session. This time, there are two Technical talks, centered around the life and works of 1998 Turing Award recipient, Dr. Jim Gray.

Jim Gray made seminal contributions to database and transaction processing research and implementions of real systems at Tandem, IBM, and Microsoft. Among his best known achievements are

  • granular database locking
  • two-tier transaction commit semantics
  • the “five-minute rule” for allocating storage
  • the data cube operator for data warehousing applications
  • describing the requirements for reliable transaction processing (memorably called the ACID test) and implemented them in software.

On Saturday, 5th January, Anand Deshpande, CEO of Persistent Systems, will talk about the life and work of Dr. Jim Gray, the recipient of the 1998 Turing Award. This talk is a part of the Turing Awards monthly lecture series that happens at Persistent’s Dewang Mehta Auditorium.

This will be followed by a session on “Software Faults, Failures and Their Mitigation – contributions of Dr. Jim Gray to the area of Fault Tolerant Systems” by Prof. Kishore Trivedi, Duke University, USa.

The event is free for everyone to attend. Register here

About the Turing Awards

The Turing awards, named after Alan Turing, given every year, are the highest achievement that a computer scientist can earn. And the contributions of each Turing award winner are then, arguably, the most important topics in computer science.

About Turing 100 @ Persistent Lecture Series

This year, the Turing 100 @ Persistent lecture series will celebrate the 100th anniversary of Alan Turing’s birth by having a monthly lecture series. Each lecture will be presented by an eminent personality from the computer science / technology community in India, and will cover the work done by one Turing award winner.

The lecture series will feature talks on Ted Codd (Relational Databases), Vint Cerf and Robert Kahn (Internet), Ken Thompson and Dennis Ritchie (Unix), Jim Gray, Barbara Liskov, and others. Full schedule is here

This is a lecture series that any one in the field of computer science must attend. These lectures will cover the fundamentals of computer science, and all of them are very relevant today.

Fees and Registration

This is a free event. Anyone can attend.

The event will be at Dewang Mehta Auditorium, Persistent Systems, SB Road, from 2pm to 5pm on Saturday 5th January. This event is free and open for anybody to attend. Register here

Turing100 Event Report: Work of Butler Lampson – Systems

(This is a live-blog of Neeran Karnik‘s talk on Butler Lampson, as part of the Turing100 Lecture Series happening at Persistent. Since it is being typed during the talk, please forgive the typos and bad structuring of the article. This article is also very incomplete.)

Butler Lampson has contributions in a wide area of computer science fields. Here is the Turing Award Citation:

For contributions to the development of distributed, personal computing environments and the technology for their implementation: workstations, networks, operating systems, programming systems, displays, security and document publishing.

The number of different areas of computer science touched there is breathtaking.

Systems Built

Here is just a sampling of some of the work of Lampson that resulted in entire areas of computer hardware and software:

  • The first personal computer:
    • The first personal computer in the world, the Xerox Alto, was conceived in a 1972 memo written by Lampson.
    • Important contributions of the Alto:
      • First “personal” computer
      • First computer that used a desktop metaphor
      • First computer to use a mouse-driven graphical interface
    • Lampson later work on the follow-up workstation designs Dorado and Wildflower (research products) which later resulted in a successful commercial product (Star).
  • The Bravo Editor
    • Lampson designed the first WYSIWYG editor in the world in 1974. This shipped with the Xerox Alto. This work can ultimately be seen to have led to the development of Microsoft Word
  • The Xerox 9700 and Dover Laser Printers
    • The first laser printer was designed in 1969 at Xerox Parc and Lampson worked on the electronic design of it.
  • The SDS 940 Time-sharing system
    • The first general-purpose time-sharing system.

And those were just the systems he built.

What about more fundamental contributions to computer science? Here is a list:

  • The two-phase commit protocol.
    • This is the fundamental building block of all transactional processing in databases that are spread out across machines and/or geographies.
  • The CAL time-sharing system
  • Programming Languages
    • MESA and SPL: for systems programming. Modern threads developed from here
    • Euclid: first programming language to use verification
  • Security:
    • Access matrix model, unifying capabilities and ACLs
    • Theory of principals speaking for other principals
    • Microsoft Palladium
    • Scrubbing disk storage
    • Research on how economic factors affect security
  • Networking
    • Co-inventor of ethernet!

How is Systems Research different?

Butler Lampson was one of the few great computer scientists who spent a lot of in the laboratory with actual hardware, getting his hands dirty. This is not the kind of work normally associated with Turing award winners, but it is the kind of work that has really given us the actual hardware that we use in computer science today.

Some thoughts on why systems programming is different more difficult.

Designing and building large computing systems, or complex computing systems or both. Computers (from tablets to supercomputers), networks, storage and other hardware and OS, programming languages, and other infrastructure software.

Systems design is different from other parts of computers science (e.g. algorithm design) because it’s external interface is to the real world (and hence it is imprecise, subject to change, and generally underspecified), lots of moving parts (i.e. more internal structure and more internal interfaces), module-level design choices have wider implications on the end product, and measure of success is less clear than in other fields. There is no such thing as an optimal answer, so avoiding terrible designs is more important than finding the most optimal one.

Hints on Computer System Design

This is a paper written by Lampson giving hints on how to build good systems. He uses hints, because in systems work, there are no infallible rules. So there are just hints which guide your thinking. Werner Vogels, CTO of Amazon, who oversees some of the most complex and scalable computing infrastructure in the world is a fan of this work. He finds these hints very useful, and says that they are more important today because they’ve withstood the test of time

These hints talk about functionality (what you’re doing), speed (are you doing it quickly enough), and fault-tolerance (and will you keep doing it) In systems, the interface design is the most important part. Do this well and other stuff will follow.

  • Hints about Functionality/Interface:

    • Do one thing at a time, and do it well
      • KISS – Keep it Short Stupid
      • Don’t generalize – because generalizations are wrong
      • Don’t overload interfaces with too much work
      • “Perfection is reached not when there is no longer anything to add, but when there is no longer anything to take away” – Antoine de St. Exupery
    • An interface should capture the minimum essentials
      • Don’t promise more than you can deliver
    • Cost of an interface should be predictable
    • Better to have basic and fast functionality, rather than slow and complex
      • Clients who need speed, are happy with the basic+fast interface
      • Clients how need complexity, can build it on top of your basics
    • Don’t hide the power
      • If a lower layer abstraction is more powerful, a higher layer abstraction should not hide that power. It should expose the power. The abstraction should only conceal undesirable properties
    • Keep the basic interfaces stable; parts of the system that will be used by many different modules should not change
    • Treat the first system that you’re building as a prototype that you will throw away
    • Divide and conquer
      • Reduce the problem to smaller pieces that you can solve more easily
      • Bite of as much as you can handle right now, and then leave the rest for the next iteration.
  • Hints on Speed

    • Split resources in a fixed way, rather than a sharing way
      • Do not spend too much effort in trying to share resources
      • Overhead of multiplexing usually not worth it
      • Fixed allocation this makes things faster, and more predictable
    • Cache Answers to Expensive Computations
    • Dynamic Translation
      • Use different representations for use of some structure vs. implementation of the same, and then dynamically translate between them.
      • “All problems in computer science can be solved by adding a level of indirection”
    • Use hints
      • i.e. things that don’t have to be 100% accurate, but can improve performance if they are correct
      • For example, the routing tables in internet packet routing. They can be incorrect, out-of-date. In most cases, they work, and give great performance. Rarely, they don’t work, and you can recover.
    • When in doubt, use brute force
      • Hardware is cheap
      • Brute force allows for cheaper faster implementation
      • e.g. chess-playing computers that use brute force have defeated grandmasters, while complex algorithms trying to “understand” chess.
    • Compute in the background whenever possible
      • e.g. if a webpage update results in an email being sent, the email sending should happen in the background.
    • Use batch processing when possible
    • Safety First
      • Correctness is more important than speed
      • Being clever and optimal is over-rated
      • A general purpose system cannot really optimize, because it doesn’t really know the use cases
      • Hardware is cheap
  • Hints for Fault Tolerance
    • End-to-End Error Recovery
      • “Error detection and recover at the application level is necessary. Error detection and recovery at lower layers is not necessary, and should only be done for performance reasons.”
      • Saltzer et. all, classic 1984 paper
      • Example: if transferring file from one computer to another there are lots and lots of steps (file being transferred from storage to memory to network), and in lots of sub-tasks (transferring chunks of file)
    • Lot updates to record the truth about an object state
      • Current state of the object in memory is a hint
      • The true state of the object is what can be recovered from the storage and logs
    • Actions should be atomic or restartable
      • Either an action is all-or-nothing, or if there is a failure in between the action should be able to restart from where it left off

Lampson himself suggests that reading the whole hints paper at once might be tiresome, so it is better to read it in small doses at bedtime. And he points out that he himself has ignored most of these hints sometimes, but then has always regretted them.

Turing100 Lecture: Butler Lampson – Systems, Security, Verification and more – Nov 24

In 1992, Butler Lampson received the Turing award in for his contributions to the development of distributed, personal computing environments and the technology for their implementation: workstations, networks, operating systems, programming systems, displays, security and document publishing.

On Saturday, 24th November, Neeran Karnik, Senior Architect at BMC Software, will give a talk about Butler Lampson’s work. This talk is a part of the Turing Awards monthly lecture series that happens at Persistent’s Dewang Mehta Auditorium.

This will be followed by a session on [“Experience Sharing – Systems design and development Projects in India”]. The speakers include Dr. Basant Rajan, CEO of Coriolis Software (previously CTO of Symantec India), and Abhay Ghaisas, Product Development Architect BMC Software.

The event is free for everyone to attend. Register here

About the Turing Awards

The Turing awards, named after Alan Turing, given every year, are the highest achievement that a computer scientist can earn. And the contributions of each Turing award winner are then, arguably, the most important topics in computer science.

About Turing 100 @ Persistent Lecture Series

This year, the Turing 100 @ Persistent lecture series will celebrate the 100th anniversary of Alan Turing’s birth by having a monthly lecture series. Each lecture will be presented by an eminent personality from the computer science / technology community in India, and will cover the work done by one Turing award winner.

The lecture series will feature talks on Ted Codd (Relational Databases), Vint Cerf and Robert Kahn (Internet), Ken Thompson and Dennis Ritchie (Unix), Jim Gray, Barbara Liskov, and others. Full schedule is here

This is a lecture series that any one in the field of computer science must attend. These lectures will cover the fundamentals of computer science, and all of them are very relevant today.

Fees and Registration

This is a free event. Anyone can attend.

The event will be at Dewang Mehta Auditorium, Persistent Systems, SB Road, from 2pm to 5pm on Saturday 24th November. This event is free and open for anybody to attend. Register here

Turing100 Lecture: Robin Milner and Polymorphic Type Inference in Programming Langauges – Oct 6

Robin Milner received the Turing award in 1991 for three major contributions to computer science:

  • In the area of automated theorem proving – He developed LCF, the first theoretically sound yet practical tool for machine assisted proof construction
  • In the area of programming language design – He developed ML, the first language to use polymorphic type inference along with a type-safe execution handling mechanism, something that underlies some of the most interesting new programming languages that are being developed today, and
  • In the area of concurrency – He developed CCS, a general theory of concurrency.

On 6th October, Navin Kabra (yes, that’s me), will give a talk about Robin Milner’s work. This talk is a part of the Turing Awards lecture series that happens at Persistent’s Dewang Mehta Auditorium at 2pm on the first Saturday of every month this year.

This will be followed by a panel discussion on “Should every programmer learn functional programming”. The panelists include Dhananjay Nene, Chief Architect at Vayana Software, Prof. Raju Pandey, of University of California-Davis, Rustom Modi, who has been teaching functional programming at the University of Pune for over 15 years, and who is a founder if i-Magus which delivers training in functional programming and other related technologies, and Kedar Swadi, CTO at AlgoAnalytics, and others. For more details of the panel discussion see this article

The event is free for everyone to attend. Register here

Abstract of the Talk

In this talk, I will give a brief overview of Robin Milner’s career, following by a technical dive into his work. I will briefly cover his work on automated theorem proving and LCF, which served as the motivation for the development of ML, the programming language intended to be used for automated theorem proving. ML ended up having a huge impact on the design of modern programming languages and its influence can be seen in important modern languages like Microsoft’s F#, Haskell, and the JVM based Scala. The bulk of my talk will cover the design of ML, with a specific focus on the polymorphic type inference system used in ML. Type inference is an important aspect of a lot of modern programming languages, and can be found, for example, in Google’s Go Language, Perl6, Visual Basic 9.0 onwards, C# version 3.0 onwards.

About the Turing Awards

The Turing awards, named after Alan Turing, given every year, are the highest achievement that a computer scientist can earn. And the contributions of each Turing award winner are then, arguably, the most important topics in computer science.

About Turing 100 @ Persistent Lecture Series

This year, the Turing 100 @ Persistent lecture series will celebrate the 100th anniversary of Alan Turing’s birth by having a monthly lecture series. Each lecture will be presented by an eminent personality from the computer science / technology community in India, and will cover the work done by one Turing award winner.

The lecture series will feature talks on Ted Codd (Relational Databases), Vint Cerf and Robert Kahn (Internet), Ken Thompson and Dennis Ritchie (Unix), Jim Gray, Barbara Liskov, and others. Full schedule is here

This is a lecture series that any one in the field of computer science must attend. These lectures will cover the fundamentals of computer science, and all of them are very relevant today.

Fees and Registration

This is a free event. Anyone can attend.

The event will be at Dewang Mehta Auditorium, Persistent Systems, SB Road, from 2pm to 5pm on Saturday 6th October. This event is free and open for anybody to attend. Register here

The Net Neutrality Debate – A Supply/Demand Perspective – V. Sridhar, Sasken

(This is a liveblog of a lecture on Network Neutrality by V. Sridhar, a Fellow at Sasken. This talk was delivered as a part of the Turing100@Persistent Lecture Series in Pune. Since it is being typed as the event is happening, it is not really well structured, but should rather be viewed as a collection of bullet points of interesting things said during the talk. For more information about Dr. Sridhar, see his website)

The Problem of Net Neutrality

The principle of “Net Neutrality” states that all traffic on the internet should be treated equally. Thus, the principle states that network service providers (i.e. the telecom companies) should not be allowed to discriminate (i.e. limit or disallow) on network connections and speeds based on the type of traffic. Thus, for example, under net neutrality, a telecom should not be allowed to disallow BitTorrent Downloads, or limit bandwidth for Skype or Video streaming, or provide higher speeds and better quality of service guarantees for just traffic generated by iPhones or US-based companies.

Telecom companies are trying to introduce systems by which different levels of service are provided for different types of traffic, because, they argue that network neutrality is not economically viable.

The Demand for Network Services

  • Mobile broadband and 3G traffic is increasing exponentially
    • Even in India! In the last 7 months there has been 78% growth in 3G traffic, and 47% growth in 2G. India loves mobile broadband
    • Users are getting hooked to 3G. An average 3G user consumes 4 times more data than a 2G user. 3G is an acceptable alternative to wired broadband
    • Mobile data is growing fastest in smaller towns and villages (category B & C circles)
  • Video, voice, and streaming data are taking up huge chunks of bandwidth

NetHeads vs BellHeads

There are two major approaches to the network: the traditional telephone providers who come from a circuit switched Telephone background (the BellHeads), and the people who come from the packet-switched internet protocol background (the NetHeads). The BellHeads believe that the network is smart, endpoints are dumb; they believe in closed, proprietary networks; they expect payment for each service; often with per-minute charges; they want to control the evolution of the network and to control everything about the network. They want strong regulations. The NetHeads philosophy is that network is dumb, and endpoints are smart. So users should take all the decisions; they believe in an open community; and they expect cheap or free services, with no per-minute charges; they want the network to evolve organically without regulations.

To a large extent, the NetHeads are for net neutrality and the BellHeads are in favor of abolishing net neutrality in favor of carefully controlled tiered traffic.

The Supply Side

Land-line penetration is decreasing. On the other hand, mobile penetration continues to increase and is showing no signs of saturation. Fixed-line is losing its relevance, especially in case of emerging countries in India. Which means that increasing chunk of the internet bandwidth is going to be consumed by mobile devices.

LTE (the Long Term Evolution) mobile network is the fastest growing network ever. 300+ different operators all over the world are investing in LTE. This will come to India soon.

Mobile technologies are improving, and individual devices will soon be capable of handling 1Gbps data connections. This means that the capacity of the core network will have to go up to provide the speeds that the device is capable of consuming. And the NetHeads are making good progress and being able to provide high capacities for the core networks.

The problem is that the mobile spectrum is a scarce resource, and will soon become the bottleneck. The other problem is that chunks of the spectrum have to be exclusively allocated to individual operators. And then that operator has to operate just within that chunk.

The Problem of the Commons

When people have shared, unlimited access to a common resource, then each will consume the resource without recognizing that this results in costs for everyone else. When the total amount that everybody would like to consume goes above what is totally available, everybody suffers. This is a problem which will affect the mobile spectrum. The spectrum gets congested, and bandwidth suffers.

How to solve the congestion problem?

  • Congestion pricing. For example, cheaper access after 9pm is an instance of congestion pricing – an attempt to convince some of the users to consume resources when they’re less congested.
  • During periods of congestion, bandwidth is scarce and hence should have high prices. On the other hand, when the network is not congested, then the additional cost of supporting an additional user’s downloads is minimal, hence the user should be given free or very cheap access.

The Net Neutrality Debate

Net neutrality believes that the maximum good of maximum people will happen if networks service providers do not discriminate amongst their customers.

No discrimination means:

  • No blocking of content based on its source, ownership or destination
  • No slowing down or speeding up of content based on source, ownership or destination

Examples of discrimination:

  • In 2005, Madison River Communications (an ISP) blocked all Vonage VoIP phone traffic
  • In 2007, Comcast in the US, restricted some P2P applications (like BitTorrent)
  • In 2009, AT&T put restrictions on what iPhone apps can run on its network
    • Disallowed SlingPlayer (IP based video broadcast) over it’s 3G network
    • Skype was not allowed to run over AT&T’s 3G network

The case for net neutrality:

  • Innovation: Operators/ISPs can kill innovative and disruptive apps if they’re allowed to discriminate
  • Competition: Operators/ISPs can kill competition by selectively disallowing certain applications. For example, if AT&T slows down Google Search, but speeds up Bing Search, this can cause Google Search to die.
  • Consumers: Operators/ISPs will have a strong grip on the consumers and other players will not get easy access to them. This will hurt the consumers in the long run.

The case against net neutrality:

  • Capacity is finite. Especially in the case of mobile broadband (because the spectrum is limited)
  • If there is no prioritization, a few apps will consume too much bandwidth and hurt everybody; and also it reduces the service provider’s motivation to increase bandwidth
  • Prioritization, and higher pricing for specific apps can be used to pay for new innovations in future network capacity increases

Broadband is a two-sided market:

  • Apps and Broadband is a two-sided market.
    • Both, applications and bandwidth are needed by consumers
    • Without applications, users will not consume the bandwidth, because they have nothing interesting to do
    • Without bandwidth, users will not use applications, because they’ll be too slow
    • Hence both have to be promoted simultaneously
  • How should a two-sided market be handled?
    • Usually, one side should to be subsidized so it can grow and help the other grow
    • e.g. Somebody needs to break this cycle and grow one side of this market, so that the other can then grow
    • For example, Google (an app/content provider) is buying fiber and providing 1Gbps connection in Kansas for $70 per month. Thus Google is subsidizing the bandwidth increase, and hopes that the users and apps will increase in proportion.
  • Regulatory and Policy implications
    • Two ways to handle this:
      • Ex Ante: come up with regulations and policies before problems occur
        • Because lawsuits are expensive
        • US is trying to do this – they have exempted mobile providers from net neutrality principles
        • Netherlands has passed net neutrality regulations – first country in the world. Mobile operators are not allowed to disallow or discriminate against services like Skype
        • Rest of Europe: public consultations going on
      • Ex Post: Let the problems occur and then figure out how to deal with them
  • Net Neutrality and India
    • No mention of net neutrality in the NTP (National Telecom Policy 2012)
    • Fair Usage Policy (FUP)
      • Is against net neutrality (maybe)
      • It discriminates against users, but does not discriminate against applications
      • But it is indirect discrimination against applications – because users who use BitTorrent and other bandwidth heavy applications will be more affected by FUP
      • Affects innovation – because users are discouraged from using innovative, bandwidth heavy applications

Event Report: The Work and Impact of Bob Kahn and Vint Cerf

(This is a liveblog of the Turing100@Persistent Lecture on Bob Kahn and Vint Cerf by R. Venkateswaran, CTO of Persistent Systems. Since it is being typed as the event is happening, it is not really well structured, but should rather be viewed as a collection of bullet points of interesting things said during the talk.)

Vint Cerf and Bob Kahn

Vint Cerf: Widely known as the father of the internet. He is President of the ACM, Chief Internet Evangelist at Google, Chairman of the ICANN and many other influential positions. In addition to the Turing Award, he has also received the Presidential Medal of Freedom in 2005 and was elected to the Internet Hall of Fame in 2012.

Bob Kahn: Worked at AT&T Bell Labs, MIT, then while working with BBN, he got involved with the DARPA and Vint Cerf and they together worked on packet switching networks, and invented the IP and TCP.

The birth of the internet: TCP and IP. 70s and 80s.

  • The Internet:

    • The first 20 years:
      • Trusted network
      • Defense, Research and Academic network
      • Non-commercial
      • Popular apps: email, ftp, telnet
    • Next 20 years:
      • Commercial use
      • Multiple levels of ownership – increased distrust and security concerns
      • Wide range of apps: email, WWW, etc
  • What did Vint Cerf and Bob Kahn do?

    • The problem:
      • There were many packet switched networks at that time
      • But very small, limited and self contained
      • The different networks did not talk to each other
      • Vint Cerf and Bob Kahn worked on interconnecting these networks
    • The approach

      • Wanted a very simple, and reliable interface
      • Non-proprietary solution. Standardized, non-patented, “open”
      • Each network talked its own protocol, so they wanted a protocol neutral mechanism of connecting the networks.
      • Each network had its own addressing scheme, so they had to invent a universal addressing scheme.
      • Packets (information slices) forwarded from one host to another via the “internetwork”
      • Packets sent along different routes, no guarantees of in-order delivery. Actually no guarantee of delivery
      • Packets have sequence numbers, so end point needs to reassemble them in order
    • The protocol

      • A “process header” identifies which process on the end host should be delivered the packets. This is today called the “port”
      • Retransmissions to ensure reliable delivery. And duplicate detection.
      • Flow control – to limit number of un-acknowledged packets, prevent bandwidth hogging
      • A conceptual “connection” created between the end processes (TCP), but the actual network (IP) does not know or understand this
      • Mechanism to set up and tear down the “connection” – the three-way handshake
      • This are the main contributions of their seminal paper
    • The Layered Network Architecture
      • Paper in 1974 defining a 4 layered network model based on TCP/IP.
      • This later became the basis of the 7 layer network architecture
    • The Internet Protocol
    • Packet-switched datagram network
    • Is the glue between the physical network and the logical higher layers
    • Key ideas:
      • Network is very simple
      • Just route the packets
      • Robust and scalable
      • Network does not guarantee any thing other than best effort
        • No SLA, no guarantee of delivery, not guarantee of packet ordering
      • Dumb network, smart end-host
      • Very different from the existing, major networks of that time (the “circuit-switched” telephone networks of that time)
      • No state maintained at any node of the network
    • Advantages
      • Can accommodate many different types of protocols and technologies
      • Very scalable
    • The Transport Layer
    • UDP
      • Most simplistic higher level protocol
      • Unreliable, datagram-based protocol
      • Detect errors, but no error corrections
      • No reliability guarantees
      • Great for applications like audio/video (which are not too affected by packet losses) or DNS (short transactions)
    • TCP
      • Reliable service on top of the unreliable underlying network
      • Connection oriented, ordered-stream based, with congestion and flow control, bi-directional
      • State only maintained at the end hosts, not at the intermediate hosts

Internet 2.0 – Commercialization

  • The birth of the world wide web: late 80s early 90s
    • Tim Berners-Lee came up with the idea of the the world-wide-web
    • 1993: Mosaic, the first graphical web browser
    • First Commercial ISP (Internet Service Provider) – Dial up internet
    • Bandwidth doubling every 6 months
    • Push for multi-media apps
  • Push for higher bandwidth and rich apps
    • Net apps (like VoIP, streaming video) demand higher bandwidth
    • Higher bandwidth enables other new applications
    • Apps: email, email with attachments, streaming video, intranets, e-commerce, ERP, Voice over Internet, Interactive Video Conferencing
  • Dumb Network no longer works
    • Single, dumb network cannot handle all these different applications
    • Next Generation Networks evolved
    • Single, packet-switched network for data, voice and video
    • But with different levels of QoS guarantees for different services
  • Clash of Network Philosophies: BellHeads vs NetHeads (mid-90s)
    • Two major approaches: the BellHeads (circuit switched Telephone background), and the NetHeads (from the IP background)
    • BellHeads philosophy: network is smart, endpoints are dumb; closed, proprietary communities; expect payment for service; per-minute charges; Control the evolution of the network; want strong regulations
    • NetHeads philosophy: network is dumb, endpoints are smart; open community; expect cheap or free services; no per-minute charges; want network to evolve organically without regulations.
    • These two worlds were merging, and there was lots of clashes
    • BellHead network example: Asynchronous Transfer Mode (ATM) network
      • Fixed sized packets over a connection oriented network
      • Circuit setup from source to destination; all packets use same route
      • Low per-packet processing at each intermediate node
      • Much higher speeds than TCP/IP (10Gbps)
      • A major challenge for the NetHeads
    • Problems for NetHeads
      • To support 10Gbps and above, each packet needs to be processed in less than 30ns, which is very difficult to do because of all the processing needed (reduce TTL, lookup destination address, manipulate headers, etc)
      • As sizes of networks increased, sizes of lookup tables increased
      • Almost read to concede defeat
    • IP Switching: Breakthrough for NetHeads
      • Use IP routing on top of ATM hardware
      • Switch to ATM circuit switching (and bypass the routing layer) if a long-running connection detected.
      • Late 90s, all IP networking companies started implementing variations on this concept
    • MPLS: Multi-Protocol Lable Switching
      • Standard developed by IP networking companies
      • Insert a layer between TCP and IP (considered layer 2.5)
      • Separates packet forwarding from packet routing
      • Edges of the network do the full IP routing
      • Internal nodes only forward packets, and don’t do full routes
      • Separate forwarding information from routing information, and put forwarding info in an extra header (MPLS label – layer 2.5)
      • MPLS Protocol (mid-97)
        • First node (edge; ingress LSR) determines path, inserts MPLS label header
        • Internal nodes only look at MPLS label, and forwards appropriately, without doing any routing and without looking at IP packet
        • Last node (edge; egress LSR) removes the MPLS label
        • Label switching at intermediate nodes can be implemented in hardware; significant reduction in total latency
      • MPLS is now basis of most internet networking

Internet 3.0: The Future

End of the network centric viewpoint. (Note: These are futuristic predictions, not facts. But, for students, there should be lots of good project topics here.)

  • Problems with today’s internet
    • Support for mobility is pretty bad with TCP/IP.
    • Security: viruses, spams, bots, DDOS attacks, hacks
      • Internet was designed for co-operative use; not ideal for today’s climate
    • Multi-homing not well supported by TCP/IP
      • Change in IP address results in service disruption
      • What if you change your ISP, your machine, etc?
      • Cannot be done seamlessly
    • Network is very machine/ip centric (“Where”)
      • What is needed are People-centric networks (“Who”) and content centric (“What”)
      • IP address ties together identity and location; this is neither necessary, nor desirable
  • Three areas of future research:
    • Delay Tolerant Network (DTN) Architecture
      • Whenever end-to-end delay is more than a few 100 milliseconds, various things start breaking in today’s networks
      • DTN’s characterized by:
        • Things that are not always connected to the network. For example, sensor networks, gadgets, remote locations. Another Example: remote villages in Africa have a bus visiting them periodically, and that gives them internet access for a limited time every day.
        • Extremely Long Delays
        • Asymmetric Data Rates
        • High Error Rates
      • Needs a store-and-forward network
    • Content-centric Networks
      • Instead of everything being based on IP-address, how about giving unique identifiers to chunks of content, and define a networking protocol based on this
      • Strategy: let the network figure out where the content is and how to deliver it
      • Security: the content carries the authorization info, and unauthorized access is prevented
    • Software Defined Networks
      • Virtualizing the Network
      • Search the net for: “OpenFlow”
      • Hardware Router only does packet forwarding, but end applications can update the routing tables of the router using the OpenFlow protocol. App has a OpenFlow controller that sends updates to the OpenFlow agent on the Hardware Router.
      • In the hardware/OS world, virtualization (VMWare, Xen, VirtualBox) are slowly taking over; OpenFlow is a similar idea for network hardware
      • Oracle, VMWare have had major acquisitions in this space recently

Turing100 Lecture: Vint Cerf + Bob Kahn – “Fathers of the Internet” – 8 Sept

Vinton Cerf and Robert Kahn invented TCP and IP, the two protocols at the heart of the internet, and are hence considered the “Fathers of the Internet”. For this and other fundamental contributions, they were awarded the Turing award in 2004.

On 8th September, R. Venkateswaran, CTO of Persistent Systems, will give a talk on the life and work of Vint Cerf and Bob Kahn as a part of the Turing 100 Lecture Series organized by Persistent in Pune on the first Saturday of every month (although this month it was shifted to the second Saturday).

In addition, this Saturday’s event will also feature a talk on “Net Neutrality: The Supply and Demand Side Perspective” by Dr. V Sridhar, a research fellow with Sasken.

About the Turing Awards

The Turing awards, named after Alan Turing, given every year, are the highest achievement that a computer scientist can earn. And the contributions of each Turing award winner are then, arguably, the most important topics in computer science.

About Turing 100 @ Persistent Lecture Series

This year, the Turing 100 @ Persistent Lecture Series will celebrate the 100th anniversary of Alan Turing’s birth by having a monthly lecture series. Each lecture will be presented by an eminent personality from the computer science / technology community in India, and will cover the work done by one Turing award winner.

The lecture series will feature talks on Ted Codd (Relational Databases), Vint Cerf and Robert Kahn (Internet), Ken Thompson and Dennis Ritchie (Unix), Jim Gray, Barbara Liskov, and others. Full schedule is here

This is a lecture series that any one in the field of computer science must attend. These lectures will cover the fundamentals of computer science, and all of them are very relevant today.

Fees and Registration

This is a free event. Anyone can attend.

The event will be at Dewang Mehta Auditorium, Persistent Systems, SB Road, from 2pm to 5pm on Saturday 8th September. This event is free and open for anybody to attend. Register here

Event Report: Sham Navathe on E.F. Codd

(This is a liveblog of Sham Navathe’s lecture on E.F. Codd as part of the Turing 100 @ Persistent lecture series.)

Sham Navathe does not really need an introduction – since he is famous for his book “Fundamentals of Database Systems” written by with Ramez Elmasri, which is prescribed for undergraduate database courses all over the world. His full background can be looked up in his Wikipedia page, but it is worth mentioning that Navathe is a Punekar, being a board topper from Pune in his HSC. He regularly visits Pune since he has family here.

Today’s lecture is about E.F. Codd, a British mathematicians, logician, and analyst, who was given the Turing award in 1981 for his invention of the relational databases model. He is one of the 3 people to have received a Turing award for work in databases (the other two being Charlie Bachman in 1973 for introducing the concept of data structures and the network database model, and Jim Gray in 1998 for his work on database transaction processing research and technical leadership in system implementation.)

Ted Codd studied in Oxford, initially studying Chemistry, before doing a stint with the Royal Air Force and then getting degree in Maths. He later emigrated to US, worked in IBM, did a PhD from University of Michigan, and finally went back to IBM. At that time, he led the development of the world’s first “multiprogramming system” – sort of an operating system.

Codd quit IBM in 1984 because he was not happy with the “incomplete implementation of the relational model.” He believed that SQL is a “convenient” and “informal” representation of the relational model. He published rules that any system must follow before it could be called a relational database management system, and complained that most commercial systems were not really relational in that sense – and some were simply thin pseudo-relational layer on top of older technology.

Invention of the Relational Model

In 1963-64, IBM developed the IMS database management system based on the hierarchical model. In 1964-65 Honeywell developed IDS, based on the network model. In 1968, Dave Childs of Michigan first proposed a set-oriented database management system. In 1969 Codd published “The derivability, redundancy, and consistency of relations stored in large databases” (IBM research report, RJ-599, 1969). This was the work that led to the seminal paper, “A Relational Model for Large Shared Data Banks” (CACM, 13:6, June 1970). Other classic papers are: “Extending the Relational Model to capture more meaning” (ACM TODS, 4:4, Dec 1979), which is called the RM/T model. He is also the inventor of the term OLAP (Online Analytical Processing).

After Codd’s proposal of the relational model, IBM was initially reluctant to commercialize the idea. Instead, Michael Stonebraker of UC-Berkeley along with PhD students created INGRES, the first fully relational system. INGRES ultimately became Postres database which is one of the leading open source databases in the world today. In the meantime, Relational Software Inc. brought another relational database product to the market. This ultimately became Oracle. After this, IBM heavily invested in System R that developed the relational DBMS ideas fully. Codd was involved in the development of System R – and most of the fundamental ideas and algorithms underlying most modern RDBMS today are heavily influenced by System R.

Interesting RDBMS developments after Codd’s paper:

  • 1975: PhD students in Berkeley develop an RDBMS
  • 1976: System R first relational prototype system from IBM
  • 1979: First proposal for a cost based optimizer for database queries
  • 1981: Transactions (by Jim Gray)
  • 1981: SQL/DS First commercial RDBMS

Two main motivations for the relational model:

    • Ordering dependence
    • Indexing dependence
    • Access path dependence

    In DBMS before RDBMS, there was a heavy dependence of the program (and programmer) on the way the data is modeled, stored and navigated:

    All of this was hardcoded in the program. And Codd wanted to simplify database programming by removing these dependencies.
    – Loss of programmer productivity due to manual optimization.

Codd’s fundamental insight was that freeing up the application programmer from knowing about the layout of the data on disk would lead to huge improvements in productivity. For example, in the network or hierarchical models, a data model in which a Student has a link to the Department that he is enrolled in, is very different from a model in which each Department links to all the students that are enrolled there. Depending upon which model is used, all application programs would be different, and switching from one model to another would be difficult later on. Instead, Codd proposed the relational model which would store this as the Student relation, the Department relation, and finally the Enrolment relation that connects Student and Department.

The Relational Data Model

A relation is simply an unordered set of tuples. On these relations, the following operations are permitted:

  • Permutation
  • Projection
  • Join
  • Composition

Of course, other operators from set theory can be applied to relations, but then the result will not be a relation. However, the operations given above take relations and the results are also relations. Thus, all the relational operators can again be applied to the results of this operation.

He defined 3 types of relations:

  • Expressible: is any relation that can be expressed in the data model, using the data access language
  • Named: is any relation that has been given a specific name (i.e. is listed in the schema)
  • Stored: is a relation that is physically stored on disk

He also talked about 3 important properties of relations:

  • Derivability: A relation is derivable if it can be expressed in terms of the data access language (i.e. can be expressed as a sequence of relational operations)
  • Redundancy: A set of relations is called strongly redundant if one of the relations can be derived from the other relations. i.e. it is possible to write a relational operation on some of the relations of the set whose result is the same as one of the other relations. A set of relations is weakly redundant if there is a relation in that set which has a projection that is derivable from the other relations. Good database design entails that strongly redundant sets of relations should not be used because of problems with inconsistency. However, weakly redundant relations are OK, and used for performance purposes. (Materialized views.)
  • Consistency / Inconsistency: Codd allowed the definition of constraints governing the data in a set of relations, and a database is said to be consistent if all the data in the database satisfies those constraints, and is said to be inconsistent if not.

In the years that followed, a bunch of IBM research reports on normalization of databases followed.

Turing Award Lecture

His talk is titled: “Relational Databases: A Practical Foundation for Productivity”. His thoughts at that time:

  • Put users in direct touch with databases
  • Increase productivity of DP professionals in developing applications
  • Concerned that the term “relational” was being misued

He points out that in relational data model, data can be addressed symbolically, as “relation name, primary key value, attribute name”. This is much better than embedding links, or positional addressing (X(i, j)).

The relational data model encompasses structure, manipulation and integrity of data. Hence, it is a complete model, because all 3 aspects are important for data management.

Characteristics of relational systems:

  • MUST have a data sub-language that allows users to query the data using SELECT, PROJECT and JOIN operators
  • MUST NOT have user visible navigation links between relations
  • MUST NOT convey any information in the way tuples are ordered

He was worried that relational system might not be able to give performance as good as the performance of non-relational systems. He talked about:

  • performance oriented data structures
  • efficient algorithms for converting user requests into optimal code

In future work, he mentioned the following

  1. Domains and primary keys
  2. Updating join-type views
  3. Outer-joins
  4. Catalogs
  5. Design aids at logical and physical level
  6. Location and replication transparency in distributed databases
  7. Capture meaning of data
  8. Improved treatment of missing, null and inapplicable values
  9. Heterogeneous data

This was a remarkably prescient list. In the 30 years since this talk, most of this has actually happened either in commercial databases or in research labs. We have pretty much achieved #1 to #6, while #7 to #9 have seen a lot of research work but not wide commercial acceptance yet.

Concluding Thoughts

  • Relational model is a firm foundation for data management. Nothing else compares.
  • On this foundation we were able to tackle difficult problems in the areas of design, derivability, redundancy, consistency, replication as well as language issues. All of these would have been very difficult otherwise
  • Proponents of NoSQL databases as well as map-reduce/hadoop type of systems need to keep in mind that large data management cannot really be done in an ad hoc manner.
  • Codd’s RM/T model was an attempt to capture metadata management, but fell short of what was needed.

Audience Questions

Q: Why did object-oriented databases not catch on?

A: There was a lack of understanding amongst the wider community as to the best way of using object-oriented ideas for data management. OODBMS companies were not really able to really educate the wider community, and hence failed. Another problem is that object-oriented DBMS systems made the data model complex but there were not corresponding developments in the query language and optimization tools.

Q: When the relational model was developed, did they constrain themselves due to the hardware limitations of those days?

A: Codd did mention that when deciding on a set of operations for the relational model, one consideration was ‘Which of these operations can be implemented on today’s hardware’. On the other hand, there were lots of companies in the 80s which tried to implement specialized hardware for relational/DBMS. However, none of those companies really succeeded.

In the future, it is very unlikely that anyone will develop a new data model with improvements in hardware and processing power. However, new operators and new ways of parallelizing them will certainly be developed.

Q: There are areas of data management like incomplete, inexact data; common sense understanding of data; deduction and inferencing capabilities. These are not really handled by today’s DBMS systems. How will this be handled in the future.

A: There have been many interesting and elegant systems proposed for these areas, but non have seen commercial success. So we’ll have to wait a while for any of these to happen.

Will be updated every 15 minutes. Please refresh regularly.

Lecture on Turing Award Winner Ted Codd (Databases) by Sham Navathe – 4 Aug

Ted Codd was awarded the Turing Award in 1981 for “his fundamental and continuing contributions to the theory and practice of database management systems.” A simpler way to put it would be that Codd was given the award for inventing relational databases (RDBMS).

On 4th August, Prof. Sham Navathe, of Georgia Tech University, who is visiting Pune, will talk about Ted Codd’s work. This talk is a part of the Turing Awards lecture series that happens at Persistent’s Dewang Mehta Auditorium at 2pm on the first Saturday of every month this year.

About the Turing Awards

The Turing awards, named after Alan Turing, given every year, are the highest achievement that a computer scientist can earn. And the contributions of each Turing award winner are then, arguably, the most important topics in computer science.

About Turing 100 @ Persistent Lecture Series

This year, the Turing 100 @ Persistent lecture series will celebrate the 100th anniversary of Alan Turing’s birth by having a monthly lecture series. Each lecture will be presented by an eminent personality from the computer science / technology community in India, and will cover the work done by one Turing award winner.

The lecture series will feature talks on Ted Codd (Relational Databases), Vint Cerf and Robert Kahn (Internet), Ken Thompson and Dennis Ritchie (Unix), Jim Gray, Barbara Liskov, and others. Full schedule is here

This is a lecture series that any one in the field of computer science must attend. These lectures will cover the fundamentals of computer science, and all of them are very relevant today.

Fees and Registration

This is a free event. Anyone can attend.

The event will be at Dewang Mehta Auditorium, Persistent Systems, SB Road, at from 2pm to 5pm on Saturday 4th August. This event is free and open for anybody to attend. Register here

Event: Turing’s Theory of Computing – Turing 100 @ Persistent – July 7

The Turing awards, named after Alan Turing, given every year, are the highest achievement that a computer scientist can earn. And the contributions of each Turing award winner are then, arguably, the most important topics in computer science. This year, the Turing 100 @ Persistent lecture series will celebrate the 100th anniversary of Alan Turing’s birth by having a monthly lecture series. Each lecture will be presented by an eminent personality from the computer science / technology community in India, and will cover the work done by one Turing award winner.

The lecture series will feature talks on Ted Codd (Relational Databases), Vint Cerf and Robert Kahn (Internet), Ken Thompson and Dennis Ritchie (Unix), Jim Gray, Barbara Liskov, and others. Full schedule is here

This is a lecture series that any one in the field of computer science must attend. These lectures will cover the fundamentals of computer science, and all of them are very relevant today.

This lecture series kicks off this Saturday with a talk on the work of Turing himself – Turing’s Theory of Computing, by Vivek Kulkarni, Principal Architect at Persistent Systems. The full schedule of the event is:

  • Welcome Address: Dr. Anand Deshpande, CEO Persistent Systems
  • Keynote: Dr. Mathai Joseph, Advisor TCS
  • Media Presentation: Life of Alan Turing
  • Turing’s Theory of Computation: Vivek Kulkarni, Principal Architect Persistent Systems

The event will be at Dewang Mehta Auditorium, Persistent Systems, SB Road, at from 2pm to 5pm on Saturday 7th July. This event is free and open for anybody to attend. Register here