ADT vs OOP

The general concept of data abstraction refers to any mechanism for hiding the implementation details of data.

The concept of data abstraction has existed long before the term data abstraction came into existence. In mathematics, there is a long history of abstract representations for data.

As a simple example, consider the representation of integer sets. Two standard approaches to describe sets abstractly are as an ‘algebra’ or as a ‘characteristic function’.

- An algebra has a collection of abstract values, and operations to manipulate the values.
- The characteristic function for a collection maps a domain of values to a boolean value, which indicates whether or not the value is included in the set.

These two traditions in mathematics correspond closely to the two forms of data abstraction in programming: algebras relate to ADT, while characteristic functions are a form of Object.

An abstract data type (ADT) has a public name, a hidden representation, and operations to create, combine, and observe values of the abstraction. The familiar built-in types in most languages, for example the integer and boolean data types in Algol, Pascal, ML, Java and Haskell, are abstract data types. While objects and ADTs are fundamentally different, they are both forms of data abstraction.

An essential observation is that ‘object interfaces’ do not use ‘type abstraction’: there is no type whose name is known but representation is hidden. Instead, ‘objects’ use ‘procedural abstraction’ to hide behavior. This difference has significant consequences for use of the two forms of data abstraction.

‘Object interfaces’ are essentially higher-order types, in the same sense that passing functions as values is higher-order. Any time an ‘object’ is passed as a value, or returned as a value, the object-oriented program is passing functions as values and returning functions as values. The fact that the functions are collected into records and called ‘methods’ is irrelevant. As a result, the typical object-oriented program makes far more use of higher-order values than many functional programs.

- ADT have a private, protected representation type that prohibits tampering or extension.
- Objects have behavioral interfaces which allow definition of new implementations at any time.

One problem with ‘object interfaces’ is that efficiency considerations often allow implementation issues to influence the design of interfaces. Adding ‘public methods’ that inspect the hidden representation can significantly improve efficiency. But it also restricts the flexibility and extensibility of the resulting interface. On the other hand OOP is designed to be as flexible as possible (And it is almost as if it were designed to be as difficult to verify as possible).

An object is a value exporting a procedural interface to data or behavior. Objects use procedural abstraction for informa- tion hiding, not type abstraction. Object and and their types are often recursive. Objects provide a simple and powerful form of data abstraction. They can be understood as clo- sures, first-class modules, records of functions, or processes. Objects can also be used for procedural abstraction. Unlike abstract data types, many people find objects to be deeply disturbing. They are fundamentally higher-order, unlike abstract data types. With an object, you are never quite certain what it is going to do: What method is being called? What kind of object is it really? On the other hand, many people find objects to be deeply appealing in their simplicity and flexibility. They do not require complex type systems. Inheritance allows recursive values to be extended in powerful ways. The fact that objects are autognostic, so that they can only know themselves, is also confusing. On the one hand, it interferes with desirable optimizations that require inspection of multiple representations. One solution is to expose representational details in the object’s interface, which limits flexibility. The benefits of autognosis are often subtle and only realized as a system grows and evolves.

One of the most significant differences between abstract data types and objects is that objects can be used to define data abstractions in a dynamically typed language. Objects do not depend upon a static type system; all they need is some form of first-class functions or processes. Abstract data types depend upon a static type system to enforce type abstraction. It is not an accident that dynamic languages use objects instead of user-defined abstract data types. Dynamic languages typically support built-in abstract data types for primitive types; the type abstraction here is enforced by the runtime system. Type systems only enforce structural properties of programs; they do not ensure conformance to a specification. But with ADTs, the type system can ensure that if the ADT implementation is correct, then all programs based on it will operate correctly. The type system prevents outside clients from tampering with the implementation. Pure object interfaces allow any structurally compatible implementation, thus the type system does not prohibit bad implementations from being used.

In the 1970s, as work began on understanding data abstraction, Reynolds published a prophetic paper that identified the key differences between objects and abstract data types, although he did not realize he was describing objects. Reynolds noticed that abstract data types facilitate adding new operations, while “procedural data values” (objects) facilitate adding new representations. Since then, this duality has been independently discovered at least three times. This duality has practical implications for programming. Abstract data types define operations that collect together the behaviors for a given action. Objects organize the matrix the other way, collecting together all the actions associated with a given representation. It is easier to add new operations in an ADT, and new representations using objects. Object-oriented programs can use inheritance to add new operations.

Wadler later gave the problem a catchy name, the “Expression Problem”, based on the well-known canonical example of a data abstraction for expressions with operations to print, evaluate, or perform other actions. The extensibility problem has been solved in numerous ways, and it still inspires new work on extensibility of data abstractions. Multimethods are another approach to this problem. More complex variations, involving integration of independent extensions, have still not been completely resolved.

Modern object-oriented languages support a mixture of object-oriented and ADT functionality, allowing programmers to choose ADT style for specific situations. In modern object-oriented languages, the issue boils down to whether or not classes are used as types. In a pure object-oriented style, classes are only used to construct objects, and interfaces are used for types. When classes are used as types, the programmer is implicitly choosing to use a form of abstract data type. The decision affects how easy it is for the program to be extended and maintained over time, and also how easy it is to optimize complex operations. Understanding the fundamental differences between objects and ADTs can help in choosing to use them wisely.