Datatypes - deftype, defrecord and reify


Motivation


Clojure is written in terms of abstractions. There are abstractions for sequences, collections, callability, etc. In addition, Clojure supplies many implementations of these abstractions. The abstractions are specified by host interfaces, and the implementations by host classes. While this was sufficient for bootstrapping the language, it left Clojure without similar abstraction and low-level implementation facilities. The protocols and datatypes features add powerful and flexible mechanisms for abstraction and data structure definition with no compromises vs the facilities of the host platform.

Basics


The datatype features - deftype , defrecord and reify , provide the mechanism for defining implementations of abstractions, and in the case of reify, instances of those implementations. The abstractions themselves are defined by either protocols or interfaces. A datatype provides a host type, (named in the case of deftype and defrecord, anonymous in the case of reify), with some structure (explicit fields in the case of deftype and defrecord, implicit closure in the case of reify), and optional in-type implementations of abstraction methods. They support, in a relatively clean manner, access to the highest-performance primitive representation and polymorphism mechanisms of the host. N.B. that they are not merely host-in-parens constructs. They support only a circumscribed subset of the host facilities, often with more dynamism than the host itself. The intent is that, unless interop forces one to go beyond their circumscribed scope, one need not leave Clojure to get the highest-performing data structures possible on the platform.

deftype and defrecord


deftype and defrecord dynamically generate compiled bytecode for a named class with a set of given fields, and, optionally, methods for one or more protocols and/or interfaces. They are suitable for dynamic and interactive development, need not be AOT compiled, and can be re-evaluated in the course of a single session. They are similar to defstruct in generating data structures with named fields, but differ from defstruct in that:

  • They generate a unique class, with fields corresponding to the given names.
  • the resulting class has a proper type, unlike conventions for encoding type for structs in metadata
  • because they generate a named class, it has an accessible constructor
  • fields can have type hints, and can be primitive
    • note that currently a type hint of a non-primitive type will not be used to constrain the field type nor the constructor arg, but will be used to optimize its use in the class methods
    • constraining the field type and constructor arg is planned
  • a deftype/defrecord can implement one or more protocols and/or interfaces
  • deftype/defrecord can be written with a special reader syntax #my.thing[1 2 3] where:
    • each element in the vector form is passed to the deftype/defrecord's constructor un-evaluated
    • the deftype/defrecord name must be fully qualified
    • only available in Clojure versions later than 1.3
  • when a deftype/defrecord Foo is defined a corresponding function ->Foo is defined that passes its arguments to the constructor (versions 1.3 and later only)

deftype and defrecord differ in the following ways:

  • deftype provides no functionality not specified by the user, other than a constructor
  • defrecord provides a complete implementation of a persistent map, including:
    • value-based equality and hashCode
    • metadata support
    • associative support
    • keyword accessors for fields
    • extensible fields (you can assoc keys not supplied with the defrecord definition)
    • etc
  • deftype supports mutable fields, defrecord does not
  • defrecord support an additional reader form of #my.record{:a 1, :b 2} taking a map that initializes a defrecord according to:
    • the defrecord name must be fully qualified
    • the elements in the map are un-evaluated
    • existing defrecord fields take the keyed values
    • defrecord fields without keyed values in the literal map are initialized to nil
    • additional keyed values are allowed and added to the defrecord
    • only available in Clojure versions later than 1.3
  • when a defrecord Bar is defined a corresponding function map->Bar is defined that takes a map and initializes a new record instance with its contents (versions 1.3 and later only)

Why have both deftype and defrecord?


It ends up that classes in most OO programs fall into two distinct categories: those classes that are artifacts of the implementation/programming domain, e.g. String or collection classes, or Clojure's reference types; and classes that represent application domain information, e.g. Employee, PurchaseOrder etc. It has always been an unfortunate characteristic of using classes for application domain information that it resulted in information being hidden behind class-specific micro-languages, e.g. even the seemingly harmless employee.getName() is a custom interface to data. Putting information in such classes is a problem, much like having every book being written in a different language would be a problem. You can no longer take a generic approach to information processing. This results in an explosion of needless specificity, and a dearth of reuse.

This is why Clojure has always encouraged putting such information in maps, and that advice doesn't change with datatypes. By using defrecord you get generically manipulable information, plus the added benefits of type-driven polymorphism, and the structural efficiencies of fields. OTOH, it makes no sense for a datatype that defines a collection like vector to have a default implementation of map, thus deftype is suitable for defining such programming constructs.

Overall, records will be better than structmaps for all information-bearing purposes, and you should move such structmaps to defrecord. It is unlikely much code was trying to use structmaps for programming constructs, but if so, you will find deftype much more suitable.

AOT-compiled deftype/defrecord may be suitable for some of the use cases of gen-class, where their limitations are not prohibitive. In those cases, they will have better performance than gen-class.

Datatypes and protocols are opinionated


While datatypes and protocols have well-defined relationships with host constructs, and make for a great way to expose Clojure functionality to Java programs, they are not primarily interop constructs. That is, they make no effort to completely mimic or adapt to all of the OO mechanisms of the host. In particular, they reflect the following opinions:

  • Concrete derivation is bad
    • you cannot derive datatypes from concrete classes, only interfaces
  • You should always program to protocols or interfaces
    • datatypes cannot expose methods not in their protocols or interfaces
  • Immutability should be the default
    • and is the only option for records
  • Encapsulation of information is folly
    • fields are public, use protocols/interfaces to avoid dependencies
  • Tying polymorphism to inheritance is bad
    • protocols free you from that

If you use datatypes and protocols you will have a clean, interface-based API to offer your Java consumers. If you are dealing with a clean, interface-based Java API, datatypes and protocols can be used to interoperate with and extend it. If you have a 'bad' Java API, you will have to use gen-class. Only in this way can the programming constructs you use to design and implement your Clojure programs be free of the incidental complexities of OO.

reify


While deftype and defrecord define named types, reify defines both an anonymous type and creates an instance of that type. The use case is where you need a one-off implementation of one or more protocols or interfaces and would like to take advantage of the local context. In this respect it is use case similar to proxy, or anonymous inner classes in Java.

The method bodies of reify are lexical closures, and can refer to the surrounding local scope. reify differs from proxy in that:

  • Only protocols or interfaces are supported, no concrete superclass.
  • The method bodies are true methods of the resulting class, not external fns.
  • Invocation of methods on the instance is direct, not using map lookup.
  • No support for dynamic swapping of methods in the method map.

The result is better performance than proxy, both in construction and invocation. reify is preferable to proxy in all cases where its constraints are not prohibitive.
Logo & site design by Tom Hickey.