Clojure

Reducers

Reducers provide an alternative approach to using sequences to manipulate standard Clojure collections. Sequence functions are typically applied lazily, in order, create intermediate results, and in a single thread. However, many sequence functions (like map and filter) could conceptually be applied in parallel, yielding code that will get faster automatically as machines get more cores. For more details on the rationale for reducers, see the original blog posts.

A reducer is the combination of a reducible collection (a collection that knows how to reduce itself) with a reducing function (the "recipe" for what needs to be done during the reduction). The standard sequence operations are replaced with new versions that do not perform the operation but merely transform the reducing function. Execution of the operations is deferred until the final reduction is performed. This removes the intermediate results and lazy evaluation seen with sequences.

Additionally, some collections (persistent vectors and maps) are foldable. The fold operation on a reducer executes the reduction in parallel by:

  1. Partitioning the reducible collection at a specified granularity (default = 512 elements)

  2. Applying reduce to each partition

  3. Recursively combining each partition using Java’s fork/join framework.

If a collection does not support folding, it will fall back to non-parallel reduce instead.

reduce and fold

The clojure.core.reducers namespace (aliased here as r) provides an alternate r/reduce function.

(r/reduce f coll)
(r/reduce f init coll)

The reducers version differs in that:

  • Map colls are reduced with reduce-kv

  • When init is not provided, f is invoked with no arguments to produce an identity value

    • Note: f may be invoked multiple times to provide the identity value

In general most users will not call r/reduce directly and instead should prefer r/fold, which implements parallel reduce and combine. However, it may be useful to execute an eager reduce with fewer intermediate results.

(r/fold reducef coll)
(r/fold combinef reducef coll)
(r/fold n combinef reducef coll)

r/fold takes a reducible collection and partitions it into groups of approximately n (default 512) elements. Each group is reduced using the reducef function. The reducef function will be called with no arguments to produce an identity value in each partition. The results of those reductions are then reduced with the combinef (defaults to reducef) function. When called with no arguments, (combinef) must produce its identity element - this will be called multiple times. Operations may be performed in parallel. Results will preserve order.

The following functions (analogous to the sequence versions) create reducers from a reducible or foldable collection: r/map r/mapcat r/filter r/remove r/flatten r/take-while r/take and r/drop. None of these functions actually transforms the source collection. To produce an accumulated result, you must use r/reduce or r/fold. To produce an output collection, use clojure.core/into to choose the collection type or the provided r/foldcat to produce a collection that is reducible, foldable, seqable, and counted.

Using Reducers

Use fold to sum with +:

(require '[clojure.core.reducers :as r])
(r/fold + (r/filter even? (r/map inc [1 1 1 2])))
;=> 6

Use into to produce a final collection:

(into [] (r/filter even? (r/map inc (range 100000))))

Or r/foldcat:

(r/foldcat (r/filter even? (r/map inc (range 100000))))

Specify a reduce function and a combine function with fold:

(defn count-words
  ([] {})
  ([freqs word]
    (assoc freqs word (inc (get freqs word 0)))))

(defn merge-counts
  ([] {})
  ([& m] (apply merge-with + m)))

(defn word-frequency [text]
  (r/fold merge-counts count-words (clojure.string/split text #"\s+")))

When to use

Use the reducer form of these operations for:

  • Efficient eager application of a multi-step transformation

  • Avoiding the dangling I/O resource issues (as seen with lazy seqs)

Use fold when:

  • Source data can be generated and held in memory

  • Work to be performed is computation (not I/O or blocking)

  • Number of data items or work to be done is "large"