Clojure
Frequently Asked Questions

Frequently Asked Questions

These questions and answers are adapted from mailing lists and other Clojure community forums.

Reader and Syntax

Keywords are cached and interned. This means that a keyword is reused (reducing memory) everywhere in your program and that checks for equality really become checks for identity (which are fast). Additionally, keywords are invokable to look themselves up in a map and thus this enables the common pattern of extracting a particular field from a collection of maps possible.

The keyword function can be used to programmatically create keywords based on user data or other sources of input. Similarly, the namespace and name functions can be used to pull a keyword back apart into components. It is common for programs to use this functionality to create keywords to be used as identifiers or map keys without ever printing and reading that data back.

Because of this use case (and also for general performance), no validation check is performed on the inputs to keyword (or symbol), which makes it possible to create keywords that, when printed, cannot be read back into a keyword (due to spaces or other non-allowed characters). If this is important to you, you should validate the keyword inputs first, before creating the keyword.

The reader takes text (Clojure source) and returns Clojure data, which is subsequently compiled and evaluated. Reader macros tell the Clojure reader how to read something that is not a typical s-expression (examples are things like quoting ' and anonymous functions #()). Reader macros can be used to define entirely new syntaxes read by the reader (for example: JSON, XML, or other formats) - this is a more powerful syntactic capability than regular macros (which come into play later at compile time).

However, unlike Lisp, Clojure does not allow the user to extend this set of reader macros. This avoids the possibility of creating code that another user cannot read (because they do not have the proper reader macros). Clojure gives back some of the power of reader macros with tagged literals, allowing you to create generically readable data, that is still extensible.

_ has no special meaning in Clojure as a symbol. However, it is a convention to use _ (or a leading _) to denote a binding that will not be used in the expression.

(defn get-x [point]
  (let [[x _] point]   ;; y-value of point is unused, so mark it with _
    x))

#() always expands to include parens around the expression you give it, thus in this case it yields (fn [x] ([x])) which fails when the vector is invoked. Instead, use the vector function #(vector %) or just vector, which is the function being described.

Collections, Sequences, and Transducers

Most Clojure data structure operations, including conj (conjoin), are designed to give the user a performance expectation. With conj, the expectation is that insertion should happen at the place where this operation is efficient. Lists (as linked lists) can make a constant time insertion only at the front. Vectors (indexed) are designed to expand at the back. As the user, you should consider this when you choose which data structure to use. In Clojure, vectors are used with much greater frequency.

If your goal is specifically to "add to the front of the collection", then the appropriate function to use is cons, which will always add to the front. Note however that this will produce a sequence, not an instance of the original collection type.

Generally you should divide the Clojure core functions into these two categories:

  • Data structure functions - take a data structure and return a modified versions of that data structure (conj, disj, assoc, dissoc, etc). These functions always take the data structure first.

  • Sequence functions - take a "seqable" and return a seqable. [Generally we try to avoid committing to the return values actually being an instance of ISeq - this allows for performance optimizations in some cases.] Examples are map, filter, remove, etc. All of these functions take the seqable last.

It sounds like you are using the latter but expecting the semantics of the former (which is a common issue for new Clojurists!). If you want to apply sequence functions but have more control over the output data structure, there are a number of ways to do that.

  1. Use data-structure equivalents like mapv or filterv, etc - this is a very limited set that lets you perform these ops but return a data structure rather than a seqable. (mapv inc (filterv odd? [1 2 3]))

  2. Pour the results of your sequence transformations back into a data structure with into: (into [] (map inc (filter odd? [1 2 3])))

  3. Use transducers (likely with into) - this has much the same effect as #2, but combinations of transformations can be applied more efficiently without creating any sequences - only the final result is built: (into [] (comp (filter odd?) (map inc)) [1 2 3]). As you work with larger sequences or more transformations, this makes a significant difference in performance.

Note that all of these are eager transformations - they produce the output vector when you invoke them. The original sequence version (map inc (filter odd? [1 2 3])) is lazy and will only produce values as needed (with chunking under the hood for greater performance). Neither of these is right or wrong, but they are both useful in different circumstances.

When performing a series of transformations, sequences will create an intermediate (cached) sequence between each transformation. Transducers create a single compound transformation that is executed in one eager pass over the input. These are different models, which are both useful.

Performance benefits of transducers:

  • Source collection iteration - when used on reducible inputs (collections and other things), avoid creating an unnecessary input collection sequence - helps memory and time.

  • Intermediate sequences and cached values - as the transformation happens in a single pass, you remove all intermediate sequence and cached value creation - again, helps memory and time. The combination of the the prior item and this one will start to win big as the size of the input collection or number of transformations goes up (but for small numbers of either, chunked sequences can be surprisingly fast and will compete).

Design / usage benefits of transducers:

  • Transformation composition - some use cases will have a cleaner design if they separate transformation composition from transformation application. Transducers support this.

  • Eagerness - transducers are great for cases where eagerly processing a transformation (and potentially encountering any errors) is more important than laziness

  • Resource control - because you have more control over when the input collection is traversed, you also know when processing is complete. It’s thus easier to release or clean up input resources because you know when that happens.

Performance benefits of sequences:

  • Laziness - if you will only need some of the outputs (for example a user is deciding how many to use), then lazy sequences can often be more efficient in deferring processing. In particular, sequences can be lazy with intermediate results, but transducers use a pull model that will eagerly produce all intermediate values.

  • Infinite streams - because transducers are typically eagerly consumed, they don’t match well with infinite streams of values

Design benefits of sequences:

  • Consumer control - returning a seq from an API lets you combine input + transformation into something that gives the consumer control. Transducers don’t work as well for this (but will work better for cases where input and transformation are separated).

State and Concurrency

Each of these really addresses a different use case.

  • Reducers are best for fine-grained data parallelism when computing a transformation over existing in-memory data (in a map or vector). Generally it’s best when you have thousands of small data items to compute over and many cores to do the work. Anything described as "embarrassingly parallel".

  • Futures are best for pushing work onto a background thread and picking it up later (or for doing I/O waits in parallel). It’s better for big chunky tasks (go fetch a bunch of data in the background).

  • core.async is primarily used to organize the subsystems or internal structure of your application. It has channels (queues) to convey values from one "subprocess" (go block) to another. So you’re really getting concurrency and architectural benefits in how you break up your program. The killer feature you can really only get in core.async is the ability to wait on I/O events from multiple channels for the first response on any of them (via alt/alts). Promises can also be used to convey single values between independent threads/subprocesses but they are single delivery only.

  • Tools like pmap, java.util queues and executors, and libraries like claypoole are doing coarse-level "task" concurrency. There is some overlap with core.async here which has a very useful transducer-friendly pipeline functionality.

This is most commonly asked in the context of programs that use future, pmap, agent-send, or other functions that invoke those functions. When a program like this finishes, there will be a 60 second pause before exit. To fix this problem, call shutdown-agents as the program exits.

Clojure uses two internal thread pools to service futures and agent function executions. Both pools use non-daemon threads and the JVM will not exit while any non-daemon thread is alive. In particular, the pool that services futures and agent send-off calls uses an Executor cached thread pool with a 60 second timeout. In the scenario above, the program will wait until the background threads have completed their work and the threads expire before it can exit.

If reads were included by default, then STM would be slower (as more transactions would require serializability). However, in many cases, reads do not need to be included. Thus, users can choose to accept the performance penalty when it is necessary and get faster performance when it is not.

Namespaces

No (although that is typical). One namespace can be split across multiple files by using load to load secondary files and in-ns in those files to retain the namespace (clojure.core is defined in this way). Also, it is possible to declare multiple namespaces in a single file (although this is very unusual).

ns is a macro that does a number of things:

  • creates a new internal Namespace object (if it does not yet exist)

  • makes that namespace the new current namespace (ns)

  • auto-refers all vars from clojure.core and imports all classes from java.lang

  • requires/refers other namespaces and vars as specified

  • (and other optional things)

ns does not return a function or anything invokable as you suggest.

While ns is typically placed at the top of a clj file, it is actually just a normal macro and can be invoked at the repl just the same. It could also be used more than once in a single file (although this would be surprising to most clj programmers and would likely not work as desired in AOT).

Compiler

Anything that has been direct linked will not see redefinitions to vars. For example, if you redefine something in clojure.core, other parts of core that use that var will not see the redefinition (however anything that you newly compile at the REPL will). In practice, this is not typically a problem.

For parts of your own app, you may wish to only enable direct linking when you build and deploy for production, rather than using it when you developing at the REPL. Or you may need to mark parts of your app with ^:redef if you want to always allow redefinition or ^:dynamic for dynamic vars.

Java and Interop

Use a $ to separate outer from inner class name. For example: java.util.Map$Entry is the Entry inner class inside Map.

Primitive types can be found as the static TYPE field on the boxed class, for example: Integer/TYPE.

Java treats a trailing varargs parameter as an array and it can be invoked from Clojure. Example:

(.method object fixed-args…​ (into-array type variable-args…​))

Example:

;; asList takes an Object vararg parameter
(java.util.Arrays/asList (object-array [0 1 2]))

;; format takes one fixed parameter and a varargs
(String/format "%s %s, %s" (object-array ["March" 1 2016]))

Design and Use

Because of its focus on immutable data, there is generally not a high value placed on data encapsulation. Because data is immutable, there is no need to worry about someone else modifying a value. Likewise, because Clojure data is designed to be manipulated directly, there is significant value in providing direct access to data, rather than wrapping it in APIs.

All Clojure vars are globally available so again there is not much in the way of encapsulation of functions within namespaces. However, the ability to mark vars private (either using defn- for functions or def with ^:private for values) is a convenience for a developer to indicate which parts of an API should be considered public for use vs part of the implementation.

Original author: Alex Miller