Clojure

Frequently Asked Questions

These questions and answers are adapted from mailing lists and other Clojure community forums.

Reader and Syntax

Keywords are cached and interned. This means that a keyword is reused (reducing memory) everywhere in your program and that checks for equality really become checks for identity (which are fast). Additionally, keywords are invokable to look themselves up in a map and thus this enables the common pattern of extracting a particular field from a collection of maps possible.

The reader page defines keywords as "like symbols" and symbols "begin with a non-numeric character" so the original intent was that :1 would be invalid. In fact, the only reason it is readable at all is due to a bug in the keyword regex. This bug was fixed in a 1.6 alpha but we quickly discovered that these keywords were in use in many actual projects. To avoid breaking existing working code, the change was rolled back and this form will continue to be supported. (There are still some open issues to clarify this in code and/or docs.)

Note that namespaced keywords with names starting with numbers have never been readable or valid, ie :foo/1. However, auto-resolved keywords like ::1 can be read but not round-tripped from print to read.

In general, it’s best to avoid using keywords that start with a number unless it is in a narrow and controlled scope.

The keyword function can be used to programmatically create keywords based on user data or other sources of input. Similarly, the namespace and name functions can be used to pull a keyword back apart into components. It is common for programs to use this functionality to create keywords to be used as identifiers or map keys without ever printing and reading that data back.

Because of this use case (and also for general performance), no validation check is performed on the inputs to keyword (or symbol), which makes it possible to create keywords that, when printed, cannot be read back into a keyword (due to spaces or other non-allowed characters). If this is important to you, you should validate the keyword inputs first, before creating the keyword.

The reader takes text (Clojure source) and returns Clojure data, which is subsequently compiled and evaluated. Reader macros tell the Clojure reader how to read something that is not a typical s-expression (examples are things like quoting ' and anonymous functions #()). Reader macros can be used to define entirely new syntaxes read by the reader (for example: JSON, XML, or other formats) - this is a more powerful syntactic capability than regular macros (which come into play later at compile time).

However, unlike Lisp, Clojure does not allow the user to extend this set of reader macros. This avoids the possibility of creating code that another user cannot read (because they do not have the proper reader macros). Clojure gives back some of the power of reader macros with tagged literals, allowing you to create generically readable data, that is still extensible.

Also see the History of Clojure paper section about this (search for "reader macros").

_ has no special meaning in Clojure as a symbol. However, it is a convention to use _ (or a leading _) to denote a binding that will not be used in the expression. A common case for this is skipping unneeded values in sequential destructuring:

(defn get-y [point]
  (let [[_ y] point]   ;; x-value of point is unused, so mark it with _
    y))

? and ! have no special meaning in Clojure as part of a function name. However, it is a convention to use a trailing ? to denote a predicate function and a trailing ! to denote a function with side-effects. More specifically, a trailing ? indicates that a predicate strictly returns a boolean result (true or false). A trailing ! was originally intended to indicate that a function has side-effects that would make it unsafe to use inside a ref transaction (in the Software Transactional Memory sense). In the wild, the use of ! is less consistent and is sometimes used in a broader way to indicate any sort of side-effecting behavior.

#() always expands to include parens around the expression you give it, thus in this case it yields (fn [x] ([x])) which fails when the vector is invoked. Instead, use the vector function #(vector %) or just vector, which is the function being described.

Collections, Sequences, and Transducers

Most Clojure data structure operations, including conj (conjoin), are designed to give the user a performance expectation. With conj, the expectation is that insertion should happen at the place where this operation is efficient. Lists (as linked lists) can make a constant time insertion only at the front. Vectors (indexed) are designed to expand at the back. As the user, you should consider this when you choose which data structure to use. In Clojure, vectors are used with much greater frequency.

If your goal is specifically to "add to the front of the collection", then the appropriate function to use is cons, which will always add to the front. Note however that this will produce a sequence, not an instance of the original collection type.

Generally you should divide the Clojure core functions into these two categories:

  • Data structure functions - take a data structure and return a modified versions of that data structure (conj, disj, assoc, dissoc, etc). These functions always take the data structure first.

  • Sequence functions - take a "seqable" and return a seqable. [Generally we try to avoid committing to the return values actually being an instance of ISeq - this allows for performance optimizations in some cases.] Examples are map, filter, remove, etc. All of these functions take the seqable last.

It sounds like you are using the latter but expecting the semantics of the former (which is a common issue for new Clojurists!). If you want to apply sequence functions but have more control over the output data structure, there are a number of ways to do that.

  1. Use data-structure equivalents like mapv or filterv, etc - this is a very limited set that lets you perform these ops but return a data structure rather than a seqable. (mapv inc (filterv odd? [1 2 3]))

  2. Pour the results of your sequence transformations back into a data structure with into: (into [] (map inc (filter odd? [1 2 3])))

  3. Use transducers (likely with into) - this has much the same effect as #2, but combinations of transformations can be applied more efficiently without creating any sequences - only the final result is built: (into [] (comp (filter odd?) (map inc)) [1 2 3]). As you work with larger sequences or more transformations, this makes a significant difference in performance.

Note that all of these are eager transformations - they produce the output vector when you invoke them. The original sequence version (map inc (filter odd? [1 2 3])) is lazy and will only produce values as needed (with chunking under the hood for greater performance). Neither of these is right or wrong, but they are both useful in different circumstances.

Primary collection operands come first. That way one can write -> and its ilk, and their position is independent of whether or not they have variable arity parameters. There is a tradition of this in OO languages and Common Lisp (slot-value, aref, elt).

One way to think about sequences is that they are read from the left, and fed from the right:

<- [1 2 3 4]

Most of the sequence functions consume and produce sequences. So one way to visualize that is as a chain:

map <- filter <- [1 2 3 4]

and one way to think about many of the seq functions is that they are parameterized in some way:

(map f) <- (filter pred) <- [1 2 3 4]

So, sequence functions take their source(s) last, and any other parameters before them, and partial allows for direct parameterization as above. There is a tradition of this in functional languages and Lisps.

Note that this is not the same as taking the primary operand last. Some sequence functions have more than one source (concat, interleave). When sequence functions are variadic, it is usually in their sources.

Adapted from comments by Rich Hickey.

When performing a series of transformations, sequences will create an intermediate (cached) sequence between each transformation. Transducers create a single compound transformation that is executed in one eager pass over the input. These are different models, which are both useful.

Performance benefits of transducers:

  • Source collection iteration - when used on reducible inputs (collections and other things), avoid creating an unnecessary input collection sequence - helps memory and time.

  • Intermediate sequences and cached values - as the transformation happens in a single pass, you remove all intermediate sequence and cached value creation - again, helps memory and time. The combination of the prior item and this one will start to win big as the size of the input collection or number of transformations goes up (but for small numbers of either, chunked sequences can be surprisingly fast and will compete).

Design / usage benefits of transducers:

  • Transformation composition - some use cases will have a cleaner design if they separate transformation composition from transformation application. Transducers support this.

  • Eagerness - transducers are great for cases where eagerly processing a transformation (and potentially encountering any errors) is more important than laziness

  • Resource control - because you have more control over when the input collection is traversed, you also know when processing is complete. It’s thus easier to release or clean up input resources because you know when that happens.

Performance benefits of sequences:

  • Laziness - if you will only need some of the outputs (for example a user is deciding how many to use), then lazy sequences can often be more efficient in deferring processing. In particular, sequences can be lazy with intermediate results, but transducers use a pull model that will eagerly produce all intermediate values.

  • Infinite streams - because transducers are typically eagerly consumed, they don’t match well with infinite streams of values

Design benefits of sequences:

  • Consumer control - returning a seq from an API lets you combine input + transformation into something that gives the consumer control. Transducers don’t work as well for this (but will work better for cases where input and transformation are separated).

Core functions

At one point, metadata was more cumbersome to use than now (the syntax for a private defn was #^{:private true}), and defn- seemed worth creating as an "easy" version. The metadata support improved and became "stackable" which allowed easier composition of independent metadata. Rather than create private variants of all the def forms, it is simply preferred to use ^:private metadata when needed on def or other def forms..

When partial (or other higher-order function combinators like comp, juxt, etc) is used, any vars referenced are evaluated to function objects before partial is invoked, and thus it captures the value of any function vars referenced, not the var itself. For example: (partial my-fn 100) evaluates my-fn to the current function value of #'my-fn, then invokes partial with it. If the my-fn var is rebound in the REPL, the prior partial function will not "see" those changes, because it only has the function, not the var.

If you are finding this to be a problem in interactive development, you can insert a layer of indirection. One option is to use the var reference #'myfn instead or you can use a separate fn or defn to re-include var dereferencing. Alternately, you can use a fn or anonymous function literal in place of the partial.

In general, this is not an issue in a running app (because vars are not typically getting re-bound), but can occur in interactive REPL development.

Spec

spec is in alpha to indicate that the API may still change. spec was broken out of Clojure core so that spec can be updated independently from the main Clojure version. At some point spec’s API will be considered stable and at that point the alpha will be removed. The next version of spec is being developed at alpha.spec.

There is no single right answer to this question. For data specs, it is often useful to put them in their own namespace, which may or may not match the qualifier used in the data specs. Matching the qualifier to the namespace allows the use of auto-resolved keywords both within the specs and in aliases in other namespaces, but also entwines them, making refactoring more complicated.

For function specs, most people either put them immediately before or after the function they apply to, or in a separate namespace that can optionally be required when needed (for testing or validation). In the latter case, Clojure core has followed the pattern of using foo.bar.specs to hold function specs for the functions in foo.bar.

Regex ops (cat, alt, *, +, ?, etc) always describe the elements in a sequential collection. They are not, by themselves, specs. When used in a spec context they are coerced into specs. Nested regex ops combine to form a single regex spec over the same sequential collection.

To validate a nested collection, use s/spec to wrap the inner regex, forcing a spec boundary between regex ops.

Instrument is intended to verify that a function is being invoked according to its args spec. That is, is the function being called correctly? This functionality should be used during development.

Checking whether a function operates correctly is a test-time activity and this should be checked with the check function which will actually invoke the function with generated args and verify the ret and fn specs on each invocation.

Yes, set the Java system property -Dclojure.spec.skip-macros=true and no macro specs will be checked during macroexpansion.

Spec’s general philosophy is one of "open" specs where maps can contain additional keys beyond what is specified as required or optional in an s/keys spec. One way to accomplish a constrained key set is to s/and an additional constraint:

(s/def ::auth
  (s/and
    (s/keys :req [::user ::password])
    #(every? #{::user ::password} (keys %))))

Currently, no. This is under consideration for the next version of spec.

State and Concurrency

Each of these really addresses a different use case.

  • Reducers are best for fine-grained data parallelism when computing a transformation over existing in-memory data (in a map or vector). Generally it’s best when you have thousands of small data items to compute over and many cores to do the work. Anything described as "embarrassingly parallel".

  • Futures are best for pushing work onto a background thread and picking it up later (or for doing I/O waits in parallel). It’s better for big chunky tasks (go fetch a bunch of data in the background).

  • core.async is primarily used to organize the subsystems or internal structure of your application. It has channels (queues) to convey values from one "subprocess" (go block) to another. So you’re really getting concurrency and architectural benefits in how you break up your program. The killer feature you can really only get in core.async is the ability to wait on I/O events from multiple channels for the first response on any of them (via alt/alts). Promises can also be used to convey single values between independent threads/subprocesses but they are single delivery only.

  • Tools like pmap, java.util queues and executors, and libraries like claypoole are doing coarse-level "task" concurrency. There is some overlap with core.async here which has a very useful transducer-friendly pipeline functionality.

This is most commonly asked in the context of programs that use future, pmap, agent-send, or other functions that invoke those functions. When a program like this finishes, there will be a 60 second pause before exit. To fix this problem, call shutdown-agents as the program exits.

Clojure uses two internal thread pools to service futures and agent function executions. Both pools use non-daemon threads and the JVM will not exit while any non-daemon thread is alive. In particular, the pool that services futures and agent send-off calls uses an Executor cached thread pool with a 60 second timeout. In the scenario above, the program will wait until the background threads have completed their work and the threads expire before it can exit.

If reads were included by default, then STM would be slower (as more transactions would require serializability). However, in many cases, reads do not need to be included. Thus, users can choose to accept the performance penalty when it is necessary and get faster performance when it is not.

Namespaces

No (although that is typical). One namespace can be split across multiple files by using load to load secondary files and in-ns in those files to retain the namespace (clojure.core is defined in this way). Also, it is possible to declare multiple namespaces in a single file (although this is very unusual).

ns is a macro that does a number of things:

  • creates a new internal Namespace object (if it does not yet exist)

  • makes that namespace the new current namespace (*ns*)

  • auto-refers all vars from clojure.core and imports all classes from java.lang

  • requires/refers other namespaces and vars as specified

  • (and other optional things)

ns does not return a function or anything invokable as you suggest.

While ns is typically placed at the top of a clj file, it is actually just a normal macro and can be invoked at the repl just the same. It could also be used more than once in a single file (although this would be surprising to most clj programmers and would likely not work as desired in AOT).

Compiler

Anything that has been direct linked will not see redefinitions to vars. For example, if you redefine something in clojure.core, other parts of core that use that var will not see the redefinition (however anything that you newly compile at the REPL will). In practice, this is not typically a problem.

For parts of your own app, you may wish to only enable direct linking when you build and deploy for production, rather than using it when you developing at the REPL. Or you may need to mark parts of your app with ^:redef if you want to always allow redefinition or ^:dynamic for dynamic vars.

Java and Interop

Use a $ to separate outer from inner class name. For example: java.util.Map$Entry is the Entry inner class inside Map.

Primitive types can be found as the static TYPE field on the boxed class, for example: Integer/TYPE.

Java treats a trailing varargs parameter as an array and it can be invoked from Clojure by passing an explicit array.

Examples:

;; Invoke static Arrays.asList(T... a)
(java.util.Arrays/asList (object-array [0 1 2]))

;; Invoke static String.format(String format, Object... args)
(String/format "%s %s, %s" (object-array ["March" 1 2016]))

;; For a primitive vararg, use the appropriate primitive array constructor
;; Invoke put(int row, int col, double... data)
(.put o 1 1 (double-array [2.0]))

;; Passing at least an empty array is required if there are no varargs
(.put o 1 1 (double-array []))

;; into-array can be used to create an empty typed array
;; Invoke getMethod(String name, Class... parameterTypes) on a Class instance
(.getMethod String "getBytes" (into-array Class []))

Java 9 added a module system, allowing code to be partitioned into modules where code outside a module cannot invoke code inside the module unless it has been exported by the module. One of the areas affected by this change in Java is reflective access. Clojure uses reflection when it encounters a Java interop call without sufficient type information about the target object or the function arguments. For example:

(def fac (javax.xml.stream.XMLInputFactory/newInstance))
(.createXMLStreamReader fac (java.io.StringReader. ""))

Here fac is an instance of com.sun.xml.internal.stream.XMLInputFactoryImpl, which is an extension of javax.xml.stream.XMLInputFactory. In the java.xml module, javax.xml.stream is an exported package, but the XMLInputFactoryImpl is an internal implementation of the public abstract class in that package. The invocation of createXMLStreamReader here will be reflective and the Reflector will attempt to invoke the method based on the implementation class, which is not accessible outside the module, yielding:

WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by clojure.lang.Reflector (file:/.m2/repository/org/clojure/clojure/1.10.0/clojure-1.10.0.jar) to method com.sun.xml.internal.stream.XMLInputFactoryImpl.createXMLStreamReader(java.io.Reader)
WARNING: Please consider reporting this to the maintainers of clojure.lang.Reflector
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release

The first thing to note here is that this is a warning. Java 9 through all current releases will permit the call to be made and the code will continue to work.

There are several potential workarounds:

  • Perhaps the best is to provide type hints to the exported types so the call is no longer reflective:

(.createXMLStreamReader ^javax.xml.stream.XMLInputFactory fac (java.io.StringReader. ""))
  • As of Clojure 1.10, turn off illegal access with --illegal-access=deny. The Java reflection system will then provide the necessary feedback to Clojure to detect that calling through the inaccessible class is not an option. Clojure will find the public invocation path instead and no warning will be issued.

  • Use JVM module system flags (--add-exports etc ) to forcibly export the internal packages to avoid the warning. This is not recommended.

If it is difficult to tell from the warning where the reflection is occurring, it may help to add the flag:

--illegal-access=debug

For example, via the Clojure CLI, using the -J option (or as part of :jvm-opts under an alias in deps.edn):

clj -J--illegal-access=debug

Design and Use

Because of its focus on immutable data, there is generally not a high value placed on data encapsulation. Because data is immutable, there is no need to worry about someone else modifying a value. Likewise, because Clojure data is designed to be manipulated directly, there is significant value in providing direct access to data, rather than wrapping it in APIs.

All Clojure vars are globally available so again there is not much in the way of encapsulation of functions within namespaces. However, the ability to mark vars private (either using defn- for functions or def with ^:private for values) is a convenience for a developer to indicate which parts of an API should be considered public for use vs part of the implementation.

Deps and CLI

No. The Clojure CLI is focused on a) building classpaths and b) launching clojure programs. It does not (and will not) create artifacts, deploy artifacts, etc, although they may facilitate these actions through tools and libraries.

tools.deps aims to provide programmatic building blocks for dependency resolution and classpath construction. clj/clojure wraps these into a command-line form that can be used to run Clojure programs. You can compose these pieces to do many other things.

No. Other tools exist to do this now or could be added on top of the existing functionality but this was not part of the initial goal.

If you don’t need any extra dependencies, just put #!/usr/bin/env clojure as the first line. Note that clojure won’t automatically call a -main function, so be sure your file does more than just define functions. You can find command-line arguments in *command-line-args*.

If you do need extra dependencies, try the following, courtesy Dominic Monroe, substituting whatever deps you need in place of funcool/tubax:

#!/bin/sh

"exec" "clojure" "-Sdeps" '{:deps {funcool/tubax {:mvn/version "0.2.0"}}}' "$0" "$@"

;; Clojure code goes here.

Contributing

It boils down to two reasons:

  1. To protect Clojure from future legal challenges that might discourage businesses from adopting it.

  2. To enable Clojure to be relicensed under a different open-source license if that would be advantageous.

Signing the Contributor Agreement grants Rich Hickey joint ownership of your contributions. In exchange, Rich Hickey guarantees that Clojure will always be available under an open-source license approved by either the Free Software Foundation or the Open Source Initiative.

This is a quirk of Adobe EchoSign specific to users whose email account is already associated with an Adobe EchoSign account. In those cases, EchoSign will use the company name from your existing profile in the subject line rather than the individual name that was signed on the form. Don’t worry! This has no effect - the agreement is as signed and attached in the email.

Rich Hickey prefers to evaluate patches attached to JIRA tickets. This is not to make it more difficult for contributors, or for legal reasons, but because of workflow preferences. See the development page for more details.

Link to Oct 2012 Clojure Google group message from Rich Hickey on this topic.

Future ideas

Frequently people ask for a "native" version of Clojure, ie one that does not rely on the JVM. ClojureScript self-hosting is one current path but probably only useful for a subset of use cases. The GraalVM project can be used to create a standalone binary executable. Native images produced with Graal start extremely fast but may have fewer opportunities to optimize performance than the full JVM.

However, neither of these is likely what people are envisioning when they ask for a "native version of Clojure", which is a version of the language that is not JVM-hosted and compiles directly to a native executable, probably via something like LLVM. Clojure leverages an enormous amount of performance, portability, and functionality from the JVM and relies heavily on things like a world-class garbage collector. Building a "Clojure native" would require a large amount of work to make a version of Clojure that was slower (probably much slower), less portable, and with significantly less functionality (as the Clojure library relies heavily on the JDK). The Clojure core team has no plans to work on this but it would be an amazing learning project for anyone and we encourage you to go for it!

Original author: Alex Miller