user> (= 2 (+ 1 1))
true
user> (= (str "fo" "od") "food")
true
This document discusses the concept of equality in Clojure, including the functions =
, ==
, and identical?
, and how they differ from Java’s equals
method. It also has some description of Clojure’s hash
, and how it differs from Java’s hashCode
. The beginning of this guide provides a summary of the most important information for quick reference followed by a much more extensive review of the details.
Information in this guide describes the behavior of Clojure 1.10.0 unless noted otherwise.
Clojure’s =
is true when comparing immutable values that represent
the same value, or when comparing mutable objects that are the
identical object. As a convenience, =
also returns true when used
to compare Java collections against each other, or against Clojure’s
immutable collections, if their contents are equal. However, there
are important caveats if you use non-Clojure collections.
Clojure’s =
is true when called with two immutable scalar values, if:
Both arguments are nil
, true
, false
, the same character, or
the same string (i.e. the same sequence of characters).
Both arguments are symbols, or both keywords, with equal namespaces and names.
Both arguments are numbers in the same 'category', and numerically the same, where category is one of:
integer or ratio
floating point (float or double)
Clojure’s =
is true when called with two collections, if:
Both arguments are sequential (sequences, lists, vectors, queues,
or Java collections implementing java.util.List
) with =
elements
in the same order.
Both arguments are sets (including Java sets implementing
java.util.Set
), with =
elements, ignoring order.
Both arguments are maps (including Java maps implementing
java.util.Map
), with =
keys and values, ignoring entry order.
Both arguments are records created with defrecord
, with =
keys
and values, ignoring order, and they have the same type. =
returns false
when comparing a record to a map, regardless of
their keys and values, because they do not have the same type.
Clojure’s =
is true when called with two mutable Clojure objects,
i.e. vars, refs, atoms, or agents, or with two "pending" Clojure
objects, i.e. futures, promises, or delays, if:
Both arguments are the identical object, i.e. (identical? x y)
is true.
For all other types:
Both arguments are the same type defined with deftype
. The type’s
equiv
method is called and its return value becomes the value of
(= x y)
.
For other types, Java’s x.equals(y)
is true.
Clojure’s ==
is intended specifically for numerical values:
==
can be used with numbers across different number categories (such as integer 0
and floating point 0.0
).
If any value being compared is not a number, an exception is thrown.
If you call =
or ==
with more than two arguments, the result will
be true when all consecutive pairs are =
or ==
. hash
is
consistent with =
, with the exceptions given below.
Exceptions, or possible surprises:
When using non-Clojure collections in a Clojure hash-based collection (as map keys, or set elements), it will not appear equal to a similar collection with Clojure counterparts, due to the difference in hashing behavior. (see Equality and hash and CLJ-1372)
When comparing collections with =
, numbers within the collections
are also compared with =
, so the three numeric categories above
are significant.
'Not a Number' values ##NaN
, Float/NaN
, and Double/NaN
are not
=
or ==
to anything, not even themselves.
Recommendation: Avoid including ##NaN
inside of Clojure data
structures where you want to compare them to each other using =
,
and sometimes get true
as the result.
0.0 is =
to -0.0
Clojure regex’s, e.g. #"a.*bc"
, are implemented using Java
java.util.regex.Pattern
objects, and Java’s equals
on two
Pattern
objects returns (identical? re1 re2)
, even though they
are documented as immutable objects. Thus (= #"abc" #"abc")
returns false, and =
only returns true if two regex’s happen to be
the same identical object in memory. Recommendation: Avoid using
regex instances inside of Clojure data structures where you want to
compare them to each other using =
, and get true
as the result
even if the regex instances are not identical objects. If you feel
the need to, consider converting them to strings first, e.g. (str
#"abc")
→ "abc"
(see
CLJ-1182)
Clojure persistent queues are never =
to Java collections
implementing java.util.List
, not even if they have =
elements in
the same order (see
CLJ-1059)
Using =
to compare a sorted map with another map, where compare
throws an exception when comparing their keys to each other because
they have different types (e.g. keywords vs. numbers), will in some
cases throw an exception (see
CLJ-2325)
In most cases, hash
is consistent with =
, meaning: if (= x y)
,
then (= (hash x) (hash y))
. For any values or objects where this
does not hold, Clojure hash-based collections will not be able to find
or remove those items correctly, i.e. for hash-based sets with those
items as elements, or hash-based maps with those items as keys.
hash
is consistent with =
for numbers, except for special float
and double values. Recommendation: Convert floats to doubles with
(double x)
to avoid this issue.
hash
is not consistent with =
for immutable Clojure collections
and their non-Clojure counterparts. See the
Equality and hash
section for more details. Recommendation: Convert
non-Clojure collections to their Clojure immutable counterparts
before including them in other Clojure data structures.
Historical: hash
was not consistent with =
for objects with class VecSeq
,
returned from calls like (seq (vector-of :int 0 1 2))
until fixed in Clojure 1.10.2 (see
CLJ-1364)
Equality in Clojure is most often tested using =
.
user> (= 2 (+ 1 1))
true
user> (= (str "fo" "od") "food")
true
Unlike Java’s equals
method, Clojure’s =
returns true for many
values that do not have the same type as each other.
user> (= (float 314.0) (double 314.0))
true
user> (= 3 3N)
true
=
does not always return true when two numbers have the same
numeric value.
user> (= 2 2.0)
false
If you want to test for numeric equality across different numeric categories, use ==
. See the section Numbers below for details.
Sequential collections (sequences, vectors, lists, and queues) with equal elements in the same order are equal:
user> (range 3)
(0 1 2)
user> (= [0 1 2] (range 3))
true
user> (= [0 1 2] '(0 1 2))
true
;; not = because different order
user> (= [0 1 2] [0 2 1])
false
;; not = because different number of elements
user> (= [0 1] [0 1 2])
false
;; not = because 2 and 2.0 are not =
user> (= '(0 1 2) '(0 1 2.0))
false
Two sets are equal if they have equal elements. Sets are normally unordered but even with sorted sets, the sort order is not considered when comparing for equality.
user> (def s1 #{1999 2001 3001})
#'user/s1
user> s1
#{2001 1999 3001}
user> (def s2 (sorted-set 1999 2001 3001))
#'user/s2
user> s2
#{1999 2001 3001}
user> (= s1 s2)
true
Two maps are equal if they have the same set of keys, and each key maps to equal values in each map. As with sets, maps are unordered and the sort order is not considered for sorted maps.
user> (def m1 (sorted-map-by > 3 -7 5 10 15 20))
#'user/m1
user> (def m2 {3 -7, 5 10, 15 20})
#'user/m2
user> m1
{15 20, 5 10, 3 -7}
user> m2
{3 -7, 5 10, 15 20}
user> (= m1 m2)
true
Note that while vectors are indexed and possess some map-like qualities, maps
and vectors never compare as =
in Clojure:
user> (def v1 ["a" "b" "c"])
#'user/v1
user> (def m1 {0 "a" 1 "b" 2 "c"})
#'user/m1
user> (v1 0)
"a"
user> (m1 0)
"a"
user> (= v1 m1)
false
Any metadata associated with Clojure collections is ignored when comparing them.
user> (def s1 (with-meta #{1 2 3} {:key1 "set 1"}))
#'user/s1
user> (def s2 (with-meta #{1 2 3} {:key1 "set 2 here"}))
#'user/s2
user> (binding [*print-meta* true] (pr-str s1))
"^{:key1 \"set 1\"} #{1 2 3}"
user> (binding [*print-meta* true] (pr-str s2))
"^{:key1 \"set 2 here\"} #{1 2 3}"
user> (= s1 s2)
true
user> (= (meta s1) (meta s2))
false
Records created with defrecord
in many ways behave similarly to
Clojure maps. However, they are only =
to other records of the same
type, and only then if they have the same keys and the same values.
They are never equal to maps, even if they have the same keys and
values.
When you define a Clojure record, you are doing so in order to create a distinct type that can be distinguished from other types — you want each type to have its own behavior with Clojure protocols and multimethods.
user=> (defrecord MyRec1 [a b])
user.MyRec1
user=> (def r1 (->MyRec1 1 2))
#'user/r1
user=> r1
#user.MyRec1{:a 1, :b 2}
user=> (defrecord MyRec2 [a b])
user.MyRec2
user=> (def r2 (->MyRec2 1 2))
#'user/r2
user=> r2
#user.MyRec2{:a 1, :b 2}
user=> (def m1 {:a 1 :b 2})
#'user/m1
user=> (= r1 r2)
false ; r1 and r2 have different types
user=> (= r1 m1)
false ; r1 and m1 have different types
user=> (into {} r1)
{:a 1, :b 2} ; this is one way to "convert" a record to a map
user=> (= (into {} r1) m1)
true ; the resulting map is = to m1
Clojure =
behaves the same as Java’s equals
for all types except
numbers and Clojure collections.
Booleans and characters are straightforward in their equality.
Strings are straightforward, too, except in some cases involving
Unicode where strings that consist of different sequences of Unicode
characters can look the same when displayed, and in some applications
should be treated as equal even though =
returns false. See
"Normalization" on the Wikipedia page on
Unicode equivalence if
you are interested. There are libraries like
ICU (International Components for Unicode for Java)
that can help if you need to do this.
Two symbols are equal if they have the same namespace and symbol name.
Two keywords are equal given the same conditions. Clojure makes
equality testing for keywords particularly quick (a simple pointer
comparison). It achieves this by its intern
method of the Keyword
class guaranteeing that all keywords with the same namespace and name
will return the same keyword object.
Java equals
is only true for two numbers if the types and numeric
values are the same. Thus equals
is false even for Integer 1 and
Long 1, because they have different types. Exception: Java equals
is also false for two BigDecimal values that are numerically equal if
they have different scales, e.g. 1.50M and 1.500M are not equal. This
behavior is documented for BigDecimal method
equals
.
Clojure =
is true if the 'category' and numeric values are the same.
Category is one of:
integer or ratios, where integer includes all Java integer types such as Byte
, Short
, Integer
, Long
, BigInteger
, and clojure.lang.BigInt
, and ratios are represented with the Java type named clojure.lang.Ratio
.
floating point: Float
and Double
decimal: BigDecimal
So (= (int 1) (long 1))
is true because they are in the same integer
category, but (= 1 1.0)
is false because they are in different
categories (integer vs. floating). While integers and ratios are
separate types in the Clojure implementation, for the purposes of =
they are effectively in the same category. The results of arithmetic
operations on ratios are auto-converted to integers if they are whole
numbers. Thus any Clojure number that has type Ratio cannot equal any
integer, so =
always gives the correct numerical answer (false
)
when comparing a ratio to an integer.
Clojure also has ==
that is only useful for comparing numbers. It
returns true whenever =
does. It also returns true for numbers that
are numerically equal, even if they are in different categories. Thus
(= 1 1.0)
is false, but (== 1 1.0)
is true.
Why does =
have different categories for numbers, you might wonder?
It would be difficult (if it is even possible) to make hash
consistent
with =
if it behaved like ==
(see section
Equality and hash).
Imagine trying to write hash
such that it was guaranteed to
return the same hash value for all of (float 1.5)
, (double 1.5)
,
BigDecimal values 1.50M, 1.500M, etc. and the ratio (/ 3 2)
.
Clojure uses =
to compare values for equality when they are used as
elements in sets, or keys in maps. Thus Clojure’s numeric categories
come into play if you use sets with numeric elements or maps with
numeric keys.
Note that floating point values might behave in ways that surprise you, if you have not learned of their approximate nature before. They are often approximations simply because they are represented with a fixed number of bits, and thus many values cannot be represented exactly and must be approximated (or be out of range). This is true for floating point numbers in any programming language.
user> (def d1 (apply + (repeat 100 0.1)))
#'user/d1
user> d1
9.99999999999998
user> (== d1 10.0)
false
There is a whole field called Numerical Analysis dedicated to studying algorithms that use numerical approximation. There are libraries of Fortran code that are used because their order of floating point operations is carefully crafted to give guarantees on the difference between their approximate answers and the exact answers. "What Every Computer Scientist Should Know About Floating-Point Arithmetic" is good reading if you want quite a few details.
If you want exact answers for at least some kinds of problems, ratios or BigDecimals might suit your needs. Realize that these require variable amounts of memory if the number of digits required grow (e.g. after many arithmetic operations), and significantly more computation time. They also won’t help if you want exact values of pi or the square root of 2.
Clojure uses the underlying Java double-size floating point numbers
(64-bit) with representation and behavior defined by a standard, IEEE
754. There is a special value NaN
("Not A Number")
that is not even equal to itself. Clojure represents this value as
the symbolic value ##NaN
.
user> (Math/sqrt -1)
##NaN
user> (= ##NaN ##NaN)
false
user> (== ##NaN ##NaN)
false
This leads to some odd behavior if this "value" appears in your data.
While no error occurs when adding ##NaN
as a set element or a key in a
map, you cannot then search for it and find it. You also cannot
remove it using functions like disj
or dissoc
. It will appear
normally in sequences created from collections containing it.
user> (def s1 #{1.0 2.0 ##NaN})
#'user/s1
user> s1
#{2.0 1.0 ##NaN}
user> (s1 1.0)
1.0
user> (s1 1.5)
nil
user> (s1 ##NaN)
nil ; cannot find ##NaN in a set, because it is not = to itself
user> (disj s1 2.0)
#{1.0 ##NaN}
user> (disj s1 ##NaN)
#{2.0 1.0 ##NaN} ; ##NaN is still in the result!
In many cases, collections that contain ##NaN
will not be =
to another collection, even if they look like they should be, because (= ##NaN ##NaN)
is false
:
user> (= [1 ##NaN] [1 ##NaN])
false
Oddly enough, there are exceptions where collections contain ##NaN
that look like they should be =
, and they are, because (identical? ##NaN ##NaN)
is true
:
user> (def s2 #{##NaN 2.0 1.0})
#'user/s2
user> s2
#{2.0 1.0 ##NaN}
user> (= s1 s2)
true
Java has a special case in its equals
method for floating point
values that makes ##NaN
equal to itself. Clojure =
and ==
do not.
user> (.equals ##NaN ##NaN)
true
Java has equals
to compare pairs of objects for equality.
Java has a method hashCode
that is consistent with this notion of
equality (or is documented that it should be, at least). This means
that for any two objects x
and y
where equals
is true,
x.hashCode()
and y.hashCode()
are equal, too.
This hash consistency property makes it possible to use hashCode
to
implement hash-based data structures like maps and sets that use hashing
techniques internally. For example, a hash table could be used to
implement a set, and it will be guaranteed that objects with different
hashCode
values can be put into different hash buckets, and objects
in different hash buckets will never be equal to each other.
Clojure has =
and hash
for similar reasons. Since Clojure =
considers more pairs of things equal to each other than Java equals
,
Clojure hash
must return the same hash value for more pairs of
objects. For example, hash
always returns the same value regardless
of whether a sequence of =
elements is in a sequence, vector, list,
or queue:
user> (hash ["a" 5 :c])
1698166287
user> (hash (seq ["a" 5 :c]))
1698166287
user> (hash '("a" 5 :c))
1698166287
user> (hash (conj clojure.lang.PersistentQueue/EMPTY "a" 5 :c))
1698166287
However, since hash
is not consistent with =
when comparing
Clojure immutable collections with their non-Clojure counterparts,
mixing the two can lead to undesirable behavior, as shown in the
examples below.
user=> (def java-list (java.util.ArrayList. [1 2 3]))
#'user/java-list
user=> (def clj-vec [1 2 3])
#'user/clj-vec
;; They are =, even though they are different classes
user=> (= java-list clj-vec)
true
user=> (class java-list)
java.util.ArrayList
user=> (class clj-vec)
clojure.lang.PersistentVector
;; Their hash values are different, though.
user=> (hash java-list)
30817
user=> (hash clj-vec)
736442005
;; If java-list and clj-vec are put into collections that do not use
;; their hash values, like a vector or array-map, then those
;; collections will be equal, too.
user=> (= [java-list] [clj-vec])
true
user=> (class {java-list 5})
clojure.lang.PersistentArrayMap
user=> (= {java-list 5} {clj-vec 5})
true
user=> (assoc {} java-list 5 clj-vec 3)
{[1 2 3] 3}
;; However, if java-list and clj-vec are put into collections that do
;; use their hash values, like a hash-set, or a key in a hash-map,
;; then those collections will not be equal because of the different
;; hash values.
user=> (class (hash-map java-list 5))
clojure.lang.PersistentHashMap
user=> (= (hash-map java-list 5) (hash-map clj-vec 5))
false ; sorry, not true
user=> (= (hash-set java-list) (hash-set clj-vec))
false ; also not true
user=> (get (hash-map java-list 5) java-list)
5
user=> (get (hash-map java-list 5) clj-vec)
nil ; you were probably hoping for 5
user=> (conj #{} java-list clj-vec)
#{[1 2 3] [1 2 3]} ; you may have been expecting #{[1 2 3]}
user=> (hash-map java-list 5 clj-vec 3)
{[1 2 3] 5, [1 2 3] 3} ; I bet you wanted {[1 2 3] 3} instead
Most of the time you use maps in Clojure, you do not specify whether you want an array map or a hash map. By default array maps are used if there are at most 8 keys, and hash maps are used if there are over 8 keys. Clojure functions choose the implementation for you as you do operations on the maps. Thus even if you tried to use array maps consistently, you are likely to frequently get hash maps as you create larger maps.
We do not recommend trying to avoid the use of hash-based sets and
maps in Clojure. They use hashing to help achieve high performance in
their operations. Instead we would recommend avoiding the use of
non-Clojure collections as parts within Clojure collections.
Primarily this advice is because most such non-Clojure collections are
mutable, and mutability often leads to subtle bugs. Another reason is
the inconsistency of hash
with =
.
Similar behavior occurs for Java collections that implement
java.util.List
, java.util.Set
, and java.util.Map
, and any of the
few kinds of values for which Clojure’s hash
is not consistent with
=
.
If you use hash-inconsistent values as parts within any Clojure collection, even as elements in a sequential collection like a list or vector, those collections become hash-inconsistent with each other, too. This occurs because the hash value of collections is calculated by combining the hash values of their parts.
You are likely wondering why hash
is not consistent with =
for
non-Clojure collections. Non-Clojure collections have used Java’s
hashCode
method long before Clojure existed. When Clojure was
initially developed, it used the same formula for calculating a hash
function from collection elements as hashCode
did.
Before the release of Clojure 1.6.0 it was discovered that this use of
hashCode
for Clojure’s hash
function can lead to many hash
collisions when small collections are used as set elements or map
keys.
For example, imagine a Clojure program that represents the contents of
a 2-dimensional grid with 100 rows and 100 columns using a map with
keys that are vectors of two numbers in the range [0, 99]. There are
10,000 such points in this grid, so 10,000 keys in the map, but
hashCode
only gives 3,169 different results.
user=> (def grid-keys (for [x (range 100), y (range 100)]
[x y]))
#'user/grid-keys
user=> (count grid-keys)
10000
user=> (take 5 grid-keys)
([0 0] [0 1] [0 2] [0 3] [0 4])
user=> (take-last 5 grid-keys)
([99 95] [99 96] [99 97] [99 98] [99 99])
user=> (count (group-by #(.hashCode %) grid-keys))
3169
Thus there are an average of 10,000 / 3,169 = 3.16 collisions per hash bucket if the map uses the default Clojure implementation of a hash-map.
The Clojure developers
analyzed
several alternate hash functions, and chose one based on the Murmur3
hash function, which has been in use since Clojure 1.6.0. It also
uses a different way than Java’s hashCode
does to combine the hashes
of multiple elements in a collection.
At that time, Clojure could have changed hash
to use the new
technique for non-Clojure collections as well, but it was judged that
doing so would significantly slow down a Java method called hasheq
,
used to implement hash
. See
CLJ-1372 for approaches
that have been considered so far, but as of this time no one has
discovered a competitively fast way to do it.
hash
inconsistent with =
For some Float and Double values that are =
to each other, their
hash
values are inconsistent:
user> (= (float 1.0e9) (double 1.0e9))
true
user> (map hash [(float 1.0e9) (double 1.0e9)])
(1315859240 1104006501)
user> (hash-map (float 1.0e9) :float-one (double 1.0e9) :oops)
{1.0E9 :oops, 1.0E9 :float-one}
You can avoid the Float
vs Double
hash inconsistency by
consistently using one or the other types in floating point code.
Clojure defaults to doubles for floating point values, so that may be
the most convenient choice.
Rich Hickey has decided that changing this inconsistency in hash
values for types Float
and Double
is out of scope for Clojure
(mentioned in a comment of
CLJ-1036). Ticket
CLJ-1649 has been filed
suggesting a change that =
always return false when comparing floats
to doubles, which would make hash
consistent with =
by eliminating
the restriction on hash
, but there is no decision on that yet.
See the code of the projects below for examples of how to do this, and much more. In
particular, the Java methods equals
and hashCode
from standard
Java objects, and the Clojure Java methods equiv
and hasheq
are
the most relevant for how =
and hash
behave.
org.flatland/ordered but note
that it needs a change so that its custom ordered map data structure
is not =
to any Clojure record:
PR #42
The paper
"Equal Rights for Functional Objects, or, the More Things Change, The More They Are the Same" by Henry
Baker includes code written in Common Lisp for a function EGAL
that
was an inspiration for Clojure’s =
. The idea of "deep equality"
making sense for immutable values, but not as much sense for mutable
objects (unless the mutable objects are the same object in memory), is
independent of programming language.
Some differences between EGAL
and Clojure’s =
are described below.
These are fairly esoteric details about the behavior of EGAL
, and
are not necessary to know for an understanding of Clojure’s =
.
EGAL
is defined to be false
when comparing mutable objects to
anything else, unless that other thing is the same identical mutable
object in memory.
As a convenience, Clojure’s =
is designed to return true
in some
cases when comparing Clojure immutable collections to non-Clojure
collections.
There is no Java method to determine whether an arbitrary collection
is mutable or immutable, so it is not possible in Clojure to implement
the intended behavior of EGAL
, although one might consider =
"closer"
to EGAL
if it always returned false
when one of the arguments was
a non-Clojure collection.
Baker recommends that EGAL
force lazy values when comparing them
(see Section 3. J. "Lazy Values" in the "Equal Rights for Functional Objects" paper). When comparing a lazy sequence to
another sequential thing, Clojure’s =
does force the evaluation of
the lazy sequence, stopping if it reaches a non-=
sequence element.
Chunked sequences, e.g. as produced by range
, can cause evaluation
to proceed a little bit further than that point, as is the case for
any event in Clojure that causes evaluation of part of a lazy
sequence.
Clojure’s =
does not deref
delay, promise, or future objects when
comparing them. Instead, it compares them via identical?
, thus
returning true
only if they are the same identical object in memory,
even if calling deref
on them would result in values that were =
.
Baker describes in detail how EGAL
can return true
in some cases
when comparing
closures
to each other (see Section 3. D. "Equality of Functions and
Function-Closures" in the "Equal Rights for Functional Objects" paper).
When given a function or closure as an argument, Clojure’s =
only
returns true
if they are identical?
to each other.
Baker appeared to be motivated to define EGAL
this way because of
the prevalence in some Lisp family languages of using closures to
represent objects, where those objects could contain mutable state, or
immutable values (see the example below). Given that Clojure has
multiple other ways of creating immutable values and mutable objects
(e.g. records, reify, proxy, deftype), using closures to do so is
uncommon.
(defn make-point [init-x init-y]
(let [x init-x
y init-y]
(fn [msg]
(cond (= msg :get-x) x
(= msg :get-y) y
(= msg :get-both) [x y]
:else nil))))
user=> (def p1 (make-point 5 7))
#'user/p1
user=> (def p2 (make-point -3 4))
#'user/p2
user=> (p1 :get-x)
5
user=> (p2 :get-both)
[-3 4]
user=> (= p1 p2)
false ; We expect this to be false,
; because p1 and p2 have different x, y values
user=> (def p3 (make-point 5 7))
#'user/p3
user=> (= p1 p3)
false ; Baker's EGAL would return true here. Clojure
; = returns false because p1 and p3 are not identical?
Original author: Andy Fingerhut