Hash Fusing – Clojure Civitas

This is an article about fusing hashes together while having the key properties of associativity and non-commutativity. I describe the basic approach of using matrix multiplication of Upper Triangle Matrices and then show the results of two experiments showing the quality and limits of this hash fusing approach.

The basic question this article tries to answer is: can fusing hashes together provide unique identifiers for arbitrary data? The answer is no. Specifically, the approach in this article can not provide unique identifiers for sufficiently large runs of repeated values. That is, sufficiently low entropy data is not uniquely represented by fusing hashes together.

The good news is that low entropy data may always be converted into high entropy data either via compression or injecting noise. A practical fusing operator can easily detect fuses of low entropy data and throw an exception when fusing produces those ‘bad’ hash values.

Introduction

I have begun working on implementing distributed immutable data structures as a way to efficiently share structured data between multiple clients. One of the key challenges in this area is being able to efficiently reference ordered collections of data. The usual approach is to use Merkle Trees which have the limitation that they are non-associative and the shape of the tree determines the final hash value at the root of the tree. But, since I plan on using Finger Trees to efficiently represent ordered collections of data, I need computed hashes that are insensitive to the tree shape. This means that I need to be able to fuse hashes together in a way that is associative and also non-commutative. As a starting point, I have been reading the HP paper describing hash fusing via matrix multiplication: https://www.labs.hpe.com/techreports/2017/HPE-2017-08.pdf

Hash Fusing via Upper Triangular Matrix Multiplication

The basic approach of fusing two hashes is to represent each hash as an upper triangular matrix and then to multiply the two matrices together. Since matrix multiplication is associative but not commutative. The following sections define hashes and build up the necessary functions to perform hash fusing via upper triangular matrix multiplication.

Hash Representation

A hash is represented as a string of 64 hexadecimal values (256 bits). To start with, here is a zero hash and a function to generate random hashes.

(def zero-hex (apply str (vec (repeat 64 0))))
"0000000000000000000000000000000000000000000000000000000000000000"
(defn random-hex
  "Generate a random hex string representing length `n` bytes."
  [n]
  (let [hex-chars "0123456789abcdef"]
    (apply str (repeatedly (* 2 n) #(rand-nth hex-chars)))))

Generate some random hashes for testing:

(def a-hex (random-hex 32))
"43b37b393ddd8932863525848da362a2b9a2fb2203ad546dd8b2a1cf9904336f"
(def b-hex (random-hex 32))
"bc1781aee63492f19f91eec6c0287cfd9accbb16f32db8116fe0a13af7cb6f20"
(def c-hex (random-hex 32))
"381241282344a744ba9adb7279d0417a2d2db2abc6a7e58c00800fd001586ef0"

To fuse two hashes, we convert each hash to an upper triangular matrix and then multiply the two matrices together. The result is another upper triangular matrix which can be converted back to a hash by taking the elements above the main diagonal. The following several sections defines this mapping between hashes and upper triangular matrices. For the experiments below four different bit sizes of cells and four corresponding matrices are defined.

8 bit cells in a 9x9 matrix:

\[%% Example: 9x9 Upper Triangular Matrix for 8 bit cells %% representing a 256 bit hash hash \to [ h_0, h_1, h_2, h_3, \ldots, h_{31} ] \to {\begin{bmatrix} 1 & h_0 & h_8 & h_{15} & h_{21} & h_{26} & h_{29} & h_{31} & h_{25} \\ 0 & 1 & h_1 & h_9 & h_{16} & h_{22} & h_{27} & h_{30} & h_{20} \\ 0 & 0 & 1 & h_2 & h_{10} & h_{17} & h_{23} & h_{28} & h_{14} \\ 0 & 0 & 0 & 1 & h_3 & h_{11} & h_{18} & h_{24} & h_7 \\ 0 & 0 & 0 & 0 & 1 & h_4 & h_{12} & h_{19} & 0 \\ 0 & 0 & 0 & 0 & 0 & 1 & h_5 & h_{13} & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 1 & h_6 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 \end{bmatrix}} \]

16 bit cells in a 7x7 matrix:

\[%% Example: 7x7 Upper Triangular Matrix for 16 bit cells %% representing a 256 bit hash hash \to [ h_0, h_1, h_2, h_3, \ldots, h_{15} ] \to {\begin{bmatrix} 1 & h_0 & h_6 & h_{10} & h_{13} & h_{15} & h_5 \\ 0 & 1 & h_1 & h_7 & h_{11} & h_{14} & 0 \\ 0 & 0 & 1 & h_2 & h_8 & h_{12} & 0 \\ 0 & 0 & 0 & 1 & h_3 & h_9 & 0 \\ 0 & 0 & 0 & 0 & 1 & h_4 & 0 \\ 0 & 0 & 0 & 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 1 \end{bmatrix}} \]

32 bit cells in a 5x5 matrix:

\[%% Example: 5x5 Upper Triangular Matrix for 32 bit cells %% representing a 256 bit hash hash \to [ h_0, h_1, h_2, h_3, \ldots, h_7 ] \to {\begin{bmatrix} 1 & h_0 & h_4 & h_7 & h_6 \\ 0 & 1 & h_1 & h_5 & h_3 \\ 0 & 0 & 1 & h_2 & 0 \\ 0 & 0 & 0 & 1 & 0 \end{bmatrix}} \]

64 bit cells in a 4x4 matrix:

\[%% Example: 4x4 Upper Triangular Matrix for 64 bit cells %% representing a 256 bit hash hash \to [ h_0, h_1, h_2, h_3 ] \to {\begin{bmatrix} 1 & h_0 & h_3 & h_2 \\ 0 & 1 & h_1 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \end{bmatrix}} \]

Hex and Byte Vector Conversion Functions

The first type of conversion is between hex strings and byte vectors.

(defn hex->hash8
  "Convert a hex string to a byte vector."
  [s]
  (with-meta
    (->> s
         (partition 2)
         (map #(-> (apply str %)
                   (Integer/parseInt 16)))
         (vec))
    {:cell-size 8}))
(defn hash8->hex
  "Convert a byte vector to a hex string."
  [bytes]
  (->> bytes
       (map #(format "%02x" %))
       (apply str)))
(= a-hex (-> a-hex hex->hash8 hash8->hex))

Hash Conversion Functions

The next type of conversion is from bytes and larger word sizes.

(defn hash8->hash16
  "Convert a vector of 32 bytes into a vector of 16 2-byte words."
  [hash8]
  (with-meta
    (vec (map (fn [i]
                (+ (bit-shift-left (hash8 (* 2 i)) 8)
                   (hash8 (+ 1 (* 2 i)))))
              (range 16)))
    {:cell-size 16}))
(defn hash16->hash8
  "Convert a vector 2-byte words into a vector of bytes."
  [hash16]
  (with-meta
    (vec (mapcat (fn [word]
                   [(bit-and (bit-shift-right word 8) 0xFF)
                    (bit-and word 0xFF)])
                 hash16))
    {:cell-size 8}))
(= a-hex (-> a-hex hex->hash8 hash8->hash16 hash16->hash8 hash8->hex))
(defn hash8->hash32
  "Convert a vector of 32 bytes into a vector of 8 4-byte words."
  [hash8]
  (with-meta
    (vec (map (fn [i]
                (+ (bit-shift-left (hash8 (* 4 i)) 24)
                   (bit-shift-left (hash8 (+ 1 (* 4 i))) 16)
                   (bit-shift-left (hash8 (+ 2 (* 4 i))) 8)
                   (hash8 (+ 3 (* 4 i)))))
              (range 8)))
    {:cell-size 32}))
(defn hash32->hash8
  "Convert a vector of 4-byte words into a vector of bytes."
  [hash32]
  (with-meta
    (vec (mapcat (fn [word]
                   [(bit-and (bit-shift-right word 24) 0xFF)
                    (bit-and (bit-shift-right word 16) 0xFF)
                    (bit-and (bit-shift-right word 8) 0xFF)
                    (bit-and word 0xFF)])
                 hash32))
    {:cell-size 8}))
(= a-hex (-> a-hex hex->hash8 hash8->hash32 hash32->hash8 hash8->hex))
(defn hash8->hash64
  "Convert a vector of 32 bytes into a vector of 4 8-byte words."
  [hash8]
  (with-meta
    (vec (map (fn [i]
                (+ (bit-shift-left (hash8 (* 8 i)) 56)
                   (bit-shift-left (hash8 (+ 1 (* 8 i))) 48)
                   (bit-shift-left (hash8 (+ 2 (* 8 i))) 40)
                   (bit-shift-left (hash8 (+ 3 (* 8 i))) 32)
                   (bit-shift-left (hash8 (+ 4 (* 8 i))) 24)
                   (bit-shift-left (hash8 (+ 5 (* 8 i))) 16)
                   (bit-shift-left (hash8 (+ 6 (* 8 i))) 8)
                   (hash8 (+ 7 (* 8 i)))))
              (range 4)))
    {:cell-size 64}))
(defn hash64->hash8
  "Convert a vector of 8-byte words into a vector of bytes."
  [hash64]
  (with-meta
    (vec (mapcat (fn [word]
                   [(bit-and (bit-shift-right word 56) 0xFF)
                    (bit-and (bit-shift-right word 48) 0xFF)
                    (bit-and (bit-shift-right word 40) 0xFF)
                    (bit-and (bit-shift-right word 32) 0xFF)
                    (bit-and (bit-shift-right word 24) 0xFF)
                    (bit-and (bit-shift-right word 16) 0xFF)
                    (bit-and (bit-shift-right word 8) 0xFF)
                    (bit-and word 0xFF)])
                 hash64))
    {:cell-size 8}))
(= a-hex (-> a-hex hex->hash8 hash8->hash64 hash64->hash8 hash8->hex))

Matrix Conversion Functions

The following functions convert between byte vectors representing hashes and four different upper triangular matrix sizes based on cell bit size.

(defn hash8->utm8
  "Convert a vector of 32 bytes into a 9x9 upper triangular matrix."
  [[h00 h01 h02 h03 h04 h05 h06 h07
    h08 h09 h10 h11 h12 h13 h14 h15
    h16 h17 h18 h19 h20 h21 h22 h23
    h24 h25 h26 h27 h28 h29 h30 h31]]
  (with-meta
    [[1 h00 h08 h15 h21 h26 h29 h31 h25]
     [0   1 h01 h09 h16 h22 h27 h30 h20]
     [0   0   1 h02 h10 h17 h23 h28 h14]
     [0   0   0   1 h03 h11 h18 h24 h07]
     [0   0   0   0   1 h04 h12 h19   0]
     [0   0   0   0   0   1 h05 h13   0]
     [0   0   0   0   0   0   1 h06   0]
     [0   0   0   0   0   0   0   1   0]
     [0   0   0   0   0   0   0   0   1]]
    {:cell-size 8}))
(defn utm8->hash8
  "Convert a 9x9 upper triangular matrix back to a vector of 32 bytes."
  [[[_ h00 h08 h15 h21 h26 h29 h31 h25]
    [_   _ h01 h09 h16 h22 h27 h30 h20]
    [_   _   _ h02 h10 h17 h23 h28 h14]
    [_   _   _   _ h03 h11 h18 h24 h07]
    [_   _   _   _   _ h04 h12 h19   _]
    [_   _   _   _   _   _ h05 h13   _]
    [_   _   _   _   _   _   _ h06   _]
    [_   _   _   _   _   _   _   _   _]
    [_   _   _   _   _   _   _   _   _]]]
  (with-meta
    (mapv #(bit-and % 0xFF)
          [h00 h01 h02 h03 h04 h05 h06 h07
           h08 h09 h10 h11 h12 h13 h14 h15
           h16 h17 h18 h19 h20 h21 h22 h23
           h24 h25 h26 h27 h28 h29 h30 h31])
    {:cell-size 8}))
(defn hash16->utm16
  "Convert a vector of 16 2-byte words into a 7x7 upper triangular matrix."
  [[h00 h01 h02 h03 h04 h05 h06 h07
    h08 h09 h10 h11 h12 h13 h14 h15]]
  (with-meta
    [[1 h00 h06 h10 h13 h15 h05]
     [0   1 h01 h07 h11 h14   0]
     [0   0   1 h02 h08 h12   0]
     [0   0   0   1 h03 h09   0]
     [0   0   0   0   1 h04   0]
     [0   0   0   0   0   1   0]
     [0   0   0   0   0   0   1]]
    {:cell-size 16}))
(defn utm16->hash16
  "Convert a 7x7 upper triangular matrix back to a vector of 16 2-byte words."
  [[[_ h00 h06 h10 h13 h15 h05]
    [_   _ h01 h07 h11 h14   _]
    [_   _   _ h02 h08 h12   _]
    [_   _   _   _ h03 h09   _]
    [_   _   _   _   _ h04   _]
    [_   _   _   _   _   _   _]
    [_   _   _   _   _   _   _]]]
  (with-meta
    (mapv #(bit-and % 0xFFFF)
          [h00 h01 h02 h03 h04 h05 h06 h07
           h08 h09 h10 h11 h12 h13 h14 h15])
    {:cell-size 16}))
(defn hash32->utm32
  "Convert a vector of 8 4-byte words into a 5x5 upper triangular matrix."
  [[h00 h01 h02 h03 h04 h05 h06 h07]]
  (with-meta
    [[1 h00 h04 h07 h06]
     [0   1 h01 h05 h03]
     [0   0   1 h02   0]
     [0   0   0   1   0]
     [0   0   0   0   1]]
    {:cell-size 32}))
(defn utm32->hash32
  "Convert a 5x5 upper triangular matrix back to a vector of 8 4-byte words."
  [[[_ h00 h04 h07 h06]
    [_   _ h01 h05 h03]
    [_   _   _ h02   _]
    [_   _   _   _   _]
    [_   _   _   _   _]]]
  (with-meta
    (mapv #(bit-and % 0xFFFFFFFF)
          [h00 h01 h02 h03 h04 h05 h06 h07])
    {:cell-size 32}))
(defn hash64->utm64
  "Convert a vector of 4 8-byte words into a 4x4 upper triangular matrix."
  [[h00 h01 h02 h03]]
  (with-meta
    [[1 h00 h03 h02]
     [0   1 h01   0]
     [0   0   1   0]
     [0   0   0   1]]
    {:cell-size 64}))
(defn utm64->hash64
  "Convert a 4x4 upper triangular matrix back to a vector of 4 8-byte words."
  [[[_ h00 h03 h02]
    [_   _ h01   _]
    [_   _   _   _]
    [_   _   _   _]]]
  (with-meta
    [h00 h01 h02 h03]
    {:cell-size 64}))

Combined Conversion Functions

The following combined conversion functions convert between hex strings into upper triangular matrices and back for the four different cell bit sizes.

(def hex->utm8
  "Convert a hex string to an upper triangular matrix with 8-bit cells."
  (comp hash8->utm8 hex->hash8))
(def utm8->hex
  "Convert an upper triangular matrix with 8-bit cells to a hex string."
  (comp hash8->hex utm8->hash8))
(= a-hex (-> a-hex hex->utm8 utm8->hex))
(def hex->utm16
  "Convert a hex string to an upper triangular matrix with 16-bit cells."
  (comp hash16->utm16 hash8->hash16 hex->hash8))
(def utm16->hex
  "Convert an upper triangular matrix with 16-bit cells to a hex string."
  (comp hash8->hex hash16->hash8 utm16->hash16))
(= a-hex (-> a-hex hex->utm16 utm16->hex))
(def hex->utm32
  "Convert a hex string to an upper triangular matrix with 32-bit cells."
  (comp hash32->utm32 hash8->hash32 hex->hash8))
(def utm32->hex
  "Convert an upper triangular matrix with 32-bit cells to a hex string."
  (comp hash8->hex hash32->hash8 utm32->hash32))
(= a-hex (-> a-hex hex->utm32 utm32->hex))
(def hex->utm64
  "Convert a hex string to an upper triangular matrix with 64-bit cells."
  (comp hash64->utm64 hash8->hash64 hex->hash8))
(def utm64->hex
  "Convert an upper triangular matrix with 64-bit cells to a hex string."
  (comp hash8->hex hash64->hash8 utm64->hash64))
(= a-hex (-> a-hex hex->utm64 utm64->hex))
(defn hex->utm
  "Convert a hex string to an upper triangular matrix with the given cell size
  (8, 16, 32, or 64 bits)."
  [hex cell-size]
  (case cell-size
    8  (hex->utm8 hex)
    16 (hex->utm16 hex)
    32 (hex->utm32 hex)
    64 (hex->utm64 hex)
    (throw (ex-info "Unsupported cell size for upper triangular matrix."
                    {:cell-size cell-size}))))
(defn utm->hex
  "Convert an upper triangular matrix with the given cell size (8, 16, 32,
  or 64 bits) to a hex string."
  [utm cell-size]
  (case cell-size
    8  (utm8->hex utm)
    16 (utm16->hex utm)
    32 (utm32->hex utm)
    64 (utm64->hex utm)
    (throw (ex-info "Unsupported cell size for upper triangular matrix."
                    {:cell-size cell-size}))))
(= a-hex (-> a-hex (hex->utm 32) (utm->hex 32)))

Upper Triangular Matrix Multiplication

This is the core function that performs the multiplication of two upper triangular matrices. The multiplication ignores the lower triangular part of of the matrices since they are always 0 (and always 1 on the main diagonal).

Note that unchecked math is enabled to ignore integer overflow since the cells are treated as fixed size bit fields.

(set! *unchecked-math* true)
(defn utm-multiply
  "Multiply two upper triangular matrices `a` and `b`."
  [a b]
  (let [dim (count a)
        cell-size (-> a meta :cell-size)
        bit-mask (if (= cell-size 64)
                   -1
                   (dec (bit-shift-left 1 cell-size)))]
    (with-meta
      (vec (for [i (range dim)]
             (vec (for [j (range dim)]
                    (cond
                      (< j i) 0
                      (= j i) 1
                      :else
                      (reduce (fn [sum k]
                                (-> sum
                                    (+ (* (get-in a [i k])
                                          (get-in b [k j])))
                                    (bit-and bit-mask)))
                              0
                              (range i (inc j))))))))
      {:cell-size cell-size})))

Associativity & Non-Commutativity Properties

Show that upper triangular matrix multiplication is associative and non-commutative. Associativity is necessary for hash fusing to work with Finger Trees so that different tree shapes produce the same fused hash. Non-commutativity is necessary for sequences of data where the order of data affects the fused hash.

(-> (for [cell-size [8 16 32 64]]
      (let [a (hex->utm a-hex cell-size)
            b (hex->utm b-hex cell-size)
            c (hex->utm c-hex cell-size)
            ab (utm-multiply a b)
            ba (utm-multiply b a)
            bc (utm-multiply b c)
            ab*c (utm-multiply ab c)
            a*bc (utm-multiply a bc)]
        {:cell-size cell-size
         :associative? (= ab*c a*bc)
         :commutative? (= ab ba)}))
    (kind/table))
cell-size associative? commutative?
8 true false
16 true false
32 true false
64 true false

Experiment 1: Random Fuses

This experiment is run with two different hashes which are fused onto an accumulator iteratively. For each iteration, one of the two hashes is randomly selected and is fused to the accumulator. This random fusing is repeated many times and the quality of the accumulator is measured both by keeping track of global uniqueness after each fuse and by the uniform distribution of bit values.

(defn random-fuses
  "Perform random fuses of two hashes onto an accumulator keeping track of all
  unique hashes produced and all duplicate hashes produced. Repeat this for all
  four utm sizes based on cell bit size."
  []
  (let [hashes {:a (random-hex 32)
                :b (random-hex 32)}
        ;; convert hashes to utms for each bit size and store in a map
        utms {8  (update-vals hashes hex->utm8)
              16 (update-vals hashes hex->utm16)
              32 (update-vals hashes hex->utm32)
              64 (update-vals hashes hex->utm64)}
        results {8  {:acc (hex->utm8 zero-hex)
                     :uniques   #{}
                     :duplicates #{}}
                 16 {:acc (hex->utm16 zero-hex)
                     :uniques   #{}
                     :duplicates #{}}
                 32 {:acc (hex->utm32 zero-hex)
                     :uniques   #{}
                     :duplicates #{}}
                 64 {:acc (hex->utm64 zero-hex)
                     :uniques   #{}
                     :duplicates #{}}}
        results (reduce
                 (fn [results _]
                   (reduce
                    (fn [results cell-size]
                      (let [curr-acc (get-in results [cell-size :acc])
                            ;; randomly select one of the two hashes
                            selected-hash (if (< (rand) 0.5)
                                            (get-in utms [cell-size :a])
                                            (get-in utms [cell-size :b]))
                            ;; fuse the selected hash onto the accumulator
                            new-acc (utm-multiply curr-acc selected-hash)]
                        ;; update results with new accumulator and uniqueness info
                        (if (contains? (get-in results [cell-size :uniques]) new-acc)
                          (update-in results [cell-size :duplicates] conj new-acc)
                          (-> results
                              (assoc-in [cell-size :acc] new-acc)
                              (update-in [cell-size :uniques] conj new-acc)))))
                    results
                    [8 16 32 64]))
                 results
                 (range 10000))]
    ;; convert final results to totals of unique and duplicate hashes
    (->> results
         (map (fn [[cell-size {:keys [uniques duplicates]}]]
                {:cell-size   cell-size
                 :uniques     (count uniques)
                 :duplicates (count duplicates)}))
         (kind/table))))

Random Fuses Results

cell-size uniques duplicates
8 10000 0
16 10000 0
32 10000 0
64 10000 0

These results show that high entropy data can be well represented by fusing hashes together. All four bit sizes for the cells of the upper triangular matrices performed well with high entroy data. I have rerun this experiment with millions of fuses and have never observed a duplicate hash being produced. Since 64 bit cells fit in the smallest matrix, this is the fasted to compute.

Experiment 2: Folded Fuses

This experiment is run with a hash fused together with itself (i.e. folding) and then the result in turn fused with itself and so on. This folding is repeated many times and the quality of the accumulator is measured both by keeping track of global uniqueness after each fuse and by the uniform distribution of bit values. Once the accumulator becomes zero the folding stops and the number of folds to reach this state is recorded. Also, the number of lower bits that are zero across all cells is recorded after each fold.

(defn calc-zero-lower-bits
  "Calculate the number of lower bits that are zero across all upper cells in
  the upper triangular matrix."
  [utm]
  (let [cell-size (-> utm meta :cell-size)]
    (loop [mask 1 zero-lower-bits 0]
      (if (and (< zero-lower-bits cell-size)
               (->> (for [i (range (count utm))
                          j (range (inc i) (count utm))]
                      (get-in utm [i j]))
                    (map #(bit-and mask %))
                    (every? zero?)))
        (recur (bit-shift-left mask 1)
               (inc zero-lower-bits))
        zero-lower-bits))))
(defn folded-fuses
  "Perform folded fuses starting from the fixed hash `a-hex`, converted to an
  upper triangular matrix (UTM) for each cell size. After each fold, track how
  many lower bits of each cell are zero across all upper cells using
  `calc-zero-lower-bits`. Stop when all those lower bits are zero (i.e.,
  `zero-lower-bits` equals the cell size) or when 1000 folds have been
  performed. Report, for each cell size and fold, the fold count, the number
  of zero lower bits, and the resulting hash."
  []
  (let [;; convert hashes to utms for each bit size and store in a map
        results (reduce
                 (fn [results cell-size]
                   (let [fold (hex->utm a-hex cell-size)
                         zero-lower-bits (calc-zero-lower-bits fold)
                         fold-result
                         (loop [folds [{:fold fold
                                        :zero-lower-bits zero-lower-bits}]]
                           (let [{:keys [fold zero-lower-bits]} (last folds)]
                             (if (or (= zero-lower-bits cell-size)
                                     (>= (count folds) 1000))
                               folds
                               (let [fold (utm-multiply fold fold)
                                     zero-lower-bits (calc-zero-lower-bits fold)]
                                 (recur (conj folds
                                              {:fold fold
                                               :zero-lower-bits zero-lower-bits}))))))]
                     (assoc results cell-size fold-result)))
                 {}
                 [8 16 32 64])]
    ;; format results into a table
    (->> results
         (mapcat (fn [[cell-size folds]]
                   (map-indexed (fn [index {:keys [fold zero-lower-bits]}]
                                  {:cell-size      cell-size
                                   :fold-count     index
                                   :zero-lower-bits zero-lower-bits
                                   :fold (-> [:pre (utm->hex fold cell-size)]
                                             (kind/hiccup))})

                                folds)))
         (kind/table))))

Folded Fuses Results

cell-size fold-count zero-lower-bits fold
8 0 0
43b37b393ddd8932863525848da362a2b9a2fb2203ad546dd8b2a1cf9904336f
8 1 0
8666f6727aba1264e56bad9dc38bca851e814f90e63557aea11dcb355f5fee9e
8 2 0
0cccece4f47424c82edae68e2a2aac1ad05286141419ffc72f4458d30b0ad827
8 3 1
1898d8c8e8e84890ecc4fc6ce4a4b8d410c4cc18886e82fa52b0a002d2503cfa
8 4 2
3030b090d0d0902018c8b8180888f028e00898f0904c9424f40040b454304864
8 5 3
60606020a0a02040309070301010e050c01030e0205868082880802868a0d088
8 6 4
c0c0c040404040806020e0602020c0a0802060c040b0d01050000050d040a010
8 7 5
8080808080808000c040c0c0404080400040c0808060a020a00000a0a0804020
8 8 6
000000000000000080808080808000800080800000c040404000004040008040
8 9 7
0000000000000000000000000000000000000000008080808000008080000080
8 10 8
0000000000000000000000000000000000000000000000000000000000000000
16 0 0
43b37b393ddd8932863525848da362a2b9a2fb2203ad546dd8b2a1cf9904336f
16 1 0
8766f6727bba12640c6a4b082f21ba79cd6e899e8157c990d2488bc56fbf69d5
16 2 1
0eccece4f77424c818d49610adae49c6038460a436de1b60c2e8a814c0dce450
16 3 2
1d98d9c8eee8499031a82c20990ce6dca9a8f6e8a05c8300d5700dd026b03868
16 4 3
3b30b390ddd093206350584028d81af8ddd0c4501a3891009b60af409440e670
16 5 4
76606720bba02640c6a0b0802cb06af0e5a0e2a012701e0088c0a5004c000760
16 6 5
ecc0ce4077404c808d406100c560a9e073402d405ce0ac00d9802400660018c0
16 7 6
d9809c80ee8099001a80c2003ac0a3c08680fa8099c01800d300b00004005980
16 8 7
b3003900dd00320035008400358087808d007500b380300026000000e8005300
16 9 8
66007200ba0064006a0008006b000f001a00ea00670060004c00800050002600
16 10 9
cc00e4007400c800d4001000d6001e003400d400ce00c00098000000a0004c00
16 11 10
9800c800e8009000a8002000ac003c006800a8009c0080003000000040009800
16 12 11
30009000d00020005000400058007800d0005000380000006000000080003000
16 13 12
60002000a0004000a0008000b000f000a000a00070000000c000000000006000
16 14 13
c000400040008000400000006000e00040004000e0000000800000000000c000
16 15 14
800080008000000080000000c000c00080008000c00000000000000000008000
16 16 15
0000000000000000000000008000800000000000800000000000000000000000
16 17 16
0000000000000000000000000000000000000000000000000000000000000000
32 0 0
43b37b393ddd8932863525848da362a2b9a2fb2203ad546dd8b2a1cf9904336f
32 1 0
8766f6727bbb12640c6a4b081b46c544ae7a886685c3a0a218700fb0eac6f9ab
32 2 1
0ecdece4f77624c818d49610368d8a8849c75954052b2064cd0b4fa86d994caa
32 3 2
1d9bd9c8eeec499031a92c206d1b151046d7d4c8f0e5bd480ac36070e3e86fa4
32 4 3
3b37b390ddd8932063525840da362a205ad432107c096c90d839c5602eebc088
32 5 4
766f6720bbb12640c6a4b080b46c5440ea3a8620610aa120bb3f9cc01c5f4610
32 6 5
ecdece4077624c808d49610068d8a880a6bd944065f46240a1af818043bfa020
32 7 6
d9bd9c80eec499001a92c200d1b15100969d48805b654480f02023003a939040
32 8 7
b37b3900dd89320035258400a362a20051c31100f4bc89009344c60079f86080
32 9 8
66f67200bb1264006a4b080046c5440035a82200e1411200f29b8c00cb35c100
32 10 9
cdece4007624c800d49610008d8a8800b3d84400a1a22400157f1800137f8200
32 11 10
9bd9c800ec499000a92c20001b15100089d08800bfc44800ec1e30001b4f0400
32 12 11
37b39000d893200052584000362a20009c21100071889000dcbc600007de0800
32 13 12
6f672000b1264000a4b080006c5440005a422000ab112000cb78c00054bc1000
32 14 13
dece4000624c800049610000d8a880003c84400076224000def18000bd782000
32 15 14
bd9c8000c499000092c20000b1510000990880006c448000dde30000caf04000
32 16 15
7b390000893200002584000062a20000b2110000d88900003bc60000d5e08000
32 17 16
f6720000126400004b080000c544000064220000b1120000778c0000abc10000
32 18 17
ece4000024c80000961000008a880000c844000062240000ef18000057820000
32 19 18
d9c80000499000002c2000001510000090880000c4480000de300000af040000
32 20 19
b390000093200000584000002a2000002110000088900000bc6000005e080000
32 21 20
6720000026400000b080000054400000422000001120000078c00000bc100000
32 22 21
ce4000004c80000061000000a88000008440000022400000f180000078200000
32 23 22
9c80000099000000c2000000510000000880000044800000e3000000f0400000
32 24 23
390000003200000084000000a20000001100000089000000c6000000e0800000
32 25 24
7200000064000000080000004400000022000000120000008c000000c1000000
32 26 25
e4000000c8000000100000008800000044000000240000001800000082000000
32 27 26
c800000090000000200000001000000088000000480000003000000004000000
32 28 27
9000000020000000400000002000000010000000900000006000000008000000
32 29 28
200000004000000080000000400000002000000020000000c000000010000000
32 30 29
4000000080000000000000008000000040000000400000008000000020000000
32 31 30
8000000000000000000000000000000080000000800000000000000040000000
32 32 31
0000000000000000000000000000000000000000000000000000000080000000
32 33 32
0000000000000000000000000000000000000000000000000000000000000000
64 0 0
43b37b393ddd8932863525848da362a2b9a2fb2203ad546dd8b2a1cf9904336f
64 1 1
8766f6727bbb12640c6a4b091b46c5447345f644075aa8dab91387ee0b945c82
64 2 2
0ecdece4f77624c818d49612368d8a88e68bec880eb551b490e021177d588f94
64 3 3
1d9bd9c9eeec499031a92c246d1b1510cd17d9101d6aa3689ca4871c93707968
64 4 4
3b37b393ddd8932063525848da362a209a2fb2203ad546d024da21ef89de5bd0
64 5 5
766f6727bbb12640c6a4b091b46c5440345f644075aa8da0f7f892b89fb25ba0
64 6 6
ecdece4f77624c808d49612368d8a88068bec880eb551b40a90260d76f3b4740
64 7 7
d9bd9c9eeec499001a92c246d1b15100d17d9101d6aa36803649af479dd0ce80
64 8 8
b37b393ddd8932003525848da362a200a2fb2203ad546d00fda714f2390a9d00
64 9 9
66f6727bbb1264006a4b091b46c5440045f644075aa8da003f9d037067b93a00
64 10 10
cdece4f77624c800d49612368d8a88008bec880eb551b40090756d10a6027400
64 11 11
9bd9c9eeec499000a92c246d1b15100017d9101d6aa3680065d872e0a644e800
64 12 12
37b393ddd8932000525848da362a20002fb2203ad546d000df6748beb589d000
64 13 13
6f6727bbb1264000a4b091b46c5440005f644075aa8da0000da81d730f13a000
64 14 14
dece4f77624c800049612368d8a88000bec880eb551b400056b66abcae274000
64 15 15
bd9c9eeec499000092c246d1b15100007d9101d6aa3680009b0594d39c4e8000
64 16 16
7b393ddd8932000025848da362a20000fb2203ad546d0000ec6e2710389d0000
64 17 17
f6727bbb126400004b091b46c5440000f644075aa8da0000b26843c4713a0000
64 18 18
ece4f77624c800009612368d8a880000ec880eb551b40000cb005e18e2740000
64 19 19
d9c9eeec499000002c246d1b15100000d9101d6aa36800002ec01671c4e80000
64 20 20
b393ddd8932000005848da362a200000b2203ad546d00000c07d95e389d00000
64 21 21
6727bbb126400000b091b46c54400000644075aa8da000000cf0cfc713a00000
64 22 22
ce4f77624c800000612368d8a8800000c880eb551b40000049b82f8e27400000
64 23 23
9c9eeec499000000c246d1b1510000009101d6aa3680000052ca9f1c4e800000
64 24 24
393ddd8932000000848da362a20000002203ad546d000000a2fe3e389d000000
64 25 25
727bbb1264000000091b46c54400000044075aa8da0000003ba07c713a000000
64 26 26
e4f77624c800000012368d8a88000000880eb551b40000004dd0f8e274000000
64 27 27
c9eeec4990000000246d1b1510000000101d6aa368000000f5e1f1c4e8000000
64 28 28
93ddd8932000000048da362a20000000203ad546d000000054c3e389d0000000
64 29 29
27bbb1264000000091b46c54400000004075aa8da00000004d87c713a0000000
64 30 30
4f77624c800000002368d8a88000000080eb551b400000002b0f8e2740000000
64 31 31
9eeec4990000000046d1b1510000000001d6aa3680000000961f1c4e80000000
64 32 32
3ddd8932000000008da362a20000000003ad546d000000002c3e389d00000000
64 33 33
7bbb1264000000001b46c54400000000075aa8da00000000587c713a00000000
64 34 34
f77624c800000000368d8a88000000000eb551b400000000b0f8e27400000000
64 35 35
eeec4990000000006d1b1510000000001d6aa3680000000061f1c4e800000000
64 36 36
ddd8932000000000da362a20000000003ad546d000000000c3e389d000000000
64 37 37
bbb1264000000000b46c54400000000075aa8da00000000087c713a000000000
64 38 38
77624c800000000068d8a88000000000eb551b40000000000f8e274000000000
64 39 39
eec4990000000000d1b1510000000000d6aa3680000000001f1c4e8000000000
64 40 40
dd89320000000000a362a20000000000ad546d00000000003e389d0000000000
64 41 41
bb1264000000000046c54400000000005aa8da00000000007c713a0000000000
64 42 42
7624c800000000008d8a880000000000b551b40000000000f8e2740000000000
64 43 43
ec499000000000001b151000000000006aa3680000000000f1c4e80000000000
64 44 44
d893200000000000362a200000000000d546d00000000000e389d00000000000
64 45 45
b1264000000000006c54400000000000aa8da00000000000c713a00000000000
64 46 46
624c800000000000d8a8800000000000551b4000000000008e27400000000000
64 47 47
c499000000000000b151000000000000aa368000000000001c4e800000000000
64 48 48
893200000000000062a2000000000000546d000000000000389d000000000000
64 49 49
1264000000000000c544000000000000a8da000000000000713a000000000000
64 50 50
24c80000000000008a8800000000000051b4000000000000e274000000000000
64 51 51
49900000000000001510000000000000a368000000000000c4e8000000000000
64 52 52
93200000000000002a2000000000000046d000000000000089d0000000000000
64 53 53
264000000000000054400000000000008da000000000000013a0000000000000
64 54 54
4c80000000000000a8800000000000001b400000000000002740000000000000
64 55 55
9900000000000000510000000000000036800000000000004e80000000000000
64 56 56
3200000000000000a2000000000000006d000000000000009d00000000000000
64 57 57
64000000000000004400000000000000da000000000000003a00000000000000
64 58 58
c8000000000000008800000000000000b4000000000000007400000000000000
64 59 59
900000000000000010000000000000006800000000000000e800000000000000
64 60 60
20000000000000002000000000000000d000000000000000d000000000000000
64 61 61
40000000000000004000000000000000a000000000000000a000000000000000
64 62 62
8000000000000000800000000000000040000000000000004000000000000000
64 63 63
0000000000000000000000000000000080000000000000008000000000000000
64 64 64
0000000000000000000000000000000000000000000000000000000000000000

These results show that repeated folding devolves into a zero hash after only a few folds. The number of bits in a cell nearly exactly defines the number of folds needed before reaching a zero hash. Since each fold represents a doubling of the number of fuses of the same value, folding shows that fusing can not reliably represent long runs of repeated values.

Practical Usage

Based on the experiments above, here are recommendations for using hash fusing in practice.

Detecting Low-Entropy Failures

The key insight from Experiment 2 is that low-entropy data causes the lower bits to become zero progressively. We can detect this condition and fail fast rather than silently producing degenerate hashes.

The following function fuses two hashes and raises an error if the result shows signs of low-entropy degeneration (lower 32 bits all zero):

(defn high-entropy-fuse
  "Fuse two 256-bit hashes together via upper triangular matrix multiplication
  with 64-bit cells. Raise an error when the lower 32 bits of the result are
  all zero."
  [a-hex b-hex]
  (let [a (hex->utm64 a-hex)
        b (hex->utm64 b-hex)
        ab (utm-multiply a b)]
    (if (->> (for [i (range (count a))
                   j (range (inc i) (count a))]
               (get-in ab [i j]))
             (map #(bit-and 0xFFFFFFFF %))
             (every? zero?))
      (throw (ex-info "Low entropy data detected during hash fusing."
                      {:a a-hex :b b-hex :fused (utm64->hex ab)}))
      (utm64->hex ab))))

Example of high entropy data fusing successfully:

(high-entropy-fuse a-hex b-hex)
"ffcafce824121c2325c7144b4dcbdf9f546fb638f6db0c7ee2322ec2f30470f9"

Example of low entropy data causing an error:

(try
  (high-entropy-fuse zero-hex zero-hex)
  (catch Exception e
         (.getMessage e)))
"Low entropy data detected during hash fusing."

Handling Low-Entropy Data

When working with data that may contain long runs of repeated values, consider these strategies:

  1. Compression: Run-length encode or compress the data before hashing. This converts low-entropy patterns into high-entropy representations.

  2. Salting: Inject position-dependent noise into the hash sequence. For example, XOR each element’s hash with a hash of its index.

  3. Chunking: Break long sequences into smaller chunks and hash each chunk separately before fusing. This limits the damage from repetition.

Conclusion

This article described a method for fusing hashes together using upper triangular matrix multiplication. The method is associative and non-commutative.

Experiment 1 showed that high entropy data can be well represented by fusing hashes together and that the size of cell fields, from 8 to 64 bits, all performed well with high entropy data. Since 64 bit cells fit in the smallest matrix, this is the fastest to compute.

Experiment 2 showed that low entropy data, specifically long runs of repeated values, can not be well represented by fusing hashes together since they all approach a zero hash. The rate of approach is determined by the bit count in the cells. 64 bit cells were the most able to handle repeated values.

The Practical Usage section provides a working implementation with low-entropy detection, along with strategies for handling problematic data.

Appendix: Why Low Entropy Data Fails — Nilpotent Group Structure

The following explanation, based on interactions with a Claude-based AI assistant, is very well written and accurate.

The low-entropy failure observed in Experiment 2 is not an accident of implementation — it is a fundamental consequence of the algebraic structure we chose. Understanding this helps clarify both the limitations and the design space for alternatives.

Upper Triangular Matrices Form a Nilpotent Group

An n×n upper triangular matrix with 1s on the diagonal can be written as I + N, where I is the identity matrix and N is strictly upper triangular (all diagonal entries are 0). The key property of strictly upper triangular matrices is that they are nilpotent: repeated multiplication eventually yields zero.

Specifically, for an n×n strictly upper triangular matrix N:

N^n = 0 (the zero matrix)

This happens because each multiplication “shifts” the non-zero entries one diagonal further from the main diagonal. After n-1 multiplications, all entries have been shifted off the matrix.

How This Affects Hash Fusing

When we fuse a hash with itself (A × A), we are computing (I + N) × (I + N):

(I + N)² = I + 2N + N²

With our bit masking over w-bit cells, arithmetic is effectively done modulo 2^w. Multiplying by 2 shifts all bits left and, after masking back to w bits, forces the least significant bit of each cell to 0. After k squarings:

(I + N)(2k) = I + 2^k · N + (higher powers of N)

The coefficient 2^k means the lowest k bits of each cell are forced to zero under mod 2^w / bitmasking. This is exactly what Experiment 2 observed: each fold (squaring) zeros out one more bit, and after cell-size folds, all bits are zero.

Why This Is Fundamental

The nilpotent structure is intrinsic to upper triangular matrices. Any associative, non-commutative composition based on this structure will exhibit the same behavior. This is not a bug — it’s a mathematical property of the group we chose.

The Trade-off

We chose upper triangular matrices because they provide:

  • Associativity: Essential for Finger Trees
  • Non-commutativity: Essential for ordered sequences
  • Efficient representation: The matrix structure maps cleanly to hash bits

The cost is nilpotency, which manifests as the low-entropy failure mode. This trade-off is acceptable for high-entropy data (which is the common case for content hashes) and can be mitigated with detection and preprocessing for edge cases.

Alternative Structures

For applications where low-entropy data is common, one could explore non-nilpotent groups such as GL(n, F_p) — the general linear group over a finite field. However, this would sacrifice the sparse matrix efficiency and require more complex encoding. The upper triangular approach remains practical for most content-addressed data structure applications.

Appendix B: Alternatives Explored

Before settling on upper triangular matrices, several alternative approaches were investigated. This section documents those explorations and explains why they were not suitable.

Quaternion Multiplication

Quaternions are a natural candidate: they form a non-commutative algebra and have efficient 4-component representations. However, experiments with quaternion-based fusing revealed the same nilpotent-like behavior observed with upper triangular matrices.

When quaternions are encoded similarly to the UTM approach — with values close to the identity element (1 + εi + εj + εk where ε represents small perturbations) — repeated self-multiplication causes the perturbation terms to degenerate. While pure unit quaternions form a proper group (isomorphic to SU(2)), the encoding required for hash values reintroduces the same fundamental problem.

Modified Fusing Operations

Various modifications to the basic fusing operation were attempted, including:

  • XOR combined with rotations
  • Nonlinear mixing functions
  • Polynomial operations over finite fields

The consistent finding was that associativity is surprisingly fragile. Most intuitive “mixing” operations that might improve entropy preservation break associativity, which is essential for the Finger Tree use case.

Matrix multiplication is special precisely because it is one of the few operations that provides both associativity and non-commutativity naturally.

Why Upper Triangular Matrices Were Chosen

Despite the low-entropy limitation, upper triangular matrices remain the best practical choice because:

  1. Guaranteed associativity: Matrix multiplication is inherently associative — this cannot be broken by implementation choices.

  2. Guaranteed non-commutativity: For n > 1, matrix multiplication is non-commutative, preserving sequence order.

  3. Efficient sparse encoding: The upper triangular structure means only n(n-1)/2 cells carry information, mapping cleanly to hash bits.

  4. Predictable failure mode: The low-entropy degeneration is detectable and follows a known pattern (one bit per fold), making it possible to implement safeguards.

The alternatives explored either broke associativity (disqualifying them for Finger Trees) or exhibited the same degeneration behavior without the other benefits of the matrix approach.