serialization - Clojure: Store and Compile Large Derived Data Structure -


i have large data structure, tree, takes 2gb in ram. includes clojure sets in leaves, , refs branches. tree built reading , parsing large flat file , inserting rows tree. takes 30 seconds. there way can build tree once, emit clj file, , compile tree standalone jar can lookup values in tree without re-reading large text file? think trim out 30 second tree build, me deploy standalone jar without needing text file come along ride.

my first swing @ failed:

(def x (ref {:zebra (ref #{1 2 3 4})})) #<ref@6781a7dc: {:zebra #<ref@709c4f85: #{1 2 3 4}>}>  (def y #<ref@6781a7dc: {:zebra #<ref@709c4f85: #{1 2 3 4}>}>) runtimeexception unreadable form  clojure.lang.util.runtimeexception (util.java:219) 

embedding data big in compiled code may not possible because of size limits imposed upon jvm. in particular, no single method may exceed 64 kib in length. embedding data in way describe further below necessitates including tons of stuff in class file it's going live in; doesn't seem great idea.

given you're using data structure read-only, can construct once, emit .clj / .edn (that's edn, serialization format based on clojure literal notation), include file on class path "resource", it's included in überjar (in resources/ default leiningen settings; it'll included in überjar unless excluded :uberjar-exclusions in project.clj) , read resource @ runtime @ full speed of clojure's reader:

(ns foo.core   (:require [clojure.java.io :as io]))  (defn get-the-huge-data-structure []   (let [r   (io/resource "huge.edn")         rdr (java.io.pushbackreader. (io/reader r))]     (read r)))  ;; if this:  (def ds (get-the-huge-data-structure))  ;; app load data namespace required; ;; :main namespace, means app starts; ;; note if use aot compilation, it'll loaded @ ;; compile time 

you not add überjar, rather add classpath when running app. way überjar not have huge.

handling stuff other persistent clojure data accomplished using print-method (when serializing) , reader tags (when deserializing). arthur demonstrated using reader tags; use print-method, you'd like

(defmethod print-method clojure.lang.ref [x writer]   (.write writer "#ref ")   (print-method @x writer))  ;; repl, after doing above:  user=> (pr-str {:foo (ref 1)}) "{:foo #ref 1}" 

of course need have print-method methods defined when serializing; you're deserializing code can leave alone, need appropriate data readers.


disregarding code size issue moment, find data embedding issue interesting:

assuming data structure contains immutable data natively handled clojure (clojure persistent collections, arbitrarily nested, plus atomic items such numbers, strings (atomic purpose), keywords, symbols; no refs etc.), can indeed include in code:

(defmacro embed [x]   x) 

the generated bytecode recreate x without reading anything, using constants included in class file , static methods of clojure.lang.rt class (e.g. rt.vector , rt.map).

this is, of course, how literals compiled, since macro above noop. can make things more interesting though:

(ns embed-test.core   (:require [clojure.java.io :as io])   (:gen-class))  (defmacro embed-resource [r]   (let [r   (io/resource r)         rdr (java.io.pushbackreader. (io/reader r))]     (read r)))  (defn -main [& args]   (println (embed-resource "foo.edn"))) 

this read foo.edn @ compile time , embed result in compiled code (in sense of including appropriate constants , code reconstruct data in class file). @ run time, no further reading performed.


Comments

Popular posts from this blog

linux - xterm copying to CLIPBOARD using copy-selection causes automatic updating of CLIPBOARD upon mouse selection -

c++ - qgraphicsview horizontal scrolling always has a vertical delta -