I’m not sure which exact kind of cross-correlation you need, but if it’s what ordinary cross-correlation does, Deep Diamond’s convolution forward layer computes this. If you want to use high-level API, just create a convolution layer, set the activation at :linear, and provide the data and the kernel. The forward pass computes the cross-correlation. You can also find the low-level function in the internals.
Thanks Dragan. I was mostly just looking into how to implement this int he Clojure way “manually” - had I know it’d be this involved I probably would have just reached for a library
Your new library looks interesting I’m trying to move into machine learning so I’ll be following it for sure - but there are no docs or intros yet. I also see it’s using neanderthal
- which I’ve used and I really enjoyed (It’s much better/more-sane than Nd array solutions that are popular now a days and goofy broken things like core.matrix
). But while the API is great, it does require the MKL which makes it a bit of a no-go. I’m releasing my GUI application as a JAR and asking users to install the MKL is just too much b.c they’re all nontechnical (I’m having trouble just getting people to consistently install Java 11…). And bundling the MKL for all platforms in the JAR will balloon the application size.
I know you’ve explained to me that it’s not too much work to put in a different backend , but I’m just a bit scared to jump down that rabbit hole right now. Since you have an OpenCL implementation it could possible be bundled with POCL to generate native CPU matrix math kernels and maybe make a non-MKL CPU implementation without too much extra work… but even then you need to have compilation targets. I’d like to stick to my JVM bubble as long as I can! Once I leave it then I have other options as well - like just writing C++ and using the JNI. Hence this thread
In any case, I’m sure I’ll be using your awesome libraries in subsequent projects. I’ll keep you in the loop if I make anything noteworthy. Thanks as always for all your open source work
Any reason you work almost exclusively with doubles? I profiled your code and there’s some overhead on casts which could probably be cut down.
Thanks for taking a look @bsless
Could you share with me how you profiled it? I’m very curious - b/c I feel a bit like I’m poking around in the dark with VisualVM
I kinda just tried to keep everything as double for simplicity (and because at first I was not in-lining and using helper functions - which I type-hinted as ^double
). I liked messed something up though
Well, everything which is an array index is clearly a long
(which is what returns from all the unchecked-*
functions anyway).
How I profiled:
add clj-async-profiler as dependency: https://github.com/clojure-goes-fast/clj-async-profiler
run something like:
(prof/profile (dotimes [_ 100] (test-cross)))
look at the resulting flame graph.
Make sure you’re configured correctly according to the instruction in the repo.
Optionally, add the following jvm flags:
:jvm-flags ["-XX:+PreserveFramePointer"
"-XX:+UnlockDiagnosticVMOptions"
"-XX:+DebugNonSafepoints"]
It looks like there might be a few casts and unboxings which can be avoided.
but there are no docs or intros yet.
I have to correct you here. There are plenty of docs and intros. There’s even a full fledged book
Just saw this. I might take another pass at the vector impl just for kicks, and maybe try to shorten and make faster the array implementation. Also take a look at what the upper bound on perf would be with a neanderthal-based comparison. I think your observations are on track (e.g. destructuring costs and the like). The unboxing operations will show up, but they’re not obvious… One way to ensure you’re using primitive math everywhere is to use the primitve-math lib (either the stand-alone version or the one bundled in @generatme fast-math library), import the ns, and turn on primitive-math ops. Then you will get reflection warnings where it cannot infer the type, as well as guaranteed unboxed math. It’s a very clean way to tack down the boxing/unboxing issues, and typically yields a very nice performance boost (you have to go hint everything, but it’s fairly painless). I will probably have some more observations after digging through your revisions and profiling a bit.
A note on destructuring - if unavoidable, you can gain some speedups by defining type hinted container types, i.e.
(deftype Interval [^long start ^long end])
Then pass it as a type hinted argument and get the fields by interop.
If cuts down slightly on allocation and gives a slight speedup.
Ooof, thanks for the tip @bsless. It was a bit challenging to get working
So yeah, you need to make sure you have debugging symbols installed and then make sure perf
profiling security is disabled - this part is all explained in detail in clj-async-profiler
So I’d never had to add JVM flags - this was all new to me and not really explained anywhere (everywhere just assumes you know how to do it haha ). Just notes for anyone that comes across this later:
- You need to create an “alias” in your
deps.edn
that has the options enabled
:aliases {:dev {:jvm-opts ["-Djdk.attach.allowAttachSelf"
"-XX:+PreserveFramePointer"
"-XX:+UnlockDiagnosticVMOptions"
"-XX:+DebugNonSafepoints"]}}
- Here i chose the name
dev
but this key is random and has no meaning. Then you need to force CIDER to use this alias (this part is undocumented) by adding this to yourinit.el
in Emacs (or running from the *scratch* buffer)
(setq cider-clojure-cli-global-options "-Adev")
Then if you do everything correctly it will run the profile and dump the result into /temp/
I’m now seeing all the valls to valueOf
methods which must be the type conversions. The profile doesn’t give call sites so I’ve poked around and found where all the issues were This is a great tool to have! thanks
There is some funny stuff where in Clojure you can only type hint ^long
but for instance alength
returns an int
. However I seem to have managed to keep all the indices as int
without needing to explicitly type hint anything.
Now my flame graph is dominated by clojure.lang.RT.nth
- but I’m not explicitly using any Clojure PersistentVector
s (other than as the final return value) so I’m not quite sure where this is coming from. Is this an implicit data structure used in the loop
recursion? Or are they generated by let
blocks?
@joinr do let me know if you make anything of it. The vector implementation was more a brain storming session… it’s a bit half baked. The tricky part that I couldn’t resolve is that ideally in the first pass will you get overlap start-end indices and an overlap average and then the second pass needed to start will use those indices to avoid running across the whole vector. Now that I’m thinking about it again, it seems you could filter down the arrays onthe average-counting pass - I’m forgetting why I abandoned that idea
Oh and the type hinted container trick is great! I might revisit that if it doesn’t hurt performance much b/c it would really help clean up the code
you get nth
calls due to destructuring of the vectors you return in places like shift-range
Would you like to share the flame graph?
Sorry, I meant to send this yesterday
You’re right that there is some destructing in the “outer loop” - which I left for code-clarity and b/c I figured it wasn’t getting called too many times. (Though looking at the numbers it’s not-to-small of a percentage here). However I’m seeing these nth
calls in all functions.
EDIT: Oh, I guess it could be the destructuring of the function arguments (which I’m guessing counts as in-function time - and not caller time). It would be crazy if getting the function arguments is swamping the actual function body. I will need to play around with this
I fear it might be destructuring of the function arguments. That’s why I suggested creating Interval containers with deftype
I managed to remove the destructuring in the arguments. From the previous flamegraph is looked like it would give some huge 3x improvement - but I guess there is something funny going on (maybe CPU pipeline stuff… not quite sure) - but I got a modest 10% speedup (I’m just running time
so it’s not a great benchmark)
There is still destructuring in the top level array-cross-correlation
which I’ll try to ameliorate with deftype
. I thought I could remove it entirely by inlining the code (suince the function is getting called in just one place anyway) but it’s still a return value of loop
so I inevitably need some intermediary container. I’ll give deftype
a try and post back with results
One thing I’m a bit unclear on is long
vs int
. You have things like alength
that give you int
s and then you have things like unchecked-inc
and -dec
that return long
values. Some places they get mixed up inevitably… though seemingly there is big performance penalty there. But I’m not sure what the best solution is there
Well after an hour I seemingly can’t figure out the right syntax with deftype
(and I can’t find anything with on the web)
I restructure the code to return a deftype:
(deftype averaging-result [^double average ^long start-index ^long end-index])
(defn count-average-over-range ^averaging-result
[^double start-position
^double end-position
^doubles positions
^doubles counts]
;; ... stuff
;; return site:
(averaging-result. (/ total-counts
number-of-positions)
start-index
end-index)
;;
)
and then later at the call site I restructure to use it
;; ...
(let [[overlap-start ;; this part still need to be rewritten to use a deftype :)
overlap-end] (overlap-area overlay-positions shifted-positions)
overlay-averaging-result (count-average-over-range overlap-start
overlap-end
overlay-positions
overlay-counts)
^double average-overlay (.average overlay-averaging-result)
^long overlay-start-index (.start-index overlay-averaging-result)
^long overlay-end-index (.end-index overlay-averaging-result)
;; ...
But this leads to reflection warnings b/c when I use interop to get at the values it seemingly can’t determine the that the return type is an instance of the deftype
.
Reflection warning, /home/geokon/Projects/correlation/src/correlation/arrays.clj:389:43 - reference to field average on java.lang.Object can't be resolved.
When I try to annotate it, it gets angry
Can't type hint a local with a primitive initializer
I suspected it may be my type hint syntax… so I’ve tried many alternatives ^averaging-result
, ^"averaging-result"
, ^correlation.arrays.averaging-result
and ^"correlation.arrays.averaging-result"
but nothing seems to stick…
That said, even with the reflection warning, I am seeing a speedup
Ah, you got bit by how the compiler munges names. try averaging_result
, that should work.
The convention for def-ed types is usually camel case, then it wouldn’t even have happened, but the type hinting is for the emitted class so you need the class name.
Good catch
But I still get the same error
I’ve update the Gist to the latest code: https://gist.github.com/geokon-gh/342083acc137c602a3f64f9d4694afd4#file-array-style-clj-L395
If you change the line to look like
overlay-averaging-result ^averagingResult (count-average-over-range overlap-start
overlap-end
overlay-positions
overlay-counts)
Then you get
Can't type hint a local with a primitive initializer
I believe the class is detected b/c If you put in some gibberish like ^NotARealType
then you get a different error
For some reason it thinks the return type of count-average-over-range
is a primitive,
The function signature reads
(defn count-average-over-range ^correlation.arrays.averagingResult
[^double start-position
^double end-position
^doubles positions
^doubles counts]
But the return type can be removed or changed to just ^averagingResult
to no effect
This is all for fun at this point. If you have no other ideas off the top of your head - then it’s no big deal
Try this:
^averagingResult overlay-averaging-result (count-average-over-range overlap-start
overlap-end
overlay-positions
overlay-counts)
average-overlay (.average overlay-averaging-result)
overlay-start-index (.start-index overlay-averaging-result)
overlay-end-index (.end-index overlay-averaging-result)
^averagingResult shifted-averaging-result (count-average-over-range overlap-start
overlap-end
shifted-positions
shifted-counts)
average-shifted (.average shifted-averaging-result)
shifted-start-index (.start-index shifted-averaging-result)
shifted-end-index (.end-index shifted-averaging-result)
The type hinting got wacky when you type hinted the primitives after destructuring.
I think you can replace all the unchecked-...
with regular (or primitive-math) operators when you have (set! *unchecked-math* :warn-on-boxed)
set.
Two more corrections:
-
array-average
- doesn’t calculate average (check parentheses) - all type hints like
new-floor-index ^long (unchecked-dec new-ceiling-index)
should be replaced by (of course if necessary) with^long new-floor-index (unchecked-dec new-ceiling-index)
I’m playing with this a little bit, so maybe I’ll come up with some more hints.
Yep, I went a bit too crazy with the type hints, but the final version worked and I got rid of all the reflection more or less
One thing that’s been very useful has been
The resulting Java is quite clear and easy to understand - and you see immediately what types it’s inferring and where it’s messing up.
The only outstanding things I’m seeing now are:
- B/c
array-correlation
take too many arguments I can’t type hint them. So the resulting code needs casts (on the input of theloop
- type hints there don’t seem to do anything). the only solution is more inlining (it’s only called in one spot - so it’s not horrible… I guess) - There are a ton of Int casts injected (basically everywhere you use
aget
) and it’s seemingly unavoidable b/c I guess you can’t increment/decrement bare ints through Clojure
But these two things are barely registering on the flamegraph
- Sorry, I should have removed that function.
array-average
is not being used. It ended up being inlined. Instead I havecount-average-over-range
which finds the array index range where the array overlaps while simultaneously calculating the average. - Yeah, I noticed later in the docstring for
*unchecked-math*
that I don’t need all theunchecked-
stuff. I should go around and remove them thanks for the reminder - And thanks for telling me about the proper type hint syntax. It wasn’t really clear to me which one I should be using haha