Well, it’s not so much collision in my understanding. As they have to appear random.
Like they’re not GUID, collision is going to occur. Because the range is restricted. And the domain of input can be bigger then that of output. But, if they were to collide, it shouldn’t be from an obvious pattern. So you shouldn’t be able to predict the probability of two input colliding.
For example, if abcd and abce collide. You start to realize a pattern. It looks like things collide alphabetically for example. And maybe the beginning of the hash is the same:
abcd -> shdhs23
abce -> shdhs24
This would be a terrible crypto hash. You could quickly build a probabilistic model and reduce the input range based on observation from the output.
shdhs35 most probably starts with abc, now I have much fewer combination left to try to crack the password.
Also, they need to be slow, so the cost of brute forcing them is high and unfeasable. Though still fast enough not to prevent their use case. That’s why there is a range if crypto hash from faster to slower. But in terms of security, slower is better.
Now generally, for the output to appear random, it also means it creates a even/uniform distribution. That will inevitably also mean that the collision will be low. But it’s more then less collision, for the probability distribution to be uniform, it means each new hash can be any of the output range. Every time its equal probability. Consider:
a -> 1
b -> 2
c -> 3
d -> 1
e -> 2
f -> 3
Lets say we hash from ascii to ints from 1 to 3. Now this avoids collisions as much as possible. But is not uniformly distributed. So even though it is perfect at minimizing collision, it’s still a weak crypto hash.
Disclaimer: I’m not actually a security engineer, just a senior software engineer with enough security know how to design secure services and software. So I might be wrong.