Current Twt Hash spec and probability of hash collision:
The probability of a Twt Hash collision depends on the size of the hash and the number of possible values it can take. For the Twt Hash, which uses a Blake2b 256-bit hash, Base32 encoding, and takes the last 7 characters, the space of possible hash values is significantly reduced.
Breakdown:- Base32 encoding: Each character in the Base32 encoding represents 5 bits of information (since ( 2^5 = 32 )).
- 7 characters: With 7 characters, the total number of possible hashes is:
[ 32^7 = 3,518,437,208 ] This gives about 3.5 billion possible hash values.
The probability of a hash collision depends on the number of hashes generated and can be estimated using the Birthday Paradox. The paradox tells us that collisions are more likely than expected when hashing a large number of items.
The approximate formula for the probability of at least one collision after generating n
hashes is:
[
P(\text{collision}) \approx 1 - e^{-\frac{n^2}{2M}}
]
Where:
- (n) is the number of generated Twt Hashes.
- (M = 32^7 = 3,518,437,208) is the total number of possible hash values.
For practical purposes, here are some example probabilities for different numbers of hashes (n
):
- For 1,000 hashes:
[ P(\text{collision}) \approx 1 - e^{-\frac{1000^2}{2 \cdot 3,518,437,208}} \approx 0.00014 \, \text{(0.014%)}
]
- For 10,000 hashes:
[ P(\text{collision}) \approx 1 - e^{-\frac{10000^2}{2 \cdot 3,518,437,208}} \approx 0.14 \, \text{(14%)}
]
- For 100,000 hashes:
[ P(\text{collision}) \approx 1 - e^{-\frac{100000^2}{2 \cdot 3,518,437,208}} \approx 0.999 \, \text{(99.9%)}
]
- For small to moderate numbers of hashes (up to around 1,000–10,000), the collision probability is quite low.
- However, as the number of Twts grows (above 100,000), the likelihood of a collision increases significantly due to the relatively small hash space (3.5 billion).