Binary search optimization; memory usage optimization

feature-optimized-md5
Inga 🏳‍🌈 7 years ago
parent c94b6b3eaa
commit 3116f22082
  1. 20
      README.md
  2. 39
      WhiteRabbit/VectorsProcessor.cs

@ -15,29 +15,21 @@ Performance
Memory usage is minimal (for that kind of task), around 10-30MB.
This solution is partially optimized for multi-threading.
It is also somewhat optimized for likely intended phrases, as anagrams consisting of longer words are generated first.
That's why the given hashes are solved much sooner than it takes to check all anagrams.
Single-threaded performance on Sandy Bridge @2.8GHz is as follows:
* If only phrases of at most 3 words are allowed, then it takes 2.5 seconds to find and check all 4560 anagrams; all relevant hashes are solved in first 0.4 seconds;
* If phrases of 4 words are allowed as well, then it takes 40 seconds to find and check all 7433016 anagrams; all hashes are solved in first 3 seconds;
For comparison, certain other solutions available on GitHub seem to require 3 hours to find all 3-word anagrams (i.e. this solution is faster by a factor of 4000 in 3-word case).
Anagrams generation is not parallelized, as even single-threaded performance for 4-word anagrams is high enough; and 5-word (or larger) anagrams are frequent enough for most of the time being spent on computing hashes, with full CPU load.
Multi-threaded performance with RyuJIT (.NET 4.6, 64-bit system) is as follows:
Multi-threaded performance with RyuJIT (.NET 4.6, 64-bit system) on quad-core Sandy Bridge @2.8GHz is as follows:
* If only phrases of at most 4 words are allowed, then it takes less than 10 seconds to find and check all 7433016 anagrams; all hashes are solved in first 1 second.
* If only phrases of at most 4 words are allowed, then it takes less than 5.5 seconds to find and check all 7433016 anagrams; all hashes are solved in first 0.7 seconds.
* If phrases of 5 words are allowed as well, then it takes around 18 minutes to find and check all anagrams; all hashes are solved in first 25 seconds. Most of time is spent on MD5 computations for correct anagrams, so there is not a lot to optimize further.
* If phrases of 5 words are allowed as well, then it takes around 17 minutes to find and check all anagrams; all hashes are solved in first 25 seconds. Most of time is spent on MD5 computations for correct anagrams, so there is not a lot to optimize further.
* If phrases of 6 words are allowed as well, then "more difficult" hash is solved in 50 seconds, "easiest" in 3.5 minutes, and "hard" in 6 minutes.
* If phrases of 6 words are allowed as well, then "more difficult" hash is solved in 30 seconds, "easiest" in 3 minutes, and "hard" in 6 minutes.
* If phrases of 7 words are allowed as well, then "more difficult" hash is solved in 6 minutes.
Note that all measurements were done on a Release build; Debug build is significantly slower.
For comparison, certain other solutions available on GitHub seem to require 3 hours to find all 3-word anagrams. This solution is faster by 5-7 orders of magnitude (it finds and checks all 4-word anagrams in 1/2000th fraction of time required for other solution just to find all 3-word anagrams, with no MD5 calculations).

@ -134,7 +134,7 @@
var requiredRemainder = (remainderNorm + allowedRemainingWords - 1) / allowedRemainingWords;
#endif
for (var i = currentDictionaryPosition; i < this.Dictionary.Length; i++)
for (var i = FindFirstWithNormLessOrEqual(remainderNorm, currentDictionaryPosition); i < this.Dictionary.Length; i++)
{
Vector<byte> currentVector = this.Dictionary[i].Vector;
@ -162,7 +162,7 @@
}
else
{
for (var i = currentDictionaryPosition; i < this.Dictionary.Length; i++)
for (var i = FindFirstWithNormLessOrEqual(remainderNorm, currentDictionaryPosition); i < this.Dictionary.Length; i++)
{
Vector<byte> currentVector = this.Dictionary[i].Vector;
@ -182,6 +182,41 @@
}
}
// BCL BinarySearch would find any vector with required norm, not the first one; or would find nothing if there is no such vector
private int FindFirstWithNormLessOrEqual(byte expectedNorm, int offset)
{
var start = offset;
var end = this.Dictionary.Length - 1;
if (this.Dictionary[start].Norm <= expectedNorm)
{
return start;
}
if (this.Dictionary[end].Norm > expectedNorm)
{
return this.Dictionary.Length;
}
// Norm for start is always greater than expected norm, or start is the required position; norm for end is always less than or equal to expected norm
// The loop always ends, because the difference always decreases; if start + 1 = end, then middle will be equal to start, and either end := middle = start or start := middle + 1 = end.
while (start < end)
{
var middle = (start + end) / 2;
var newNorm = this.Dictionary[middle].Norm;
if (this.Dictionary[middle].Norm <= expectedNorm)
{
end = middle;
}
else
{
start = middle + 1;
}
}
return start;
}
private IEnumerable<T[]> GeneratePermutations<T>(T[] original)
{
foreach (var permutation in PrecomputedPermutationsGenerator.HamiltonianPermutations(original.Length))

Loading…
Cancel
Save