You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
69 lines
4.2 KiB
69 lines
4.2 KiB
11 months ago
|
The problem's phase space consists of pairs ("current step (modulo number of directions)", "current node"),
|
||
|
with the total of ~200 thousands states for the puzzle input.
|
||
|
|
||
|
For every state, there is a defined transition to the single next state.
|
||
|
|
||
|
There are six starting states (all pairs with "current step" being zero and "current node" ending with 'A').
|
||
|
And 263*6 ending states (all pairs with any "current step", and "current node" ending with Z).
|
||
|
|
||
|
Transitions are periodic; since successor of every state is clearly defined, and there are finite number of states,
|
||
|
this means that no matter at what state we start, we will eventually find ourselves in a loop with the length lower than 200k.
|
||
|
There might be several non-intersecting loops.
|
||
|
|
||
|
One way to solve the problem would be to use some complicated math in order to compute the result.
|
||
|
Another, to brute force the result naively, by doing what the puzzle describes:
|
||
|
running several "ghosts", one from each starting state, and on every step checking if all the current states are "ending".
|
||
|
|
||
|
In order for brute force to work as fast as possible,
|
||
|
we need to reduce the number of conditions, dereferences and computations within the loop.
|
||
|
|
||
|
There is only so much that we can do regarding the storage
|
||
|
(200k states means at least 18 bits per state to store the next state, times 200k that's 450KB,
|
||
|
way larger than any L1 cache).
|
||
|
|
||
|
For simplicity, here I store states in array of 270*1024 u32 (i.e. one megabyte),
|
||
|
still just a bit more than a modern L2 cache per core;
|
||
|
and the array layout is optimized for access: index is "current step" * 270 + "current node",
|
||
|
so on every step we stay more or less in the same region of the array
|
||
|
(we traverse 1k entries, or 4KB of memory, on average for every step).
|
||
|
|
||
|
For simplicity, in order to check that the state is "final", I slightly renumber the list of nodes;
|
||
|
nodes that end with Z get the high three bits of their 10-bit index set to 1
|
||
|
(since the total number of nodes in the sample input is 770).
|
||
|
Unfortunately, the puzzle input contains collisions
|
||
|
(there are "final" nodes on lines 320 and 694, with the same last seven bits),
|
||
|
so I had to manually reorder the puzzle input;
|
||
|
it was easier to move all nodes ending with Z to the end of the file,
|
||
|
to make sure that there will be no collisions.
|
||
|
This way, the state is final iff it has its eight, ninth and tenth bits set.
|
||
|
It's also easy enough to check all six current states at once
|
||
|
(just bitwise-and them all, bitwise-and the result with a `0b1110000000` mask, and check that the result matches the mask).
|
||
|
|
||
|
So ultimately, every step is just six bitwise-ands, one comparison
|
||
|
(which is only true once we found the result, meaning that there is no performance penalty for branch misprediction),
|
||
|
and six dereferences and assignments.
|
||
|
|
||
|
The resulting performance is over 100 million steps per second (single-threaded),
|
||
|
meaning that we get to ~250 billion steps in just half an hour.
|
||
|
|
||
|
Unfortunately, the result it produces (around ~250 billion) is apparently incorrect;
|
||
|
it is not accepted by AoC website.
|
||
|
Must be some bug somewhere, even though it works correctly on the (modified) sample input.
|
||
|
|
||
|
Another option, with math, would be to iterate over all possible direction numbers,
|
||
|
and for every direction number (out of 270), and for each permutation of final nodes (6^6~=47k) compute:
|
||
|
For each one out of the six starting states, how many steps does it take to get to this node? And to get to it again?
|
||
|
(Answering that question with brute-forcing would require on the order of 200k operations for every starting state and final state,
|
||
|
and another 200k for every final state, so that's about 200k*(270*6 + 270*6*6) ~= 2 billion operations
|
||
|
to precompute all ~10k values,
|
||
|
but it can be optimized if we would identify the shape of transitions,
|
||
|
and untangle the transition matrix into a set of loops, and of paths leading to these loops).
|
||
|
|
||
|
The answer to such a question would have a form of a_i+b_i*k, for some a and b, for every integer k>=0.
|
||
|
Knowing a and b, for each of the six questions, we could use arithmetic to find A and B such that
|
||
|
for every k>=0, A+Bk steps from the starting states produce exactly this configuration.
|
||
|
With A being the first time when we reach this configuration.
|
||
|
|
||
|
And then we would just need to find the smallest A for all ~10 million configurations.
|
||
|
|
||
|
But I can't be bothered to do this now.
|