parent
17a9896398
commit
c949603386
@ -0,0 +1,69 @@ |
||||
The problem's phase space consists of pairs ("current step (modulo number of directions)", "current node"), |
||||
with the total of ~200 thousands states for the puzzle input. |
||||
|
||||
For every state, there is a defined transition to the single next state. |
||||
|
||||
There are six starting states (all pairs with "current step" being zero and "current node" ending with 'A'). |
||||
And 263*6 ending states (all pairs with any "current step", and "current node" ending with Z). |
||||
|
||||
Transitions are periodic; since successor of every state is clearly defined, and there are finite number of states, |
||||
this means that no matter at what state we start, we will eventually find ourselves in a loop with the length lower than 200k. |
||||
There might be several non-intersecting loops. |
||||
|
||||
One way to solve the problem would be to use some complicated math in order to compute the result. |
||||
Another, to brute force the result naively, by doing what the puzzle describes: |
||||
running several "ghosts", one from each starting state, and on every step checking if all the current states are "ending". |
||||
|
||||
In order for brute force to work as fast as possible, |
||||
we need to reduce the number of conditions, dereferences and computations within the loop. |
||||
|
||||
There is only so much that we can do regarding the storage |
||||
(200k states means at least 18 bits per state to store the next state, times 200k that's 450KB, |
||||
way larger than any L1 cache). |
||||
|
||||
For simplicity, here I store states in array of 270*1024 u32 (i.e. one megabyte), |
||||
still just a bit more than a modern L2 cache per core; |
||||
and the array layout is optimized for access: index is "current step" * 270 + "current node", |
||||
so on every step we stay more or less in the same region of the array |
||||
(we traverse 1k entries, or 4KB of memory, on average for every step). |
||||
|
||||
For simplicity, in order to check that the state is "final", I slightly renumber the list of nodes; |
||||
nodes that end with Z get the high three bits of their 10-bit index set to 1 |
||||
(since the total number of nodes in the sample input is 770). |
||||
Unfortunately, the puzzle input contains collisions |
||||
(there are "final" nodes on lines 320 and 694, with the same last seven bits), |
||||
so I had to manually reorder the puzzle input; |
||||
it was easier to move all nodes ending with Z to the end of the file, |
||||
to make sure that there will be no collisions. |
||||
This way, the state is final iff it has its eight, ninth and tenth bits set. |
||||
It's also easy enough to check all six current states at once |
||||
(just bitwise-and them all, bitwise-and the result with a `0b1110000000` mask, and check that the result matches the mask). |
||||
|
||||
So ultimately, every step is just six bitwise-ands, one comparison |
||||
(which is only true once we found the result, meaning that there is no performance penalty for branch misprediction), |
||||
and six dereferences and assignments. |
||||
|
||||
The resulting performance is over 100 million steps per second (single-threaded), |
||||
meaning that we get to ~250 billion steps in just half an hour. |
||||
|
||||
Unfortunately, the result it produces (around ~250 billion) is apparently incorrect; |
||||
it is not accepted by AoC website. |
||||
Must be some bug somewhere, even though it works correctly on the (modified) sample input. |
||||
|
||||
Another option, with math, would be to iterate over all possible direction numbers, |
||||
and for every direction number (out of 270), and for each permutation of final nodes (6^6~=47k) compute: |
||||
For each one out of the six starting states, how many steps does it take to get to this node? And to get to it again? |
||||
(Answering that question with brute-forcing would require on the order of 200k operations for every starting state and final state, |
||||
and another 200k for every final state, so that's about 200k*(270*6 + 270*6*6) ~= 2 billion operations |
||||
to precompute all ~10k values, |
||||
but it can be optimized if we would identify the shape of transitions, |
||||
and untangle the transition matrix into a set of loops, and of paths leading to these loops). |
||||
|
||||
The answer to such a question would have a form of a_i+b_i*k, for some a and b, for every integer k>=0. |
||||
Knowing a and b, for each of the six questions, we could use arithmetic to find A and B such that |
||||
for every k>=0, A+Bk steps from the starting states produce exactly this configuration. |
||||
With A being the first time when we reach this configuration. |
||||
|
||||
And then we would just need to find the smallest A for all ~10 million configurations. |
||||
|
||||
But I can't be bothered to do this now. |
@ -1,10 +1,12 @@ |
||||
LR |
||||
RL |
||||
|
||||
PPA = (PPB, XXX) |
||||
PPA = (PPL, PPL) |
||||
PPL = (PPB, XXX) |
||||
PPB = (XXX, PPZ) |
||||
PPZ = (PPB, XXX) |
||||
QQA = (QQB, XXX) |
||||
QQA = (QQL, QQL) |
||||
QQL = (QQB, XXX) |
||||
QQB = (QQC, QQC) |
||||
QQC = (QQZ, QQZ) |
||||
QQZ = (QQB, QQB) |
||||
XXX = (XXX, XXX) |
||||
XXX = (XXX, XXX) |
||||
PPZ = (PPB, XXX) |
||||
QQZ = (QQB, QQB) |
Loading…
Reference in new issue