parent
17a9896398
commit
c949603386
@ -0,0 +1,69 @@ |
|||||||
|
The problem's phase space consists of pairs ("current step (modulo number of directions)", "current node"), |
||||||
|
with the total of ~200 thousands states for the puzzle input. |
||||||
|
|
||||||
|
For every state, there is a defined transition to the single next state. |
||||||
|
|
||||||
|
There are six starting states (all pairs with "current step" being zero and "current node" ending with 'A'). |
||||||
|
And 263*6 ending states (all pairs with any "current step", and "current node" ending with Z). |
||||||
|
|
||||||
|
Transitions are periodic; since successor of every state is clearly defined, and there are finite number of states, |
||||||
|
this means that no matter at what state we start, we will eventually find ourselves in a loop with the length lower than 200k. |
||||||
|
There might be several non-intersecting loops. |
||||||
|
|
||||||
|
One way to solve the problem would be to use some complicated math in order to compute the result. |
||||||
|
Another, to brute force the result naively, by doing what the puzzle describes: |
||||||
|
running several "ghosts", one from each starting state, and on every step checking if all the current states are "ending". |
||||||
|
|
||||||
|
In order for brute force to work as fast as possible, |
||||||
|
we need to reduce the number of conditions, dereferences and computations within the loop. |
||||||
|
|
||||||
|
There is only so much that we can do regarding the storage |
||||||
|
(200k states means at least 18 bits per state to store the next state, times 200k that's 450KB, |
||||||
|
way larger than any L1 cache). |
||||||
|
|
||||||
|
For simplicity, here I store states in array of 270*1024 u32 (i.e. one megabyte), |
||||||
|
still just a bit more than a modern L2 cache per core; |
||||||
|
and the array layout is optimized for access: index is "current step" * 270 + "current node", |
||||||
|
so on every step we stay more or less in the same region of the array |
||||||
|
(we traverse 1k entries, or 4KB of memory, on average for every step). |
||||||
|
|
||||||
|
For simplicity, in order to check that the state is "final", I slightly renumber the list of nodes; |
||||||
|
nodes that end with Z get the high three bits of their 10-bit index set to 1 |
||||||
|
(since the total number of nodes in the sample input is 770). |
||||||
|
Unfortunately, the puzzle input contains collisions |
||||||
|
(there are "final" nodes on lines 320 and 694, with the same last seven bits), |
||||||
|
so I had to manually reorder the puzzle input; |
||||||
|
it was easier to move all nodes ending with Z to the end of the file, |
||||||
|
to make sure that there will be no collisions. |
||||||
|
This way, the state is final iff it has its eight, ninth and tenth bits set. |
||||||
|
It's also easy enough to check all six current states at once |
||||||
|
(just bitwise-and them all, bitwise-and the result with a `0b1110000000` mask, and check that the result matches the mask). |
||||||
|
|
||||||
|
So ultimately, every step is just six bitwise-ands, one comparison |
||||||
|
(which is only true once we found the result, meaning that there is no performance penalty for branch misprediction), |
||||||
|
and six dereferences and assignments. |
||||||
|
|
||||||
|
The resulting performance is over 100 million steps per second (single-threaded), |
||||||
|
meaning that we get to ~250 billion steps in just half an hour. |
||||||
|
|
||||||
|
Unfortunately, the result it produces (around ~250 billion) is apparently incorrect; |
||||||
|
it is not accepted by AoC website. |
||||||
|
Must be some bug somewhere, even though it works correctly on the (modified) sample input. |
||||||
|
|
||||||
|
Another option, with math, would be to iterate over all possible direction numbers, |
||||||
|
and for every direction number (out of 270), and for each permutation of final nodes (6^6~=47k) compute: |
||||||
|
For each one out of the six starting states, how many steps does it take to get to this node? And to get to it again? |
||||||
|
(Answering that question with brute-forcing would require on the order of 200k operations for every starting state and final state, |
||||||
|
and another 200k for every final state, so that's about 200k*(270*6 + 270*6*6) ~= 2 billion operations |
||||||
|
to precompute all ~10k values, |
||||||
|
but it can be optimized if we would identify the shape of transitions, |
||||||
|
and untangle the transition matrix into a set of loops, and of paths leading to these loops). |
||||||
|
|
||||||
|
The answer to such a question would have a form of a_i+b_i*k, for some a and b, for every integer k>=0. |
||||||
|
Knowing a and b, for each of the six questions, we could use arithmetic to find A and B such that |
||||||
|
for every k>=0, A+Bk steps from the starting states produce exactly this configuration. |
||||||
|
With A being the first time when we reach this configuration. |
||||||
|
|
||||||
|
And then we would just need to find the smallest A for all ~10 million configurations. |
||||||
|
|
||||||
|
But I can't be bothered to do this now. |
@ -1,10 +1,12 @@ |
|||||||
LR |
RL |
||||||
|
|
||||||
PPA = (PPB, XXX) |
PPA = (PPL, PPL) |
||||||
|
PPL = (PPB, XXX) |
||||||
PPB = (XXX, PPZ) |
PPB = (XXX, PPZ) |
||||||
PPZ = (PPB, XXX) |
QQA = (QQL, QQL) |
||||||
QQA = (QQB, XXX) |
QQL = (QQB, XXX) |
||||||
QQB = (QQC, QQC) |
QQB = (QQC, QQC) |
||||||
QQC = (QQZ, QQZ) |
QQC = (QQZ, QQZ) |
||||||
QQZ = (QQB, QQB) |
|
||||||
XXX = (XXX, XXX) |
XXX = (XXX, XXX) |
||||||
|
PPZ = (PPB, XXX) |
||||||
|
QQZ = (QQB, QQB) |
Loading…
Reference in new issue