About 490 million positions are at distance 20
Submitted by rokicki on Sat, 09/14/2013 - 13:43.
Even though we now know the diameter of Rubik's Cube group in the half-turn
metric, there is still much yet to be discovered. The diameter in the QTM
and the STM are unproved (although they are almost certainly 26 and 18,
respectively). The exact count of positions at distance 16, 17, 18, 19,
and 20 in the half-turn metric is unknown. This note reports some progress
on an estimate for the count of 20's in the half-turn metric.
It is fascinating to me how problems of distinctly different difficulty
exist around the 3x3x3 cube in the half-turn metric. Initially, back in
the early days, we could solve individual positions non-optimally.
With appropriate ideas and some moderate computer power (about 1/1000
of a modern cell phone), we got to the point where we could solve
arbitrary positions near-optimally, with Kociemba's algorithm.
Eventually ideas and computer speeds got to the point where we could
solve arbitrary positions optimally with Korf's solver (initially in
about a day, now in much less than a second). With some further ideas
and effort we got to the point where we could solve huge chunks of the
space optimally at a high rate of speed, and even, with a heroic
amount of CPU, calculate the actual diameter of the group. But even
today, finding a distance-20 position is difficult---they are rare,
and proving a candidate position to be at distance 20 still requires
a few minutes of computer time.
I have made some earlier approximations of the count of distance-20
positions in this forum, but the uncertainty in those numbers has
always been very high. About a year ago I decided to work towards a
better estimate. To obtain this estimate, I solved 2500 randomly
chosen cosets of Kociemba's group (the group generated by
{U,D,F2,R2,B2,L2}), each with 19,508,428,800 positions. This took
about five months of CPU time, distributed across a number of
machines, and found optimal solutions for 48,771,072,000,000
positions at a rate of about four million optimal solves per CPU second.
This effort netted a total of 647 distance-20 positions (not all of
which were new), which is an average of 0.2588 distance-20 positions
per coset. Since there are 2,217,093,120 total cosets, simple
multiplication predicts a total of 574 million distance-20 positions.
Unfortunately, the variance of the sample set was very high. More
than 90% of the cosets had no distance-20 positions (2286 of the 2500).
Most of the remainder (142) had only one. And the cosets with the
greatest count of distance-20 positions had 159 and 59 such positions,
respectively. The standard deviation of the sample was 3.5, more than
thirteen times the mean; thus, the estimate derived from this sample was
highly uncertain.
Luckily, there is a way to reduce this uncertainty without too much
additional work. There is a fair amount of variability in the count
of length-19 canonical sequences that lead to each coset, and a
simple dynamic programming program can calculate, for each coset all
at once, how many length-19 canonical sequences lead to that coset.
Intuitively, the more length-19 canonical sequences that lead to a
coset, the fewer distance-20 positions that coset is likely to have,
and this is supported by all of the empirical evidence as well. Thus,
distance-20 positions are concentrated in those cosets with few
length-19 canonical positions, so we should sample those cosets more
heavily.
I created a list of all 138,639,780 symmetry-distinct cosets, sorted
by a decreasing count of distance-19 canonical sequences, and assigned
each an index from 1..138,639,780 based on that ordering. I broke
this set into four subsets and, for the three subsets with higher than
average incidence of distance-20s, I calculated additional sample points.
It turns out that the cosets with a higher incidence of distance-20s
are, in some sense, faster to calculate. We compute the distance-20
positions by performing a full search out to distance 19; the greater
the number of length-19 canonical sequences, the longer this takes.
Thus, those cosets that yield more distance-20 positions are usually
faster to run to completion in addition. (This observation breaks
down when you get further down the list, because for the majority of
the cosets it's sufficient to search through depth 18, do a "prepass"
to depth 19, and solve the few remaining positions with an optimal
solver, but for the portion of the space that is rich in 20s, the
observation holds true.)
The first sample set was for the cosets numbered from 1 to 5,000. I
solved every such coset in this sample exactly (I have been doing this
for some time). From these 5,000 cosets I obtained a total of 884,425
distance-20 positions (not all unique, but we will handle that below).
The mean is 176.9 and the standard deviation is 147.5. The high
standard deviation does not matter because we explored the entire
set, so there is no sampling error.
The next sample set was for the cosets numbered 5,001 to 15,000. I
solved 200 of these cosets, found 16,239 distance-20 positions,
with a mean of 81.645 and a standard deviation of 50.0. Extrapolating
these statistics, we expect a total of 816,450 distance-20
positions from this range.
The next sample set was for cosets numbered 15,001 to 1,500,000.
I solved 297 of these cosets, found 2,129 distance-20 positions,
with a mean of 7.168 and a standard deviation of 8.429. Extrapolating
these statistics, we expect a total of 10,645,000 distance-20
positions from this range.
The last range is for cosets numbered 1,500,001 to 138,639,780.
I solved 2479 of these cosets (these were entirely from the
original 2500 cosets solved that were indexed at greater than 1,500,000).
From this subset, I found 334 distance-20 positions, with a mean
of 0.1347 and a standard deviation of 0.7233. Extrapolating, we
expect this range to yield a total of 18,477,082 distance-20
positions. The ratio of standard deviation to mean is still
extremely high, but it is much better than before; in addition, the
range only accounts for about 60% of the total so the impact of
uncertainty is reduced.
The total expected distance-20 positions from all four ranges added
up is thus 30,822,957. Since we only explored symmetrically-unique
cosets, we need to multiply this by the 16 symmetries of the Kociemba
subgroup to obtain a final total of 490,000,000 distance-20 positions.
I believe the true count is likely to be within 10 or 15 percent of
this estimate.
It is striking that there can be so many distance-20 positions (more
than one for every man, woman, and child in the United States), yet
they are so rare (fewer than one in 80 billion positions overall,
equivalent to finding a single grain of sand in a dump truck full
of sand). This is why I am fascinated by them; they are rare, yet
plentiful, and very hard to find.
I am continuing my search for distance-20 positions. I occasionally
get another idea for speeding up the program, and I am still mining
the richest portion of the cosets, so success is frequent; I am
currently finding about 200,000 new ones a day using multiple
machines. As I get further down the list of cosets, solving each
coset gets harder, and yields fewer new distance-20 positions.
Including all the cosets listed above, and additional cosets solved
in prior experiments over the years, and the early results of Radu
and Kociemba in solving all symmetrical positions, I presently have
a total of 63,222,090 distance-20 positions, or about one in eight.
My computers continue churning through cosets extending this set.
In a few decades it may be child's play to list all distance-20
positions, but in the meantime, I will continue to chip away until
I have them all.
metric, there is still much yet to be discovered. The diameter in the QTM
and the STM are unproved (although they are almost certainly 26 and 18,
respectively). The exact count of positions at distance 16, 17, 18, 19,
and 20 in the half-turn metric is unknown. This note reports some progress
on an estimate for the count of 20's in the half-turn metric.
It is fascinating to me how problems of distinctly different difficulty
exist around the 3x3x3 cube in the half-turn metric. Initially, back in
the early days, we could solve individual positions non-optimally.
With appropriate ideas and some moderate computer power (about 1/1000
of a modern cell phone), we got to the point where we could solve
arbitrary positions near-optimally, with Kociemba's algorithm.
Eventually ideas and computer speeds got to the point where we could
solve arbitrary positions optimally with Korf's solver (initially in
about a day, now in much less than a second). With some further ideas
and effort we got to the point where we could solve huge chunks of the
space optimally at a high rate of speed, and even, with a heroic
amount of CPU, calculate the actual diameter of the group. But even
today, finding a distance-20 position is difficult---they are rare,
and proving a candidate position to be at distance 20 still requires
a few minutes of computer time.
I have made some earlier approximations of the count of distance-20
positions in this forum, but the uncertainty in those numbers has
always been very high. About a year ago I decided to work towards a
better estimate. To obtain this estimate, I solved 2500 randomly
chosen cosets of Kociemba's group (the group generated by
{U,D,F2,R2,B2,L2}), each with 19,508,428,800 positions. This took
about five months of CPU time, distributed across a number of
machines, and found optimal solutions for 48,771,072,000,000
positions at a rate of about four million optimal solves per CPU second.
This effort netted a total of 647 distance-20 positions (not all of
which were new), which is an average of 0.2588 distance-20 positions
per coset. Since there are 2,217,093,120 total cosets, simple
multiplication predicts a total of 574 million distance-20 positions.
Unfortunately, the variance of the sample set was very high. More
than 90% of the cosets had no distance-20 positions (2286 of the 2500).
Most of the remainder (142) had only one. And the cosets with the
greatest count of distance-20 positions had 159 and 59 such positions,
respectively. The standard deviation of the sample was 3.5, more than
thirteen times the mean; thus, the estimate derived from this sample was
highly uncertain.
Luckily, there is a way to reduce this uncertainty without too much
additional work. There is a fair amount of variability in the count
of length-19 canonical sequences that lead to each coset, and a
simple dynamic programming program can calculate, for each coset all
at once, how many length-19 canonical sequences lead to that coset.
Intuitively, the more length-19 canonical sequences that lead to a
coset, the fewer distance-20 positions that coset is likely to have,
and this is supported by all of the empirical evidence as well. Thus,
distance-20 positions are concentrated in those cosets with few
length-19 canonical positions, so we should sample those cosets more
heavily.
I created a list of all 138,639,780 symmetry-distinct cosets, sorted
by a decreasing count of distance-19 canonical sequences, and assigned
each an index from 1..138,639,780 based on that ordering. I broke
this set into four subsets and, for the three subsets with higher than
average incidence of distance-20s, I calculated additional sample points.
It turns out that the cosets with a higher incidence of distance-20s
are, in some sense, faster to calculate. We compute the distance-20
positions by performing a full search out to distance 19; the greater
the number of length-19 canonical sequences, the longer this takes.
Thus, those cosets that yield more distance-20 positions are usually
faster to run to completion in addition. (This observation breaks
down when you get further down the list, because for the majority of
the cosets it's sufficient to search through depth 18, do a "prepass"
to depth 19, and solve the few remaining positions with an optimal
solver, but for the portion of the space that is rich in 20s, the
observation holds true.)
The first sample set was for the cosets numbered from 1 to 5,000. I
solved every such coset in this sample exactly (I have been doing this
for some time). From these 5,000 cosets I obtained a total of 884,425
distance-20 positions (not all unique, but we will handle that below).
The mean is 176.9 and the standard deviation is 147.5. The high
standard deviation does not matter because we explored the entire
set, so there is no sampling error.
The next sample set was for the cosets numbered 5,001 to 15,000. I
solved 200 of these cosets, found 16,239 distance-20 positions,
with a mean of 81.645 and a standard deviation of 50.0. Extrapolating
these statistics, we expect a total of 816,450 distance-20
positions from this range.
The next sample set was for cosets numbered 15,001 to 1,500,000.
I solved 297 of these cosets, found 2,129 distance-20 positions,
with a mean of 7.168 and a standard deviation of 8.429. Extrapolating
these statistics, we expect a total of 10,645,000 distance-20
positions from this range.
The last range is for cosets numbered 1,500,001 to 138,639,780.
I solved 2479 of these cosets (these were entirely from the
original 2500 cosets solved that were indexed at greater than 1,500,000).
From this subset, I found 334 distance-20 positions, with a mean
of 0.1347 and a standard deviation of 0.7233. Extrapolating, we
expect this range to yield a total of 18,477,082 distance-20
positions. The ratio of standard deviation to mean is still
extremely high, but it is much better than before; in addition, the
range only accounts for about 60% of the total so the impact of
uncertainty is reduced.
The total expected distance-20 positions from all four ranges added
up is thus 30,822,957. Since we only explored symmetrically-unique
cosets, we need to multiply this by the 16 symmetries of the Kociemba
subgroup to obtain a final total of 490,000,000 distance-20 positions.
I believe the true count is likely to be within 10 or 15 percent of
this estimate.
It is striking that there can be so many distance-20 positions (more
than one for every man, woman, and child in the United States), yet
they are so rare (fewer than one in 80 billion positions overall,
equivalent to finding a single grain of sand in a dump truck full
of sand). This is why I am fascinated by them; they are rare, yet
plentiful, and very hard to find.
I am continuing my search for distance-20 positions. I occasionally
get another idea for speeding up the program, and I am still mining
the richest portion of the cosets, so success is frequent; I am
currently finding about 200,000 new ones a day using multiple
machines. As I get further down the list of cosets, solving each
coset gets harder, and yields fewer new distance-20 positions.
Including all the cosets listed above, and additional cosets solved
in prior experiments over the years, and the early results of Radu
and Kociemba in solving all symmetrical positions, I presently have
a total of 63,222,090 distance-20 positions, or about one in eight.
My computers continue churning through cosets extending this set.
In a few decades it may be child's play to list all distance-20
positions, but in the meantime, I will continue to chip away until
I have them all.