There are approximately 700,000,000 distance-20 positions.

Inspired by Jerry Bryan's recent Cubelovers article, I decided to run a few large randomly-selected cosets such that optimal solutions were found for *all* positions (rather than just proving all positions were within 20, as I usually do). From this, I hoped to gain some insight into how long it would take to enumerate the entire cube space and find counts for every distance. I also hoped to get an estimate for the total count of 20s in the entire cube space.

I first wrote a program to select a random coset, and then set those cosets to running, specifying a maximum search ply of 18. This way the coset solver will find optimal solutions for all distance 18 and lower positions in the coset, but may report some distance 19 and larger positions with a non-optimal solution. My hope was the number of positions reported to be 20 or larger (these were the only ones that could have een suboptimal) would be small. Then, using various techniques, I would prove find optimal solutions to those remaining positions.

The overall results of this experiment are listed below; each line gives the sequence for the coset I ran, along with the count of positions at each depth, individually. I am listing all the sets independently so we can see the distribution of results.

Sequence              9  10   11    12     13      14       15         16         17          18         19 20
B3L1F1L2D1B2R1B1L3U2  0  14  534 11657 196080 2937991 41299927  545853895 5607291938 12899502606  411334158
  L3F2R1D3U1B3U3F2U3 12 283 3909 50440 625227 7523487 89006836  999573895 7827313152 10476371239  107960320
L3F1L2U1B2D2F1D1L2U2  0  20  764 14703 234163 3400017 46716211  604913894 5990416554 12551161618  311570856
F3D2R1D1U3L1B1L2R1U1  0  22  492  8151 125669 1862220 26879999  374704976 4367304184 13962505002  775038085
  L3F3D1R2F1U1B3D1U1  2  74 1380 21783 316425 4286841 55717758  686831327 6360050438 12129869062  271333710
  R3D3B3R1D3L3B2U3R1  2  56 1136 19561 296206 4156497 55650919  704614310 6661129884 11901749295  180810934
  L3D1L2B1U2R1F2L2R2  8 106 1211 16568 227090 3076878 41339979  536945482 5502539731 12978986433  445295314
R3F2R1B1L1U1B2D2R2F3  0  32  760 12845 195089 2808490 39037907  517927601 5407989120 13088176128  452280828
  L3U2B1U2L3U1F2U3F1  8 156 2640 39405 534214 6879989 84822334  967020846 7665063334 10684674249   99391625
  R3U3B3F1L3B2D1U1R1  2  30  403  6033  92543 1404341 20938697  302867007 3751195863 14234790504 1197133376  1
F3L1D1U2B3R2D1B1F1U3  0  20  450  7705 123526 1899926 28201017  399903983 4642837613 13778032783  657421777
R3U1L3D3B1U1F2R3F2U3  0  24  578 10948 179778 2732629 39229330  530023329 5552342430 12971260550  412649204
  F3U3L1R3D3R2D2R1B1  2  64 1155 16178 217814 2906652 38648789  501931830 5233010583 13189478300  542217433
L3D2U3R2F3R2B2D1R3F1  0  12  424  8501 144634 2269991 33717351  472272190 5218904943 13303468931  477641823
R3D1F1D2U3L3U2R3B3R1  0   2   64  1499  32779  641740 11438656  189621858 2731617168 14471674772 2103400257  5
R3U3L3U1F1L1R1D3U2F3  0  84 1956 29756 400644 5089831 62507133  735152905 6559001792 11906025930  240218769
R3U3F1D3R3U2L3F1R3U1  0  10  284  5490  92885 1484265 22734030  332734145 4080540996 14148691639  922145056
  B3L1R2B1U3B3U2L3U3 10 204 3290 46672 605665 7528346 90953829 1031730649 7979095436 10297296089  101168610
B3F3D3L3U2F1L3R3U3L2  0   8  193  4407  86689 1506073 24152141  360843785 4387953895 13978651255  755230354
B3D1F1U1F2L3B1U3R3U3  0   4  122  2776  54152  942002 15377610  237716122 3186471721 14480961746 1586902543  2
  F3L2F3D1F3L1U1F3R2  8 100 1272 17514 235531 3157654 42082764  543623011 5560903157 12957057043  401350746
  F3L3U1F2L2R2D2F1R2  4  60  992 15451 223273 3138907 42994730  564953975 5817748645 12758252133  321100630
R3D3R2U3L3D3F1U2F1U2  0  28  650 11400 184487 2796606 40283689  546115156 5720024846 12859012686  339999252
  L3D2R3B1L1R2U3L3F1  4  92 1688 24280 329068 4363545 56379502  692932021 6353167841 12098244479  302986280
  R3B1R2U1F2L3U3L3U1 24 320 3383 37966 444285 5297015 63940256  750573237 6561647196 11848676521  277808597
There are a total of 8 distance-20 positions found; they were found in 3 of the 25 cosets. Each of these positions lacked both symmetry and antisymmetry. Since there are 2,217,093,120 different cosets (before symmetry reduction), this allows us to project that there are approximately 700 million total distance-20 positions (about 7 million essentially unique with respect to symmetry and inversion). I believe this is the first estimate of the number of distance-20 positions. Since the sample size is small it is only a rough estimate, but it may well be within an order of magnitude of the correct result.

We can compare the overall distance distribution from these 25 cosets with previous attempts to estimate the distance distribution for the entire group:

      25 cosets           Korf   Kociemba      Rokicki
 9           86 0.000
10         1825 0.000
11        29730 0.000
12       441689 0.000
13      6197916 0.000
14     84091933 0.000                          1 0.000
15   1114051394 0.002             3 0.003     37 0.002
16  14131381429 0.029  1 0.100   15 0.015    579 0.025
17 138725562460 0.284  3 0.300  260 0.260   6063 0.266
18 319954570993 0.656  6 0.600  692 0.692  15266 0.671
19  13694390537 0.028            30 0.030    811 0.036
20            8
   ------------       --       ----        -----
   487710721000       10       1000        22757
Even though the 25 coset experiment found optimal solutions for a vastly greater number of positions, these positions are not statistically independent (since they came from the same 25 cosets) so it is unclear how accurately we can assume the distribution matches the true distribution. Nonetheless, the numbers appear roughly inline with prior estimates.

Each of the cosets took approximately a core-day to run. The great bulk of the time was in the final depth-18 search, and I have not yet multi-threaded that in any way. So on my 8GB machine, I ran two at a time (each run takes approximately 3.4GB of RAM.) The table below includes the runtime in hours for each coset.

            Sequence  hrs  rem 20
B3L1F1L2D1B2R1B1L3U2 21.1   23
  L3F2R1D3U1B3U3F2U3 29.3    0
L3F1L2U1B2D2F1D1L2U2 22.1    3
F3D2R1D1U3L1B1L2R1U1 16.4   47
  L3F3D1R2F1U1B3D1U1 23.2    0
  R3D3B3R1D3L3B2U3R1 24.8    0
  L3D1L2B1U2R1F2L2R2 20.7    8
R3F2R1B1L1U1B2D2R2F3 19.9   10
  L3U2B1U2L3U1F2U3F1 31.0    0
  R3U3B3F1L3B2D1U1R1 14.3  974  1
F3L1D1U2B3R2D1B1F1U3 18.0   36
R3U1L3D3B1U1F2R3F2U3 21.3    4
  F3U3L1R3D3R2D2R1B1 19.7   26
L3D2U3R2F3R2B2D1R3F1 20.2   10
R3D1F1D2U3L3U2R3B3R1 11.6 9877  5
R3U3L3U1F1L1R1D3U2F3 23.7    2
R3U3F1D3R3U2L3F1R3U1 15.5  190
  B3L1R2B1U3B3U2L3U3 31.2    0
B3F3D3L3U2F1L3R3U3L2 17.1   95
B3D1F1U1F2L3B1U3R3U3 12.7 2534  2
  F3L2F3D1F3L1U1F3R2 20.5    4
  F3L3U1F2L2R2D2F1R2 22.0    0
R3D3R2U3L3D3F1U2F1U2 21.6    2
  L3D2R3B1L1R2U3L3F1 24.0    1
  R3B1R2U1F2L3U3L3U1 24.5    3
The rem column above lists the number of positions for which an optimal solution may not have been found (those solutions that were distance 20 or greater). For 6 of the 25 cosets, all solutions found were optimal and no post-processing was needed. For the remaining cosets, there were a number of positions that needed to be analyzed separately.

For all the cosets but the one with 9877 remainders, the leftover positions were analyzed first by running them through a six-axis non-optimal Kociemba solver in hopes of finding a distance 19 solution; for more than 99% of the leftover positions, this succeeded in a reasonably short time. For the remaining positions, an optimal solver was used. In three cases the position was shown to be at distance 20, and in all other cases it was shown to be at distance 19. Overall an additional two days of CPU time was used for these positions.

For the coset with 9877 leftover positions, I estimated that it would be faster to go ahead and run the coset solver to a ply of 19 rather than attempt to solve all 9877 positions with the six-axis solver and then the optimal solver for any leftover. Running this coset to a depth of 19 took 7.3 days, showed that only five of the 9877 leftover positions had a distance of 20.

I plan to improve my coset solver to improve these runtimes in several ways. First, the actual search itself can probably be sped up; since this search does not take the majority of the time in the depth 16 runs, I haven't given that much attention to this speed so far, but for the depth 17 and deeper runs, it domainates. I may attempt to multithread the search. In addition, I plan to improve the handling of leftover positions in other ways, including making my six-axis solver and my optimal solver both know about the search space for solutions that has already been searched by the coset solver, so those solvers don't end up repeating a lot of that work.

If we plan to run the entire search space, however, the postprocessing can be eliminated easily simply by running all 138 million symmetry-reduced cosets at about a CPU day each. If a leftover position occurs in the cosets that contain all three of its orientations, then it is truly a distance-20 position; otherwise, it a distance-19 solution is known. So in this case the total amount of postprocessing is negligible. (On the other hand, all 138 million cosets must be run.)

I will follow this note with a further note dissecting the step-by-step performance of the coset solver, to further explain how it gets the speed that it does.

Comment viewing options

Select your preferred way to display the comments and click 'Save settings' to activate your changes.

Faster and faster

[Comment withdrawn; need more thinking here]

I also think that solving com

I also think that solving complete H-cosets is a good method to find better empirical approximations for the number of cubes for some distance, especially for distances 12,13,14,15,16,19,20 where the probabilites for a random cube having such a distance is small. Below 12 Jerry Bryan already computed the numbers.
From Jerry we get probability 4.06836*10-10, 5.36965*10-9 and 7.08242×10-8 for distances 9, 10 and 11, while your values are 1.76334*10-10, 3.74197*10-9, 6.09583*10-8.

Especially for depth 9 this is more than 100% difference. Because your probability for depth 20 is 1.64032*10-11 which is even less than the depth 9 probability I would not be surprised if there is a deviation from the true value by a few hundred percent too.

So I believe we should at least take a few hundred random cosets to get a more reliable approximation for the number of depth 20 positions.

Complete H-cosets

If the value I gave for the count of distance-20 positions is within a factor of 10 of accurate, I'll be happy. I just wanted to get *some* estimate; to my knowledge, this is the first estimate actually based on any sort of sampling.

The depth-9 difference is not too surprising; most of the depth-9 positions are probably from distance-8 and distance-7 cosets, of which my random sample included none. We may expect similar issues for 20, as you say.

I probably won't run any more random cosets until I spend some time to make my coset solver faster when run with plys this deep.

I can probably get a fairly good estimate for 12..16 because I do have 1.29 million cosets solved to that depth. The problem is these 1.29 million cosets are not randomly selected, but instead chosen to enable the proof of 22; thus, they are not statistically independent. Maybe I can come up with a way to select a statistically-independent subset of this larger set, but that may be a difficult problem. Or, since running these cosets at a ply of 16 is so much faster, perhaps I can quickly run another 1000 statistically-independent samples and see what they show.