There are approximately 700,000,000 distance-20 positions.
I first wrote a program to select a random coset, and then set those cosets to running, specifying a maximum search ply of 18. This way the coset solver will find optimal solutions for all distance 18 and lower positions in the coset, but may report some distance 19 and larger positions with a non-optimal solution. My hope was the number of positions reported to be 20 or larger (these were the only ones that could have een suboptimal) would be small. Then, using various techniques, I would prove find optimal solutions to those remaining positions.
The overall results of this experiment are listed below; each line gives the sequence for the coset I ran, along with the count of positions at each depth, individually. I am listing all the sets independently so we can see the distribution of results.
Sequence 9 10 11 12 13 14 15 16 17 18 19 20 B3L1F1L2D1B2R1B1L3U2 0 14 534 11657 196080 2937991 41299927 545853895 5607291938 12899502606 411334158 L3F2R1D3U1B3U3F2U3 12 283 3909 50440 625227 7523487 89006836 999573895 7827313152 10476371239 107960320 L3F1L2U1B2D2F1D1L2U2 0 20 764 14703 234163 3400017 46716211 604913894 5990416554 12551161618 311570856 F3D2R1D1U3L1B1L2R1U1 0 22 492 8151 125669 1862220 26879999 374704976 4367304184 13962505002 775038085 L3F3D1R2F1U1B3D1U1 2 74 1380 21783 316425 4286841 55717758 686831327 6360050438 12129869062 271333710 R3D3B3R1D3L3B2U3R1 2 56 1136 19561 296206 4156497 55650919 704614310 6661129884 11901749295 180810934 L3D1L2B1U2R1F2L2R2 8 106 1211 16568 227090 3076878 41339979 536945482 5502539731 12978986433 445295314 R3F2R1B1L1U1B2D2R2F3 0 32 760 12845 195089 2808490 39037907 517927601 5407989120 13088176128 452280828 L3U2B1U2L3U1F2U3F1 8 156 2640 39405 534214 6879989 84822334 967020846 7665063334 10684674249 99391625 R3U3B3F1L3B2D1U1R1 2 30 403 6033 92543 1404341 20938697 302867007 3751195863 14234790504 1197133376 1 F3L1D1U2B3R2D1B1F1U3 0 20 450 7705 123526 1899926 28201017 399903983 4642837613 13778032783 657421777 R3U1L3D3B1U1F2R3F2U3 0 24 578 10948 179778 2732629 39229330 530023329 5552342430 12971260550 412649204 F3U3L1R3D3R2D2R1B1 2 64 1155 16178 217814 2906652 38648789 501931830 5233010583 13189478300 542217433 L3D2U3R2F3R2B2D1R3F1 0 12 424 8501 144634 2269991 33717351 472272190 5218904943 13303468931 477641823 R3D1F1D2U3L3U2R3B3R1 0 2 64 1499 32779 641740 11438656 189621858 2731617168 14471674772 2103400257 5 R3U3L3U1F1L1R1D3U2F3 0 84 1956 29756 400644 5089831 62507133 735152905 6559001792 11906025930 240218769 R3U3F1D3R3U2L3F1R3U1 0 10 284 5490 92885 1484265 22734030 332734145 4080540996 14148691639 922145056 B3L1R2B1U3B3U2L3U3 10 204 3290 46672 605665 7528346 90953829 1031730649 7979095436 10297296089 101168610 B3F3D3L3U2F1L3R3U3L2 0 8 193 4407 86689 1506073 24152141 360843785 4387953895 13978651255 755230354 B3D1F1U1F2L3B1U3R3U3 0 4 122 2776 54152 942002 15377610 237716122 3186471721 14480961746 1586902543 2 F3L2F3D1F3L1U1F3R2 8 100 1272 17514 235531 3157654 42082764 543623011 5560903157 12957057043 401350746 F3L3U1F2L2R2D2F1R2 4 60 992 15451 223273 3138907 42994730 564953975 5817748645 12758252133 321100630 R3D3R2U3L3D3F1U2F1U2 0 28 650 11400 184487 2796606 40283689 546115156 5720024846 12859012686 339999252 L3D2R3B1L1R2U3L3F1 4 92 1688 24280 329068 4363545 56379502 692932021 6353167841 12098244479 302986280 R3B1R2U1F2L3U3L3U1 24 320 3383 37966 444285 5297015 63940256 750573237 6561647196 11848676521 277808597There are a total of 8 distance-20 positions found; they were found in 3 of the 25 cosets. Each of these positions lacked both symmetry and antisymmetry. Since there are 2,217,093,120 different cosets (before symmetry reduction), this allows us to project that there are approximately 700 million total distance-20 positions (about 7 million essentially unique with respect to symmetry and inversion). I believe this is the first estimate of the number of distance-20 positions. Since the sample size is small it is only a rough estimate, but it may well be within an order of magnitude of the correct result.
We can compare the overall distance distribution from these 25 cosets with previous attempts to estimate the distance distribution for the entire group:
25 cosets Korf Kociemba Rokicki 9 86 0.000 10 1825 0.000 11 29730 0.000 12 441689 0.000 13 6197916 0.000 14 84091933 0.000 1 0.000 15 1114051394 0.002 3 0.003 37 0.002 16 14131381429 0.029 1 0.100 15 0.015 579 0.025 17 138725562460 0.284 3 0.300 260 0.260 6063 0.266 18 319954570993 0.656 6 0.600 692 0.692 15266 0.671 19 13694390537 0.028 30 0.030 811 0.036 20 8 ------------ -- ---- ----- 487710721000 10 1000 22757Even though the 25 coset experiment found optimal solutions for a vastly greater number of positions, these positions are not statistically independent (since they came from the same 25 cosets) so it is unclear how accurately we can assume the distribution matches the true distribution. Nonetheless, the numbers appear roughly inline with prior estimates.
Each of the cosets took approximately a core-day to run. The great bulk of the time was in the final depth-18 search, and I have not yet multi-threaded that in any way. So on my 8GB machine, I ran two at a time (each run takes approximately 3.4GB of RAM.) The table below includes the runtime in hours for each coset.
Sequence hrs rem 20 B3L1F1L2D1B2R1B1L3U2 21.1 23 L3F2R1D3U1B3U3F2U3 29.3 0 L3F1L2U1B2D2F1D1L2U2 22.1 3 F3D2R1D1U3L1B1L2R1U1 16.4 47 L3F3D1R2F1U1B3D1U1 23.2 0 R3D3B3R1D3L3B2U3R1 24.8 0 L3D1L2B1U2R1F2L2R2 20.7 8 R3F2R1B1L1U1B2D2R2F3 19.9 10 L3U2B1U2L3U1F2U3F1 31.0 0 R3U3B3F1L3B2D1U1R1 14.3 974 1 F3L1D1U2B3R2D1B1F1U3 18.0 36 R3U1L3D3B1U1F2R3F2U3 21.3 4 F3U3L1R3D3R2D2R1B1 19.7 26 L3D2U3R2F3R2B2D1R3F1 20.2 10 R3D1F1D2U3L3U2R3B3R1 11.6 9877 5 R3U3L3U1F1L1R1D3U2F3 23.7 2 R3U3F1D3R3U2L3F1R3U1 15.5 190 B3L1R2B1U3B3U2L3U3 31.2 0 B3F3D3L3U2F1L3R3U3L2 17.1 95 B3D1F1U1F2L3B1U3R3U3 12.7 2534 2 F3L2F3D1F3L1U1F3R2 20.5 4 F3L3U1F2L2R2D2F1R2 22.0 0 R3D3R2U3L3D3F1U2F1U2 21.6 2 L3D2R3B1L1R2U3L3F1 24.0 1 R3B1R2U1F2L3U3L3U1 24.5 3The rem column above lists the number of positions for which an optimal solution may not have been found (those solutions that were distance 20 or greater). For 6 of the 25 cosets, all solutions found were optimal and no post-processing was needed. For the remaining cosets, there were a number of positions that needed to be analyzed separately.
For all the cosets but the one with 9877 remainders, the leftover positions were analyzed first by running them through a six-axis non-optimal Kociemba solver in hopes of finding a distance 19 solution; for more than 99% of the leftover positions, this succeeded in a reasonably short time. For the remaining positions, an optimal solver was used. In three cases the position was shown to be at distance 20, and in all other cases it was shown to be at distance 19. Overall an additional two days of CPU time was used for these positions.
For the coset with 9877 leftover positions, I estimated that it would be faster to go ahead and run the coset solver to a ply of 19 rather than attempt to solve all 9877 positions with the six-axis solver and then the optimal solver for any leftover. Running this coset to a depth of 19 took 7.3 days, showed that only five of the 9877 leftover positions had a distance of 20.
I plan to improve my coset solver to improve these runtimes in several ways. First, the actual search itself can probably be sped up; since this search does not take the majority of the time in the depth 16 runs, I haven't given that much attention to this speed so far, but for the depth 17 and deeper runs, it domainates. I may attempt to multithread the search. In addition, I plan to improve the handling of leftover positions in other ways, including making my six-axis solver and my optimal solver both know about the search space for solutions that has already been searched by the coset solver, so those solvers don't end up repeating a lot of that work.
If we plan to run the entire search space, however, the postprocessing can be eliminated easily simply by running all 138 million symmetry-reduced cosets at about a CPU day each. If a leftover position occurs in the cosets that contain all three of its orientations, then it is truly a distance-20 position; otherwise, it a distance-19 solution is known. So in this case the total amount of postprocessing is negligible. (On the other hand, all 138 million cosets must be run.)
I will follow this note with a further note dissecting the step-by-step performance of the coset solver, to further explain how it gets the speed that it does.