
cube files
Indexed Cube Lovers Archive
![]() |
cube archivesGAP filesBlogs
Forum topicsActive forum topics:New forum topics:User loginNavigation |
About 490 million positions are at distance 20
Submitted by rokicki on Sat, 09/14/2013 - 13:43.
Even though we now know the diameter of Rubik's Cube group in the half-turn
metric, there is still much yet to be discovered. The diameter in the QTM and the STM are unproved (although they are almost certainly 26 and 18, respectively). The exact count of positions at distance 16, 17, 18, 19, and 20 in the half-turn metric is unknown. This note reports some progress on an estimate for the count of 20's in the half-turn metric. It is fascinating to me how problems of distinctly different difficulty exist around the 3x3x3 cube in the half-turn metric. Initially, back in the early days, we could solve individual positions non-optimally. With appropriate ideas and some moderate computer power (about 1/1000 of a modern cell phone), we got to the point where we could solve arbitrary positions near-optimally, with Kociemba's algorithm. Eventually ideas and computer speeds got to the point where we could solve arbitrary positions optimally with Korf's solver (initially in about a day, now in much less than a second). With some further ideas and effort we got to the point where we could solve huge chunks of the space optimally at a high rate of speed, and even, with a heroic amount of CPU, calculate the actual diameter of the group. But even today, finding a distance-20 position is difficult---they are rare, and proving a candidate position to be at distance 20 still requires a few minutes of computer time. I have made some earlier approximations of the count of distance-20 positions in this forum, but the uncertainty in those numbers has always been very high. About a year ago I decided to work towards a better estimate. To obtain this estimate, I solved 2500 randomly chosen cosets of Kociemba's group (the group generated by {U,D,F2,R2,B2,L2}), each with 19,508,428,800 positions. This took about five months of CPU time, distributed across a number of machines, and found optimal solutions for 48,771,072,000,000 positions at a rate of about four million optimal solves per CPU second. This effort netted a total of 647 distance-20 positions (not all of which were new), which is an average of 0.2588 distance-20 positions per coset. Since there are 2,217,093,120 total cosets, simple multiplication predicts a total of 574 million distance-20 positions. Unfortunately, the variance of the sample set was very high. More than 90% of the cosets had no distance-20 positions (2286 of the 2500). Most of the remainder (142) had only one. And the cosets with the greatest count of distance-20 positions had 159 and 59 such positions, respectively. The standard deviation of the sample was 3.5, more than thirteen times the mean; thus, the estimate derived from this sample was highly uncertain. Luckily, there is a way to reduce this uncertainty without too much additional work. There is a fair amount of variability in the count of length-19 canonical sequences that lead to each coset, and a simple dynamic programming program can calculate, for each coset all at once, how many length-19 canonical sequences lead to that coset. Intuitively, the more length-19 canonical sequences that lead to a coset, the fewer distance-20 positions that coset is likely to have, and this is supported by all of the empirical evidence as well. Thus, distance-20 positions are concentrated in those cosets with few length-19 canonical positions, so we should sample those cosets more heavily. I created a list of all 138,639,780 symmetry-distinct cosets, sorted by a decreasing count of distance-19 canonical sequences, and assigned each an index from 1..138,639,780 based on that ordering. I broke this set into four subsets and, for the three subsets with higher than average incidence of distance-20s, I calculated additional sample points. It turns out that the cosets with a higher incidence of distance-20s are, in some sense, faster to calculate. We compute the distance-20 positions by performing a full search out to distance 19; the greater the number of length-19 canonical sequences, the longer this takes. Thus, those cosets that yield more distance-20 positions are usually faster to run to completion in addition. (This observation breaks down when you get further down the list, because for the majority of the cosets it's sufficient to search through depth 18, do a "prepass" to depth 19, and solve the few remaining positions with an optimal solver, but for the portion of the space that is rich in 20s, the observation holds true.) The first sample set was for the cosets numbered from 1 to 5,000. I solved every such coset in this sample exactly (I have been doing this for some time). From these 5,000 cosets I obtained a total of 884,425 distance-20 positions (not all unique, but we will handle that below). The mean is 176.9 and the standard deviation is 147.5. The high standard deviation does not matter because we explored the entire set, so there is no sampling error. The next sample set was for the cosets numbered 5,001 to 15,000. I solved 200 of these cosets, found 16,239 distance-20 positions, with a mean of 81.645 and a standard deviation of 50.0. Extrapolating these statistics, we expect a total of 816,450 distance-20 positions from this range. The next sample set was for cosets numbered 15,001 to 1,500,000. I solved 297 of these cosets, found 2,129 distance-20 positions, with a mean of 7.168 and a standard deviation of 8.429. Extrapolating these statistics, we expect a total of 10,645,000 distance-20 positions from this range. The last range is for cosets numbered 1,500,001 to 138,639,780. I solved 2479 of these cosets (these were entirely from the original 2500 cosets solved that were indexed at greater than 1,500,000). From this subset, I found 334 distance-20 positions, with a mean of 0.1347 and a standard deviation of 0.7233. Extrapolating, we expect this range to yield a total of 18,477,082 distance-20 positions. The ratio of standard deviation to mean is still extremely high, but it is much better than before; in addition, the range only accounts for about 60% of the total so the impact of uncertainty is reduced. The total expected distance-20 positions from all four ranges added up is thus 30,822,957. Since we only explored symmetrically-unique cosets, we need to multiply this by the 16 symmetries of the Kociemba subgroup to obtain a final total of 490,000,000 distance-20 positions. I believe the true count is likely to be within 10 or 15 percent of this estimate. It is striking that there can be so many distance-20 positions (more than one for every man, woman, and child in the United States), yet they are so rare (fewer than one in 80 billion positions overall, equivalent to finding a single grain of sand in a dump truck full of sand). This is why I am fascinated by them; they are rare, yet plentiful, and very hard to find. I am continuing my search for distance-20 positions. I occasionally get another idea for speeding up the program, and I am still mining the richest portion of the cosets, so success is frequent; I am currently finding about 200,000 new ones a day using multiple machines. As I get further down the list of cosets, solving each coset gets harder, and yields fewer new distance-20 positions. Including all the cosets listed above, and additional cosets solved in prior experiments over the years, and the early results of Radu and Kociemba in solving all symmetrical positions, I presently have a total of 63,222,090 distance-20 positions, or about one in eight. My computers continue churning through cosets extending this set. In a few decades it may be child's play to list all distance-20 positions, but in the meantime, I will continue to chip away until I have them all. |
Browse archives
Pollwww.olympicube.com need cube lovers opinion on which cube to produce first olympic cube 6a 83% olympic cube 6b 17% Total votes: 23 Syndicate |