# About 490 million positions are at distance 20

metric, there is still much yet to be discovered. The diameter in the QTM

and the STM are unproved (although they are almost certainly 26 and 18,

respectively). The exact count of positions at distance 16, 17, 18, 19,

and 20 in the half-turn metric is unknown. This note reports some progress

on an estimate for the count of 20's in the half-turn metric.

It is fascinating to me how problems of distinctly different difficulty

exist around the 3x3x3 cube in the half-turn metric. Initially, back in

the early days, we could solve individual positions non-optimally.

With appropriate ideas and some moderate computer power (about 1/1000

of a modern cell phone), we got to the point where we could solve

arbitrary positions near-optimally, with Kociemba's algorithm.

Eventually ideas and computer speeds got to the point where we could

solve arbitrary positions optimally with Korf's solver (initially in

about a day, now in much less than a second). With some further ideas

and effort we got to the point where we could solve huge chunks of the

space optimally at a high rate of speed, and even, with a heroic

amount of CPU, calculate the actual diameter of the group. But even

today, finding a distance-20 position is difficult---they are rare,

and proving a candidate position to be at distance 20 still requires

a few minutes of computer time.

I have made some earlier approximations of the count of distance-20

positions in this forum, but the uncertainty in those numbers has

always been very high. About a year ago I decided to work towards a

better estimate. To obtain this estimate, I solved 2500 randomly

chosen cosets of Kociemba's group (the group generated by

{U,D,F2,R2,B2,L2}), each with 19,508,428,800 positions. This took

about five months of CPU time, distributed across a number of

machines, and found optimal solutions for 48,771,072,000,000

positions at a rate of about four million optimal solves per CPU second.

This effort netted a total of 647 distance-20 positions (not all of

which were new), which is an average of 0.2588 distance-20 positions

per coset. Since there are 2,217,093,120 total cosets, simple

multiplication predicts a total of 574 million distance-20 positions.

Unfortunately, the variance of the sample set was very high. More

than 90% of the cosets had no distance-20 positions (2286 of the 2500).

Most of the remainder (142) had only one. And the cosets with the

greatest count of distance-20 positions had 159 and 59 such positions,

respectively. The standard deviation of the sample was 3.5, more than

thirteen times the mean; thus, the estimate derived from this sample was

highly uncertain.

Luckily, there is a way to reduce this uncertainty without too much

additional work. There is a fair amount of variability in the count

of length-19 canonical sequences that lead to each coset, and a

simple dynamic programming program can calculate, for each coset all

at once, how many length-19 canonical sequences lead to that coset.

Intuitively, the more length-19 canonical sequences that lead to a

coset, the fewer distance-20 positions that coset is likely to have,

and this is supported by all of the empirical evidence as well. Thus,

distance-20 positions are concentrated in those cosets with few

length-19 canonical positions, so we should sample those cosets more

heavily.

I created a list of all 138,639,780 symmetry-distinct cosets, sorted

by a decreasing count of distance-19 canonical sequences, and assigned

each an index from 1..138,639,780 based on that ordering. I broke

this set into four subsets and, for the three subsets with higher than

average incidence of distance-20s, I calculated additional sample points.

It turns out that the cosets with a higher incidence of distance-20s

are, in some sense, faster to calculate. We compute the distance-20

positions by performing a full search out to distance 19; the greater

the number of length-19 canonical sequences, the longer this takes.

Thus, those cosets that yield more distance-20 positions are usually

faster to run to completion in addition. (This observation breaks

down when you get further down the list, because for the majority of

the cosets it's sufficient to search through depth 18, do a "prepass"

to depth 19, and solve the few remaining positions with an optimal

solver, but for the portion of the space that is rich in 20s, the

observation holds true.)

The first sample set was for the cosets numbered from 1 to 5,000. I

solved every such coset in this sample exactly (I have been doing this

for some time). From these 5,000 cosets I obtained a total of 884,425

distance-20 positions (not all unique, but we will handle that below).

The mean is 176.9 and the standard deviation is 147.5. The high

standard deviation does not matter because we explored the entire

set, so there is no sampling error.

The next sample set was for the cosets numbered 5,001 to 15,000. I

solved 200 of these cosets, found 16,239 distance-20 positions,

with a mean of 81.645 and a standard deviation of 50.0. Extrapolating

these statistics, we expect a total of 816,450 distance-20

positions from this range.

The next sample set was for cosets numbered 15,001 to 1,500,000.

I solved 297 of these cosets, found 2,129 distance-20 positions,

with a mean of 7.168 and a standard deviation of 8.429. Extrapolating

these statistics, we expect a total of 10,645,000 distance-20

positions from this range.

The last range is for cosets numbered 1,500,001 to 138,639,780.

I solved 2479 of these cosets (these were entirely from the

original 2500 cosets solved that were indexed at greater than 1,500,000).

From this subset, I found 334 distance-20 positions, with a mean

of 0.1347 and a standard deviation of 0.7233. Extrapolating, we

expect this range to yield a total of 18,477,082 distance-20

positions. The ratio of standard deviation to mean is still

extremely high, but it is much better than before; in addition, the

range only accounts for about 60% of the total so the impact of

uncertainty is reduced.

The total expected distance-20 positions from all four ranges added

up is thus 30,822,957. Since we only explored symmetrically-unique

cosets, we need to multiply this by the 16 symmetries of the Kociemba

subgroup to obtain a final total of 490,000,000 distance-20 positions.

I believe the true count is likely to be within 10 or 15 percent of

this estimate.

It is striking that there can be so many distance-20 positions (more

than one for every man, woman, and child in the United States), yet

they are so rare (fewer than one in 80 billion positions overall,

equivalent to finding a single grain of sand in a dump truck full

of sand). This is why I am fascinated by them; they are rare, yet

plentiful, and very hard to find.

I am continuing my search for distance-20 positions. I occasionally

get another idea for speeding up the program, and I am still mining

the richest portion of the cosets, so success is frequent; I am

currently finding about 200,000 new ones a day using multiple

machines. As I get further down the list of cosets, solving each

coset gets harder, and yields fewer new distance-20 positions.

Including all the cosets listed above, and additional cosets solved

in prior experiments over the years, and the early results of Radu

and Kociemba in solving all symmetrical positions, I presently have

a total of 63,222,090 distance-20 positions, or about one in eight.

My computers continue churning through cosets extending this set.

In a few decades it may be child's play to list all distance-20

positions, but in the meantime, I will continue to chip away until

I have them all.

## Comment viewing options

### Connected 20s

735099 1 652 2 77 3 13 4 4 5 3 6 1 8Almost all were completely isolated. The biggest connected component (using both moves and premoves) is only of size 8. I really thought there was going to be a big one with 100 positions or something. What this tells me is almost all 20f*'s are strict local maxima. Even though we only have about 1 in 8 20f*'s in this set, the space we searched is highly connected because of the coset approach.

### the cluster of 8?

I've found a cluster of five 20f positions. Can I ask for a scramble in the cluster of 8? Thank you so much.

### Cluster of 9 (6 unique mod M+inv)

There are only six unique mod-M+inv in this set of 9.

FL RU BL LU FR RD BR LD UF DF UB DB FDR BRD URB UFR DFL FUL BLU DLB C_4h A 8

FU RU BL LU FD RD BR LD RF LF UB DB FLD BRD URB RFD LFU FRU BLU DLB C_4 A 4

FU RU BD LU FD RD BU LD RF LF LB RB FLD BUR LUB RFD LFU FRU BDL RDB C_4 A 4

FR RU BD LU FL RD BU LD DF UF LB RB FUL BUR LUB DFL UFR FDR BDL RDB C_4 A 4

FR RU BR LU FL RD BL LD DF UF DB UB FUL BLU DLB DFL UFR FDR BRD URB C_4h A 8

FD RU BR LU FU RD BL LD LF RF DB UB FRU BLU DLB LFU RFD FLD BRD URB C_4 A 4

FD RU BU LU FU RD BD LD LF RF RB LB FRU BDL RDB LFU RFD FLD BUR LUB C_4 A 4

FL RU BU LU FR RD BD LD UF DF RB LB FDR BDL RDB UFR DFL FUL BUR LUB C_4 A 4

FU RU BU LU FD RD BD LD RF LF RB LB FLD BDL RDB RFD LFU FRU BUR LUB C_4h A 8

The scrambles are here:

B1L1B3L3F2D3R1B3U1D2B1L1B2R1D1R3B2R2B2R3

L2F3L2D1F2B2L1B3R2B1D3R3F1U1L1R2D2B2L3R2

F3U1B3U3F2L3D1B3L2R1B1U1B2D1L1D3B2D2B2D3

L1B3L3F2D3R1B3U1D2B1L1B2R1D1R3B2R2B2R3F2

L3B2F3L3B1D1L1U3B3R3F3D1L2F1D2F3B2L1D2L2

L3F1L1B2D1R3F1U3D2F3L3F2R3D3R1F2R2F2R1B2

B1U3F1U1B2L1D3F1L2R3F3U3F2D3L3D1F2D2F2D1

L2B1L2D3B2F2L3F1R2F3D1R1B3U3L3R2D2F2L1R2

U1L1U2B2R2F3D1L3F3B3D3F1R3F1B1R2B3D2R2D3

### I am wondering if the cluster

### Number of symmetrically/antis

I also have the other number, but I can't put my hands on it

right at the moment.

I'll report the actual size-8 cluster when I get home to my data.

-tom

### Tom, much thanks for checking

Jerry

### Howdy, Jerry! This is an i

This is an interesting idea. I've pursued it earlier when I had fewer positions, but with 600K+ unique positions (mod M+inv) it's getting difficult. Even with my new super-fast cube solver, that's a lot of 20's to solve.

Since a coset is defined as aS*, where a is a representative position and S is the set of moves {U,F2,R2,D,B2,L2}, we know immediately that solving a coset also finds implicitly all neighbors connected by 10 of the 18 moves. So at least some data exists.

I'll do some analysis of the existing positions, in that I'll see how many clusters I have and how large each is based on the set I have; that should be relatively easy to do. It will be interesting to determine the largest connected component. I'll use both moves and "premoves" (that is, both left and right group multiplication by the 18 generators).

My intuition says most 20f*s are strong maxima.

-tom

### I wonder how you can compute

### Calculating the count of canonical sequences that "reach" a cose

To compute this value, we used dynamic programming to compute the function

cs(d, c)where d is the length of the canonical sequence, and c is the specific coset. This function cannot be decomposed recursively, but a related function can:

cs(d, c, f)This counts canonical sequences ending in a turn of face f, that end in coset c, and have length d. The recursive formulation is simply

cs(d+1, c, f) = sum(m, p) cs(d, c . m, p)where m is a move of face f, and p is a face such that the sequence of face turns (p, f) is permitted. (For instance, a move of face U followed by a move of face D is permitted, but not the reverse, and not two consecutive moves of any given face.)

Then, of course,

cs(d, c) = sum(f) cs(d, c, f)where f is any face of the cube.

To compute cs(d, c, f), an enormous table can can be used; the total state space is 19 (lengths 1..19) times 6 (faces) times 2,217,093,120 for a total of 252,748,615,680 entries. This can be reduced by computing the values in order of d and by reducing the space of cosets by symmetry (since values between two cosets related by symmetry will be the same); this cuts the entry requirements down to 2 times 6 times 138,639,780 which is 1,663,677,360. Using single-precision floating point, this requires only 6.5GB of RAM.

(I originally performed this calculation using doubles on a 4GB machine, by using Hoey syllables and axis instead of faces, and by storing the table for each value of d into a disk file, and rereading those disk files when calculating a new value of d, but in these days of 16GB machines such heroics are no longer needed.)

Since I'm on the topic, I'll share some of the results. We know there are 3,292,256,598,848,819,251,200 canonical sequences of length 19; this means the average coset has about 1,484,942,860,158 canonical sequences that lead to it. The coset with the fewest canonical sequences to depth 19 is

F3U2R3D3B3L1F1U1R1D3F3It has only 558,579,323,856 length-19 canonical sequences that lead to it (this is about 37.6% of the average.) Out of 19,508,428,800 distinct positions in the coset, these sequences (and the shorter canonical sequences that lead to it) manage to miss 2,767 distance-20 positions. This is almost certainly the coset with the greatest number of distance-20 positions; the first-place coset is at index 738 on my list, with 4367 distance-20 positions; that coset has 634,379,723,072 length-19 canonical sequences that lead to it:

L3U2B1R3F3U3L3D3R2F1U3R3The coset with the greatest count of length-19 canonical sequences, with 14,979,482,126,237,464, or about 10,000 times the average, was the subgroup itself.

Almost certainly the only cosets with 1000 or more distance-20 positions are these:

index sym 20s representative ----- --- ---- ------------------------ 738 8 4367 L3U2B1R3F3U3L3D3R2F1U3R3 1 8 2767 F3U2R3D3B3L1F1U1R1D3F3 92 8 2221 B3L3F3R1U1R2B3D2L1F3U3F3 4 4 2160 R3D2B1U1L1F3R3D3B3U1R3 24 4 2130 F3R3D2L3D1B1U3R1U2L1F3U3 8 2 1861 R3U3R2F1L3D1B3U1F2L3F3 26 2 1533 B3R2U1L3B1F1R1D3U3L3F3 14 2 1485 F3R2U3L3B3F3R1D1U1L3F3 23 2 1434 B3R2U1L3B1F1R1D3U3L3F3U3 6 1 1390 F3R3D1U1L1B3F3R3U3L2F3 71 1 1318 R3D1L2B1U1L3B2F3U1B1R3 7 2 1287 F3L1D2R1D3B3U1L3U2R3F1 2 4 1195 R3D2F1L1U3L3B1R1B2D1F3U3 227 1 1157 F3R1D1L3R2B3F2D1F1R3 829 4 1117 L3B2F1U1R1B1D1L1F1U2R3 90 2 1038 L3U2B1U3F1D3F3R3B2F3L3 28278 8 1036 R3F2D3B1R2U1L1D1B1R1F3U3 154 1 1031 R3D1L2B1U1L3B2F3U1B1R1The first value is the index, the second is the symmetry of the coset, the third is the count of distance-20 positions, and the fourth value is a representative position. The coset at index 28,278 had 734,351,183,840 length-19 canonical sequences. Note that symmetrical cosets can have a higher incidence of 20s (although usually many of the 20s in symmetrical cosets are related to each other by the symmetry of the coset).

### Coset Symmetry

### Symmetry

Most distance-20 positions exhibit no symmetry or antisymmetry. The work you and Radu did found all symmetrical distance-20 positions, and you present that information at http://kociemba.org/symmetric2.htm; you find a total of 1,091,994 distinct symmetrical positions, 32,625 mod M+inv. All remaining distance-20 positions must be, of course, not symmetrical (but they are sometimes antisymmetrical or self-inverse).

The antisymmetry classes of the distance-20 positions that exhibit no symmetry are as follows. Mod m, I have found 622,907 positions lacking antisymmetry, 47,282 positions that are antisymmetric but not self-inverse, and 1,281 that are self-inverse. Thus, the total number of known distance-20 positions is

622907 * 96 + 47282 * 48 + 1281 * 48 + 1091994 = 63,222,090

Very few elements of the Kociemba subgroup exhibit any symmetry (as a fraction of the total positions); but yes, they do have a higher incidence in the "20s-richer" portions of my sorted list, I do not believe this has a first-order effect on my estimate, however.

## Length of the neighbors of the 20f* positions

The reason I ask is as follows. Because all positions in the face turn metric can be solved in 20 moves or less, the 20f* positions are all local maxima. In the face turn metric, a local maximum can be a weak local maximum or a strong local maximum. The distinction is that for a strong local maximum, all 18 neighbors are closer to Start than the position itself. For a weak local maximum, one or more of the neighbors is the same distance from Start as the position itself.

In my experience, most local maxima in the face turn metric are weak local maxima. If that is true of the 20f* positions (and it may not be), then there will be a certain clustering of the 20f* positions within the Cayley graph of the face turn metric. However, I can't even remotely picture if there is any relationship between clustering in the Cayley graph and clustering in a coset of the Kociemba group. Which is to say, I can't picture if making one move tends switch to a different coset or to remain in the same coset.

Jerry