A Generalization on the Shamir Method
I have not posted in a very long time, but I have continued to work on ideas for a better program to calculate God's Algorithm for the full cube. The time has long passed where a single desktop class machine could make further progress on the problem as a whole. The search space is just too large for that. Instead, I have been working on ideas for a program that could at least visit all the positions in a single coset no matter how far the positions in the coset are from Start.
I have such a program which works, but its performance is not acceptable. Therefore, I'm not going to report anything about it. Instead, I'm going to report on a plan for a new and similar program which I believe will have acceptable performance. I have developed most of the pieces that will be required for this new program, but it will take me a few months to put all the pieces together. The main idea in the new program is very old and is not original with me. The idea borrows heavily from a message to the original CubeLovers mailing list by Alan Bawden on 27 May 1987. Alan's message was based on a talk given by Adi Shamir.
Shamir's method is actually a very slow way to solve a single cube position. Therefore, it has never been widely deployed. However, embedded in the method is a way to produce products in lexicographic order. I have used that portion of Shamir's method ever since without embracing the entire method. My idea now is to embrace the entire Shamir method except to apply it to an entire coset of positions at the same time rather than applying it only to a single position. My preliminary testing suggests that using the Shamir method on a large number of positions at the same time will produce acceptable performance that applying the Shamir method to a single position does not.
My motivation for using the full Shamir method at this time is that producing products in lexicographic order only works for products of 2 sets of positions. On a desktop/laptop class machine, it is possible to store out to 7 moves from Start in the face turn metric and out to 8 moves from Start in the quarter turn metric. That makes it possible to use lexicographic ordering to visit positions out to 14 moves from Start in the face turn metric and out to 16 moves from Start in the quarter turn metric. But the diameter of the Cube group in the face turn metric is 20 moves from Start and in the quarter turn metric is 26 moves from Start.
Shamir's method is based on producing products of 4 sets of positions. Computer memory in 1987 was very limited as compared to today and it was possible to store all the positions only out to 4 moves from Start. In Alan's email, positions of length N from Start were stored in an array called P[N], so positions could be stored out to P[4]. Therefore, the set of products taken from P[4]P[4]P[4][4] would visit all the positions that are 16 moves from Start. If we are seeking a solution for a particular position ζ, we can write the problem as ζ = P[4]P[4]P[4][4]. But an extraordinarily long time would be taken to form all the products in P[4]P[4]P[4][4] and to hope that eventually one of them would be ζ. Therefore, the Shamir method rewrites the equation as ζP[4]P[4] = P[4]P[4] and it produces the products on both sides of the equation in lexicographic order.
All the positions in P[8] will be in the product P[4]P[4]. Shamir's method is therefore equivalent in a sense to ζP[8] = P[8]. A justification for this equation is that ζP[8] contains all the positions that are 8 moves from ζ. If ζ is 16 moves from Start, then at least one position which is 8 moves from ζ will also be 8 moves from Start. Therefore, finding a position that is in both ζP[4]P[4] and in P[4]P[4] (which is to say, finding a position that is in both ζP[8] and P[8]) is equivalent to finding a 16 move solution for ζ.
Finding a solution to ζP[8] = P[8] is most convenient if ζP[8] is in lexicographic order and if P[8] is in lexicographic order. The bulk of Alan's message discussed how to produce P[8] = P[4]P[4] in lexicographic order. The message didn't discuss how to identify positions in P[4]P[4] that are less than 8 moves from Start. I suppose that part of the problem was left as an exercise for the student. I have discussed my approach to that part the problem in previous messages. There is also a bit of Alan's message that discusses how to produce ζP[8] = ζP[4]P[4] in lexicographic order, and doing so requires only a fairly minor adjustment to the idea of producing P[8] = P[4]P[4] in lexicographic order.
I have come to think that it might be better to write the idea for Shamir's method as the expression ζP[4]P[4] ∩ P[4]P4] rather than writing the idea as an equation. If the expression ζP[4]P[4] ∩ P[4]P4] is not null, then the result identifies a 16 move solution for ζ. And if the expression ζP[4]P[4] ∩ P[4]P4] is null, then there is not a 16 move solution for ζ.
Mathematically, there is no requirement for ζP[4]P[4] or P[4]P4] to be produced in lexicographic order in order to calculate their intersection. But computationally, it is much easier to calculate the intersection of two large sets if each of the sets is in lexicographic order. In the case of the Shamir method lexicographic order is not just easier but it is completely necessary because neither of the two large sets is resident in memory and we have to produce the elements of each of the sets one at a time.
The Shamir method requires that positions be represented as words which are permutations on an alphabet that represents the cubies. ζ will work as a coset rather than as an individual position in the expression ζP[4]P[4] ∩ P[4]P4] provided two conditions are met. The conditions are that ζ must be a left coset of a stabilizer subgroup and that the stabilizer subgroup must fix the leftmost k letters of a word that represents a Rubik's Cube positions. In combination, these two conditions mean that the coset ζ is defined as all the positions for which the leftmost k letters of the words which represents Rubik's Cube position are the same.
My programs store Rubik's Cube positions as 20 letters. 8 of the letters are taken from an alphabet of 24 letters to represent the position and twist for each of the 8 corner cubies. 12 of the letters are taken from an alphabet of 24 letters to represent the position and flip for each of the 12 edge cubies. Therefore, ζ will work very nicely as a coset in the expression ζP[4]P[4] ∩ P[4]P4]. Namely, the coset can be defined as all the positions for which the first k of the 20 letters are the same.
If the number k of letters which define coset the coset ζ is large enough, then the number of positions in the coset ζ can be made small enough to store in memory. For example, if we use all 20 letters to define the coset ζ, then there is only 1 position in the coset and the idea of using Shamir's method with a coset reduces to Shamir's original idea of using the method for only a single position. We will want the cosets to be larger than containing just 1 position, and the choice of k to determine coset size is an important design decision for the program.
Generally speaking, using Shamir's method for cosets instead of for individual positions will be most efficient if the cosets are as large as possible. Therefore we will make the number k of letters as small as possible and still fit the coset into memory. And rather than storing positions in memory, we will us a single bit as a Boolean variable for each position to keep track of whether that position has been visited or not.
It is very intuitive to choose ζ to be a coset based on the 8 letters representing the 8 corner cubies being the same for all positions in the coset, or else to choose ζ to be a coset based on the 12 letters representing the 12 edge cubies being the same for all positions in the coset. In addition to being very intuitive, either of these choices has the advantage of simplifying the symmetry and antisymmetry calculations required to reduce the size of the search space by symmetry and antisymmetry. In particular, if the corner cubies have no symmetry or antisymmetry, then no symmetry or antisymmetry calculations need be performed on the edge cubies, and vice versa.
When I have tried basing the coset ζ on the 8 corner cubies, there seem to be too few cosets and the cosets are too large. And when I have tried basing the coset ζ on the 12 edge cubies, there seem to be too many cosets and the cosets are too small. Therefore, the new design compromises by having two levels of cosets. For the first level of cosets, the coset ζ is based on the 8 corner cubies. The coset is defined as the first k letters where k is 8. For the second level of cosets, the coset ζ is partitioned into 528 subcosets based on the first 2 edge cubies. A subcoset is therefore defined as its first k letters where k is 10. The 11^{th} through the 20^{th} letters can take on any legal values and their configuration defines a particular position within the subcoset.
We denote the subcosets as ζ_αβ where α and β are the first 2 of the 12 letters which define the position of the 12 edge cubies. The first of the 528 subcosets is ζ_ab and the last of the 528 subcosets is ζ_xw. The choice of 2 of the 12 letters for the edge cubies to define the subcosets is somewhat arbitrary but it seems the best choice. Such a coset contains 928,972,800 positions which in turn requires 928,972,800 visit bits which can be stored in 116,121,600 bytes. More visit bits than that cannot be stored on a typical desktop/laptop machine. For example, using the first 9 letters to define a coset instead of the first 10 there would be 20,437,401,600 visit bits to be stored in 2,554,675,200 bytes, and each thread needs its own set of visit bits.
The 528 subcosets arise because there 24 ways to choose the first edge cubie and its flip, and there are 22 ways to choose the second edge cubie and its flip. For most cosets, the symmetry and antisymmetry of the full cube can be based on the symmetry and antisymmetry of the 8 corner cubies alone. In these cases, the two edge cubies required to define the subcoset do not change the symmetry and antisymmetry characteristics of the full cube.
Because we will be treating ζ as a coset rather than as a complete cube position, we need to look a little more rigorously into the original Shamir method. Essentially, the original Shamir method is looking for cases where the intersection ζ ∩ P[4]P[4]P[4]P[4]is not null and it does so by finding cases where the intersection ζP[4]P[4] ∩ P[4]P4] is not null. The method requires four separate copies of P[4]. We write these copies as Σ[4], Τ[4], Π[4], and Φ[4]. and the intersections become ζ ∩ Σ[4]Τ[4]Π[4]Φ[4] and ζΦ^{1}[4]Π^{1}[4] ∩ Σ[4]Τ[4].
The sets Φ[4] and Π[4] become the sets Φ^{1}[4] and Π^{1}[4] when they are moved to the left side of the intersection. On its face, that is a distinction without a difference because Φ^{1}[4] = Φ[4] and Π^{1}[4] = Π[4]. In the case of the original Shamir method, it is indeed a distinction without a difference. But when ζ is a coset rather than being an individual position, we will need to multiply out the cubies to the right of the k^{th} cubie. In other words, we use φ^{1} and π^{1} in the expression ζφ^{1}π^{1} ∩ στ when we are processing the first k letters to get into the coset ζ. But we use π and φ in the expression ζ ∩ στπφ when we are multiplying out the letters after the first k letters to identify the position within the coset ζ.
As we have just seen, the program needs to use both π and π^{1}, and it needs to use both φ and φ^{1}. It is perhaps less obvious, but the program also needs to use both τ and τ^{1}. The only inverse the program does not need is σ^{1}. To avoid calculating inverses an excessive number of times, the program stores a position p as an ordered pair of the form (p,&p^{1}). The & character is the C++ operator for addresses, so the program is storing ordered pairs which are a position p and the address of the inverse p^{1}. Storing the address of an inverse requires much less memory than does storing the actual inverse, and it requires only a single extra hardware instruction to fetch the address of the inverse and then to fetch the inverse as compared to just fetching the inverse.
Just as the program stores inverses as pointers, it stores the arrays Σ[N], Τ[N], Π[N], and Φ[N] as pointers to the array P[N].
 Σ[N] is an exception. It does not have to be stored as pointers nor stored at all. Rather, P[N] serves as Σ[N] and no extra memory is required for Σ[N]. The program creates P[N] during program initialization. Thereafter, P[N] is readonly and is thread safe. It is shared in his manner freely by all the threads as if it were Σ[N].
 The program stores Φ[N] as pointers to P[N]. The program creates Φ[N] once per coset Ζ and sorts it so that ζΦ^{1}[N] is in lexicographic order. The program creates Φ[N] while it is in single threaded mode. Thereafter Φ[N] is readonly and is thread safe. It is shared by multiple threads as they process the 528 subcosets of ζ.
 The program stores Τ[N] and Π[N] as pointers to P[N]. Each thread creates its own local copy of Τ[N] and Π[N] because each thread needs to sort Τ[N] and Π[N] in a different order. Within a thread, Τ[N] and Π[N] need to be sorted in the same order as each other. This provides an opportunity for Τ[N] and Π[N] to be literally the same array in memory and to be sorted only once. This opportunity needs to be investigated because there are times that Τ[N] and Π[N] need to sorted only piecewise and in part. It would therefore be a tricky coding problem to share Τ[N] and Π[N] as the same array.
 The sorting of Τ[N] and Π[N] is the most computationally expensive part of producing products in lexicographic order. The sorting of Σ[N] and ζΦ^{1}[N] is the computationally least expensive part of producing products in lexicographical order. Therefore, the program makes N as large as possible for Σ[N] and Φ[N] and the program makes N as small as possible for Τ[N] and Π[N].
Our Shamir method example so far has been about positions that are 16 moves from Start. More generally, we need to produce positions at any given distance from Start. For example, we could use ζΦ^{1}[5]Π^{1}[4] ∩ Σ[7]Τ[2] to find positions of length 18 because 5 + 4 + 7 + 2 = 18.
In order to visit all positions in ζ out to P[20] in the face turn metric, we need to calculate in turn something like the following table. The significance of the table is that positions which are N moves from Start are visited and counted before positions which are N + 1 moves from Start. Therefore, each position is counted at its correct distance from Start. Elements of ζ which are 20 moves from Start are included in the table for completeness. But they don't have to be visited because any position remaining unvisited up to 19 moves from Start must be 20 moves from Start. The program does have to make note of and count any positions that remain unvisited up to 19 moves from Start. But such positions do not actually have to be visited.
ζΦ^{1}[0]Π^{1}[0] ∩ Σ[0]Τ[0] elements of ζ of length 0
ζΦ^{1}[0]Π^{1}[0] ∩ Σ[1]Τ[0] elements of ζ of length 1
ζΦ^{1}[0]Π^{1}[0] ∩ Σ[2]Τ[0] elements of ζ of length 2
ζΦ^{1}[0]Π^{1}[0] ∩ Σ[3]Τ[0] elements of ζ of length 3
ζΦ^{1}[0]Π^{1}[0] ∩ Σ[4]Τ[0] elements of ζ of length 4
ζΦ^{1}[0]Π^{1}[0] ∩ Σ[5]Τ[0] elements of ζ of length 5
ζΦ^{1}[0]Π^{1}[0] ∩ Σ[6]Τ[0] elements of ζ of length 6
ζΦ^{1}[0]Π^{1}[0] ∩ Σ[7]Τ[0] elements of ζ of length 7
ζΦ^{1}[7]Π^{1}[0] ∩ Σ[1]Τ[0] elements of ζ of length 8
ζΦ^{1}[7]Π^{1}[0] ∩ Σ[2]Τ[0] elements of ζ of length 9
ζΦ^{1}[7]Π^{1}[0] ∩ Σ[3]Τ[0] elements of ζ of length 10
ζΦ^{1}[7]Π^{1}[0] ∩ Σ[4]Τ[0] elements of ζ of length 11
ζΦ^{1}[7]Π^{1}[0] ∩ Σ[5]Τ[0] elements of ζ of length 12
ζΦ^{1}[7]Π^{1}[0] ∩ Σ[6]Τ[0] elements of ζ of length 13
ζΦ^{1}[7]Π^{1}[0] ∩ Σ[7]Τ[0] elements of ζ of length 14
ζΦ^{1}[7]Π^{1}[1] ∩ Σ[6]Τ[1] elements of ζ of length 15
ζΦ^{1}[7]Π^{1}[1] ∩ Σ[7]Τ[1] elements of ζ of length 16
ζΦ^{1}[7]Π^{1}[2] ∩ Σ[6]Τ[2] elements of ζ of length 17
ζΦ^{1}[7]Π^{1}[2] ∩ Σ[7]Τ[2] elements of ζ of length 18
ζΦ^{1}[7]Π^{1}[3] ∩ Σ[6]Τ[3] elements of ζ of length 19
ζΦ^{1}[7]Π^{1}[3] ∩ Σ[7]Τ[3] elements of ζ of length 20
The arrangement of the sets of positions and their products and intersections above perhaps requires a bit of explanation and justification.

 We let N_{Σ} be the distance from Start for positions in Σ.
 We let N_{Τ} be the distance from Start for positions in Τ
 We let N_{Π} be the distance from Start for positions in Π.
 We let N_{Φ} be the distance from Start for positions in Φ.
 The general form of the expression for elements of each length is ζΦ^{1}[N_{Φ}]Π^{1}[N_{Π}] ∩ Σ[N_{Σ}]Τ[N_{Τ}] where the length of the positions being visited is N_{Σ}+N_{Τ}+N_{Φ}+N_{Π}. All cases where this expression is not null represent situations where the product στπφ is in the coset ζ.
 For elements of ζ of length 0 through 7, the expression Σ[N_{Σ}]Τ[0] ∩ ζΦ^{1}[0]Π^{1}[0] reduces mathematically to Σ[N_{Σ}] ∩ ζ. In other words, all the program has to do for elements of ζ of length 0 through 7 is to look them up in Σ[N_{Σ}]. However, the program uses the more complicated expression so that the computer code is the same for elements of ζ of all lengths. There is a tiny bit of extra processing to evaluate the more complicated expression, but the extra processing is so small that it cannot be measured.
 For elements of ζ of length 8 through 14, the expression Σ[N_{Σ}]Τ[0] ∩ ζΦ^{1}[7]Π^{1}[0] reduces mathematically to Σ[N_{Σ}] ∩ ζΦ^{1}[7]. However, the program uses the more complicated looking expression so that the computer code is the same for elements of ζ of all lengths. There is a tiny bit of extra processing to evaluate the more complicated looking expression, but the extra processing is so small that it cannot be measured.
 N_{Φ} only takes on the values 0 or 7 so that arrays containing positions of intermediate length do not have to be created for Φ[N_{Φ}]. Arrays containing positions of intermediate length for Σ[N_{Σ}] already exist without needing to be created again. Σ[N_{Σ}] of different lengths is therefore used with Φ^{1}[7] to create products of length 8 through length 14.
 After finding a product στπφ such that the product of their first 10 letters is in the coset ζ, the product of the last 10 letters is produced to determine a position within the coset ζ. The visit bit for the position is then tested. If the visit bit is already on, the position is a duplicate and is discarded. Otherwise, the visit bit is turned on and the position is counted.