# More details about my new program

Introduction

On 02/23/2016, I posted a message about a new program I had developed that had succeeded in enumerating the complete search space for the edges only group. It was not a new result because Tom Rokicki had solved the same problem back in 2004, but it was important to me because the problem served as a testbed for some new ideas I was developing to attack the problem of the full cube. I am now in the process of adapting the new program to include both edges and corners. In this message, I will include some additional detail about my new program that was not included in the first message.

Review

First, here is a little review of what I have already posted. Let P[k] be the set of all positions of length k. The program first calculates and stores P, P, P, etc. until there is not enough memory to proceed further. To calculate the edges group in the face turn metric, the program is able to calculate and store through P. Having done so, the program calculates P as P = P × P without storing P, the program calculates P as P = P × P without storing P, etc. and ends when it calculates P as P = P × P without storing P. There is nothing novel about this approach and I think it's about the same way that everybody does it.

The key problem with this approach it to keep track of which positions have been visited so that they can be counted only once. For many years my approach has been to produce positions in lexicographic order. For the new program, I have changed to having a bit array with one bit per position, and with the bit for a given position being turned on when that position has been visited. Again, there is nothing novel about this approach and I think it's probably the way that most everybody else does it. For the edges group, there are nearly 1012 positions which is too many bits for desktop machines. But I will begin the description with the fiction that all the bits can be stored in memory at the same time. It will be a straightforward matter later on to describe the approach to partitioning the problem into cosets that are small enough that all the bits for a given coset actually can be stored in memory at the same time.

The program operates on permutations of the cube on a color tab by color tab basis. The program considers the 24 color tabs for the edges to be labeled from a to x. Each cubie is considered to have a primary color tab and a secondary color tab. The primary color tabs are labeled from a through l, and the secondary tabs are labeled from m through x. The secondary color tab for each cubie is labeled with the letter from the primary color tab shifted to the right 12 letters in the alphabet. So the a and m color tabs are on the same cubie, the b and n color tabs are on the same cubie, etc. A cubie is represented by its primary color tab when it is not flipped and by its secondary color tab when it is flipped. Therefore, each position is represented by a string of 12 letters from an alphabet of 24 letters. We refer to these strings of 12 letters as words. We consider the letters in the words to be the 0-th through the 11-th letters rather than the 1-st through the 12-th letters.

To form products, the program forms the words that make up the products one letter at a time from left to right. A substring of letters at the beginning of a word has the effect of defining a coset. As the program adds letters to a word, the letters define a sequence of nested cosets. After the 0-th letter has been added to a word, the program will have defined a 1-letter coset. After the 1-st letter has been added to a word, the program will have defined a 2-letter coset. Etc. This method of counting can seem a little confusing, but it is very useful in practice.

The aforementioned nested cosets are based on stabilizer subgroups. The subgroup H0 fixes the 0-th letter. The subgroup H1 fixes the 0-th and 1-st letters. Etc. Each 1-letter coset is a right coset of H0 and is defined by its 0-th letter. Each 2-letter coset is a right coset of H1 and is defined by its 0-th and 1-st letters. Etc.

Alphabets and Indexings

In several sections that follow, there will be discussions of alphabets and indexings for those alphabets. The motivation for these alphabets and indexings is to calculate a bijection between the permutations of the cubies and the set of integers from 0 to |G| - 1. The integers in turn are used as indexes into the bit arrays which remember which positions have been visited and which sub-cosets have been completed. There is a more traditional way to calculate such a bijection, but the approach with alphabets ad indexings is faster for this particular problem than is the traditional approach.

Three Algorithms

Within this basic scheme, the program implements three algorithms. The fastest of the three algorithms we call Algorithm #3. In limiting conditions, Algorithm #3 reduces to a somewhat slower and simpler algorithm we call Algorithm #2. In limiting conditions, Algorithm #2 reduces to a somewhat slower and even simpler algorithm we call Algorithm #1. The limiting conditions occur rather frequently, and all three algorithms have many elements in common. Indeed, the three algorithms are a single integrated piece of computer code which is a single integrated algorithm. We break the single integrated algorithm down into three algorithms only for ease of exposition. We start with Algorithm #1.

Algorithm #1     ζ = σ × τ

Consider the calculation of P as P = P × P. We speak more generally of calculating a set of permutations Z as Z = S × T. Hence, P = P × P is a special case where Z = P, S = P, and T = P. In the general case, Algorithm #1 forms Z as the set of all products of the form ζ = σ × τ for each σ in S and for each τ in T, yielding each ζ in Z. We use Greek letters for the set elements rather than z = s × t simply to avoid any confusion with the letters making up the permutations. Algorithm #1 forms the products σ × τ without any particular regard for the order in which the σ values are chosen from S and the τ values are chosen from T. The products are formed left to right and one letter at a time for each of the 12 letters in ζ and certain processing is performed after the calculation of each letter of ζ.

We choose examples for σ and τ such that σ = fbxcajedgkti and τ = axcpbeghijkf. Hence, we set up our product ζ = σ × τ as

ζ              σ              τ
???????????? = fbxcajedgkti × axcpbeghijkf

We highlight the letters we need to look at to calculate the 0-th letter of ζ as

ζ              σ              τ
???????????? = fbxcajedgkti × axcpbeghijkf

Hence, we have

ζ              σ              τ
e??????????? = fbxcajedgkti × axcpbeghijkf

The general rule for forming the products is that position of the letter in the word for ζ has to match the position of the letter in the word for σ, the position of the letter in τ has to match the value of the letter in σ, and the value of the letter in ζ has to match the value of the letter in τ. Because only 12 color tabs are stored instead of 24, if a color tab in σ is in the second half of the alphabet, it maps to a color tab that is not stored in τ. When that happens it means that the letter in σ represents a flipped cubie. Therefore, the unflipped color tab for that cubie in σ is mapped to a color tab in τ and the color tab found at that position in τ is flipped to its other color tab to produce the value for ζ.

Before proceeding to calculate any of the additional letters, the program perform some calculations on the partial product e???????????. The 0-th letter in the product is e and in general it could have been any of a through x. We consider the 0-th letter of the product to have been taken from the alphabet ω0 where

ω0 = a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x

The definition of an alphabet is a finite sequence, so an alphabet is not just a set. Rather, the definition of an alphabet produces an ordering on the elements of the alphabet. At this point in the calculation, the program has the alphabet ωi where i =0. We also define a master alphabet ω where the master alphabet is

ω = a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x

and where ωi = ω at this point. However, in general it will not be the case that ωi = ω.

We also define an indexing on the alphabet ω0 with respect to the master alphabet ω. The indexing γ0 of the alphabet with respect to the master alphabet ω is the array

γ0 = 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23

That means that the letter a in ω0 is the 0-th letter of ω, the letter b in ω0 is the 1-st letter of ω, etc. As the program gets further into the calculation of the product, the current alphabet ωi will change as each new letter is added to the product but the master alphabet ω will not change. Therefore, the indexing γi of the current alphabet ωi will change as each new letter as added to the product.

The 0-th letter of ζ = e??????????? is e, and γ0(e) = 4. That means that e is the 4-th letter of the master alphabet ω. There are two useful interpretations of the index γ0(e). The more traditional interpretation is that we may consider 4 to be a coefficient of a mixed based number χ. The mixed based number χ indexes the product ζ within the entirety of the edges group G. If |G| is the order of the edge group G, the mixed base for the 0-th coefficient is |G| / 24 and the contribution of the coefficient 4 to the overall number χ(ζ) is 4 × |G| / 24.

Instead, our interpretation for γ0(e) = 4 will be as an index into cosets. The 0-th letter of ζ partitions the overall edges group G into 24 cosets. We define an indexing χ0 where χ0 = γ0. For subsequent letters in ζ, the indexing χi will be calculated from γi but it will not equal γi. For the 0-th letter in ζ, because γ0(e) = 4, we are in the 4-th 1-letter coset of G. Informally, we may think of the 24 different 1-letter cosets as the a coset, as the b coset, and so forth through the x coset. For 1-letter cosets, the program has a bit array υ0 of 24 bits, one bit per coset. Having set χ0 equal to γ0, χ0 serves as an index into υ0. The bit in υ0 for each coset will be turned on when every position in that coset has been visited. When all the bits in υ0 have been turned on, the program will know that the problem has been completed.

As we begin our calculations for the second letter in ζ, we may observe that we have a new alphabet

ω1 = a,b,c,d,f,g,h,i,j,k,l,m,n,o,p,r,s,t,u,v,w,x

ω1 contains 22 letters rather than 24 letters.  It omits e and q because we have already used the e-q cubie.

The indexing of the alphabet ω1 with respect to the master alphabet ω is

γ1 = 0,1,2,3,~,4,5,6,7,8,9,10,11,12,13,14,~,15,16,17,18,19,20,21

In other words, there are still 24 items in the list for the indexing γ1 even though there are only 22 items in the alphabet ω1. The letters e and q are not indexed because they are no longer in the alphabet. The reason for this indexing model rather than defining γ1 as

γ1 = 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21

is to enable γ1 to be referenced directly by the elements of ω1 rather than needing to perform some sort of search of ω1 to discover the location of each letter in the alphabet.

With that in mind, we can calculate the second letter of ζ as

ζ              σ              τ
ex?????????? = fbxcajedgkti × axcpbeghijkf

x is the 21-st letter of ω1 which we can determine by looking at the 23-rd entry of γ1 and where x is the 23-rd letter in the master alphabet ω. Again, there are two useful interpretations of this index. The more traditional is that we may consider 21 to be the coefficient of a mixed based number χ where the base for this digit is |G| / (24 × 22) and the contribution of the coefficient 21 to the overall number χ(ζ) is 21 × |G| / (24 × 22).

Our interpretation for the index γ1(x)=21 will be that the index partitions the e coset of the edges group G into sub-cosets where each sub-coset is 1/22 of the e coset. When the 1-st letter of ζ is x, we are in the 21-st of the sub-cosets of the e coset. Informally, we may think of the cosets which are sub-cosets of the e coset as the ea coset, as the be coset, and so forth through the ex coset. There are 24 × 22, or 528 different 2-letter cosets, which we index from 0 to 527. Rather than thinking of γ1(x)=21 as a mixed base coefficient for the overall number which indexes the position ζ within the overall edges group G, we use γ1(x)=21 to calculate an index from 0 to 527 among all the 2-letter cosets. The calculation is χ1 = 24 × χ0(e) + γ1(x). In this case the calculation yields 117, meaning that the ex coset is the 117-th out of 528 different 2-letter cosets. The index χ1 = 117 is therefore an index into the 528 different 2-letter cosets.

For ω1, we will have a bit array υ1 of 528 bits, one bit per coset. The bit in υ1 for each coset will be turned on when every position in that coset has been visited. We may think of the 528 bits in υ1 as 24 strings of 22 bits each. Each string of 22 bits corresponds to a 1-letter coset. When all 22 bits in υ1 for a particular 1-letter coset have been turned on, we will know that particular 1-letter coset has been completed and we turn on the bit in υ0 for that particular 1-letter coset. At the same time we are turning on a bit in υ0, we check to see if all the bits in υ0 are turned on to determine if the problem has been completed.

The same general process continues for each letter αi in ζ. We have a new alphabet ωi for each letter and a new indexing γi for each letter. We have a new bit array υi for each letter. As we add letters, the bit arrays υi contain respectively 24 bits, then 24 × 22 bits, then 24 × 22 × 20 bits, etc. For each new letter αi, we determine γii) and determine an indexing χi to index the bit arrays. The calculation is χi = (24 − 2 × (i − 1)) × χ(i − 1) + γii). The calculation is much simpler than it looks and is very fast. The part about (24 − 2 × (i − 1)) reduces to 24 for the 1-st letter, to 22 for the 2-nd letter, to 20 for the 3-rd letter, etc. It doesn't actually have to be calculated because it's simply the number of letters in the current alphabet ωi.

The next to last letter is the 10-th letter. After the 10-th letter, χ10 will be an index into the collection of 11-letter cosets, but the 11-letter cosets each contain only one element of the edge group G. Therefore, we may interpret χ10 as index into the group elements themselves.

The calculation χi = (24 − 2 × (i − 1)) × χ(i − 1) + γi is applied recursively from the 1-st letter out through the 10-th letter. The result for χ10 is the same as would be calculated for χ(ζ) by treating all the γ values as mixed based coefficients.

The last letter is the 11-th letter. There is only one way to choose the 11-th letter, so it need not enter into the indexing calculations. If we did choose to calculate χ11, it would have the same value as χ10. Therefore, when the program reaches the 10-th letter, it tests the bit indexed by χ10. If the bit is not on, it is turned on and the position is counted. If the bit is on, the position is a duplicate position. The program backtracks without counting the duplicate position.

The duplicate position performance problem

Strictly speaking, the program would not need any bit arrays other than υ10 because it is υ10 that keeps track of which positions have been visited and which have not. The reason for the bit arrays after each letter is that the program runs into a severe performance problem towards the end of the search because at that point a very high percentage of positions are duplicate positions and calculating them is a waste of time. The bit arrays after each letter attempt to detect duplicate positions after the calculation of as few a number of letters in ζ as possible.

Therefore, as it turns on a visit bit in υ10 for an 11-letter coset, the program determines if all 4 of the positions associated with the containing 10-letter coset have been visited. If so, it turns on the respective coset completed bit in υ9 for the containing 10-letter coset. As it turns on a coset completed bit in υ9 for a 10-letter coset, the program determines if all 6 of the cosets associated with the containing 9-letter coset have been completed. If so, it turns on the respective coset completed bit in υ8 for the containing 9-letter coset. The program follows the same procedure all the way back to the 0-th letter. It stops the backtracking procedure of turning on visit bits for larger and larger cosets when every position has been visited or when not all the containing cosets at a particular level in the backtracking procedure have been completed.

In turn, as it is moving forward and calculating additional letters for each word ζ, the program tests the visit bit or coset completed bit associated with the coset indicated by the letter most recently calculated. If the bit is on, the partial product is abandoned and the program tries the next letter in the alphabet as the rightmost letter in the current substring of letters at the beginning of the word. This process has very little effect on the performance of the program in the early part of the enumeration of all the positions in a coset. But towards the end of the enumeration of all the positions in a coset, the performance of the program is greatly enhanced because duplicate positions are detected before all of their letters are calculated.

Partitioning the problem into cosets

As previously mentioned, 1012 visit bits are too many bits for the memory of typical desktop machines. Partitioning the problem into 1-letter cosets requires about 1012/24 visit bits which is still too many for a typical desktop machine. Partitioning the problem into 2-letter cosets requires about 1012 / (24 × 22) visit bits. This is about 0.3GB which will fit on most desktop machines. Hence, the program is designed to be able to partition the problem into 2-letter cosets or into 3-letter cosets etc. through 10-letter cosets. The fewer letters defining the cosets, the larger the cosets and vice versa. Most typically, I run the program with large cosets, for example with 2-letter cosets to 4-letter cosets. When symmetry is not considered by the program, the fastest run times are with the largest cosets so I usually run the program with 2-letter cosets. When symmetry is considered by the program, the fastest run times for the program are with 4-letter cosets.

The accommodations that Algorithm #1 makes when the problem is partitioned into cosets are as follows.

• The number of visit bits is reduced appropriately.
• For each σ × τ product, if the product is not in the current coset after the current letter in ζ is calculated, the product can be abandoned and the program can go on to the next product. Initiating a product and abandoning the product when it proves to be in the wrong coset is an inelegant procedure. But Algorithm #2 and Algorithm #3 make sure that products are in the current coset. Algorithm #1 is not a free standing algorithm but rather is only a part of Algorithm #2 and Algorithm #3. Therefore, the program doesn't actually initiate products that are not in the current coset.
• Until the number of letters calculated exceeds the number of letters that define the coset, the coefficient γi is set to 0. The indexing χi is derived from γi and therefore also is set to zero until the number of letters calculated exceeds the number of letters that define the coset. For all subsequent letters, the indexing formula calculates the correct indexing values for the array of visit bits.
• The program is multi-threaded and each thread operates on one coset at a time. There is a function called next_coset that each thread calls to obtain the next coset to process. The counters and accumulators and totals for each coset are added together to obtain the overall totals for the entire search space. The multiple threads operate completely independently of each other except that a lock is used to serialize the next_coset function and a lock is used to serialize the adding together of the overall totals from each coset. The portion of time that the multiple threads hold locks is so small that in practice the multiple threads never lock each other out.

Algorithm #2     Z = S × τ

Algorithm #2 differs from Algorithm #1 in that it forms the i-th letter of the product for many different values of S at the same time. To see how this works, we revisit our product

ζ              σ              τ
e??????????? = fbxcajedgkti × axcpbeghijkf

We observe that we can calculate the 0-th letter of ζ by considering only the 0-th letter of σ without regard to the value of any other letter of σ. In this case, the 0-th letter of σ is f.

Z              S              τ
e??????????? = f??????????? × axcpbeghijkf

There may be many different words in S that begin with the same letter. If we could group these letters together somehow or other, then we could calculate the 0-th letter for perhaps millions of products S × τ in about the same time as is required to calculate the 0-th letter for a single product σ × τ. Serendipitously, the process of producing the sets of words P[k] places the words in lexicographic order. Therefore, the time required to determine the range of all the letters in S which start with a particular letter is very minimal and for all practical purposes we may treat this time as zero. It is not only the calculation of the 0-th letter of the product where we can calculate perhaps millions of partial products in the same time as producing a single product, it's also the alphabets ωi, the coefficients γi, and the coset indexing χi that can be calculated for perhaps millions of different words all at the same time.

The processing described in the previous paragraph for calculating the 0-th letter of ζ is repeated in an iterative fashion for the 1-st letter, for the 2-nd letter, etc. Each time the process is repeated for the next letter, the range of S values being processed is reduced considerably in size. After the 0-th letter, the list will be about 1 / 24 of the original list. After the 1-st letter, the list will be about 1 / 22 of the reduced list or about 1 / (24 × 22) of the original list. Even with many millions in the original S list, the list becomes very short very quickly as letters are added to the product.

We may think of Algorithm #2 as producing the product Z = S × τ. Algorithm #2 is more efficient than is Algorithm #1, but the advantage of Algorithm #2 over Algorithm #1 becomes less and less as successive letters are calculated. That's because the number of words in S which have the same first two letters is less than the number of words in S which have the same first letter, the number of words in S which have the same first three letters is less than the number of words in S which have the same first two letters, etc. By about the 5-th or 6-th letter, the number of words in S which have the same letters at the beginning of the word is usually reduced to just one word and Algorithm #2 reduces itself to Algorithm #1. Nevertheless, it proves very useful to implement the program using Algorithm #2 rather than using Algorithm #1 because the overall program performance is better.

It's useful to consider what happens if the first letter of S takes on each possible value. For the first half of the alphabet as the first letter of S, here are the results.

Z      =        S     ×      τ

a??????????? = a??????????? × axcpbeghijkf
x??????????? = b??????????? × axcpbeghijkf
c??????????? = c??????????? × axcpbeghijkf
p??????????? = d??????????? × axcpbeghijkf
b??????????? = e??????????? × axcpbeghijkf
e??????????? = f??????????? × axcpbeghijkf
g??????????? = g??????????? × axcpbeghijkf
h??????????? = h??????????? × axcpbeghijkf
i??????????? = i??????????? × axcpbeghijkf
j??????????? = j??????????? × axcpbeghijkf
k??????????? = k??????????? × axcpbeghijkf
f??????????? = l??????????? × axcpbeghijkf

For the second half of the alphabet as the first letter in S, we have to make an adjustment for the fact that we are storing only the 12 letters of τ rather than 24 letters. Therefore, we must infer the 12 letters we are not storing from the 12 letters that we are storing. The results are as follows.

Z      =        S     ×      τ

m??????????? = m??????????? × axcpbeghijkf
l??????????? = n??????????? × axcpbeghijkf
o??????????? = o??????????? × axcpbeghijkf
d??????????? = p??????????? × axcpbeghijkf
n??????????? = q??????????? × axcpbeghijkf
q??????????? = r??????????? × axcpbeghijkf
s??????????? = s??????????? × axcpbeghijkf
t??????????? = t??????????? × axcpbeghijkf
u??????????? = u??????????? × axcpbeghijkf
v??????????? = v??????????? × axcpbeghijkf
w??????????? = w??????????? × axcpbeghijkf
r??????????? = x??????????? × axcpbeghijkf

If we process S in lexicographic order, Z will not be produced in lexicographic order. But because we have a bit array with one bit per position, we do not need to produce Z in lexicographic order. Nevertheless, most early versions of the new program did produce Z in lexicographic order. The early versions did so by processing S in an order that was not lexicographic and where the order of S was chosen so that Z would be produced in lexicographic order. The order required for S depends on τ, and for the particular τ value in this example, the required order for S to produce Z in lexicographic order is a, e, c, p, f, etc. You can determine the required S values simply by looking at the table above. The computer code to accomplish the same thing does so by calculating τ-1. The best way to see why the τ-1 trick works is simply to rearrange the equation Z = S × τ into the equation Z × τ-1 = S and for a fixed τ-1 letting Z be an independent variable that varies in lexicographic order. The Z × τ-1 = S version of the equation determines the order in which S must appear.

The Z × τ-1 = S equation is what I have used for over two decades to produce products in lexicographic order, and I suppose it was only natural to continue to use this technique in the new program even though it was not strictly speaking necessary. I should emphasize that the Z × τ-1 = S equation alone is not sufficient to eliminate the need for a bit array to keep up with which positions have been visited and which positions have not. To eliminate the need for the bit array, it is necessary to produce all the products in lexicographic order, not just the products for a particular τ. And to produce the products in lexicographic order it is necessary to use the tournament of losers algorithm or something similar to it to combine the results of the Z × τ-1 = S equation for every different possible value of τ

When I developed the new program, one of the goals was to eliminate the tournament of losers algorithm because it is slow and it requires a great deal of memory. What I didn't realize until very late in the development of the new program is that the Z × τ-1 = S equation itself includes a performance problem even after eliminating the tournament of losers algorithm. The problem with the Z × τ-1 = S equation is fairly subtle. The equation requires constant looking up of the value of Z × τ-1 in S. In and of itself, that's ok and it's no worse than looking up all the words in S that begin with the same letter. The problem rather is that after enough letters of ζ are calculated - say five or six letters or so - the list of S values will have been reduced to a sub list containing very few elements. Under these circumstances, most Z × τ-1 values will not be found in S. And the program spends most if its time precisely in the portion of the search space where most Z × τ-1 values will not be found in S. So the program spends most of its time looking for S values that do not exist. Every program I have ever written which operated on the basis of producing positions in lexicographic order has suffered this performance problem without me ever realizing it. When I realized the problem and changed to using the Z = S × τ form of the equation, the speed of the program nearly doubled.

There was a method to the madness of using the Z × τ-1 = S equation in the early versions of the new program. Namely, if the letter calculated for S indicates a coset for which all the elements have already been visited (or for the 10-th letter, indicates a position that has already been visited), then no search for Z × τ-1 is required in S. In other words, don't try to go to a position that has already been visited. So early versions of the new program tested the visit bits in lexicographic order and only visited cosets or positions that had not already been visited. That sounds like a fairly optimal way for the program to work. Nevertheless, testing showed that using the Z = S × τ form of the equation before testing the visit bit is nearly twice as fast as using the Z × τ-1 = S form of the equation after the visit bit. And testing the visit bit first implies using the Z × τ-1 = S form of the equation.

Algorithm #3     Z = S × T

Algorithm #2 may be thought of as a generalization of Algorithm #1. Similarly, Algorithm #3 may be thought of as a generalization of Algorithm #2. Namely, instead of a single element τ of T in the equation Z = S × τ, we use all the elements of T that have the same letter in the same position that is mapped to by the chosen letter in S. For example, for the equation

Z               S            τ
x??????????? = b??????????? × axcpbeghijkf

we find all the τ values in T where the 1-st letter is x. In other words, the equation becomes

Z               S            T
x??????????? = b??????????? × ?x??????????

It is not as easy or fast to partition T in this manner as it is to partition S. In particular, the letters in S are used in a left to right fashion from a list of positions that is already in lexicographical order. The letters in T are used in a fashion where the position of the letter being processed jumps all around and the words are in no particular order with respect to the position of the letters being processed. Also, we must partition T without disturbing the lists of P[k] that are being used to represent S and T. Therefore, the program processes T by making a second copy of the P[k] list that is being used to represent T. The second copy of the P[k] list is a list of pointers to the items in P[k]. For example, to process the product P × P, the program makes a list of pointers to all the items in P. The list of pointers effectively becomes T and the program partitions T by manipulating the list of pointers.

For the equation Z = S × T, the program partitions the overall search space into cosets of Z to reduce the size of the array of visited bits. Having done so, the program operates in a multi-threaded fashion and each thread processes one coset of Z at a time. Generally speaking, there are as many threads as there are processor cores in the machine. That means that each thread has to have its own copy of the array of visited bits. In addition, each thread has to have its own copy of T. The threads can share S because S is static and is read-only, but each thread has to have its own copy of T because T is being reordered continuously.

Each position in each P[k] requires 28 bytes of storage. Each position in S therefore requires 28 bytes of storage but no extra storage is required for S in each thread other than the storage required for each P[k] in the first place. As 64 bit pointers, each T requires 8 bytes of storage and each thread has its own copy of the pointers. If the storage required for T should become a problem, the list of T pointers can be further partitioned into smaller lists which are sub-lists of the P[k] list which is being used to represent T, and each element of the partition of T can be processed separate from the other elements of the partition. Partitioning T in this manner can slow the algorithm down a little bit and it has not been necessary thus far, but the technique is available if memory used to store the T pointers in each thread becomes excessive.

Having created the list of T pointers, the list of T pointers is partitioned as required. In the example above, the list would be partitioned into a sub-list of pointers that point to all the T values where the 1-st letter is x and into a sub-list of pointers that point to all the T values where the 1-st letter is not x. The partitioning is an O(n) operation. Generally speaking, the size of the partitions after the 1-st letter is smaller than the number of partitions after the 0-th letter. The size of the partitions after the 2-nd letter is smaller than the partitions after the 1-st letter. Etc. Also, the sub-list of T values where the 1-st letter is not x is used to create subsequent partitions where the 1-st letter is other than x. Using the "not x" words to make the next partition for the next letter after x, the size of the list being partitioned is being reduced each time new partition is made even while still working on just the 1-st letter.

The previous paragraph needs further discussion. There are two different partitioning procedures that are used by the program for T. Suppose the program were processing 2-letters cosets of Z. While processing the 0-th and 1-st letters, one partitioning procedure would be used. The partitioning procedure would always produce particular letters in the product Z. While processing the 2-nd, 3-rd, and subsequent letters, a different partitioning procedure would be used. The partitioning procedure would produce all possible letters in the product Z.

We illustrate with words containing four letters and where the four letter words are taken from a four letter alphabet. We first consider the case where a letter in Z is required to have a particular value. To produce products where the 0-th letter of Z is required to be c, the partitioning would be as follows. The program first partitions S by its 0-th letter, in turn where the 0-th letter of S is a, then b, then c, and finally d. As it is partitioning S, the program partitions T into those words where the 0-th letter is c or not c, where the 1-letter is c or not c, where the 2-nd letter is were c or not c, and where the 3-rd letter is were c or not c.

Z  =  S   ×  T

c??? = a??? × c???
c??? = b??? × ?c??
c??? = c??? × ??c?
c??? = d??? × ???c

In the program itself, at least the 0-th and 1-st letter of Z will be required to have a particular value when 2-letter cosets are in use, and additional letters in Z will be required to have a particular value when cosets with additional letters are in use. The processing for the additional letters that are required to have a particular value is the same as for the 0-th letter. When requiring a particular letter of Z to have a particular value, it will sometimes happen that the program is looking for something that doesn't exist. Using the example above, it's possible when S = d??? that there are no words in T of the form ???c. That situation cannot be completely avoided, but it is rare and if it happens it takes place in the domain of the search space where the program spends the least time.

When k-letter cosets are in use, letters after the first k are not required to have any particular value. In this case, each time T is partitioned it is partitioned according to the first word remaining in the list of T values. Even though at least the 0-th and 1-st letter of Z will be required to have a particular value, we illustrate the processing when a letter of Z can have any value by using the 0-th letter as the example. To produce products where the 0-th letter can be any letter, the partitioning would first choose in turn a, b, c, and d as the 0-th letter of S. So the first approximation would be as follows.

Z  =  S   ×  T

???? = a??? × ????
???? = b??? × ????
???? = c??? × ????
???? = d??? × ????

We consider the case where S = b???. Therefore, we are partitioning T by the 1-st letter in each word, viz. T = ????. The list of words in T will have an initial word and the 1-st letter in the initial word in the list will have some value. Let's assume it's d, e.g., τ = bdac. Therefore, if we partition T into those words where the 1-st letter is d and where the first letter is not d, the partition element of those words containing d as the 1-st letter is known by construction not to be null. If instead we looked for a as the 1-st letter of T, then looked for b as the 1-st letter of T, etc., we might be looking for things that aren't there. This is the Algorithm #3 version of the idea from Algorithm #2 not to look for things that might not be there. Because Algorithm #2 is not a freestanding algorithm but rather is a special case of Algorithm #3, this is the place in the code where all three algorithms avoid looking for things that aren't there.

Depending on the order in which we discover letters as the 1-st letter in the first word remaining in T, we might end up partitioning T as follows when S = b???.

Z  =  S   ×  T

d??? = b??? × ?d??
c??? = b??? × ?c??
a??? = b??? × ?a??

For the purposes of the example, we are supposing that we find that the first word in T has d as its 1-st letter. So we partition T into those words that do have d their 1-st letter and words that don't have d as their 1-st letter. We are then supposing that the first word that doesn't have d as its 1-st letter has c as its 1-st letter and we partition the remaining words in the list into those that do and don't have c as their 1-st letter. We continue in this manner until the list of T values is exhausted. Therefore, are always partitioning the list of T values by a word in the list that is known by construction to exist.

The example illustrates the possibility that a particular letter might not be found at a particular position in T. In this case, it's b that is not found as the 1-st letter of T. The method of partitioning never looks for the b that's not there.

Just like with Algorithm #1 and Algorithm #2, and remembering that Algorithm #1 and Algorithm #2 are both just special cases of Algorithm #3, Algorithm #3 proceeds to produce additional letters in Z, one letter at a time. The letters are produced left to right, requiring that S be partitioned one letter at a time left to right, and that T be partitioned one letter at a time but in an order mandated by the value of the letters in S. As the letters are produced in Z, the list of words in S and T will grow shorter and shorter. When the number of words in T is reduced to one, Algorithm #3 acts like Algorithm #2. When the number of words in T is reduced to one and the number of words in S is also reduced to one, Algorithm #3 acts like Algorithm #1.

In summary, Algorithm #3 operates on letters in S left to right, one letter at a time. It chooses a letter in the current left to right position in S and partitions S by that letter. Doing so chooses a position for a letter in T. It chooses the letter in that position from the first word in T and partitions T by the that letter. As it forms the partitions of S and T, letters appear left to right for Z. The letters for Z yield the values for ζ, γ, and χ as previously described. Algorithm #3 continues through all the letters and all the words in S and T in an iterative fashion until all the letters of all the words in Z are produced. As it visits a complete word in Z that has not been visited before, the visit bit is turned on for that word and the word is counted. The word represents a position, and the distance from Start for the position is k + j if S is P[k] and T is P[j].

Calculating Alphabets and their Indexings Very Quickly

As it adds letters to a word ζi, the program needs to calculate new alphabets ωi and their respective indexings γi. γi is completely determined by ωi, and there are only 4,096 different alphabets ωi. For that reason, all 4,096 possible alphabets and their respective indexings are pre-calculated and stored during program initialization. The reason there are 4,096 possible alphabets is that an alphabet is determined by the cubies that have not yet been used, and there are 12 cubies. Therefore, there are 212 or 4,096 different ways to choose a set of cubies that have not yet been used. When the feature of pre-calculating all the possible alphabets was added to the program, a test run that previously required 33 hours was immediately reduced to 27 hours. That means that 7 of the 33 hours in the first test run were spent doing nothing else than calculating alphabets and their indexings. All that time was eliminated by pre-calculating the alphabets and their respective indexings.

## Comment viewing options

### Vector units

The 'multiplication' you are doing is an permutation and a toggeling of a bit. This can be done without pre-calculated lookup tables on CPUs when using a vector unit.
f.e.
The AltiVec haves an instruction where bytes can be permuted. One vector register is 128 bits (16 bytes) wide. The vperm instruction puts two v registers together as a 32 byte string and in the third v register is a 16 byte string with values between 0x0 and 0x1f. Theses values are index numbers pointing to bytes in the 32 byte string. The values are picked out an put in a fourth v register as result of the permutation. (The idea of vperm was to do fast left and right shifts for adjustment vector values loaded from memory/stored to memory, if they are not adjusted to 16-byte-boundaries in the memory)

The code would look like this:

v0=0x0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f0f (mask to get the index numbers and a dummy for index numbers 0x10-0x1f)
v1=0x80808080808080808080808080808080 (mask to get the flip bits of a cube)

v2=0x05018b0200090403060a87080f0f0f0f (fbxcajedgkti)
v3=0x008b02830104060708090a050f0f0f0f (axcpbeghijkf)

first 12 bytes are the edge cubes (used values: 0x0-0xb; highest bit of an byte is the bit which shows if the cube is flipped [0x80], last 4 contain invalid values

vand v4,v2,v0
vperm v5,v3,v0,v4
vand v6,v2,v1
vxor v7,v5,v6

v7 contains the result and would be 0x048b850200090183060a87080f0f0f0f

I do not know the vector units of other CPUs, so maybe vperm is unique to AltiVec. The last desktop computer with AltiVec sold, was the Apple Power Mac G5. The G5 CPU contains a very fast vector unit - so fast, that loading/storing from memory is the bottle neck. But it is more than 10 years old. Currently AltiVec can only be found in servers and super computers from IBM (POWER8 is the newest generation)

BTW: would not be P x P be faster than P x P, since you have less combinations to calculate?

### I have a friend who for years

I have a friend who for years has suggested to me machines that I might be able to afford at home that have high end graphics processors. He suggests that high end graphics processors can sometimes be accessed for purposes other than rendering graphics and that they might serve in a fashion similar to that which you describe for the AltiVec vector units. I just haven't had time to pursue such things.

On your comment about P x P vs. P x P, it is definitely the case that P x P is faster - or at least it is for my algorithm. There are two reasons.

One reason is that there are fewer products that way. For the edges only cube in the face turn metric, P contains 42,807 positions, and 42,807 × 42,807 is 1,832,439,249. On the other hand, for the same problem it is the case that P contains 87,801,812 positions and P contains 18 positions. 87,801,812 × 18 is 1,580,432,616. These kinds of calculations provide a sense of the "efficiency" of my algorithm. The actual value of P for the edges only on the face turn metric is 1,050,559,626. Therefore, P x P is about 57.3% efficient when it comes to the number of products that are not duplicated and P x P is about 66.5% efficient when it comes to the number of products that are not duplicated.

The other reason is that for the product P[i] × P[k], my algorithm is much more efficient for P[i] than it is for P[k]. That is, the algorithm has to partition both P[i] and P[k], but the letters which define cosets in P[i] are processed left to right and the letters which define cosets in P[j] are processed in more or less a random order. It's like in a dictionary finding quickly all the words whose first letter is E vs. finding quickly in a dictionary all the words whose third letter is E. The time to partition P[i] is virtually zero and the time to partition p[j] is not. Therefore, the algorithm is faster if P[i] contains as many positions as possible and if P[j] contains as few positions as possible.

### I see. I had the full cube in

I see. I had the full cube in mind and not the edges only. So it is clear that that way is faster. :)

I do not know much about GPUs, but afaik they are optimized for 3D graphic calculations which required a lot of Floatingpoint-Multiply-Add (vector) instructions. So they lack in integer manipulation/calculation.

I found an other CPU architecture having something similar like the AltiVec "vperm" instruction:
the SPARC VIS 2 subset haves 'bshuffle'. It works similar, but not on full 16 Bytes - it puts two 8 bytes register together and the mask reduces them into an third 8 byte register. Since the mask is inside a special register (GSR.mask) a full 16 [12 edges] Byte permutation requires reloading a new mask and a second 'bshuffle'. Also SPARC does this inside the floating-point registers while AltiVec is having own vector registers for that.

### Details of Symmetry and Anti-Symmetry in my New Program

Symmetry and Anti-Symmetry Introduction

The new program reduces the search space for the problem by both symmetry and anti-symmetry. The primary purpose of reducing the search space by symmetry and anti-symmetry is to reduce the computer time required to enumerate the search space. But it's also useful and interesting to know the number of positions at each distance from Start for the search space reduced by symmetry, and also the number of positions at each distance from Start for the search space reduced both by symmetry and anti-symmetry.

Reducing the size of the search space means reducing the number of distinct values of ζ in Z that must be calculated for the equation Z = S × T. Most of my previous programs have reduced S by symmetry, they have reduced T by symmetry, and they have reduced Z by symmetry. Reducing Z by symmetry has the obvious advantage of reducing the computer time required to enumerate the search space. But when S and T are reduced by symmetry, the computer time required to enumerate the search space is greatly increased. That's because elements of both S and T that have been reduced by symmetry have to be expanded to their original sizes at various points in the program. The expansion is done on the fly and has to be done over and over again. Therefore, a design goal of the new program was to avoid reducing S and T by symmetry. Doing so increases the memory requirements for S and T by about 48 times, but the improvement in processing speed is well worth the increase in memory.

None of my previous programs have reduced anything by anti-symmetry. That's because my approach to symmetry calculations has never lent itself very well to calculating anti-symmetry. But I have at last figured out a way to incorporate anti-symmetry into the program in a way that is compatible with the program's symmetry calculations.

Symmetry

We begin describing how the new program calculates symmetry and we defer anti-symmetry until later. We let the group of 48 symmetries of the cube be denoted by M. Two positions η and ζ in the Rubik's Cube search space are equivalent under symmetry if there exists an element μ in M such that η = μ-1 × ζ × μ. With respect to Rubik's Cube, the significance of two positions being equivalent under symmetry is that such positions are the same distance from Start.

Distance is a property of the Cayley graph of a group rather than being a property of the group itself, and the Cayley graph depends on a generating set. For example, if the generating set of the Rubik's cube is taken to be the set of quarter turns, then the position reached after the sequence of moves FF is two moves from Start. But if the generating set of the Rubik's cube is taken to be the set of quarter turns and the set of half turns, then the position reached after the sequence of moves FF is one move from Start because the same position can be reached after the sequence of moves F2. For a position ζ, we denote its distance from Start as |ζ|. Whether |ζ| is calculated in the quarter turn metric or the face turn metric, it remains the case that |η| = |ζ| when η = μ-1 × ζ × μ.

The equivalence class for ζ under symmetry is the set of all values of the form μ-1 × ζ × μ for all μ in M. We denote this equivalence class as ζM. My approach to symmetry has always been to select a representative element from each equivalence class under symmetry. The representative element is chosen as the first element of ζM in lexicographic order. The only positions that need be visited are the representative elements. There are about 1/48 as many representative elements as there are members of the overall group.

It can be a little jarring when first exposed to the standard way that symmetry calculations are performed on Rubik's Cube to realize that the calculation is μ-1 × ζ × μ rather than simply ζ × μ. The ζ × μ approach is what is typically used for symmetry calculations for such things as crystals, atoms, and molecules.

The reason for the difference is the color tabs on a Rubik's Cube. To preserve a particular position, a symmetry operation must preserve the color tabs. By contrast, consider the symmetries of a water molecule which has two hydrogen atoms and one oxygen atom. The symmetries for the water molecule follow the ζ × μ model. That's because the two hydrogen atoms are identical and can be exchanged by a symmetry operation without changing the appearance of the molecule. It's not as if there were one red hydrogen atom and one green hydrogen atom so that you could tell the difference after the hydrogen atoms were exchanged. Of course, the symmetries of a water molecule are not the same as the symmetries of a cube. But the important point is that a single multiplication by a symmetry suffices for a water molecule but the symmetry of Rubik's cube requires conjugation.

From a group theory point of view, a useful way to think of symmetries of Rubik's Cube is that if G is the entire Rubik's Cube group, then for any fixed symmetry μ it is the case that Gμ is a distance preserving automorphism of the group. Such an automorphism is distance preserving provided that conjugation by μ preserves the set of generators for the group. In the quarter turn metric and the face turn metric, we have chosen generators such that conjugation by μ does preserve the set of generators.

Indeed, we have to be careful to be sure that conjugation by μ does preserve the set of generators. For example, suppose that instead of generating G in the quarter turn metric as G = <Q> where Q is the set of 12 quarter turns, we generate G with the 10 quarter turns that remain if we omit B and B-1. The group is the same with these 10 generators, but distances and the Cayley graph are not the same. We can still use symmetry for the 10 generator Cayley graph, but the only symmetries we can use are those which preserve the set of 10 generators.

The program produces each element ζ of Z one letter at a time. We denote the value of ζ after the i-th letter has been produced as ζi. Hence each ζ appears as a sequence ζ0, ζ1, etc. through ζ11 as we add letters to the word. For each ζi that is produced, it is possible to perform a partial symmetry calculation, forming the value μ-1 × ζi × μ for as many values of μ in M as possible.

Such calculations are a bit tricky because there may be letters in μ-1 which are mapped to letters in ζi which have not yet been calculated. Therefore, there are certain combinations of value for μ-1 × ζi × μ where the product is not defined, or where fewer letters are defined for μ-1 × ζi × μ than for ζi itself. For example, suppose we have ζ0 = d???????????. If the 0-th letter of μ-1 is mapped to anything other than the 0-th letter of ζ then we cannot calculate the 0-th letter of μ-1 × ζ0 × μ. In such a case, we cannot compare the two words ζ0 and μ-1 × ζ0 × μ. It's as if you were asked to compare two words in the English language where the 0-th letter of one of the words were d and the 0-th letter of the other word were unknown.

Nevertheless, there are many cases where the program can make a meaningful symmetry calculation after a very limited number of the letters of ζ have been calculated, often after ζ0 or ζ1. For example, it turns out that for the way the program labels the color tabs, there is one μ in M such that the 0-th letter of μ-1 is mapped to the 0-th letter of ζ0 and such that ζ0 = d??????????? is mapped to b?????????? by μ-1 × ζ0 × μ. Therefore, no word beginning with the letter d can ever be a symmetry representative. In other words, even though μ-1 × ζi × μ might be undefined for many or most values of μ, if there is at least one value of μ where μ-1 × ζi × μ is defined and if μ-1 × ζi × μ precedes ζi in lexicographic order, then ζi is not a representative element.

Therefore, the determination that a partial word ζi beginning with a particular letter or particular string of letters can never be a symmetry representative can often be made with very few letters and by checking very few values of μ. It can sometimes be sufficient to test just one value of μ provided the correct value of μ is chosen. To that end, the program builds some lookup tables for M where the tables include a great deal of information about which values of μ should be tested after ζ0  is calculated, after ζ1  is calculated, etc. These lookup tables serve to minimize the number of μ values that have to be tested and greatly to speed up the program's symmetry calculations.

When a partial word of the form ζ0 is tested to determine whether it is a symmetry representative or not, the only possible answers are no and maybe. If the answer is no, the respective visit bit in υ0 for the partial word ζ0 is turned on and there is nothing else to be done with this particular ζ0. The program proceeds to the next possible letter of ω0 to be the 0-th letter of ζ0. If the answer is maybe, then additional letters of ζ0 have to be calculated for the current partial word. After an additional letter is calculated, the program has ζ1. When a partial word of the form ζ1 is tested to determine whether it is a symmetry representative or not, the only possible answers are no and maybe. If the answer is no, the respective visit bit in υ1 for the partial word ζ1 is turned on and there is nothing else to be done with this particular ζ1. The program proceeds to the next possible letter of ω1 to be the 1-st letter of ζ1. If the answer is maybe, then additional letters of ζ1 have to be calculated for the current partial word.

The program continues this process for each ζi until the answer is no or until it reaches ζ11. When ζ11 is tested, there will not be a maybe and the answer will be a definitive yes or no. Only if the answer is yes for ζ11 can the program declare that ζ is a symmetry representative. If so, the program counts the position and marks the position as visited. If testing ζ11 yields a no, the position is a not a symmetry representative. The program marks the position as visited without counting it.

When the program is processing positions without taking symmetry into account, it can stop processing after the 10-th letter because there is only one choice remaining for the 11-th letter and the 11-th letter therefore does not have to be processed. When the program it is taking symmetry into account, it remains the case that there is only one choice remaining for the 11-th letter but nevertheless the 11-th letter must be processed. The symmetry calculation to determine that a position definitely is a symmetry representative is not complete and correct until all the letters have been processed.

At the time a visit bit is turned on, it does not matter if the visit bit is turned on for a position that is symmetry representative or if it is turned on for a position or coset that is not a symmetry representative. In either case, the position or coset should not be visited again. To that end, the same backtracking procedure is performed for the visit bits when the program is taking symmetry into account as is performed when not taking symmetry into account. That is, visit bits are turned on when all the positions in their respective cosets have been visited, all the way back to the 0-th letter or to when not all the positions in a coset has been visited. And similarly, when the program is moving forward and calculating new letters for new positions, it tests the visit bit associated with each new letter as it is calculated. If the visit bit is on, the position or coset is abandoned even before any additional symmetry calculations are performed. Because about 47/48 of the positions in the search space are not symmetry representatives, the function of the visit bits in speeding up the program is therefore more efficacious when symmetry is taken into account than symmetry it is not taken into account.

When a position ζ is a symmetry representative, it is counted when it is first visited. In addition, the total number of positions |ζM| in its equivalence class is counted at the same time. The total number of positions in its equivalence class depends on the number of symmetries associated with the position. For a position ζ, its symmetries are the set of all μ in M such that ζ = μ-1 × ζ × μ which we denote as Symm(ζ). The program's process of determining if a position ζ is a symmetry representative or not develops the set Symm(ζ) with no additional calculations required. Symm(ζ) is a subgroup of M and there are 98 different subgroups of M. The program has a lookup table for the 98 different subgroups. The lookup table contains a variety of useful information about each subgroup of M, including its order. The order of Symm(ζ) divides 48, and we may calculate the size of the equivalence class under symmetry for ζ as |ζM| = 48 / |Symm(ζ)|.

For most positions ζ it is the case that the identity is the only symmetry. If so, then |Symm(ζ)| = 1 and |ζM| = 48. Such positions provide the maximum possible reduction in the size of the search space. A relatively few positions ζ have more symmetries than just the identity. If so, then |Symm(ζ)| > 1 and |ζM| < 48. That is why the use of symmetry reduces the size of the search space by approximately 48 rather than by exactly 48.

Pre-calculation of Symmetry

Because of the way the equation Z = S × T is processed by the program, some of the symmetry calculations may have to be repeated numerous times. When testing any particular ζi determines that it is not a symmetry representative, the visit bit associated with the particular ζi is turned on and its symmetry never has to be tested again. When the result of the testing is maybe, the visit bit is not turned on (or at least, it is not turned on immediately). Therefore, the same test may have to be repeated many times on the same ζi. In particular, the same test will have to be repeated each time the processing of the Z = S × T equation reaches that particular ζi until the visit bit is turned on for that particular ζi. Even though the search space is reduced by about 48 times by symmetry, it is the case that about 20% to 30% of the CPU processing consumed by the program is taken up by symmetry testing.

Therefore, the program has an option to pre-calculate symmetry at the beginning of the processing of each coset. During the pre-calculation of symmetry, all positions ζi in the coset are examined for symmetry without regard to which σ and τ values might be multiplied together to form ζi. The positions ζi in the coset are considered one letter at a time in a left to right fashion so that any particular ζi that cannot be a symmetry representative serves as a cutoff point and no further letters have to be examined for that particular ζi. When a cutoff point is encountered, the visit bit for the associated ζi is turned on. Any visit bits that are not turned on by the pre-calculation of symmetry therefore represent values of ζi where the testing for symmetry representatives is known to result in a maybe. Because the results of symmetry testing for all positions are known and are stored in the visit bits when symmetry is pre-calculated, symmetry calculations do not have to take place while processing ζi values for the Z = S × T equation. When symmetry is pre-calculated in this manner, the amount of CPU time spent calculating symmetry is less than 1% of the overall CPU time.

When symmetry is pre-calculated as described in the previous paragraph, there remains one problem. Namely, the program needs to know the size of the equivalence class under symmetry that is associated with each position in order to count the number of positions not reduced by symmetry. When symmetry is pre-calculated, the symmetries associated with each position could be stored until needed later but the storage of the symmetries for each position would require a great deal of memory. For the vast majority of positions it is the case that Symm(ζ) contains only the identity symmetry. Therefore, during the pre-calculation of symmetry the program creates a table of positions and their respective symmetries, but only stores the positions and respective symmetries for those positions which have more symmetries than just the identity symmetry. During the processing of products for the Z = S × T equation, a lookup is performed for each product ζ = σ × τ in the positions and symmetries table. If the product is not found in the table, then its only symmetry is the identity symmetry. Otherwise, the product's symmetries can be determined from the lookup table.

Anti-symmetry

Anti-symmetry is the word used to describe the reduction in the search space that can be achieved by the fact that |ζ-1| = |ζ|. In turn, that means that |μ-1 × ζ-1 × μ| = |ζ-1| = |ζ| for all μ in M. The net result is that the size of the search space can be reduced by about 96 times if both symmetry and inverses are taken into account.

In my history of working on Rubik's Cube, I have had difficulty including anti-symmetry in my calculations. The problem has been that as letters in a product appear left to right in ζ0, ζ1, ζ2, etc., the corresponding letters do not appear left to right in ζ0-1, ζ1-1, ζ2-1, etc. That makes it tricky to compare any of ζi-1 to ζi.

Such comparisons are possible. For example, suppose ζ0 = e???????????. Then it is the case that ζ0-1 = ????a???????. What we then need is one or more elements μ in M such that the 0-th letter of μ-1 is mapped to the 4-th letter of ζ0-1. In general, such values of μ exist for ζ0-1, ζ1-1, ζ2-1, etc. But it has always seemed to me that the efficient processing of the μ values in M for ζi-1 would be extraordinarily difficult. I finally figured out how to do it. But before writing the complicated computer code that would be required, I decided on a much simpler approach. In light of the efficiencies obtained by pre-calculating symmetry as previously described, the simpler approach is about as efficient as is the more complicated approach would have been.

No matter the exact algorithm for calculating anti-symmetry, the approach is to take advantage of the fact that ζM and (ζ-1)M are either identical or disjoint. To make the determination of which is the case, we need only compare the first element of ζM in lexicographic order to the first element of (ζ-1)M in lexicographic order. Our calculation of symmetry provides us with the first element of ζM in lexicographic order which we use as the representative element of ζM. Our process for ζ-1 does not actually give us the first element of ζM in lexicographic order. Rather, we compare the representative element of ζM in turn with μ-1 × ζ-1 × μ for each μ in M. When making such a comparison, there are three cases.

• μ-1 × ζ-1 × μ < ζ
• μ-1 × ζ-1 × μ = ζ
• μ-1 × ζ-1 × μ > ζ

These calculations are performed after all the letters of ζ have been calculated and after ζ has been determined to be a symmetry representative.

• In case #1, ζ is not a combined symmetry/anti-symmetry representative. The visit bit for ζ is turned on without the position being counted, and no additional values of μ have to be tested.
• In case #2, ζ is a combined symmetry/anti-symmetry representative. The visit bit for ζ is turned on and counters are updated. The counters reflect that ζM and (ζ-1)M are the same set. In other words, ζ is a representative for up to 48 positions. No additional values of μ have to be tested.
• In case #3, there is nothing else to do with this μ and we proceed to the next μ. If case #3 persists for every μ in M, ζ is a combined symmetry/anti-symmetry representative. The visit bit for ζ is turned on and the counters are updated. The counters reflect that ζM and (ζ-1)M are disjoint sets. In other words, ζ is a representative for up to 96 positions.

In practice, case #2 is a very fast calculation and occurs a small percentage of the time. Cases #1 and #3 each occur slightly less than half the time. Case #1 is usually a very fast calculation because usually only one or two values of μ have to be tested. Case #3 requires testing all 48 values of μ in M, but for each μ it usually requires testing only one or two letters of ζ and ζ-1.