More details about my new program
Introduction
On 02/23/2016, I posted a message about a new program I had developed that had succeeded in enumerating the complete search space for the edges only group. It was not a new result because Tom Rokicki had solved the same problem back in 2004, but it was important to me because the problem served as a testbed for some new ideas I was developing to attack the problem of the full cube. I am now in the process of adapting the new program to include both edges and corners. In this message, I will include some additional detail about my new program that was not included in the first message.
Review
First, here is a little review of what I have already posted. Let P[k] be the set of all positions of length k. The program first calculates and stores P[0], P[1], P[2], etc. until there is not enough memory to proceed further. To calculate the edges group in the face turn metric, the program is able to calculate and store through P[7]. Having done so, the program calculates P[8] as P[8] = P[7] × P[1] without storing P[8], the program calculates P[9] as P[9] = P[7] × P[2] without storing P[9], etc. and ends when it calculates P[14] as P[14] = P[7] × P[7] without storing P[14]. There is nothing novel about this approach and I think it's about the same way that everybody does it.
The key problem with this approach it to keep track of which positions have been visited so that they can be counted only once. For many years my approach has been to produce positions in lexicographic order. For the new program, I have changed to having a bit array with one bit per position, and with the bit for a given position being turned on when that position has been visited. Again, there is nothing novel about this approach and I think it's probably the way that most everybody else does it. For the edges group, there are nearly 10^{12} positions which is too many bits for desktop machines. But I will begin the description with the fiction that all the bits can be stored in memory at the same time. It will be a straightforward matter later on to describe the approach to partitioning the problem into cosets that are small enough that all the bits for a given coset actually can be stored in memory at the same time.
The program operates on permutations of the cube on a color tab by color tab basis. The program considers the 24 color tabs for the edges to be labeled from a to x. Each cubie is considered to have a primary color tab and a secondary color tab. The primary color tabs are labeled from a through l, and the secondary tabs are labeled from m through x. The secondary color tab for each cubie is labeled with the letter from the primary color tab shifted to the right 12 letters in the alphabet. So the a and m color tabs are on the same cubie, the b and n color tabs are on the same cubie, etc. A cubie is represented by its primary color tab when it is not flipped and by its secondary color tab when it is flipped. Therefore, each position is represented by a string of 12 letters from an alphabet of 24 letters. We refer to these strings of 12 letters as words. We consider the letters in the words to be the 0-th through the 11-th letters rather than the 1-st through the 12-th letters.
To form products, the program forms the words that make up the products one letter at a time from left to right. A substring of letters at the beginning of a word has the effect of defining a coset. As the program adds letters to a word, the letters define a sequence of nested cosets. After the 0-th letter has been added to a word, the program will have defined a 1-letter coset. After the 1-st letter has been added to a word, the program will have defined a 2-letter coset. Etc. This method of counting can seem a little confusing, but it is very useful in practice.
The aforementioned nested cosets are based on stabilizer subgroups. The subgroup H_{0} fixes the 0-th letter. The subgroup H_{1} fixes the 0-th and 1-st letters. Etc. Each 1-letter coset is a right coset of H_{0} and is defined by its 0-th letter. Each 2-letter coset is a right coset of H_{1} and is defined by its 0-th and 1-st letters. Etc.
Alphabets and Indexings
In several sections that follow, there will be discussions of alphabets and indexings for those alphabets. The motivation for these alphabets and indexings is to calculate a bijection between the permutations of the cubies and the set of integers from 0 to |G| - 1. The integers in turn are used as indexes into the bit arrays which remember which positions have been visited and which sub-cosets have been completed. There is a more traditional way to calculate such a bijection, but the approach with alphabets ad indexings is faster for this particular problem than is the traditional approach.
Three Algorithms
Within this basic scheme, the program implements three algorithms. The fastest of the three algorithms we call Algorithm #3. In limiting conditions, Algorithm #3 reduces to a somewhat slower and simpler algorithm we call Algorithm #2. In limiting conditions, Algorithm #2 reduces to a somewhat slower and even simpler algorithm we call Algorithm #1. The limiting conditions occur rather frequently, and all three algorithms have many elements in common. Indeed, the three algorithms are a single integrated piece of computer code which is a single integrated algorithm. We break the single integrated algorithm down into three algorithms only for ease of exposition. We start with Algorithm #1.
Algorithm #1 ζ = σ × τ
Consider the calculation of P[8] as P[8] = P[7] × P[1]. We speak more generally of calculating a set of permutations Z as Z = S × T. Hence, P[8] = P[7] × P[1] is a special case where Z = P[8], S = P[7], and T = P[1]. In the general case, Algorithm #1 forms Z as the set of all products of the form ζ = σ × τ for each σ in S and for each τ in T, yielding each ζ in Z. We use Greek letters for the set elements rather than z = s × t simply to avoid any confusion with the letters making up the permutations. Algorithm #1 forms the products σ × τ without any particular regard for the order in which the σ values are chosen from S and the τ values are chosen from T. The products are formed left to right and one letter at a time for each of the 12 letters in ζ and certain processing is performed after the calculation of each letter of ζ.
We choose examples for σ and τ such that σ = fbxcajedgkti and τ = axcpbeghijkf. Hence, we set up our product ζ = σ × τ as
ζ σ τ
???????????? = fbxcajedgkti × axcpbeghijkf
We highlight the letters we need to look at to calculate the 0-th letter of ζ as
ζ σ τ
???????????? = fbxcajedgkti × axcpbeghijkf
Hence, we have
ζ σ τ
e??????????? = fbxcajedgkti × axcpbeghijkf
The general rule for forming the products is that position of the letter in the word for ζ has to match the position of the letter in the word for σ, the position of the letter in τ has to match the value of the letter in σ, and the value of the letter in ζ has to match the value of the letter in τ. Because only 12 color tabs are stored instead of 24, if a color tab in σ is in the second half of the alphabet, it maps to a color tab that is not stored in τ. When that happens it means that the letter in σ represents a flipped cubie. Therefore, the unflipped color tab for that cubie in σ is mapped to a color tab in τ and the color tab found at that position in τ is flipped to its other color tab to produce the value for ζ.
Before proceeding to calculate any of the additional letters, the program perform some calculations on the partial product e???????????. The 0-th letter in the product is e and in general it could have been any of a through x. We consider the 0-th letter of the product to have been taken from the alphabet ω_{0} where
ω_{0} = a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x
The definition of an alphabet is a finite sequence, so an alphabet is not just a set. Rather, the definition of an alphabet produces an ordering on the elements of the alphabet. At this point in the calculation, the program has the alphabet ω_{i} where i =0. We also define a master alphabet ω where the master alphabet is
ω = a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x
and where ω_{i} = ω at this point. However, in general it will not be the case that ω_{i} = ω.
We also define an indexing on the alphabet ω_{0} with respect to the master alphabet ω. The indexing γ_{0} of the alphabet with respect to the master alphabet ω is the array
γ_{0} = 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23
That means that the letter a in ω_{0} is the 0-th letter of ω, the letter b in ω_{0} is the 1-st letter of ω, etc. As the program gets further into the calculation of the product, the current alphabet ω_{i} will change as each new letter is added to the product but the master alphabet ω will not change. Therefore, the indexing γ_{i} of the current alphabet ω_{i} will change as each new letter as added to the product.
The 0-th letter of ζ = e??????????? is e, and γ_{0}(e) = 4. That means that e is the 4-th letter of the master alphabet ω. There are two useful interpretations of the index γ_{0}(e). The more traditional interpretation is that we may consider 4 to be a coefficient of a mixed based number χ. The mixed based number χ indexes the product ζ within the entirety of the edges group G. If |G| is the order of the edge group G, the mixed base for the 0-th coefficient is |G| / 24 and the contribution of the coefficient 4 to the overall number χ(ζ) is 4 × |G| / 24.
Instead, our interpretation for γ_{0}(e) = 4 will be as an index into cosets. The 0-th letter of ζ partitions the overall edges group G into 24 cosets. We define an indexing χ_{0} where χ_{0} = γ_{0}. For subsequent letters in ζ, the indexing χ_{i} will be calculated from γ_{i} but it will not equal γ_{i}. For the 0-th letter in ζ, because γ_{0}(e) = 4, we are in the 4-th 1-letter coset of G. Informally, we may think of the 24 different 1-letter cosets as the a coset, as the b coset, and so forth through the x coset. For 1-letter cosets, the program has a bit array υ_{0} of 24 bits, one bit per coset. Having set χ_{0} equal to γ_{0}, χ_{0} serves as an index into υ_{0}. The bit in υ_{0} for each coset will be turned on when every position in that coset has been visited. When all the bits in υ_{0} have been turned on, the program will know that the problem has been completed.
As we begin our calculations for the second letter in ζ, we may observe that we have a new alphabet
ω_{1} = a,b,c,d,f,g,h,i,j,k,l,m,n,o,p,r,s,t,u,v,w,x
ω_{1} contains 22 letters rather than 24 letters. It omits e and q because we have already used the e-q cubie.
The indexing of the alphabet ω_{1} with respect to the master alphabet ω is
γ_{1} = 0,1,2,3,~,4,5,6,7,8,9,10,11,12,13,14,~,15,16,17,18,19,20,21
In other words, there are still 24 items in the list for the indexing γ_{1} even though there are only 22 items in the alphabet ω_{1}. The letters e and q are not indexed because they are no longer in the alphabet. The reason for this indexing model rather than defining γ_{1} as
γ_{1} = 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21
is to enable γ_{1} to be referenced directly by the elements of ω_{1} rather than needing to perform some sort of search of ω_{1} to discover the location of each letter in the alphabet.
With that in mind, we can calculate the second letter of ζ as
ζ σ τ
ex?????????? = fbxcajedgkti × axcpbeghijkf
x is the 21-st letter of ω_{1} which we can determine by looking at the 23-rd entry of γ_{1} and where x is the 23-rd letter in the master alphabet ω. Again, there are two useful interpretations of this index. The more traditional is that we may consider 21 to be the coefficient of a mixed based number χ where the base for this digit is |G| / (24 × 22) and the contribution of the coefficient 21 to the overall number χ(ζ) is 21 × |G| / (24 × 22).
Our interpretation for the index γ_{1}(x)=21 will be that the index partitions the e coset of the edges group G into sub-cosets where each sub-coset is 1/22 of the e coset. When the 1-st letter of ζ is x, we are in the 21-st of the sub-cosets of the e coset. Informally, we may think of the cosets which are sub-cosets of the e coset as the ea coset, as the be coset, and so forth through the ex coset. There are 24 × 22, or 528 different 2-letter cosets, which we index from 0 to 527. Rather than thinking of γ_{1}(x)=21 as a mixed base coefficient for the overall number which indexes the position ζ within the overall edges group G, we use γ_{1}(x)=21 to calculate an index from 0 to 527 among all the 2-letter cosets. The calculation is χ_{1} = 24 × χ_{0}(e) + γ_{1}(x). In this case the calculation yields 117, meaning that the ex coset is the 117-th out of 528 different 2-letter cosets. The index χ_{1} = 117 is therefore an index into the 528 different 2-letter cosets.
For ω_{1}, we will have a bit array υ_{1} of 528 bits, one bit per coset. The bit in υ_{1} for each coset will be turned on when every position in that coset has been visited. We may think of the 528 bits in υ_{1} as 24 strings of 22 bits each. Each string of 22 bits corresponds to a 1-letter coset. When all 22 bits in υ_{1} for a particular 1-letter coset have been turned on, we will know that particular 1-letter coset has been completed and we turn on the bit in υ_{0} for that particular 1-letter coset. At the same time we are turning on a bit in υ_{0}, we check to see if all the bits in υ_{0} are turned on to determine if the problem has been completed.
The same general process continues for each letter α_{i} in ζ. We have a new alphabet ω_{i} for each letter and a new indexing γ_{i} for each letter. We have a new bit array υ_{i} for each letter. As we add letters, the bit arrays υ_{i} contain respectively 24 bits, then 24 × 22 bits, then 24 × 22 × 20 bits, etc. For each new letter α_{i}, we determine γ_{i}(α_{i}) and determine an indexing χ_{i} to index the bit arrays. The calculation is χ_{i} = (24 − 2 × (i − 1)) × χ_{(i − 1)} + γ_{i}(α_{i}). The calculation is much simpler than it looks and is very fast. The part about (24 − 2 × (i − 1)) reduces to 24 for the 1-st letter, to 22 for the 2-nd letter, to 20 for the 3-rd letter, etc. It doesn't actually have to be calculated because it's simply the number of letters in the current alphabet ω_{i}.
The next to last letter is the 10-th letter. After the 10-th letter, χ_{10} will be an index into the collection of 11-letter cosets, but the 11-letter cosets each contain only one element of the edge group G. Therefore, we may interpret χ_{10} as index into the group elements themselves.
The calculation χ_{i} = (24 − 2 × (i − 1)) × χ_{(i − 1)} + γ_{i} is applied recursively from the 1-st letter out through the 10-th letter. The result for χ_{10} is the same as would be calculated for χ(ζ) by treating all the γ values as mixed based coefficients.
The last letter is the 11-th letter. There is only one way to choose the 11-th letter, so it need not enter into the indexing calculations. If we did choose to calculate χ_{11}, it would have the same value as χ_{10}. Therefore, when the program reaches the 10-th letter, it tests the bit indexed by χ_{10}. If the bit is not on, it is turned on and the position is counted. If the bit is on, the position is a duplicate position. The program backtracks without counting the duplicate position.
The duplicate position performance problem
Strictly speaking, the program would not need any bit arrays other than υ_{10} because it is υ_{10} that keeps track of which positions have been visited and which have not. The reason for the bit arrays after each letter is that the program runs into a severe performance problem towards the end of the search because at that point a very high percentage of positions are duplicate positions and calculating them is a waste of time. The bit arrays after each letter attempt to detect duplicate positions after the calculation of as few a number of letters in ζ as possible.
Therefore, as it turns on a visit bit in υ_{10} for an 11-letter coset, the program determines if all 4 of the positions associated with the containing 10-letter coset have been visited. If so, it turns on the respective coset completed bit in υ_{9} for the containing 10-letter coset. As it turns on a coset completed bit in υ_{9} for a 10-letter coset, the program determines if all 6 of the cosets associated with the containing 9-letter coset have been completed. If so, it turns on the respective coset completed bit in υ_{8} for the containing 9-letter coset. The program follows the same procedure all the way back to the 0-th letter. It stops the backtracking procedure of turning on visit bits for larger and larger cosets when every position has been visited or when not all the containing cosets at a particular level in the backtracking procedure have been completed.
In turn, as it is moving forward and calculating additional letters for each word ζ, the program tests the visit bit or coset completed bit associated with the coset indicated by the letter most recently calculated. If the bit is on, the partial product is abandoned and the program tries the next letter in the alphabet as the rightmost letter in the current substring of letters at the beginning of the word. This process has very little effect on the performance of the program in the early part of the enumeration of all the positions in a coset. But towards the end of the enumeration of all the positions in a coset, the performance of the program is greatly enhanced because duplicate positions are detected before all of their letters are calculated.
Partitioning the problem into cosets
As previously mentioned, 10^{12} visit bits are too many bits for the memory of typical desktop machines. Partitioning the problem into 1-letter cosets requires about 10^{12}/24 visit bits which is still too many for a typical desktop machine. Partitioning the problem into 2-letter cosets requires about 10^{12} / (24 × 22) visit bits. This is about 0.3GB which will fit on most desktop machines. Hence, the program is designed to be able to partition the problem into 2-letter cosets or into 3-letter cosets etc. through 10-letter cosets. The fewer letters defining the cosets, the larger the cosets and vice versa. Most typically, I run the program with large cosets, for example with 2-letter cosets to 4-letter cosets. When symmetry is not considered by the program, the fastest run times are with the largest cosets so I usually run the program with 2-letter cosets. When symmetry is considered by the program, the fastest run times for the program are with 4-letter cosets.
The accommodations that Algorithm #1 makes when the problem is partitioned into cosets are as follows.
- The number of visit bits is reduced appropriately.
- For each σ × τ product, if the product is not in the current coset after the current letter in ζ is calculated, the product can be abandoned and the program can go on to the next product. Initiating a product and abandoning the product when it proves to be in the wrong coset is an inelegant procedure. But Algorithm #2 and Algorithm #3 make sure that products are in the current coset. Algorithm #1 is not a free standing algorithm but rather is only a part of Algorithm #2 and Algorithm #3. Therefore, the program doesn't actually initiate products that are not in the current coset.
- Until the number of letters calculated exceeds the number of letters that define the coset, the coefficient γ_{i} is set to 0. The indexing χ_{i} is derived from γ_{i} and therefore also is set to zero until the number of letters calculated exceeds the number of letters that define the coset. For all subsequent letters, the indexing formula calculates the correct indexing values for the array of visit bits.
- The program is multi-threaded and each thread operates on one coset at a time. There is a function called next_coset that each thread calls to obtain the next coset to process. The counters and accumulators and totals for each coset are added together to obtain the overall totals for the entire search space. The multiple threads operate completely independently of each other except that a lock is used to serialize the next_coset function and a lock is used to serialize the adding together of the overall totals from each coset. The portion of time that the multiple threads hold locks is so small that in practice the multiple threads never lock each other out.
Algorithm #2 Z = S × τ
Algorithm #2 differs from Algorithm #1 in that it forms the i-th letter of the product for many different values of S at the same time. To see how this works, we revisit our product
ζ σ τ
e??????????? = fbxcajedgkti × axcpbeghijkf
We observe that we can calculate the 0-th letter of ζ by considering only the 0-th letter of σ without regard to the value of any other letter of σ. In this case, the 0-th letter of σ is f.
Z S τ
e??????????? = f??????????? × axcpbeghijkf
There may be many different words in S that begin with the same letter. If we could group these letters together somehow or other, then we could calculate the 0-th letter for perhaps millions of products S × τ in about the same time as is required to calculate the 0-th letter for a single product σ × τ. Serendipitously, the process of producing the sets of words P[k] places the words in lexicographic order. Therefore, the time required to determine the range of all the letters in S which start with a particular letter is very minimal and for all practical purposes we may treat this time as zero. It is not only the calculation of the 0-th letter of the product where we can calculate perhaps millions of partial products in the same time as producing a single product, it's also the alphabets ω_{i}, the coefficients γ_{i}, and the coset indexing χ_{i} that can be calculated for perhaps millions of different words all at the same time.
The processing described in the previous paragraph for calculating the 0-th letter of ζ is repeated in an iterative fashion for the 1-st letter, for the 2-nd letter, etc. Each time the process is repeated for the next letter, the range of S values being processed is reduced considerably in size. After the 0-th letter, the list will be about 1 / 24 of the original list. After the 1-st letter, the list will be about 1 / 22 of the reduced list or about 1 / (24 × 22) of the original list. Even with many millions in the original S list, the list becomes very short very quickly as letters are added to the product.
We may think of Algorithm #2 as producing the product Z = S × τ. Algorithm #2 is more efficient than is Algorithm #1, but the advantage of Algorithm #2 over Algorithm #1 becomes less and less as successive letters are calculated. That's because the number of words in S which have the same first two letters is less than the number of words in S which have the same first letter, the number of words in S which have the same first three letters is less than the number of words in S which have the same first two letters, etc. By about the 5-th or 6-th letter, the number of words in S which have the same letters at the beginning of the word is usually reduced to just one word and Algorithm #2 reduces itself to Algorithm #1. Nevertheless, it proves very useful to implement the program using Algorithm #2 rather than using Algorithm #1 because the overall program performance is better.
It's useful to consider what happens if the first letter of S takes on each possible value. For the first half of the alphabet as the first letter of S, here are the results.
Z = S × τ
a??????????? = a??????????? × axcpbeghijkf
x??????????? = b??????????? × axcpbeghijkf
c??????????? = c??????????? × axcpbeghijkf
p??????????? = d??????????? × axcpbeghijkf
b??????????? = e??????????? × axcpbeghijkf
e??????????? = f??????????? × axcpbeghijkf
g??????????? = g??????????? × axcpbeghijkf
h??????????? = h??????????? × axcpbeghijkf
i??????????? = i??????????? × axcpbeghijkf
j??????????? = j??????????? × axcpbeghijkf
k??????????? = k??????????? × axcpbeghijkf
f??????????? = l??????????? × axcpbeghijkf
For the second half of the alphabet as the first letter in S, we have to make an adjustment for the fact that we are storing only the 12 letters of τ rather than 24 letters. Therefore, we must infer the 12 letters we are not storing from the 12 letters that we are storing. The results are as follows.
Z = S × τ
m??????????? = m??????????? × axcpbeghijkf
l??????????? = n??????????? × axcpbeghijkf
o??????????? = o??????????? × axcpbeghijkf
d??????????? = p??????????? × axcpbeghijkf
n??????????? = q??????????? × axcpbeghijkf
q??????????? = r??????????? × axcpbeghijkf
s??????????? = s??????????? × axcpbeghijkf
t??????????? = t??????????? × axcpbeghijkf
u??????????? = u??????????? × axcpbeghijkf
v??????????? = v??????????? × axcpbeghijkf
w??????????? = w??????????? × axcpbeghijkf
r??????????? = x??????????? × axcpbeghijkf
If we process S in lexicographic order, Z will not be produced in lexicographic order. But because we have a bit array with one bit per position, we do not need to produce Z in lexicographic order. Nevertheless, most early versions of the new program did produce Z in lexicographic order. The early versions did so by processing S in an order that was not lexicographic and where the order of S was chosen so that Z would be produced in lexicographic order. The order required for S depends on τ, and for the particular τ value in this example, the required order for S to produce Z in lexicographic order is a, e, c, p, f, etc. You can determine the required S values simply by looking at the table above. The computer code to accomplish the same thing does so by calculating τ^{-1}. The best way to see why the τ^{-1} trick works is simply to rearrange the equation Z = S × τ into the equation Z × τ^{-1} = S and for a fixed τ^{-1} letting Z be an independent variable that varies in lexicographic order. The Z × τ^{-1} = S version of the equation determines the order in which S must appear.
The Z × τ^{-1} = S equation is what I have used for over two decades to produce products in lexicographic order, and I suppose it was only natural to continue to use this technique in the new program even though it was not strictly speaking necessary. I should emphasize that the Z × τ^{-1} = S equation alone is not sufficient to eliminate the need for a bit array to keep up with which positions have been visited and which positions have not. To eliminate the need for the bit array, it is necessary to produce all the products in lexicographic order, not just the products for a particular τ. And to produce the products in lexicographic order it is necessary to use the tournament of losers algorithm or something similar to it to combine the results of the Z × τ^{-1} = S equation for every different possible value of τ
When I developed the new program, one of the goals was to eliminate the tournament of losers algorithm because it is slow and it requires a great deal of memory. What I didn't realize until very late in the development of the new program is that the Z × τ^{-1} = S equation itself includes a performance problem even after eliminating the tournament of losers algorithm. The problem with the Z × τ^{-1} = S equation is fairly subtle. The equation requires constant looking up of the value of Z × τ^{-1} in S. In and of itself, that's ok and it's no worse than looking up all the words in S that begin with the same letter. The problem rather is that after enough letters of ζ are calculated - say five or six letters or so - the list of S values will have been reduced to a sub list containing very few elements. Under these circumstances, most Z × τ^{-1} values will not be found in S. And the program spends most if its time precisely in the portion of the search space where most Z × τ^{-1} values will not be found in S. So the program spends most of its time looking for S values that do not exist. Every program I have ever written which operated on the basis of producing positions in lexicographic order has suffered this performance problem without me ever realizing it. When I realized the problem and changed to using the Z = S × τ form of the equation, the speed of the program nearly doubled.
There was a method to the madness of using the Z × τ^{-1} = S equation in the early versions of the new program. Namely, if the letter calculated for S indicates a coset for which all the elements have already been visited (or for the 10-th letter, indicates a position that has already been visited), then no search for Z × τ^{-1} is required in S. In other words, don't try to go to a position that has already been visited. So early versions of the new program tested the visit bits in lexicographic order and only visited cosets or positions that had not already been visited. That sounds like a fairly optimal way for the program to work. Nevertheless, testing showed that using the Z = S × τ form of the equation before testing the visit bit is nearly twice as fast as using the Z × τ^{-1} = S form of the equation after the visit bit. And testing the visit bit first implies using the Z × τ^{-1} = S form of the equation.
Algorithm #3 Z = S × T
Algorithm #2 may be thought of as a generalization of Algorithm #1. Similarly, Algorithm #3 may be thought of as a generalization of Algorithm #2. Namely, instead of a single element τ of T in the equation Z = S × τ, we use all the elements of T that have the same letter in the same position that is mapped to by the chosen letter in S. For example, for the equation
Z S τ
x??????????? = b??????????? × axcpbeghijkf
we find all the τ values in T where the 1-st letter is x. In other words, the equation becomes
Z S T
x??????????? = b??????????? × ?x??????????
It is not as easy or fast to partition T in this manner as it is to partition S. In particular, the letters in S are used in a left to right fashion from a list of positions that is already in lexicographical order. The letters in T are used in a fashion where the position of the letter being processed jumps all around and the words are in no particular order with respect to the position of the letters being processed. Also, we must partition T without disturbing the lists of P[k] that are being used to represent S and T. Therefore, the program processes T by making a second copy of the P[k] list that is being used to represent T. The second copy of the P[k] list is a list of pointers to the items in P[k]. For example, to process the product P[7] × P[2], the program makes a list of pointers to all the items in P[2]. The list of pointers effectively becomes T and the program partitions T by manipulating the list of pointers.
For the equation Z = S × T, the program partitions the overall search space into cosets of Z to reduce the size of the array of visited bits. Having done so, the program operates in a multi-threaded fashion and each thread processes one coset of Z at a time. Generally speaking, there are as many threads as there are processor cores in the machine. That means that each thread has to have its own copy of the array of visited bits. In addition, each thread has to have its own copy of T. The threads can share S because S is static and is read-only, but each thread has to have its own copy of T because T is being reordered continuously.
Each position in each P[k] requires 28 bytes of storage. Each position in S therefore requires 28 bytes of storage but no extra storage is required for S in each thread other than the storage required for each P[k] in the first place. As 64 bit pointers, each T requires 8 bytes of storage and each thread has its own copy of the pointers. If the storage required for T should become a problem, the list of T pointers can be further partitioned into smaller lists which are sub-lists of the P[k] list which is being used to represent T, and each element of the partition of T can be processed separate from the other elements of the partition. Partitioning T in this manner can slow the algorithm down a little bit and it has not been necessary thus far, but the technique is available if memory used to store the T pointers in each thread becomes excessive.
Having created the list of T pointers, the list of T pointers is partitioned as required. In the example above, the list would be partitioned into a sub-list of pointers that point to all the T values where the 1-st letter is x and into a sub-list of pointers that point to all the T values where the 1-st letter is not x. The partitioning is an O(n) operation. Generally speaking, the size of the partitions after the 1-st letter is smaller than the number of partitions after the 0-th letter. The size of the partitions after the 2-nd letter is smaller than the partitions after the 1-st letter. Etc. Also, the sub-list of T values where the 1-st letter is not x is used to create subsequent partitions where the 1-st letter is other than x. Using the "not x" words to make the next partition for the next letter after x, the size of the list being partitioned is being reduced each time new partition is made even while still working on just the 1-st letter.
The previous paragraph needs further discussion. There are two different partitioning procedures that are used by the program for T. Suppose the program were processing 2-letters cosets of Z. While processing the 0-th and 1-st letters, one partitioning procedure would be used. The partitioning procedure would always produce particular letters in the product Z. While processing the 2-nd, 3-rd, and subsequent letters, a different partitioning procedure would be used. The partitioning procedure would produce all possible letters in the product Z.
We illustrate with words containing four letters and where the four letter words are taken from a four letter alphabet. We first consider the case where a letter in Z is required to have a particular value. To produce products where the 0-th letter of Z is required to be c, the partitioning would be as follows. The program first partitions S by its 0-th letter, in turn where the 0-th letter of S is a, then b, then c, and finally d. As it is partitioning S, the program partitions T into those words where the 0-th letter is c or not c, where the 1-letter is c or not c, where the 2-nd letter is were c or not c, and where the 3-rd letter is were c or not c.
Z = S × T
c??? = a??? × c???
c??? = b??? × ?c??
c??? = c??? × ??c?
c??? = d??? × ???c
In the program itself, at least the 0-th and 1-st letter of Z will be required to have a particular value when 2-letter cosets are in use, and additional letters in Z will be required to have a particular value when cosets with additional letters are in use. The processing for the additional letters that are required to have a particular value is the same as for the 0-th letter. When requiring a particular letter of Z to have a particular value, it will sometimes happen that the program is looking for something that doesn't exist. Using the example above, it's possible when S = d??? that there are no words in T of the form ???c. That situation cannot be completely avoided, but it is rare and if it happens it takes place in the domain of the search space where the program spends the least time.
When k-letter cosets are in use, letters after the first k are not required to have any particular value. In this case, each time T is partitioned it is partitioned according to the first word remaining in the list of T values. Even though at least the 0-th and 1-st letter of Z will be required to have a particular value, we illustrate the processing when a letter of Z can have any value by using the 0-th letter as the example. To produce products where the 0-th letter can be any letter, the partitioning would first choose in turn a, b, c, and d as the 0-th letter of S. So the first approximation would be as follows.
Z = S × T
???? = a??? × ????
???? = b??? × ????
???? = c??? × ????
???? = d??? × ????
We consider the case where S = b???. Therefore, we are partitioning T by the 1-st letter in each word, viz. T = ????. The list of words in T will have an initial word and the 1-st letter in the initial word in the list will have some value. Let's assume it's d, e.g., τ = bdac. Therefore, if we partition T into those words where the 1-st letter is d and where the first letter is not d, the partition element of those words containing d as the 1-st letter is known by construction not to be null. If instead we looked for a as the 1-st letter of T, then looked for b as the 1-st letter of T, etc., we might be looking for things that aren't there. This is the Algorithm #3 version of the idea from Algorithm #2 not to look for things that might not be there. Because Algorithm #2 is not a freestanding algorithm but rather is a special case of Algorithm #3, this is the place in the code where all three algorithms avoid looking for things that aren't there.
Depending on the order in which we discover letters as the 1-st letter in the first word remaining in T, we might end up partitioning T as follows when S = b???.
Z = S × T
d??? = b??? × ?d??
c??? = b??? × ?c??
a??? = b??? × ?a??
For the purposes of the example, we are supposing that we find that the first word in T has d as its 1-st letter. So we partition T into those words that do have d their 1-st letter and words that don't have d as their 1-st letter. We are then supposing that the first word that doesn't have d as its 1-st letter has c as its 1-st letter and we partition the remaining words in the list into those that do and don't have c as their 1-st letter. We continue in this manner until the list of T values is exhausted. Therefore, are always partitioning the list of T values by a word in the list that is known by construction to exist.
The example illustrates the possibility that a particular letter might not be found at a particular position in T. In this case, it's b that is not found as the 1-st letter of T. The method of partitioning never looks for the b that's not there.
Just like with Algorithm #1 and Algorithm #2, and remembering that Algorithm #1 and Algorithm #2 are both just special cases of Algorithm #3, Algorithm #3 proceeds to produce additional letters in Z, one letter at a time. The letters are produced left to right, requiring that S be partitioned one letter at a time left to right, and that T be partitioned one letter at a time but in an order mandated by the value of the letters in S. As the letters are produced in Z, the list of words in S and T will grow shorter and shorter. When the number of words in T is reduced to one, Algorithm #3 acts like Algorithm #2. When the number of words in T is reduced to one and the number of words in S is also reduced to one, Algorithm #3 acts like Algorithm #1.
In summary, Algorithm #3 operates on letters in S left to right, one letter at a time. It chooses a letter in the current left to right position in S and partitions S by that letter. Doing so chooses a position for a letter in T. It chooses the letter in that position from the first word in T and partitions T by the that letter. As it forms the partitions of S and T, letters appear left to right for Z. The letters for Z yield the values for ζ, γ, and χ as previously described. Algorithm #3 continues through all the letters and all the words in S and T in an iterative fashion until all the letters of all the words in Z are produced. As it visits a complete word in Z that has not been visited before, the visit bit is turned on for that word and the word is counted. The word represents a position, and the distance from Start for the position is k + j if S is P[k] and T is P[j].
Calculating Alphabets and their Indexings Very Quickly
As it adds letters to a word ζ_{i}, the program needs to calculate new alphabets ω_{i} and their respective indexings γ_{i}. γ_{i} is completely determined by ω_{i}, and there are only 4,096 different alphabets ω_{i}. For that reason, all 4,096 possible alphabets and their respective indexings are pre-calculated and stored during program initialization. The reason there are 4,096 possible alphabets is that an alphabet is determined by the cubies that have not yet been used, and there are 12 cubies. Therefore, there are 2^{12} or 4,096 different ways to choose a set of cubies that have not yet been used. When the feature of pre-calculating all the possible alphabets was added to the program, a test run that previously required 33 hours was immediately reduced to 27 hours. That means that 7 of the 33 hours in the first test run were spent doing nothing else than calculating alphabets and their indexings. All that time was eliminated by pre-calculating the alphabets and their respective indexings.