3x3x3 cube ignoring edge locations

Hello! (I am new to the group.)
I have done an analysis of the 3x3x3 Rubik's cube ignoring the locations of the edges (but not ignoring the orientations). That is, the locations and orientations of the corners were used, as well as the orientation of the edges. I used symmetry to reduce the number of positions from 180,592,312,320 to 3,772,354,560 (1,841,970 corner sym-coordinate values * 2048).
I have created files giving the distances for each of the positions of this problem space for the face-turn and quarter-turn metrics. I have listed the summary of the results I got below (where the numbers given are not symmetry-reduced). I was wondering if anyone has done this analysis before as I haven't been able to find any other such data on the internet to verify my results against.
face turn metric         positions        cumulative
distance  0f:                    1                 1
distance  1f:                   18                19
distance  2f:                  243               262
distance  3f:                3,090             3,352
distance  4f:               38,666            42,018
distance  5f:              480,654           522,672
distance  6f:            5,926,677         6,449,349
distance  7f:           71,470,701        77,920,050
distance  8f:          825,654,684       903,574,734
distance  9f:        8,503,518,654     9,407,093,388
distance 10f:       60,902,212,428    70,309,305,816
distance 11f:      106,280,694,040   176,589,999,856
distance 12f:        4,002,310,460   180,592,310,316
distance 13f:                2,004   180,592,312,320

quarter turn metric
distance  0q:                    1                 1
distance  1q:                   12                13
distance  2q:                  114               127
distance  3q:                1,068             1,195
distance  4q:                9,939            11,134
distance  5q:               91,668           102,802
distance  6q:              836,232           939,034
distance  7q:            7,515,984         8,455,018
distance  8q:           65,904,748        74,359,766
distance  9q:          545,333,600       619,693,366
distance 10q:        3,985,539,719     4,605,233,085
distance 11q:       22,921,454,744    27,526,687,829
distance 12q:       71,219,543,521    98,746,231,350
distance 13q:       66,815,043,888   165,561,275,238
distance 14q:       15,024,321,886   180,585,597,124
distance 15q:            6,715,196   180,592,312,320

Comment viewing options

Select your preferred way to display the comments and click 'Save settings' to activate your changes.

Pruning table comparison

I wrote a small program to simulate performing pruning table lookups for 100 million cube positions, based upon the distribution of the distance values in the table. The program shows the results for the distance table I generated for the face-turn metric as well as the results for the pruning table used in the Huge Optimal Solver in the Cube Explorer program by H. Kociemba. Note that the Cube Explorer program makes use of the pruning table 3 times per position, making that table significantly more effective than if only 1 lookup per position were performed. (The maximum value from the three lookups can be used as the pruning table value.)

So while it calculated the average value for the Huge Optimal Solver pruning table to be about 10.306, the effective value came out to be about 10.804. It is not clear to me whether or not the Huge Optimal Solver in Cube Explorer can and does incorporate a further technique attributed to Michiel de Bondt. If that is the case, my simulation indicates the effective average pruning table value for a cube position increases to 10.991. (The readme file distributed with the program appears to say it's only used with the program's standard solver, not the huge optimal solver.)

The pruning table I generated that considers the permutation of the corners, and the orientations of both the corners and the edges, comes out to an average distance of approximately 10.575. So although my table has a higher average distance, the pruning table used in the Huge Optimal Solver of the Cube Explorer appears to be somewhat better when you take into account how it is used. (Since Cube Explorer uses the face-turn metric, all numbers are in terms of that metric.)

I thought I would make one additional comment here. In the Huge Optimal Solver, the sym-coordinate used related only to edge permutation information (as compared to my program, where two raw cooordinates, corner permutation and corner orientation, are used in creating a sym-coordinate). I assume this greatly simplifies the translation from "raw" coordinates to "sym" coordinates. I believe if I had used only corner permutation information to create a sym-coordinate, the number of positions for my table would needed to have been 984*2187*2048 = 4,407,312,384, I believe. At 5 positions per byte, the amount of RAM for the main table would have been over 881 million bytes, about 127 million bytes extra, and I suspect it may have been a little too much for the program to run completely from RAM on a 1GB Windows XP machine.

Finally, I would like to thank other people for their prior work and articles/documentation that helped me in creating the program, including Mike Reid, Herbert Kociemba, Jaap Scherphuis, and others. Also thanks to Jerry and Claude for their comments.

- Bruce Norskog

3x3x3 cube ignoring edge locations

How did you store so many configurations ?
That would need 90Gbytes of storage !
How long did you take to run all this ?

Have you considered using this table as a base for
the heuristic function used IDE-A* search methods?

great work!


Re: 3x3x3 cube ignoring edge locations

> How did you store so many configurations ?
> That would need 90Gbytes of storage !

I used the "coordinate space" of size 1841970*2048 to reduce the number of values stored to 3,772,354,560. This requires about 1.886*(10^9) bytes of disk storage at 4 bits per element. This space uses symmetry in the corner cubies. (Note, the "real" number of positions, using symmetry of the edge orientations as well as the corners, according to brute-force counting by some code I added to the program, appears to be 3,769,762,452.)

Note that I used a certain edge orientation convention that allows use of the full 48 symmetries of the cube. Another edge orientation convention might provide a higher average distance, but using fewer symmetries would require a larger table.

> How long did you take to run all this ?

The face turn metric took about 16h 17m for the main loop to execute (including one extra iteration to verify no new positions were found). The initialization I believe was under 10 minutes. I had optimized the file I/O when I ran the quarter turn metric, and used an internal hard drive instead of an external USB2 drive. It took about 10h 20m to run the first fourteen iterations. (I initially did the quarter-turn metric in two steps, but found that I really didn't need to since all the remaining positions were reached in the 15th iteration.) I reran the quarter-turn metric in a single step, but I don't seem to have the timing information from that. I think the main reason the quarter-turn metric was faster is that there are only 12 moves instead of 18 to process for each position. (Note: I used a Pentium4, 2.4GHz, 1GB RAM).

Note, I am not mentioning time spent to build and save lookup table data that was done prior to the final "runs" that produced the results. The final runs loaded some of the lookup tables from disk files.

> Have you considered using this table as a base for
> the heuristic function used IDE-A* search methods?

Yes, I am working on incorporating this with some code of mine based roughly on the IDA* pseudo-code I found on Jaap's web site. (I am not quite sure if you mean exactly the same thing by IDE-A*.)

- Bruce Norskog

Simple optimal solver results

I have implemented a couple of simple optimal solvers that use the distance table I generated as a pruning table. (I used the face-turn metric distance table, although with a few changes to my code, I should be able to use the quarter-turn metric table.) The good news is that the correct optimal solutions were obtained for the test cases I tried, providing some additional validation to the distance tables. The programs are not any where near the performance needed to compete with optimal solvers already available.

The full distance table has a size in excess of 1.886E+9 bytes (about 1.9 GB), so it is too large to use as is in a computer with 1GB of RAM. I have used a couple of techniques to reduce the amount of memory used for the pruning table.

First, I used a technique I've experimented with where I used a "perimeter search" in combination with the IDA* algorithm, and a pruning table with reduced information. The second approach used a modulo 3 pruning table.

The first method essentially used an IDA* algorithm, except when the remaining search depth was reduced to a distance of 6, I used a table lookup to determine if the position was actually distance 6 from the solved cube state. This used a totally separate table that was a hash table storing all the positions of the cube of distance 6 (or less) from the solved state. As a result, a pruning table value in the range of 1 to 7 becomes essentially useless. This means I could reduce the pruning table to 3 bits wide (per position) and have the three bits represent the values 0, 8, 9, 10, 11, 12, and 13 with one bit-combination left over. (13 is the maximum value in my table for the face-turn metric.) But this would still require more than 1GB. I wanted to compress 5 pruning table values into a byte, so instead I used the equivalent of one ternary digit to represent only the values 0, 10, and 11. If the value in the original table is 0 to 9, 0 is used. 11 is used for any values of 11 or more in the original table. This unfortunately reduces the effectiveness of the pruning table. The increased memory for the "perimeter table" seemed to be enough to cause the computer to use the computer swap file to some extent, perhaps due to background processes waking up.

The 2nd approach uses a trick mentioned in the documentation of the Cube Explorer program by H. Kociemba. In memory, I store the distance value from the original table modulo 3. With this method, there is no effective loss of information in the pruning table, but the time it takes to determine the actual value is much longer. Generally, many lookups and cube move operations must be performed to arrive at the proper value. See the Cube Explorer documentation for more details on this method. My code for performing a basic move is not currently optimized (it translates a symmetry-reduced representation to a non-symmetry-reduced representation, performs the move in that representation, and then converts back to the symmetry-reduced representation), further increasing the time to calculate the proper pruning table value.

The table below gives a summary of results I obtained using three test cases using cubes of distances 14 to 16 from the solved cube. (The last test case using the second method was not run to completion, but progress information allowed me estimate what the total execution would have been.) It can be seen how the reduced effectiveness of the pruning table in the first method increased the number of nodes generated. The node count would have been much higher without the perimeter search were not employed. However, since the pruning table lookups were much faster with this method, the overall execution time was actually much less than with the second method.

(Note: the number of nodes is for the final iteration of the IDA* algorithm, and the final iteration continued to completion after the first solution was found to look for other possible optimal solutions. The execution times were for all iterations. Note that with the first method, a hash table lookup checked if the distance was less than or equal to 6, so the IDA* algorithm started with a search depth of 7.)

       first method (with perimeter search)   second method (with mod 3 table)
distance           nodes   execution time          nodes   execution time
 14            14,017,014           55s         6,305,748        4m  0s
 15           244,926,576       10m 41s       115,281,765    1h 11m  0s
 16         2,848,008,711    2h 03m 36s                 ?   13h (estimated)

The use of a 2nd pruning table that uses edge permutation information would probably help to improve the performance of the program, but for a machine with 1 GB of RAM, there is not a lot of memory to spare.

- Bruce Norskog

I doubt that anybody has done

I doubt that anybody has done this particular calcuation before. I certainly haven't. It's very nice work. Congratulations.

It would probably be useful if you could post the symmetry reduced numbers. I usually consider them more significant than then non-reduced numbers because I tend to think in terms of the "real size" of cube space rather than the "group size".

It would also be useful if you could describe a few more details of your program. The search space is pretty large because 1841970x2048 is 3772354560. I think the best you can do is 2 bits per position, storing the distance from Start mod 3, with one other bit pattern indicating that you haven't calculated the position yet. At that rate, you would need 943088640 bytes to store all the results, which is about a gigabyte. Were you able to keep all the results in memory, or did you have to use disk? I also assume that you would have to treat the "times 1841970" index as sort of an associative array, because the symmetry classes are so ill-behaved. I would assume you would treat the "times 2048" index as a bit string, with each bit representing flipped or not. It's 2^11 rather than 2^12 because the flip of the first eleven edge cubies dictates the flip of the twelfth cubie.

Where are the symmetry-reduced numbers?

My program output contained symmetry-reduced numbers, but the numbers were in relation to the "coordinate space" I used. My program did not calculate the "real" symmetry-reduced numbers.

There are "real" positions that do not map to a unique value in my coordinate space. I put in code to ensure that when a "new" position was found, any other position corresponding to the same "real" position was also marked "new." So it looks like to me I could have easily have generated the "real" symmetry-reduced numbers. It should not be necessary to rerun from scratch to get these numbers, so I think I can do this in the next day or so.

- Bruce Norskog

Symmetry-reduced results

Finally, I have "real" symmetry-reduced numbers for the 3x3x3 cube when corner locations and orientations are considered along with edge orientation using (UF,UB,DF,DB,LU,LD,RU,RD,FL,FR,BL,BR) to define the edge orientation reference framework. These results (and the results of my original post) are unconfirmed.

By "real" symmetry-reduced numbers, I mean that positions that are equivalent under one of the 48 cube symmetries are considered the same position. Anti-symmetry was not considered.

           "real" positions   cumulative total

face-turn metric
distance  0f              1              1
distance  1f              2              3
distance  2f              9             12
distance  3f             73             85
distance  4f            839            924
distance  5f         10,100         11,024
distance  6f        123,829        134,853
distance  7f      1,490,278      1,625,131
distance  8f     17,206,529     18,831,660
distance  9f    177,181,959    196,013,619
distance 10f  1,268,898,678  1,464,912,297
distance 11f  2,214,419,000  3,679,331,297
distance 12f     83,438,084  3,762,769,381
distance 13f             71  3,762,769,452

quarter-turn metric
distance  0q              1              1
distance  1q              1              2
distance  2q              5              7
distance  3q             25             32
distance  4q            218            250
distance  5q          1,934          2,184
distance  6q         17,530         19,714
distance  7q        156,835        176,549
distance  8q      1,374,168      1,550,717
distance  9q     11,364,530     12,915,247
distance 10q     83,047,697     95,962,944
distance 11q    477,575,701    573,538,645
distance 12q  1,483,874,938  2,057,413,583
distance 13q  1,392,144,562  3,449,558,145
distance 14q    313,070,169  3,762,628,314
distance 15q        141,138  3,762,769,452
- Bruce Norskog

I have verified these numbers

I have verified these numbers with a totally separate program. Great work, Bruce!

Re: I have verified these numbers

Thank you for verifying my results, Tom. So I assume you verified both QTM and FTM, since you didn't give any details about what you did.

By the way, have you considered trying this analysis with a different edge orientation convention, such as the one used in Mike Reid's program? That doesn't allow full M-symmetry to be used, but I speculate that it may result with a higher average distance. But it also has a cost of approximately 3 times the number of "real" positions.

- Bruce Norskog

Yep, did both QTM and FTM.

Yep, did both QTM and FTM.

I'm confused how a different edge orientation convention would change the average distance? Wouldn't it by necessity yield precisely the same numbers?

edge orientation convention and distance distributions

In the absence of sufficient information about which edge cubie is where, I believe it does make a difference what edge orientation convention is used. In this case, there is no information about the permutation of the edges. Try doing God's algorithm calculations where only edge orientation is considered (2048 positions). Try different edge orientation conventions (such as Reid's convention and an M-symmetric convention) and see if the results have the same distribution. (I just tried this myself, and if my quickly hacked program worked correctly, the results are close, but not exactly the same.)

I remember awhile back I tried duplicating Mike Reid's pruning table with my own code. I ran my program and compared the distribution of distances with those given by Reid's program. My results were different, so I assumed I must have had a bug in my program. After analyzing the discrepancy, I concluded that the reason my results were different was not because my program was working incorrectly, but because I was defining edge orientation differently. I believe I had totally correct results, but for a different subgroup. After changing the edge orientation convention, I got the same results. I suspect Mike Reid chose the edge orientation that he did because of the quality of the resulting pruning table.

- Bruce Norskog

sym-coordinates and edge orientation coordinates

My program needed to convert between "raw" coordinates and sym-coordinates (both directions). I used a procedure analogous to that used in Mike Reid's optimal solver program to do that, only I was dealing with 8 corner cubies whereas Mike's code was dealing with 4 edge cubies.

In my case, the arrays required over 700 million bytes. This was too much, of course. So I looked for patterns within the arrays, and eventually came up with a way to do the same conversions with 14 lookup tables using a total of about 48 million bytes. This could probably be improved, but I felt it was good enough.

After verifying my new conversion code produced the same results as the direct conversion tables for all possible values, I could eliminate the direct conversion tables from the program. This provided room for the "main array" of the program.

Yes, my edge orientation coordinate was an 11-bit number with a bit containing a 1 corresponding to a flipped edge. The MSB represented the UF edge position, and the LSB represented the BL edge position. The BR edge position was omitted from the 11-bit number.

- Bruce Norskog

I used less than 2 bits per position in memory, but ...

I was working with a computer with 1GB of memory, so I couldn't afford two bits per position in memory, and had to perform some file I/O at the end of each iteration. Here are the details.

I figured using two bits for each of the 3,772,354,560 elements would use up too much memory. I needed to find a way to use less than two bits. Packing 5 elements per byte would reduce the memory requirement for this data to less than 755 million bytes. This would mean I would have essentially a ternary digit of storage for each element. That is, each element could have a value of 0, 1, or 2. I came up with a way to make this work, using a 4-bit per element disk file, along with the large RAM array.

My program performs has a main loop that determines the elements at a distance one more than the previous iteration. Each iteration starts with a disk file containing the distances of each of the 3.772+ billion positions (except a distance value of 15 is used for any element for which the distance is still unknown) and the RAM array has all ternary digits being 0 except for those elements that have the maximum distance determined so far. Let's say the previous iteration determined the elements at distance d from the solved state. Then, the new iteration takes each element whose array value is 1, and computes all positions one move away (quarter-turn, face-turn, or whatever metric is used), and sets the corresponding array elements to the value 2 (except for those where the existing value was 1). This means the elements with an array value of 2 will be either distance d+1 or d-1 from the solved state. Thus, more "new" elements are generated than we really want. This is remedied by using the disk file.

After all the elements are processed for the iteration, the disk file from the previous iteration is read. If the disk file contains a value other than 15, the distance value is valid and is merely copied to the disk file for this new iteration. If the disk file contains the value 15, then the RAM array is checked. If the RAM array value is 2, the value d+1 is written to the new disk file. We know the distance in this case must be d+1, not d-1; otherwise the old file would have had the distance d-1 for it, and the value d-1 would have been copied to the new file. Also, the RAM array values are fixed up for the next iteration during this processing.

In summary, data for 5 positions were packed into a byte of memory, with the penalty of having to concurrently read one large file and write one large file at the end of each iteration. No random access files were used.

- Bruce Norskog

A note about the edge orientation

In my previous post (3x3x3 cube ignoring edge locations), I neglected to mention how I defined edge orientation.

My definition of edge orientation is based upon this list: (UF,UB,DF,DB,LU,LD,RU,RD,FL,FR,BL,BR).

The letter pairs indicate the various positions of the edge cubes in terms of the two faces of the cube the edge position is part of. The first letter of each pair is considered the "marked" face or reference face. For example, the "U" face is considered the reference face of the UF edge position. (Likewise the edge cubie with the "U" and "F" colors has the "U"-colored face as its reference face.) If the reference face for an edge position is aligned with the reference face of the cubie located in that position, then the cubie is considered aligned. Else it is considered flipped.

This convention for defining edge orientations allows all 48 cube symmetries to be applied without knowledge about which edge cubie is in what location. Applying a particular symmetry either preserves all the edge orientation reference faces, or flips all of them. So in either case, the edge orientation coordinate (after applying the symmetry) can be determined without regard to what cubie is in each position. This type of edge orientation convention was necessary for me to take full advantage of the 48 symmetries of the cube in order to reduce the number of effective cube positions to under 4 billion.

- Bruce Norskog

This note is remindful of som

This note is remindful of some messages that appeared on Cube-Lovers. Generally speaking, proofs of the conservation of flip involve "marking" one of the two faces of each edge cubie as you describe. I usually refer to such marking as establishing a frame of reference. For each each cubie, it really doesn't matter which face you mark as long as you are consistent thereafter. Hence, there are 2^12=4096 possible frames of references in which to describe the flip status of all the edge cubies. Your list (UF,UB,DF,DB,LU,LD,RU,RD,FL,FR,BL,BR) chooses one of the 4096 possible frames of reference.

Occasionally a proof of conservation of flip is offered that claims not to establish such a frame of reference. It may be only a matter of terminology, but it always seems to me that such proofs really do establish a frame of reference. It's just that with some proofs the frame of reference is stated very explicitly, and with other proofs the frame of reference is much more implicit. In particular, a fairly abstract proof for conservation of flip can be given involving a wreath product representation of the edges group, and a wreath product representation makes the frame of reference fairly implicit rather than explicit.

But I think the frame of reference always there. The issue is that when an edge cubie is in its home cubicle, it's easy to tell if it's flipped or not. But if an edge cubie is not in it's home cubicle, then you can't tell if it's flipped or not without a frame of reference.