Calculating Symmetry using Representative Elements

I have long been curious how other people who write cube programs and who incorporate symmetry into their programs actually do the symmetry calculations.  The general way I do it has been outlined in Cube-Lovers.

For a given position x, I calculate a representative element of its m-conjugates, where M is the standard Cube-Lovers terminology for the 48 symmetries of the cube.  I then store and manipulate only the representatives.

We denote this calculation as y = Rep(x) = min{xM} = min{xm | m in M}.  The minimum element of xM is taken to be the one that is first in lexicographic order.  And as usual, xm means m-1xm, and xM means {xm | m in M}.

The calculation seems simple enough. You calculate in turn
 
    (m0)-1xm0,  (m1)-1xm1,  ....  (m47)-1xm47
 
and keep the smallest conjugate as the representative.  But it's not a very fast calculations.  There are lot of loops within loops within loops.  Here are some things I have done through the years to make the calculations faster.

Suppose we are comparing two strings such as "cat" and "dog".  Any program would compare the "c" to the "d" and then stop.  It's only necessary to compare additional characters to break ties, as in comparing "cat" to "car".  In the later case, the tie would not be broken until you reached the third character of each string.

The situation with comparing x to mm is slightly different than the situation with comparing "cat" to "dog"  Presumably, in the case of "cat" and "dog", both strings already exist.  But in the case of x and mm, x presumably already exists but mm doesn't.  Rather, mm has to be calculated.

It's as if you were comparing the string "cat" to the unknown string "???" that first has to be calculated.  You could calculate the entire unknown string "???" as "dog", and then compare "cat" to "dog".  Or, you could calculate the unknown string one character at time and compare after each character.  So you could calculate the first character of "???" yielding "d??", then compare the "c" to the "d" and you are done.  But if you calculate the first character of "???" yielding "c??", then you have to go on and calculate the next character yielding "ca?" and so forth.  In any case, you calculate and compare in the same loop, you stop the loop after the first unequal compare, and there are usually many fewer characters to calculate that way.

Similarly, I don't calculate the entire permutation xm and then compare it to the permutation x.  Rather, I calculate each unitary partial permutation of xm in turn, and compare each unitary partial permutation of xm to the corresponding unitary partial permutation of x.

Here's an example.  The basic model for the cube in my programs is S24 x S24, with S24 acting on 0...23 rather than on 1...24.  A position for the corners might look something like
 
  x=(9 2 5 0 22 3 7 4 17 10 13 8 6 11 15 12 1 18 21 16 14 19 23 20)
 
For some fixed m, we might calculate the first unitary partial permutation as
 
  xm=(15 * * * * * * * * * * * * * * * * * * * * * * *)
 
Therefore, because 15 is greater than 9, we know that our particular choice of m does not yield a representative element for the permutation x without calculating the rest of the permutation xm, and we go on to the next m.

Before proceeding further, I will point out that with an S24 model for the corners or for the edges, it's not necessary to store all 24 unitary partial permutations.  It's only necessary to store 8 of them for the corners, corresponding to one facelet per corner cubie.  And it's only necessary to store 12 of them for the edges, corresponding to one facelet per edge cubie.  That's why I think most programs model the cube with a wreath product representation for the corners and for the edges, rather than an S24 representation for the corners and for the edges.  But some of what I'm about the explain is easier if I temporarily maintain the fiction that I'm storing all 24 unitary partial permutations, both for the corners and for the edges.

Based on what I have described so far, the basic scheme for finding Rep(x) is to take x itself as a first approximation for the representative.  There is then a loop that goes through all 48 values for m, calculating and comparing xm one unitary partial permutation at a time to the current approximation for the representative.  Any time the current xm is smaller than the current approximation, it becomes the new approximation.  A very minor enhancement is that I make sure that m0 is the identity element of M, and then the loops only have to run from 1 to 47 rather than from 0 to 47.

The procedure I have described is not too bad.  Most values of xm can be discarded after the calculation of only one unitary partial permutation.  But there is still a performance problem.  As you loop through the 48 values for m, there will usually be several times where you do encounter a value for m that yields a value for xm that is smaller than the current best approximation for the representative.  So the value of xm becomes a new current approximation.  The new current approximation for the representative has to be calculated in its entirety, even though it may not become the final representative.

We can address this performance problem as follows.  We have a loop that for each value of m calculates only its first unitary partial permutation.  As we go through the loop, we maintain a list of m values that minimize the first unitary partial permutation.  And it is only those values of m that possibly can yield the representative for x.  Such a list of m values usually only contains two or three values for m, and it is then only those two or three values that have to be used to calculate xm, rather than all 48 values of m.

Of course, the list of m values is quite dynamic as it is being built.  The list starts out consisting only of m0.  Let's say that m0 yields a first unitary partial permutation of (9 * * ....).  If m1 yields (12 * * ....) we don't add m1 to the list.  If If m2 yields (9 * * ....) we add m2 to the list, and the list now consists of m0 and m2.  If m3 yields (7 * * ....), we start the list over again, containing just m3.  Etc.

This new approach is much faster, but it still requires looping through all 48 values of m, even if just to calculate just one unitary partial permutation for each m.  It's possible to do a little better still.

Normally, we calculate m-1xm left to right in the obvious fashion.  For example, suppose we want to calculate m-1xm: 0 -> k, which we have been denoting as m-1xm = (k * * * ...).  We would send 0 through m-1, yielding i.  We would send i through x yielding j.  And we would send j through m yielding k.  But it's also possible do the calculation "from the middle", as it were.

Let's go back to our example for the corners of a position where
 
  x=(9 2 5 0 22 3 7 4 17 10 13 8 6 11 15 12 1 18 21 16 14 19 23 20)
 
The first unitary partial permutation is 0->9.  There are exactly two elements of M for which m-1 maps 0->9.  The way my tables are set up, they are m0 and m35.  m0 is a rotation and m35 is a reflection.  Conjugation by m0 yields (0->0)(0->9)(9->9)=9.  Conjugation by m35 yields (0->0)(0->9)(9->19)=19.  So conjugation by m0 might minimize xm, but conjugation by m35 cannot.

The second unitary partial permutation is 1->2.  There are exactly two elements of M for which m-1 maps 0->1.  The way my tables are set up, they are m3 and m34.  m3 is a rotation and m34 is a reflection.  Conjugation by m3 yields (0->1)(1->2)(2->1)=1.  Conjugation by m35 yields (0->1)(1->2)(2->1)=1.  So conjugation by m3 might minimize xm, and conjugation by m34 might also minimize xm.  But we have now eliminated m0 as a possibility.

We can continue in this vein through the last unitary partial permutation, which is 23->20.  For each i->j, there will be exactly two possible values of m for which m-1 maps 0->i.  One of the m values will be a rotation, and the other m value will be a reflection.  By the time we have examined each of the 24 unitary partial permuations in this manner, we will have considered each of the 48 possible values for m as a candidate for minimizing xm.

On it's face, it may not seem that we have made much progress.  We have reduced 48 loops down to 24 loops to try to find our list of initial m values that might minimize xm.  But remember that we really only store 8 of the unitary partial permutations, not all 24 of them, were are unitary partial permutation corresponds to a single facelet of the cube.

For example, instead of storing
 
  x=(9 2 5 0 22 3 7 4 17 10 13 8 6 11 15 12 1 18 21 16 14 19 23 20)
 
we really only store
 
  x=(9 2 5 0 22 3 7 4)
 
For facelet number 9, we know that it is connected to facelet number 17 and to facelet number 1 to form a complete cubie.  So for example, we can take,
 
  x=(9 * * * * * * *)
 
and determine that it really means
 
  x=(9 * * * * * * * 17 * * * * * * * 1 * * * * * * *)
 

That is, facelet 9, facelet 17, and facelet 1 are a part of a single cubie that we might call the (9,17,1) cubie.  It turns out that the way I number the facelets, the (9,17,1) cubie is the urf cubie in Singmaster notation.  For our purposes here, that means that we know that if see the unitary partial permutation 0->9, we are also going to have the unitary partial permutations 8->17 and 16->1.  So instead of 2 possible m values we have identified to look at, we have 6 possible m values, namely the two for 0->9, the two for 8->17, and the two for 16->1.  Our loop only has to go 8 times instead of 24, and each loop can consider 6 possible m values.

Essentially all of this information can be placed in tables during program initialization.  For example, the following might be the case for
 
  x=(9 2 5 0 22 3 7 4)
 

  • 0->9  m0 yields 0->9
  • 1->2  m34 and m3 yield 0->1
  • 2->5  m19 yields 0->12
  • 3->0  m17 and m35 yields 0->4
  • 4->22  m5 m16 and m29 yield 0->6
  • 5->3  m23 yields 0->5
  • 6->7  m29 yields 0->11
  • 7->4  m28 yields 0->1

So clearly, the best we can do is 0->1, and the only values for m that we need to consider are m34 m3 and m28.  The loop only had to run 8 times, and it quickly reduced the number of possible m values from 48 down to only 3.  Most of the m values are actually eliminated during the building of the lookup tables during program initialization.  For example, during its consideration of 7->4, the initialization code did actually consider six different possibilities for m.  But only one of the six, namely m28, mapped 7->4 to 0->1.  The other five mapped 7->4 to 0->k where k was greater than 1.

Comment viewing options

Select your preferred way to display the comments and click 'Save settings' to activate your changes.

Re: Calculating Symmetry using Representative Elements

For handling symmetry, I use the idea used in Mike Reid's optimal solver program where one or two "coordinates" describing the cube position are converted to a "sym-coordinate" (the term used in the Cube Explorer documentation), and a symmetry value (an index specifying which element of M is used to convert from the representative to the particular position). I have written most of this before I saw Herbert Kociemba's reply, but basically I am using the same technique as he describes. I may express it in different words, and will use as an example my program for analyzing the permutation of the corners and edges (ignoring orientation).

In my analysis of the permutations of the corners and edges (ignoring orientation), each position consists of an edge permutation value gep (0..12!-1) and a corner permutation gcp (0..8!-1). I used symmetry to convert gcp to a corner sym-coordinate gcs (0..983) and symmetry code gsym (0..47). Lookup tables allow for converting between the two coordinate systems.

So let's say I know the corner permutation gcp and edge permutation gep of a cube position g. I use a lookup table to get which element m of M was used to generate gcp from its representative. I use m', the inverse of m, to generate the corner permutation of the representative. So let r be the cube position of the representative. Then gcp = m'rcpm or rcp = mgcpm'. I also use m' to generate the corresponding edge permutation coordinate (rep = mgepm'). However, this edge coordinate will not always be the correct value for the "true" representative, as m was determined from the corner permutation information only. In some cases, there are multiple edge permutations paired with the same corner permutation in a set of symmetrically equivalent elements.

Note: I may be a little lax in my notation, using numerical coordinates as if they are a group element. But in my example, the edge permutation and corner permutation are independent coordinates, in the sense that one can make the calculations (conjugating by a symmetry element) for either coordinate without knowing the other coordinate's value.

Mike Reid's program calculates a "stabilizer" value for each possible value of the sym-coordinate. Essentially, this is a bitvector specifying which elements in M have the property m'gcpm = gcp. (Note, Mike Reid's optimal solver program used a subgroup of M with 16 elements, not the full M of 48 elements. This is due to the properties of the cube subgroup he was working with.) In my case, I was using the full symmetry group M, so in my case the stabilizer values contain 48 bits. To find the representative for a position, I only need to consider the elements of M for which the corresponding bit in the stabilizer value is a 1. Note that a large proportion of the corner permutation representatives have a stabilizer value of 1 (47 bits that are 0 and only the LSB is 1), meaning that there is "no symmetry" in that particular corner permutation. When this is the case, there are no other possibilities for m to consider regardless of the value of the edge permutation coordinate.

My program uses a table of 984 files, one for each corner sym-coordinate. A 4-bit value is stored for each edge permutation with parity matching that of the corner permutation associated with the particular file. So the table has a total size of 984*(12!/2) nibbles. This is larger than the number of true representatives, so there are some elements of the table that are not true representatives. (One can consider that all the table elements are "representatives" for some set of elements in the whole group, but in some cases, it takes more than one of these sets to make up a complete set of elements that are equivalent when taking symmetry into account. Saying this another way, a set of equivalent elements may be represented by more than one element of the table, but only one of those table elements correspond to the true representative, that one which comes first lexicographically.)

I have chosen to set all table elements having the same representative to the same distance value, once I determine the value for one of them. Then, when looking up a value, I don't need to worry about whether or not I have found the "correct" representative. On the other hand, I have the task of finding all table entries equivalent to a given one. This I do by iterating over the elements of M where the corresponding stabilizer bit is a 1, and calculating the corresponding edge permutation for each. While this code to search for the other equivalent table elements is somewhat expensive (and could be optimized further I am sure), I actually avoid the problem of ever having to find the true representative for a position.

If I later go through the table, for example, to count the number of true representatives in the table for a specified distance, I only have to check for each element if it is the representative. I don't actually have to find the representative for each element. That is, I only have to search (using the stabilizer bitvector as a search filter) until I find an equivalent element that is lexicographically smaller. If the element has the right distance, and it is the true representative, I count it. Otherwise I don't count it. Still, my code is very time-consuming as it stands right now (when iterating over the whole table of over 200 billion elements). I was more concerned with making sure it worked correctly than in getting the highest performance possible.

- Bruce Norskog

Re: Calculating symmetry

I also heavily use representitive elements in my Cube Explorer program. Though I only reduce by the 16 symmetries, which fix the UD-axis of the cube. The algorithm of finding the representatives has not to be fast in my program, because the computation only has to be done once on initialization and never during the search.
Staying with your example of the corners. I seperate the corner permutations p and orientations o and I only build a table for the representatives for the 40320 permutations. I do this on the coordinate-level, and the representative is the smallest number from 0..40319 which can be achieved when conjugating the given permutation by the 16 symmetries.
If the corners position is described by the coordinate pair(p,o), and the representative element for the corner permutation p is p' = m^-1*p*m for some symmetry m, the representative for the the complete corner positions is (p',m*o*m^-1). Actually I replace p' by a number z from 0 to 2767, because there are 2768 equivalence classes.
There is a problem here, because the above representation is not unique for corner permutations p which are symmetric themself, that means if there is a symmetry m1 (identity excluded) with m1^-1*p*m1 =p. This is unimportant for my program, but I think it is no problem to get uniqueness back with the help of a precomputed table which gives the smallest orientation o' in dependence of a given orientation o and the subgroup of M, which keeps p invariant under conjugation.

But maybe also using only the corner permutation could speed up your programs, provided that you keep track of the corner permutation of your given positions and you order your representatives according to that coordinate. Then just a single table lookup will show, if that coordinate is a representative of the corner permutations, and if not you can continue. Only if yes, you have to take a closer look.

So can you share what your fi

So can you share what your final canonicalization code looks like? I'm not sure I fully understand everything you wrote, and actually writing out the final canonicalization code (without the initialization code) would I think be very helpful. Thanks! -tom

I'll include some of the code

I'll include some of the code below as requested.  I don't think you will find it very useful without a little more context.  It's a part of a C++ class library I've been working on.
 
First, a couple of comments/corrections. I worked very hard to try to make the original post look halfway decent with proper subscripts and superscripts and such. It looked fine when I posted it using Internet Explorer. But I looked at it later using Firefox, and the font was so small as to be nearly unreadable. My apologies. I'll try to avoid that problem in the future. You can, however, make the font larger by using the View option on most browsers if the browser you are using makes it look too small.
 
Second, I tried very hard to avoid typographical errors. But there is one very serious one that can make what I said a little more confusing that it already is. At one point I said,
 
Let's go back to our example for the corners of a position where
 
  x=(9 2 5 0 22 3 7 4 17 10 13 8 6 11 15 12 1 18 21 16 14 19 23 20)
 
The first unitary partial permutation is 0->9.  There are exactly two elements of M for which m-1 maps 0->9. 
 
Of course, for any of this to make sense, I should have said: There are exactly two elements of M for which m-1 maps 0->0. 
 
It may seem like a minor error, but the whole discussion makes no sense without correcting the error.
 
Finally, here's the code you requested.
 
// ----------------------------------------------------------------------


      // Form the representative element y = rep(x) = min{m'xm}, where
      // the minimum is taken over the 48 elements of M, the symmetry
      // group of the cube.  This is a faster version than the
      // basic code in repslow.  Invoke as y.repfast(x);


void corners::repfast(const corners &x )
    {


     unsigned char i, i2, lasti2, xti, t1, j, m, wmin;

               // determine minimum value of w for m'xm: 0->w
               // across all m values.  this has been partially
               // precalculated, but the completion of the
               // calculation depends upon the specific x
               // being conjugated

     // (for j=0; j <=7; j++))  unrolled for speed
     wmin   =   w[0][ x.tabs[0] ][0];
     if (wmin > w[1][ x.tabs[1] ][0]) wmin = w[1][ x.tabs[1] ][0];
     if (wmin > w[2][ x.tabs[2] ][0]) wmin = w[2][ x.tabs[2] ][0];
     if (wmin > w[3][ x.tabs[3] ][0]) wmin = w[3][ x.tabs[3] ][0];
     if (wmin > w[4][ x.tabs[4] ][0]) wmin = w[4][ x.tabs[4] ][0];
     if (wmin > w[5][ x.tabs[5] ][0]) wmin = w[5][ x.tabs[5] ][0];
     if (wmin > w[6][ x.tabs[6] ][0]) wmin = w[6][ x.tabs[6] ][0];
     if (wmin > w[7][ x.tabs[7] ][0]) wmin = w[7][ x.tabs[7] ][0];

     tabs[0] = wmin;     // known to be the case by the way wmin was calculated
     tabs[1] = 24;       // force scan of all cubies first time through


              // process only those values for m such that
              // m'xm: 0->w.  the values are precalculated in
              // the w array.

     for (i = 0; i <= 7; i++) if (w[i][ x.tabs[i] ][0] == wmin )
        {

         lasti2 = w[i][ x.tabs[i] ][1];   //  calculate outside i2 loop
         xti    = x.tabs[i];              //  calculate outside i2 loop

         for (i2 = 2; i2 <= lasti2; i2++)
            {
             m = w[i][xti] [i2];
             for (j = 1; j <= 7; j++)   //  loop through 7 bytes of the corner
                                        //  (first byte is already set)
                {
                 t1 = x.rep1(m,j);
                 if (t1 > tabs[j])        //  y is better than m'xm, go to next m
                    {                     //  by forcing the end of the j loop
                     j = 8;
                    }
                 else if (t1 < tabs[j])   //  m'xm is better than y, so
                    {                     //  set y = m'xm (new candidate)
                     tabs[j] = t1;
                     for (j++; j <= 7; j++)   // loop through rest of j's
                        {
                         tabs[j] = x.rep1(m,j);  // y = m'xm for rest of new y
                        }
                     j=8;   // force end of j loop
                    }

                 // if (t1 == tabs[j]), then do nothing except go to next byte
                 // and going to the next byte is implicit.  So the if
                 // (t1 == tabs[j]) test need not be made.
                }
            }
        }
    };



 
If it helps a little, compare the unrolled loop at the beginning of the function to the example at the end of the original post in this thread.  The unrolled loop is finding the minimum of 9, 1 12, 4, 6, 5, 11, 1. The least value is of course 1. The first 1 corresponds to m34 and m3. The second 1 corresponds to m28.  The rest of the code then calculates xm for each of m34, m3, and m28.  The trick is that the other 45 values for m do not need to be tested at all.
 
The following function will also be helpful:
 
// ----------------------------------------------------------------------

      // The rep1 routine is used as a part of the process of
      // calculating the representative element y = rep(x) = min{m'xm}, where
      // the minimum is taken over the 48 elements of M, the symmetry
      // group of the cube.  x is the *this argument.

      // The rep1 routine is called by rep to calculate only one byte
      // of m'xm in order to keep the logic of rep itself much cleaner.


inline unsigned char corners::rep1(const unsigned char &m, const unsigned char &j) const
    {

     unsigned char t1, t2;

     t1 = M_inv[m][j];

     if      (t1 <=  7) t2 = tabs[t1];
     else if (t1 <= 15)
         {
          t2 = (unsigned char)(tabs[ t1-8 ] + 8);
          if (t2 >= 24) t2 -= (unsigned char)24;
         }
     else
         {t2 = (unsigned char)(tabs[ t1-16 ] + 16);
          if (t2 >= 24) t2 -= (unsigned char)24;
         };

//               m'x  has been calculated.  Now calculate
//               (m'x)m by multiplying by m.

     return M[m][t2];

    };

 
The code above is for the corners only.  But the code is very little different when dealing with both corners and edges.  I compare corners before edges, comparing edges only to break a tie on the corners.  And for the trick of coming up with a short list of m values, I only need calculate the first partial permuation for the corners, not all eight partial permutations for the corners.
 
Finally, I should probably mentions that there are pathological cases where this "fast" version of the Rep function is no faster than any other way of doing it.  If a position is highly symmetric, you will have to test a large number of m values no matter what.  And in particular, for the case where Symm(x)=M, all 48 values for M will have to be tested no matter what.  Fortunately, the percentage of such pathological cases is vanishingly small.

I decided to add another litt

I decided to add another little tidbit that may help in deciphering what I'm trying to say.

I think everybody knows what a conjugate in general is, e.g., xy=y-1xy is the conjugate of x by y.  And if you have x and y in hand, the calculaton of the conjugate is simplicity itself.

But what if x is not a permutation, but rather what if x is a partial permutation.  In particular, what if x is a unitary partial permutation.  For example, what if x is the partial permutation 4->1 which I have been writing as (* * * * 1).  Can you calculate its conjugate?

Of course, you can.  And it's actually quite simple.  You have to find the pre-image of 4 in y-1, and you have to find the image of 1 in y.  Computationally in a program, finding the image of 1 in y is totally trivial.  Finding the pre-image of 4 in y-1 may not be quite so trivial, but there is a better way.  That's because finding the pre-image of 4 in y-1 is equivalent to finding the image of 4 in y, which is totally trivial.  So you just map both 4 and 1 to their image in y, and you are done.

For example, suppose y=(1 2 3 4 0).  Then, the image of 4 in y is 0, and the image 1 in y is 2, so 4->1 becomes 0->2 under conjugation by y.

Here's another way to do it.

We know that y-1=(4 0 1 2 3). So we can calculate

y-1xy=(4 0 1 2 3)(* * * * 1)(1 2 3 4 0)

The patter in our head is:

0 goes to 4, 4 goes to 1, 1 goes to 2:

(2 ....)

1 goes to 0, 0 goes to undefined:

(2 * ....)

2 goes to 1, 1 goes to undefined:

(2 * * ...)

3 goes to 2, 2 goes to undefined:

(2 * * * ...)

4 goes to 3, 3 goes to undefined:

(2 * * * *)

and we are done. (2 * * * *) means 0->2, nothing more, nothing less.  So the second way we did the calculation gave us the same answer as the first way.

The second way I calculated the conjugate is the obvious and direct way. But it's not the easiest or the fastest. The first way sort of "starts in the middle", and it is not as obvious and direct (at least, to me it's not -- it may to others). But it's the easiest and the fastest. And it helps to motivate the notion of calculating m-conjugates "from the middle" as a part of the technique of doing the calculation of Rep(x) as fast as possible.

But the problem of calculating Rep(x) does involve a little wrinkle that's a little trickier.  In this note so far, we have been given a partial permutation x and a permutation y, and we have been calculating the partial permutation z where z=y-1xy.

As a first approximation of what we need to calculate Rep(x), we need to be given a partial permutation x and a partial permutation z, and we need to find a permutation y that satisfies z=y-1xy. And in particular (and no longer an approximation), we need for z to be of the form 0->k, and we need to find a permutation y that satifies (0->k)=y-1xy while minimizing k.  Except that we need this to be the Cube group instead of Sn, and we need m to take the role of y because we are doing m-conjugation.

The reason z must be of the form 0->k rather than of the form j->k is that we must calculate the first unitary partial permutation (sort of like calculating the first letter in a word), because it's the first unitary partial permutation that we compare first to put permutations in lexicographic order (like comparing the first letter of each word as the first step in putting them in alphabetic order).

Very nice! I'm going to have

Very nice! I'm going to have to use some of these ideas in my programs.

Right now my canonicalizer just iterates over the 48 symmetries, building each projection left-to-right in the target and doing the usual less/more/equal sequence. But as you say, you may end up finding "less" several or even many times. I'll have to see how much your ideas speed things up.

Jerry, do you ever use the inverse too, to get 96-fold compression?

Well I experimented with this

Well I experimented with this technique (the very last one because I'm pretty much using the others)---and only got a slight speedup (on the order of 10%). A different technique gave me a much greater speedup (nearly 3X for the canonicalization step). Let's assume I'm doing canonicalization of edge positions, although this technique also works for edge cubies (with orientation), corner positions, corner cubies, and of course the full cube. What I do is separate the 48 symmetry classes into 6 sets, one for each face. I simply have a table of size 12*12*12*12 that gives me, for a given set of four edges on a face, the best of the eight orientations that preserve that face, and its result as an integer (base 12). Let us assume the first four cubies in our representation pertain to face 0. Essentially, my code then simply does:
int bestseen = 999999999 ; some impossibly high value
int mapping_to_use = -1 ; none seen yet
for (face f in FACES) {
   // calculate *a* mapping of this face, to the face sharing
   // the first four cubies of our cube representation.  We only
   // need to calculate four cubies, and we can immediately
   // sum them into a table index
   mapping m = firstmapping(f, 0) ; // find a mapping that takes f to 0
   int ind = calcnewcubie(m, 3) + 12 *
            (calcnewcubie(m, 2) + 12 *
            (calcnewcubie(m, 1) + 12 *
             calcnewcubie(m, 0))) ;
   if (bestvalueofeight[ind] < bestseen) {
      // new better mapping!
      bestseen = bestvalueofeight[ind] ;
      mapping_to_use = m + secondarymapping[ind] ;
   } else if (bestvalueofeight[ind] == bestseen) {
      // saw another way to get this value; use slow method
      mapping_to_use = -1 ;
   }
}
if (mapping_to_use < 0) {
   use_slow_method() ;
} else {
   remap(mapping_to_use) ;
}
This way, we get and compare the results of four cubies at a time, which almost always gives us a unique minimum (with a unique mapping that gets us to that minimum). And the code is very simple and fast. Heck, even the multiplications by 12 listed above are actually done by the table lookups that do the orientations themselves. Of course this is still slower than using a separate orientation from permutation "symmcoordinate", but I haven't figured out how to get that to work outside of a UD symmetry world (for instance, for full M-symmetry). Maybe someone has some ideas on this?

A possible improvement

This appears to me to be a good idea, Tom. Thanks for sharing.

When working with M-symmetric problems, I have been subdividing the twelve edges based upon which inner slice each belongs to, rather than subdividing according to which of three parallel slices they belong to. So my four most-significant edges are UF, UB, DF, and DB. With this convention, it appears to me your technique could be modified so that the code could iterate over 3 inner slices, rather than 6 faces. The lookups would provide a best of 16 rather than a best of 8. It seems to me this may provide another 2x improvement (approximately) in speed.

I have been looking at optimizing my code for iterating over my distance table of corner and edge permutations and tabulating only the representatives (unique wrt M and unique wrt M+inv). Since in my case, the edge permutation is considered less significant than the corner permutation, I must restrict the elements of M used to those that do not affect the value of the corner permutation. So this technique doesn't look like it would work for my immediate need, since I need to consider different subsets of the 16 possibilities depending upon what corner permutation I am working on, rather than getting the best of all 16.

Sym-coordinates for orientations

Sym-coordinates for the orientations of the edges with full M-symmetry are no problem, because you can define a reference frame for the orientations which has M-symmetry. The simplest way to describe this is that every quarter turn changes the orientation of an involved edge.

For the corners there is no sym-coordinate with full M-symmetry for the orientations alone. The best you can do to use a coordinate which combines the orientation of the corners with the positions of four corners, which build tetraeder (e.g. urf,urb, dlf and drb, disregarding the order of these corners (70 possibilities), so your sym-coordinate would have about 2187*70/48 different states.

Right after I posted that com

Right after I posted that comment, I went for a run with the yellow Labrador, and during that run I figured out that the edge orientation, flipping with each quarter turn, would indeed preserve M-symmetry. Thanks for the confirmation!

On the corner orientation, how did you figure that to be the best you could do? Is there a lengthy explanation somewhere? I've been working through some candidate coordinates and have not been able to come up with a better one. Do you have a concise description of what the coordinate is, exactly? I.e., what moves generate what twists?

No, I have no really bulletpr

No, I have no really bulletproof argumentation, that this is the best.

Coordinates correspond to right cosets of a certain subgroup H of the cube group G. The usual corner orientation coordinate is defined by the subgroup H, which keeps all corner orientation =0. If the coordinate should be able to be transformed to a sym-coordinate, this only is possible, if the subgroup H is invariant under conjugation with all elements of symmetry group. If this symmetry group is M, then for all x in M we must have x^-1 H x=H.

This is not valid with the corner orientations defined in the usual way, where H =[U,U2,U',D,D2,D',R2,F2,L2,B2].

But with the "extended" corner coodinate which also takes into account the position of the four "tetraeder corners", the corresponding subgroup H is defined by all elements of G, where the orientations stay 0 *and* all corners stay in their tetraeder. This subgroup has 4!*4! = 576 elements (ignoring the edges), and the number of cosets is 8!*3^7/576, which is the 70*2187. And H is invariant under conjugation by all 48 symetries.

But I have no strict proof, that there is no possibility to define the orientations in different way so that there is a smaller number of cosets which respect M-symmetry.

I have never done the 96-fold

I have never done the 96-fold compression. I have long planned to do so, but never got around to it. It's a feature I plan to add to the C++ class library I'm building for Rubik's cube.

Right now, I don't have as much time as I would like to spend on the class library project. But more importantly, I want to include the capability of ignoring some number of cubies. My model for doing so is fine when not taking symmetry into account, but I'm having conceptual difficulties (e.g., mathematical and group theory difficulties) when I try to ignore some number of cubies and take symmetry into account at the same time. I haven't wanted to proceed with my class library project until I solve this problem to my satisfaction.

If I could get this all worked out, I would like to take a crack at ignoring seven corner cubies. That would mean including all the edge cubies plus one corner cubie, which would be a *very* big problem to tackle. Maybe somebody else can solve it instead.