The heuristic calculation basically consists of two steps: The first finds a suitable ordering of the rows and columns not yet fixed by the B&B search tree descent. The second step factorizes the system and returns the cost as the heuristic.
A simple algorithm for finding a suitable ordering of rows given a fixed column ordering, simply fixes one suitable row, then recursively attempts to fix the remaining rows. If this fails, another row is fixed as the first, and the recursion attempted again. If everything fails, no valid ordering (an ordering with non-zero diagonal) exists. If we succeed, we can either just use this ordering, or generate all possible orderings. Either way, this problem is hard.
I liked this approach because of its obvious simplicity. It was easy to generate all possible orderings, or just some. But the factorial run-time of the algorithm made it virtually unusable on systems of dimensions above , or just more dense systems. It can easily be seen, that the algorithm will explode in run-time as either the dimension or the density of the system increases.
One way to work around the non-polynomial run-time of this algorithm would be to apply dynamic programming, or ``memorization''. If we remembered how some sub-set of rows could be ordered, chances seemed to be good, that this sub-set would need to be ordered very often, and thus if it was memorized and the ordering could be retrieved fast, we would save a recursive descent saving us from some of the unpleasantness.
Storing and retrieving the ordering of some sub-set of the rows needs to be done as fast as possible. I chose a trie as the data structure for storing the solutions. Tries yield a search time which is constant with respect to the number of stored solutions, and linear with respect to the dimension of the system. The overall algorithm seemed viable to this point.
During a series of experiments, I found two major flaws in this design. The first was, that the ``hit-rate'' (the ratio between hits and misses) in the search trie was around 1%. No matter how fast I could retrieve the orderings, it was only a minor part of the overall orderings that where to be found that I could actually retrieve from the trie. Thus, the memorization did not improve run-time significantly.
The second pitfall was, that search tries have exorbitant memory requirements. Even for smaller systems () the memory consumption of the program exceeded half a gigabyte. This made the algorithm unsuitable for execution on commodity hardware, and suggested that it would be impossible to treat large systems even on supercomputers.
A third observation is, that even if we could have dealt with the memory consumption and if the hit-rate had been more significant, the algorithm would still have run in non-polynomial time. The need for ordering a large number of sets of rows recursively would still exist. The run-time might have been better measured in real time, but the complexity would still have been non-polynomial, and we would have met the wall of combinatorial explosion for some not-too-large size of system anyway.
These sad observations led to the abandoning of this approach. Perhaps storing the orderings in a balanced tree (eg. a red-black tree) would solve the memory consumption problem. Perhaps a more sophisticated search logic would be able to find more usable memorized solutions, and thus improve the hit-rate. I doubt this approach will lead to anything as good as what I use in the final optimizer, but still I felt it deserved to be mentioned.