Thursday, March 19, 2015

GCC and UD2 instructions

A few colleagues and I are working on OS development. While most of the development has taken place on MacOS, I prefer Linux and primarily use rolling release distribution called Arch. On the Mac, my colleagues obtained GCC 4.8 from mac ports and everything compiles just fine for them. However, having a rolling release version of Linux implies I will always have the latest and greatest versions on my system. Usually that is fine, sometimes not as in this scenario. At some point, GCC started introducing UD2 instructions instead of emitting errors. Now specifically, a UD2 instruction stands for undefined instruction and causes the system to halt. Why on earth would any compiler perform this function? It was absolutely baffling to see this type of behavior from a properly compiled program that was built using -Wall.

So I did some searching in the assembly output to find where the UD2 instruction was being generated and found one in the following code snippet:

static struct pci_func * alloc_pci_func(){ if (pci_dev_list.total_dev == N_PCI_DEV) { KERN_DEBUG ("Alloc pci_func from pci_dev_list error! no available \n"); return NULL; } return &pci_dev_list.dev[pci_dev_list.total_dev++]; }


Where do you think the problem is? My initial reaction was that maybe this is due to some fancy overflow detection not working quite right, notice that we increment total_dev but limit from incrementing it beyond N_PCI_DEV. This did not work. So I tried a slightly different method, I looked at our current optimization level and it happened to be -Os, or effectively -O2 with some tweaks for size of output. So I went to -O2 and then -O1, at -O2 the issue still existed whereas in -O1 it did not. Taking a peek at the list options enabled by -O2, I set the compilation to -O1 and began enabling -O2 options explicitly until I stumbled upon the problem: -fisolate-erroneous-paths-attribute. This flag happens to do the following: Detect paths which trigger erroneous or undefined behaviour due a NULL value being used in a way which is forbidden by a "returns_nonnull" or "nonnull" attribute. Isolate those paths from the main control flow and turn the statement with erroneous or undefined behaviour into a trap. Brilliant, the others figured it was better to turn in return NULL is undefined behavior than warning us that maybe we should look into a different convention. Frankly, I'm not sure what the correct convention should be, perhaps a panic? But that seems a little bit harsh especially if the system can handle running out of limited resources. So to keep our -Os setting I also added the following compiler flag: -fno-isolate-erroneous-paths-attribute. Fortunately I found my bug issue, but it seems to be expected behavior from GCC. Mind you, this isn't the only example of a GCC UD2 issue.