Monday, November 23, 2015

Optimizing Server Interaction with Screen and Friends

Since becoming part of industry, I've come to see the advantage of doing my work on a remote machine (a DevServer). Now historically, I had been this Linux nut, who loved using tiling window managers. My workflow has been launch a handful of terminals to begin working and launch more as needed. This was a rather mundane setup that just worked. Unfortunately, it just doesn't work in industry. I encountered several challenges:

  • Launching a terminal connects me only to my machine
  • Cannot work locally (challenges through the roof to make this even possible)
  • Accessing the server requires SSH, SSH requires the use of two factor auth (no auto login)
  • Long-lived sessions with SSH are impossible, it cannot withstand a computer suspend and rarely even a momentary loss of network connectivity

So the question became how can I reproduce, if not create an enhanced version of, my previous environment with minimal effort?

  • Find the simplest terminal emulator (st)
    • For better colors, add the solarized patch
    • I tried a few other patches, but they didn't seem to work on my system
  • Use screen locally
    • A keyboard only way to access scrollback buffer (content that was printed out that no longer fits on the screen)
    • Make the local screen respond to easier to access keys: Alt-a instead of Ctrl-a
    • Quicker access to my screen windows I'm looking for (windowlist -b) and make this easier Alt-a ' (instead of Alt-a ")
  • Rename the terminal and screen window with the PWD and command executing inside (fun stuff with PROMPT_COMMAND)
  • Use screen remotely
    • Always ssh into a long-lived state
    • Restart that state automatically, in case I manually close it out or the machine is rebooted
    • No need to SSH in for each additional window, just create new windows in the current screen session
  • Use mosh
  • No need to deal with SSH flakiness -- automatic connection reestablishment after lossy network usage or suspend / resume
  • Mosh can be flaky off of our corporate network -- I can switch to SSH and resume my screen session there

The only thing really missing is that I would love to be able to create multiple distinct views of my remote screen session without multiple SSH sessions. Think of it this way, if I have 3 windows open on a remote screen session. I can only view one of them at a time unless I ssh in again and attach to that same session. Ideally, I could move the multiplexing locally, alas, I couldn't figure a clean way of moving the screen unix domain socket locally and have that local screen connect to it.

Now it is time for the useful code bits.

My .screenrc:

vbell off
startup_message off
autodetach on
defscrollback 100000
shelltitle '$ |/home/davidiw bash'
#hardstatus string "%h"
#caption always "%{= kw} %-w%{= wk}%n*%t%{-}%+w%{= kw} %=%d %M %0c %{g}%H%{-}"
#hardstatus alwayslastline

escape ^Aa
register S ^A
bindkey "^[a" process S # On the remote machine, i set this to "^[s", so I don't have to type Alt-a a
bind "\'" windowlist -b
bind "\"" select

Append this to .bashrc to get nice names for screen and xterm title's and to start screen with the default session (home):

function make_title {
  if [[ $BASH_COMMAND == 'trap "make_title" DEBUG' ]]; then
  echo -ne "\033]0;$PWD $cmd\007"
  echo -ne "\033k$PWD $cmd\033\\"


case $term in
    export TERM="xterm-256color"
    export PROMPT_COMMAND='trap "make_title" DEBUG'

case $term in
    exec /bin/bash -c "screen -AUx home || screen -AUS home"

Future work:

  • Shared local vim buffer
  • Shared remote vim buffer
  • A git repository to make reproduction easy
  • Fix for mosh to not break on non-corp networks
  • Clickable URLs in the Terminal

Thursday, March 19, 2015

GCC and UD2 instructions

A few colleagues and I are working on OS development. While most of the development has taken place on MacOS, I prefer Linux and primarily use rolling release distribution called Arch. On the Mac, my colleagues obtained GCC 4.8 from mac ports and everything compiles just fine for them. However, having a rolling release version of Linux implies I will always have the latest and greatest versions on my system. Usually that is fine, sometimes not as in this scenario. At some point, GCC started introducing UD2 instructions instead of emitting errors. Now specifically, a UD2 instruction stands for undefined instruction and causes the system to halt. Why on earth would any compiler perform this function? It was absolutely baffling to see this type of behavior from a properly compiled program that was built using -Wall.

So I did some searching in the assembly output to find where the UD2 instruction was being generated and found one in the following code snippet:

static struct pci_func * alloc_pci_func(){ if (pci_dev_list.total_dev == N_PCI_DEV) { KERN_DEBUG ("Alloc pci_func from pci_dev_list error! no available \n"); return NULL; } return &[pci_dev_list.total_dev++]; }

Where do you think the problem is? My initial reaction was that maybe this is due to some fancy overflow detection not working quite right, notice that we increment total_dev but limit from incrementing it beyond N_PCI_DEV. This did not work. So I tried a slightly different method, I looked at our current optimization level and it happened to be -Os, or effectively -O2 with some tweaks for size of output. So I went to -O2 and then -O1, at -O2 the issue still existed whereas in -O1 it did not. Taking a peek at the list options enabled by -O2, I set the compilation to -O1 and began enabling -O2 options explicitly until I stumbled upon the problem: -fisolate-erroneous-paths-attribute. This flag happens to do the following: Detect paths which trigger erroneous or undefined behaviour due a NULL value being used in a way which is forbidden by a "returns_nonnull" or "nonnull" attribute. Isolate those paths from the main control flow and turn the statement with erroneous or undefined behaviour into a trap. Brilliant, the others figured it was better to turn in return NULL is undefined behavior than warning us that maybe we should look into a different convention. Frankly, I'm not sure what the correct convention should be, perhaps a panic? But that seems a little bit harsh especially if the system can handle running out of limited resources. So to keep our -Os setting I also added the following compiler flag: -fno-isolate-erroneous-paths-attribute. Fortunately I found my bug issue, but it seems to be expected behavior from GCC. Mind you, this isn't the only example of a GCC UD2 issue.