CXX Calling Conventions

Posted on 2011-09-09 Last modified on 2011-09-09 17:43:52

My advisor presented me with an interesting problem the other day. Calling some simple C++ free functions that took C++ objects by value using Python's ctypes produced strange results. We eventually narrowed the test case down to the following:

#include <stdio.h>

struct Simple {
  int x,y,z;
};

struct Fancy {
  int x,y,z;
  ~Fancy() {
    printf("Destroying a Fancy\n");
  }
};

extern "C" {
  void printSimple(Simple s) {
    fprintf(stderr, "Entering printSimple\n");
    fprintf(stderr, "x=%d, y=%d, z=%d\n", s.x, s.y, s.z);
    fprintf(stderr, "Exiting printSimple\n");
  }

  void printFancy(Fancy f) {
    fprintf(stderr, "Entering printFancy\n");
    fprintf(stderr, "x=%d, y=%d, z=%d\n", f.x, f.y, f.z);
    fprintf(stderr, "Exiting printFancy\n");
  }
}

This code sample exports the two interesting functions with unmangled names to make calling it from Python easier. The two structs are structurally identical. Allocating these in Python is easy enough using ctypes. Calling the first one with a structure by-value works as expected. Calling the second prints out nonsense. However, calling the second with a pointer to a struct works just fine. The raw x86_64 assembly is a bit hard to read, but a dump of the LLVM IR for this code is slightly more enlightening:

define void @printSimple(%"struct.<anonymous namespace>::Fancy"* byval %s) {
; ...
  ret void
}

define void @printFancy(%"struct.<anonymous namespace>::Fancy"* %f) {
; ...
  ret void
}

Clearly, the second function must be called with the parameter passed by pointer. But why? We figured that it must be some kind of calling convention issue. It looks like the relevant standard is the Itanium C++ ABI (Section 3.1.1). Nobody uses Itanium anymore, but g++ uses this ABI for most (all?) platforms that do not define their own (including i386 and x86_64). Basically, if a C++ object has any non-trivial constructor or destructor, the caller allocates the copy of the object being passed by value. When making the call it has to pass the address of this local temporary to the callee. Presumably this simplifies exception handling and unwinding. This was a bit surprising and adds a whole new set of complexities to calling C++ from other languages.

Recent Bug Fixes

Posted on 2011-09-08 Last modified on 2011-09-09 17:43:52

First, I finally went back and added support for clang in my whole-program-llvm wrapper. Usage is easy: just set LLVM_COMPILER=clang and it should Just Work. In the process of implementing this clang support I ended up feeling pretty bad about some of the old code. I ended up refactoring out quite a bit and it is definitely cleaner now. I'm still not sold on this whole "object-oriented programming" thing, but it worked tolerably for a small chunk of Python code. The resulting objects are, of course, completely arbitrary bundles of functionality.

I also fixed a bug in taffybar that prevented non-ASCII window titles in xmonad from rendering properly in the XMonadLog widget. It turns out that xmonad lies about the types of strings it gets from X. While the type signature claims that they are Strings (lists of Char), they are actually lists of bytes stuffed into Char. That is, the underlying utf-8 string was not decoded before being converted into a String. A simple call to decodeString from the utf8-string library fixed the issue.

An xmobar alternative

Posted on 2011-08-13 Last modified on 2011-08-13 15:50:33

I've been an XMonad user for a while now. The window manager doesn't include any status bar-type functionality, instead relying on external programs. The two most common bars in use seem to be dzen and xmobar. I didn't like the methods for feeding data into dzen, so I went with xmobar. It was tolerable. I added a freedesktop.org notification widget a few months ago and it got a little better. Unfortunately, the text-only interface bothered me a little bit, and there was no easy way to get a system tray working well with xmobar.

This itch bothered me enough that I scratched it last week and wrote an alternative status bar for xmonad: taffybar. The bar is actually very simple and based on gtk2hs. If you find that offensive, taffybar probably isn't for you. Visually, it is very similar to xmobar (I don't really know how much room there is for visual variation in the system bar space). A few feature highlights:

  • System tray
  • XMonad log widget that works over DBus (so you can restart xmonad and taffybar separately and not worry about pipes filling up)
  • Freedesktop notification widget
  • Battery widget using UPower
  • Time-based graph widgets (similar to those in Awesome)

There is still a lot of work that could be done and a few widgets I want to add, but I wanted to have an early release so people could try it and give me feedback. The documentation is in the System.Taffybar module haddock; unfortunately, hackage doesn't like my package and the build failed, so the documentation is temporarily elsewhere until I figure out how to make hackage happy. The installation goes fine on the machines I've tried it on.

Haskell in Emacs

Posted on 2011-07-15 Last modified on 2011-07-15 15:18:55

Like many people, I use Emacs to edit my Haskell code. The standard haskell-mode works fine, but I always felt like I could use a bit more help from my editor. I recently ran across scion, which is something like a Haskell IDE library that provides deep information about programs to editors. It also happens to include Emacs integration.

A few notable features:

  • In-buffer error and warning highlighting
  • Expression typechecking
  • Completion of LANGUAGE pragmas
  • Go-to-definition of symbols

It does all of this by using the Cabal library to determine your project settings and then invoking the GHC API to compile everything. The information is very accurate and compilations after the initial setup are too fast to notice. It re-checks your code every time you save, so feedback is also basically immediate.

I really like scion so far. I had one project that was fairly complex with a fancy custom build system involving a configure script -- this proved to be too complicated for scion to handle correctly. It ended up having some linking issues with some FFI calls I was using, so I split those out into a separate package so I could use scion with my main project.

Whole program LLVM bitcode

Posted on 2011-07-01 Last modified on 2011-09-09 17:43:52

Sometimes it is useful to be able to analyze an entire program at once, rather than analyzing individual compilation units. LLVM has some infrastructure in this: the llvm-link program (and associated library methods) combines multiple bitcode files into one as a linker might. Unfortunately, getting all of the bitcode to pass to llvm-link in the first place can be challenging in the face of strange and arcane build systems (e.g., autotools, libtool, and other impolite tools).

Official Aside

There is actually an official LLVM way to build whole-program LLVM bitcode using the GCC link time optimization (LTO) framework and a plugin for the gold linker. Some of the relevant details are documented, but that is not the whole story. Long story short, after you:

  • build binutils > 2.20.1 with LTO and plugin support (--enable-lto and --enable-plugins),
  • build LLVM with the --with-binutils-include flag to tell it where to find the plugin-api header,
  • build gcc 4.5.2 with LTO and plugin support (--enable-lto and --enable-plugin) with the patch from dragonegg that did not make it into the 4.5 mainline (also, gcc dependencies mpc, mpfr, ppl, and cloog), and
  • build the dragonegg plugin

you can finally use the link time optimization plugin LLVMgold.so to generate whole-program bitcode files. One way to do this is to just use gcc with dragonegg as your compiler for a project with the extra flags -flto -fplugin=dragonegg.so -S. This tells the dragonegg plugin to generate LLVM IR for each file; the -S is necessary to prevent gcc from attempting to run the system assembler over LLVM assembly, which it will not understand. During the link stage, you can add the flag -Wl,-plugin-opt=also-emit-llvm and the linker will spit out a bitcode file along with a binary.

Note that the -S flag is not entirely necessary. You can convince the compiler to use llvm-as to assemble the output of the dragonegg plugin into an actual bitcode file if you pass the -Bdir flag, where dir is the path to a directory containing a single symlink that presents llvm-as as just as. I also had to provide a link for my plugin-enabled ld, as the system version was far too old.

This process does work for simple builds, but fails down if your build process generates intermediate static libraries. Since LLVM assembly and bitcode files are not true object files, the resulting static library archives are not valid linker inputs and the compilation will error out when it tries to use them.

A Hackish Solution

Instead of doing things the right way, I decided to do them my way. I wrote a simple wrapper script that pretends to be gcc and compiles every file in a project twice. The first compilation produces an authentic object file. The second produces an LLVM bitcode file and writes the full path to the bitcode file into an ELF section in the object file. The objects can be moved around and linked arbitrarily and these ELF sections are merged appropriately. After the compilation finishes, there is a script to read this ELF section and link together all of the named bitcode files.

I posted the scripts on github. They have worked for me so far but there are probably bugs. If you happen to find some, just let me know and I'll see about fixing them.

TODO

Currently, the scripts only support dragonegg. I need to make it work with clang, too.

Wacom and RHEL5

Posted on 2011-05-10 Last modified on 2011-05-10 16:53:56

I have been making my presentation slides in LaTeX Beamer for a few years now and it is a pretty pleasant process for the various reasons I mentioned in the previous post. More recently I decided that I needed to depend less heavily on the staple of a staple of Beamer presentations: the bullet point. You can generally spot a Beamer presentation from a few miles away, and I did not really want to be that guy any more. Without bullet points, though, what is really left for presentation slides? I was at a loss and had no idea what to do. The TikZ package is an amazing and terrifying library for building diagrams in LaTeX, but I usually end up feeling slightly ill at the prospect of making any non-trivial figures with it.

That leaves drawing. I am reasonably sure that a pigeon holding a stick in its beak could draw better than me. That has not really stopped me, though, and I decided that I would start drawing my diagrams. I don't have the motor skills required to use a mouse, so I went and picked up the cheapest Wacom drawing tablet I could find: some Bamboo model. I just plugged it into my laptop running Debian Sid and everything Just Worked. It works amazingly well in Inkscape and The Gimp, so I am covered from the software end.

Of course, my laptop runs recent versions of software. At the office, my workstation has to run Red Hat Enterprise Linux 5. The kernel in RHEL5 is several years old and hardware support is often lacking. In fact, it has no clue what to do with this drawing tablet. My first workaround was to just use my laptop and hook up an extra monitor. That did work, but I do not really like leaving my laptop on and idle for extended periods of time like that. My next workaround has proved sufficient: VMWare. I just run an up-to-date Debian install in a VMWare virtual machine and give it ownership of the drawing tablet. That kernel recognizes it and everything works fine. The performance of the VM on my workstation is far from stellar but it is sufficient for drawing.

Now I just have to see if my presentations improve at all.

Fragile LaTeX Beamer Slides

Posted on 2011-05-05 Last modified on 2011-05-06 00:41:54

I am currently working on the talk slides for my thesis proposal. My preferred presentation medium is LaTeX Beamer -- a documentclass for generating slides. It gives you access to the usual selection of excellent LaTeX tools. Many argue that most of what LaTeX gives you is exactly what you do not want in a presentation:

  • Ease of content generation without worrying about formatting
  • Easy abuse of well-typeset math
  • Lots of bullet points

While it is true that you can use these tools to make the standard Awful Beamer Presentation, you can also ignore those things and make good presentations. Currently, I do not actually have much text at all (and zero bullet points); instead, I mostly have diagrams and illustrations. I could have made all of the diagrams in TikZ, but that can be a little tedious. I have been using Inkscape, which is actually really good and a lot of fun.

That is not really the important point here, though. I also have many short source code listings in the presentation. For papers, I usually stick to the listings package. However, it is not very appealing for presentations because it is very painful to get properly syntax-highlighted code listings out of it. Instead, I much prefer minted, which uses the excellent pygments program to do the actual syntax highlighting. Both of these packages work very well, except for one small problem when combined with Beamer: the output of both tools is fragile. My understanding is that this means that the content cannot be moved freely. Beamer has provisions to work around this, though. A slide can optionally be marked as fragile, which changes the method LaTeX uses to process it.

\begin{frame}[fragile]
  %% Fragile content goes here
  \begin{minted}{haskell}
    ...
  \end{minted}
\end{frame}

This works and fragile content like code listings can be embedded as one would expect. However, there is a major restriction: fragile slides cannot have overlays. Overlays are the mechanism by which Beamer allows content in a single logical slide to be hidden or progressively revealed. They are a major workhorse of a Beamer presentation, so this restriction is unpleasant. I only found the solution yesterday in some presentation, and this post is mostly to make sure I remember it. The idea is to construct the fragile content outside of a slide, save it, and then use it in a slide as desired later.

\defverbatim[colored]\contentName{
  \begin{minted}{haskell}
    ...
  \end{minted}
}

\begin{frame}
  \contentName
\end{frame}

Note that the slide is not marked as fragile and overlays will work as expected. Anyway, I thought it was pretty useful and very under-documented. One warning: be very careful to put the end tag for the fragile environment (listings, minted, etc) on its own line. In particular, the closing brace of the macro being defined to hold the fragile content should come on a separate line from the end of the environment. The terrifying things LaTeX does to construct some of these pieces of fragile content seems to depend on the end of the environment being on its own line. The error messages that it produces when this is not the case are, as always in LaTeX, completely useless.

GitHub

Posted on 2011-04-17 Last modified on 2011-04-17 17:18:30

Today I finally decided to stop hosting my own git repositories and just move to GitHub. Hosting them on my VPS was never particularly difficult or demanding, but I was never happy with the web interface I had set up (cgit). cgit itself is great and easy enough to use. I had my web server just dispatch CGI requests to it and everything was fine. The only problem was that I didn't put enough effort into making it fit with the rest of my site visually. I was also never quite sure about the stability of the hosting setup that I had constructed.

GitHub solved both of these issues: it seems reasonably stable and doesn't give me the option to customize the appearance of the repositories. This way I don't have to agonize over it. I went to go sign up and was shocked to see that travitch, was taken. Annoyed, I looked to see what this person was hosting. Well, it turns out that I had already signed up a few months ago and completely forgotten about it. That was pretty convenient, so I ran with it (after I remembered the password). In the process of migrating my repositories, I got rid of a few that I didn't want to maintain anymore. I'm just deleting me scripts for the uzbl browser since I don't use it anymore. The code for my PLDI 2009 work is going away temporarily (tarballs are still available on my research page); this code will be superseded by something better soon.

My former git hosting solution was obtuse enough that it actually took me about 10 minutes to figure out how to disable it. I set it up months ago and never documented how I was starting all of the associated services. Eventually, I just uninstalled the service and my package manager figured out what I was doing, somehow.

Emacs fonts

Posted on 2011-04-16 Last modified on 2011-04-16 22:02:41

I keep forgetting to update this blog. I feel bad about that; I thought that I would have more free time after I finished taking classes. I am going to try to force myself to update more frequently. I won't force myself to make the content insightful, though.

Anyway, I have been writing code in Haskell lately, and the haskell-mode for emacs is very good. This mode has support for replacing common multi-character operators and identifiers with their Unicode equivalents. For example, -> becomes →. The replacement is done only at the level of the text renderer; the underlying file still has the original multi-character ASCII sequences. While this feature is far from necessary, it makes scanning code more pleasant. There is only one problem: my programming font (Anonymous Pro) does not have full glyph coverage for the Unicode symbols that haskell-mode uses.

Emacs notices this, of course, and uses another font to render these symbols. Unfortunately, whatever font it picks is 1) ugly and 2) the wrong size. I eventually figured out that emacs uses fontsets to determine the search order for glyphs; I modified my default fontset to prioritize a better looking and more appropriately-sized font for Unicode symbols. The hardest part was figuring out the second argument to this function: TARGET. I still haven't found a definitive list of the possible values of this argument, but I found enough examples to at least let me fix my problem:

  (set-fontset-font "fontset-default" 'ucs "dejavu sans mono-11")

I also tried ghc-mod, which extends haskell-mode with some more interactive features using ghc (or hlint). It is very impressive and can offer syntax checking, useful symbol completion, type signature inference, and documentation lookup. Unfortunately it automatically saves the buffer you are working on whenever emacs is idle so that it can redo the syntax check. I found this to be incredibly annoying and had to disable the mode. Perhaps future versions will allow the auto-save to be disabled. If the saving doesn't bother you, though, it is an impressive aid to Haskell development.

Browser extensions

Posted on 2011-02-15 Last modified on 2011-02-15 16:44:48

I am not really a huge fan of browser extensions. Most seem kind of excessive or silly and can easily make the browser seem slower than it really is. That said, I do appreciate the extensions that let me browse more easily without using a mouse.

The most important feature that I want from a browser is to be able to easily select links to follow using the keyboard. The typical user interface for this feature is to label each link with hints after some keystroke is hit; you follow a link by typing in the hint attached to the link you want. In Firefox, vimperator does a good job of providing this functionality. As the name implies, this extension brings vim-like behavior to the browser. Of course, I switched to Chrome a while ago. The equivalent in Chrome is somewhat more lightweight: vimium

That extension covers most of my requirements. Unfortunately, switching between tabs is still a pain, so I finally got around to finding a more keyboard-friendly way to switch tabs. The default Chrome keybindings allow you to switch tabs with Alt+<TabNum>. Unfortunately, that only works for up to 9 tabs. I try not to open that many at once on general principle, but sometimes it happens. Even if you have fewer than 10 tabs open, you still have to remember the numbers for each tab (or count each time). I am bad at both remembering things and counting, so this does not scale well for me. Enter another (less slickly named) extension: Tab Title Search. A keystroke (Alt+g by default) brings up a dropdown list of your tabs with a search box. You can type in a fragment of any page title (or use a regular expression, apparently), and just hit enter. This is close enough to the buffer list in a proper text editor that I am fairly happy, finally.