Parsing the Linux kernel with Haskell: experience with Language.C

At Galois, Aaron Tomb has been experimenting with the new Haskell Language.C libraries recently (a Summer of Code project by Benedikt Huber, mentored by a Galois engineer, Iavor Diatchki), and he’s been impressed by what it can do. Here are his thoughts on parsing the Linux kernel with Haskell, with an eye to future static analysis work:My interest in the library is for use in static analysis of very large bodies of legacy C code, which means two issues matter a lot to me: 1) rock-solid support for all of GCC’s numerous extensions, and 2) speed. I have used CIL, and tools based on CIL in the past, but have been disappointed with its speed.As a simple scalability and robustness experiment, I decided to see how well Language.C would do on the Linux source tree. It doesn’t yet have an integrated preprocessor (depending on GCC’s for now), but I happened to have an already-preprocessed set of sources for Linux 2.6.24.3 sitting around (configured with defconfig).Could Language.C handle the Linux kernel?I wrote a little wrapper around the C parser to essentially just syntax-check all of the code.

import Language.Cimport Language.C.System.GCCimport System.Environmentprocess :: String -> IO ()process file = do putStr filestream <- readInputStream fileputStr (take (20 - length file) $ repeat ' ')either print(const $ putStrLn "Pass")(parseC stream nopos)main :: IO ()main = dofiles <- getArgsmapM_ process files

It prints the filename followed by “Pass” if the parse succeeds, or details about a syntax error if the parse fails. When I ran this on the Linux code mentioned above, I was amazed to find that it processed it all successfully! All 18 million lines of pre-processed source without a hitch.Since I also care about speed, I wanted to compare it with GCC. GCC has a handy flag, -fsyntax-only, which tells it to just check the syntax of the input file and quit. I ran both the Language.C wrapper(compiled with GHC 6.8.3 and the -O2 option) and GCC on all that code, on a 2.2GHz/4GB MacBook Pro. The result: Language.C parsed all of the code in about 6 minutes, while GCC managed it in a little over 2. GCC is still faster, but I’m happy to take a 3x speed hit for the benefit of being able to write all the subsequent analysis in Haskell.The following table shows the precise time and memory statistics for Langugage.C and GCC, both on the entire source collection and on the single largest file in the tree, bnx2.i, the driver for the Broadcom NetXtreme II network adapter. For the Language.C tests, I compared the performance when the garbage collector used 2 generations (the default) to 4 generations (specified with the +RTS -G4 option). Increasing the number of generations helped slightly.

User Time System Time Memory Use
L.C, all 5:59 0:09 144MB
L.C, all, -G4 5:27 0:08 131MB
L.C, bnx2.i 0:02.15 0:01.89 133MB
L.C, bnx2.i, -G4 0:01.96 0:01.76 85MB
gcc, all 2:02 0:17 ????
gcc, bnx2.i 0:00.56 0:00.07 33MB
Read More

The bike commute challenge – status

The Bike Commute Challenge is a wonderful Oregon tradition that Galois has participated in for the past 3 years. This year was a bit disruptive for Galois commuters, because we moved offices from Beaverton to downtown Portland. I think quite a few West-siders haven’t yet figured out the best way to get downtown by bike. Nevertheless, we’re holding steady at an overall 15% commute rate, with a few folks standing out from the crowd: Sigbjorn Finne at 360 miles and Paul Heinlein at 250 miles.The challenge web site has had a few ups and downs (er, mostly downs), but it seems to be back on-line, so I’m hoping folks are able to log their trips without troubles. One weird thing is I think they’re miscomputing the commute rate – it says Paul and Sigbjorn are around 80%, but I’m pretty sure they’re both at 100%, so perhaps there are still a few kinks left to work out. (and once they are worked out, I suspect our overall rate will be well-above 15%)Anyone reading this, please commend any of your bike-commuting colleagues, and I’ll take this opportunity to thank everyone for participating (Galwegians and everyone else!)

Read More

FMCAD’08 is coming to Portland!

Formal Methods in Computer Aided Design (FMCAD’08) is the preeminent conference in formal methods for hardware and systems, and this year, it’ll be held in downtown Portland, November 17-20. The advance program has been announced, and the lineup of technical papers, invited tutorials, invited speakers, and panel discussion looks awesome.Registration is open, so be sure to get your spot soon!Galois is sponsoring this year’s conference, along with Cadence, IBM, Intel, NEC, and Synopsis.  If you attend, stop by Galois; we’re only a few blocks from the conference hotel.

Read More

Galois Open House

Please join us for an open house to celebrate our new office space in downtown Portland’s historic Commonwealth Building. Located on SW 6th Avenue between Stark and Washington streets, we’re easily accessible via MAX or TriMet buses. We’re up on the third floor.Parking will also be available in the Alder Street Star Park parking garage located at 615 SW Alder, just one block from our building; validation will be provided at the event.

What: Galois Open House
When: Thursday, September 18
Time: 4:00 pm – 6:00 pm
Where: The Commonwealth Building421 SW Sixth Ave., Ste. 300

RSVP: Anne Marie @ ph. 503.626.6616, x153 or email anne at galois.com

Read More