My open server of Haskell QUIC on Linux sometimes got segfaults. I saw two types of segfaults. One is a simple segfault by accessing a wrong address:
mighty: segmentation fault
The other is relating to free()
:
*** Error in `mighty': corrupted double-linked list: 0x00007fcdf0008f90 ***
I guessed that a buffer overrun occurred against a buffer allocated by malloc()
and this segfault happened when the buffer is freed.
Many Haskellers would be surprised at this kind of segfaults because it is hard to cause segfaults in normal Haskell programming. However, if you manipulate pointers or use unsafe
functions, segfaults are usual like other programming language.
For the first type of segfault, you can use Foreign.Storable.peek
:
% ghci > import Foreign.Ptr > import Foreign.Storable > :type peek peek :: Storable a => Ptr a -> IO a
Let's try to access so-called NULL
:
> peek nullPtr :: IO Int sh: segmentation fault ghci
Buffer overruns can be caused by Foreign.Storable.poke
. Its type is as follows:
> :type poke poke :: Storable a => Ptr a -> a -> IO ()
I checked all peek
s and poke
s in my code but I could not figure out the reasons of segfaults. So, I needed to take another approach.
The -g
option of GHC
Like other compilers, GHC provides the -g
option to add debug information to a complied program. We can run the program in gdb
and get a back trace if a segfault happens. To compile all dependent libraries with the -g
option, I modified my Cabal wrapper, called cab, to provide a command line option (whose name is also -g
) to implement this feature. I also used the sandbox feature of Cabal-v1:
% cd mighty % cab init # creating a sandbox % cab add ~/work/quic # adding non-Hackage deps ... % cab install -d -f tls -f quic -g % cab conf -f tls -f quic -g % cab build
Then run the complied program in gdb
:
% sudo gdb --args mighty conf route (gdb) handle SIGPIPE nostop noprint pass Signal Stop Print Pass to program Description SIGPIPE No No Yes Broken pipe (gdb) handle SIGUSR1 nostop noprint pass Signal Stop Print Pass to program Description SIGUSR1 No No Yes User defined signal 1 (gdb) run
As you can see, I needed to modify behavior of two signal handlers to ignore them:
- SIGPIPE: This is common in network programming. You can find an example case in "Implementing graceful-close in Haskell network library"
- SIGUSR1: This signal is used to drop unnecessary privilege. For more information, see "Haskell vs Linux capabilities"
Segfault 1
When I added some test cases of QPACK to h3spec
and test the open server, gdb
finally caught a segfault and showed a back trace. The reason is Data.Array.Base.unsafeAt
. I did not check the boundary of an array! (My QPACK code is derived from my HPACK code where this boundary check is not necessary.)
Segfault 2
The segfault relating to free()
was really mysterious because the buffer boundary is always checked when poke
is used. The error message of free()
on Linux is not so kind. But when I got the same segfault on macOS, the following message was displayed:
mighty(75755,0x700009519000) malloc: Incorrect checksum for freed object 0x7fb8de80ea00: probably modified after being freed.
Eureka! Even if the boundary is checked everytime, this segfault happens because a freed buffer is used.
But why is a freed buffer used? This is one of difficulties of multi-thread programming. Suppose thread A and thread B share a buffer. The following is an example clean-up procedure:
- Thread A sends a kill signal to thread B
- Thread A frees the buffer
- Thread A exits
This looks perfect. However the timing of termination of thread B depends on the scheduler. Even after thread A freed the buffer, thread B is alive and can manipulate the buffer.
To prevent this contention, I gave up the approach of Foreign.Marshal.Alloc.mallocBytes
and Foreign.Marshal.Alloc.free
. Instead, I started using GHC.ForeignPtr.mallocPlainForeignPtrBytes
. Buffers allocated by this function are GCed like ByteString
.
Now I believe that my QUIC server gets much stabler than before.