mighty: segmentation fault
The other is relating to
*** Error in `mighty': corrupted double-linked list: 0x00007fcdf0008f90 ***
I guessed that a buffer overrun occurred against a buffer allocated by
malloc() and this segfault happened when the buffer is freed.
Many Haskellers would be surprised at this kind of segfaults because it is hard to cause segfaults in normal Haskell programming. However, if you manipulate pointers or use
unsafe functions, segfaults are usual like other programming language.
For the first type of segfault, you can use
% ghci > import Foreign.Ptr > import Foreign.Storable > :type peek peek :: Storable a => Ptr a -> IO a
Let's try to access so-called
> peek nullPtr :: IO Int sh: segmentation fault ghci
Buffer overruns can be caused by
Foreign.Storable.poke. Its type is as follows:
> :type poke poke :: Storable a => Ptr a -> a -> IO ()
I checked all
pokes in my code but I could not figure out the reasons of segfaults. So, I needed to take another approach.
-g option of GHC
Like other compilers, GHC provides the
-g option to add debug information to a complied program. We can run the program in
gdb and get a back trace if a segfault happens. To compile all dependent libraries with the
-g option, I modified my Cabal wrapper, called cab, to provide a command line option (whose name is also
-g) to implement this feature. I also used the sandbox feature of Cabal-v1:
% cd mighty % cab init # creating a sandbox % cab add ~/work/quic # adding non-Hackage deps ... % cab install -d -f tls -f quic -g % cab conf -f tls -f quic -g % cab build
Then run the complied program in
% sudo gdb --args mighty conf route (gdb) handle SIGPIPE nostop noprint pass Signal Stop Print Pass to program Description SIGPIPE No No Yes Broken pipe (gdb) handle SIGUSR1 nostop noprint pass Signal Stop Print Pass to program Description SIGUSR1 No No Yes User defined signal 1 (gdb) run
As you can see, I needed to modify behavior of two signal handlers to ignore them:
- SIGPIPE: This is common in network programming. You can find an example case in "Implementing graceful-close in Haskell network library"
- SIGUSR1: This signal is used to drop unnecessary privilege. For more information, see "Haskell vs Linux capabilities"
When I added some test cases of QPACK to
h3spec and test the open server,
gdb finally caught a segfault and showed a back trace. The reason is
Data.Array.Base.unsafeAt. I did not check the boundary of an array! (My QPACK code is derived from my HPACK code where this boundary check is not necessary.)
The segfault relating to
free() was really mysterious because the buffer boundary is always checked when
poke is used. The error message of
free() on Linux is not so kind. But when I got the same segfault on macOS, the following message was displayed:
mighty(75755,0x700009519000) malloc: Incorrect checksum for freed object 0x7fb8de80ea00: probably modified after being freed.
Eureka! Even if the boundary is checked everytime, this segfault happens because a freed buffer is used.
But why is a freed buffer used? This is one of difficulties of multi-thread programming. Suppose thread A and thread B share a buffer. The following is an example clean-up procedure:
- Thread A sends a kill signal to thread B
- Thread A frees the buffer
- Thread A exits
This looks perfect. However the timing of termination of thread B depends on the scheduler. Even after thread A freed the buffer, thread B is alive and can manipulate the buffer.
To prevent this contention, I gave up the approach of
Foreign.Marshal.Alloc.free. Instead, I started using
GHC.ForeignPtr.mallocPlainForeignPtrBytes. Buffers allocated by this function are GCed like
Now I believe that my QUIC server gets much stabler than before.