あどけない話

Internet technologies

Labeling threads in Haskell

GHC 9.6 provides a function to list up the current threads finally. The function is listThreads exported from the GHC.Conc.Sync module. listThreads is a killer debug method for thread leaks.

If you have Haskell programs which run for a long time, it's quite nice to provide feature to monitor threads with the following functions:

import Data.List (sort)
import Data.Maybe (fromMaybe)
import GHC.Conc.Sync (ThreadStatus, listThreads, threadLabel, threadStatus)

printThreads :: IO ()
printThreads = threadSummary >>= mapM_ (putStrLn . showT)
  where
    showT (i, l, s) = i ++ " " ++ l ++ ": " ++ show s

threadSummary :: IO [(String, String, ThreadStatus)]
threadSummary = (sort <$> listThreads) >>= mapM summary
  where
    summary t = do
        let idstr = drop 9 $ show t
        l <- fromMaybe "(no name)" <$> threadLabel t
        s <- threadStatus t
        return (idstr, l, s)

The following is an example of how printThreads displays a list of thread status:

1 (no name): ThreadFinished
2 IOManager on cap 0: ThreadRunning
3 TimerManager: ThreadBlocked BlockedOnForeignCall
4 main: ThreadRunning
5 accepting: ThreadBlocked BlockedOnMVar
6 server:recv: ThreadBlocked BlockedOnForeignCall
7 server:gracefulClose: ThreadRunning

Let's label threads

Threads spawned via forkIO or others do not have its label by default. Threads without label displayed "(no name)" in the example above. If there are a lot of threads without label, debugging is hard. So, I have already asked GHC developers to label threads created in the libraries shipped with GHC.

I would also like to ask all library maintainers to label threads if forked. You can use the following code to label your threads:

import Control.Concurrent (myThreadId)
import GHC.Conc.Sync (labelThread)

labelMe :: String -> IO ()
labelMe lbl = do
    tid <- myThreadId
    labelThread tid lbl

labelThread is a very old function. So, you can use it without worrying about GHC versions.

labelThread override the current label if exists. If you don't want to override it, use the following function:

{-# LANGUAGE CPP #-}

import Control.Concurrent (myThreadId)
import GHC.Conc.Sync (labelThread, threadLabel)

labelMe :: String -> IO ()
#if MIN_VERSION_base(4,18,0)
labelMe name = do
    tid <- myThreadId
    mlabel <- threadLabel tid
    case mlabel of
        Nothing -> labelThread tid name
        Just _ -> return ()
#else
labelMe name = do
    tid <- myThreadId
    labelThread tid name
#endif

Unfortunately, the first appear of threadLabel is GHC 9.6. So, #if is necessary.

ThreadFinished

Threads in the ThreadFinished status should be GCed quickly. If you see a long-lived threads in this status, their ThreadIds are held somewhere. Surprisingly, ThreadId is not integer but reference! The following is an example that a WAI timeout manger holds ThreadIds, resulting in thread leaks.

10150 WAI timeout manager (Reaper): ThreadBlocked BlockedOnMVar
10190 Warp HTTP/1.1 192.0.2.1:43390: ThreadFinished
10191 Warp HTTP/1.1 192.0.2.1:43392: ThreadFinished
10193 Warp HTTP/1.1 192.0.2.1:43404: ThreadFinished
10202 Warp HTTP/1.1 192.0.2.1:43406: ThreadFinished
10204 Warp HTTP/1.1 192.0.2.1:41256: ThreadFinished

To prevent this thread leaks, hold Weak ThreadId instead. This can be created via mkWeakThreadId provided by Control.Concurrent. To convert Weak ThreadId to ThreadId, use deDefWeak exported from GHC.Weak.

More labels

The ThreadBlocked constructor of the ThreadStatus type contains BlockReason. It has the following constructors:

  • BlockedOnMVar
  • BlockedOnBlackHole
  • BlockedOnException
  • BlockedOnSTM
  • BlockedOnForeignCall
  • BlockedOnOther

It's nice if we can label MVar via labelMVar and BlockedOnMVar contains its label. STM data types should follow this way, too.