2024-02-21

Status report of dnsext

DNS Haskell

This article reports the current status of the dnsext packages in Haskell. If you don't know what dnsext is, please read "Developing network related libraries in Haskell in 2022FY" first. The purpose of this project is provide DNS full resolver (cache server).

bowline

Our DNS full resolver is now called bowline named after the king of knots. (I used to climb rocks with double eight knot but I like bowline.) Most importantly, our DNSSEC verifier has been integrated into bowline by Kei Hibino.

New features

To make bowline practical, the following features have been added:

Configuration: bowline.conf is in the key-value format. Especially, local-zone:, local-data:, root-hints: and trust-anchor-file: can be specified.
Logging: the standard way of logging for DNS servers is DNSTAP whose format is protocol buffer and transport is fast stream. Instead of using other libraries, we implement them in dnsext-utils.
Statistic monitoring: Prometheus is supported.
Web API: the recent trend for server management is containers. When servers run in containers, the traditional signal scheme is not feasible. So, bowline provides web API for reading statistic, reloading, etc.

DNS transport

To protect privacy, the transport between DNS full resolvers and stub resolvers should be encrypted. The send-receive API of tls and quic is suitable to implement DoT (DNS over TLS) and DoQ (DNS over QUIC). However, the worker model of http2 and http3 is inefficient for DoH (DNS over HTTP). To emulate the send-receive API, runIO is implemented and provided from Internal module of http2. Unfortunately, I have no idea on how to implement runIO for http3 at this moment.

While verifying safety of http2 and quic, I noticed that not all cases of flow control are covered. The following should be implemented for stream numbers in a connection, amount of sending/receiving data in a connection and amount of sending/receiving data in a stream:

Telling the limit of receiving data to the peer in proper timing
Closing the connection if the receiving data reaches the limit
Sending data with the respect of the limit of sending data

To extract common patterns of flow-control, the network-control package is created. With network-control, http2 and quic have covered the all cases.

Refactoring and testing

The code for iterative queries was huge and monolithic. So, it was divided into multiple modules with the help of calligraphy which can visualize call-graph of functions.

dnsperf is used to measure server performance and to run stress testing. We noticed that stacks of Haskell lightweight threads consume huge memory. Their initial size of 1 KiB. When the limit are reached, they glow 33 KiB since the next chunk size is default to 32 KiB. In my opinion, this value is too big because threads might use only 2 KiB, for instance. So, we specify -kc2k (2 KiB) as an RTS option so that the size of stack glows 1KiB, 3 KiB, 5 KiB, 7 KiB and so on.

dug

dug is a command line interface for DNS queries. Of course, it can resolve records for a target domain using UDP as a stub resolver:

% dug www.iij.ad.jp aaaa
;; 2001:a7ff:5f01:1::a#53/UDP, Tx:42bytes, Rx:196bytes, 34usec
...
;; ANSWER SECTION:
www.iij.ad.jp.  300(5 mins) IN  AAAA    2001:240:bb81::10:180

The characteristics of dug are as follows:

Queries can be sent with DoT, DoQ and DoH if the -d option is specified.
Such a transport is automatically selected by parsing SVCB RRs if the -d auto is specified.
It can execute the iterative query algorithm used in bowline if the -i option is specified.

The followings are the new feature added in 2023FY:

tcp is added in addition to auto, doq, dot etc for the -d option.
The result of DNSSEC is displayed with colors if the --demo option is specified.
The query result is displayed in the JSON format if the --json option is specified.

2024-02-15

Releasing tls library version 2.0.0 in Haskell

TLS Haskell

I needed to implement the session ticket mechanism for my project. In addition to this coding, I decided to improve the tls library in Haskell drastically. So, I have spent three months to do so and finally released tls vresion 2.0.0. This version is secure by default and its code readability is improved. This article explains what changed.

Removing insecure stuff

tls version 1.9.0 supports TLS 1.0 and TLS 1.1 in addition to TLS 1.2 and TLS 1.3. RFC 8996 deprecates TLS 1.0 and TLS 1.1. So, they are removed from tls.

TLS 1.2 is considered secure if configured correctly while TLS 1.3 is considered secure by design. To ensure secure configuration, the followings are removed according to "Deprecating Obsolete Key Exchange Methods in TLS 1.2":

CBC ciphers
RC4 and 3DES
DSS(digital signature standard)

The current cipher suites for TLS 1.2 are only:

TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256
TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384
TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
TLS_ECDHE_ECDSA_WITH_AES_128_CCM
TLS_ECDHE_ECDSA_WITH_AES_256_CCM
TLS_ECDHE_ECDSA_WITH_AES_128_CCM_8
TLS_ECDHE_ECDSA_WITH_AES_256_CCM_8
TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305_SHA256
TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305_SHA256

To prevent the triple handshake attack, the extended main secret defined by RFC7627 is necessary. supportedExtendedMasterSec was default to AllowEMS. It is renamed to supportedExtendedMainSecret and its default value is set to RequireEMS.

I believe that your code can be built without any modifications if you don't customize parameters heavily. If your code cannot be built, I'm sorry, but this breaking changes are intentional to tell that you are using insure parameters for TLS 1.2.

Catching up RFC8446bis

TLS 1.3 is defined in RFC8466 and it is being revised in RFC8466bis. Important changes are:

The word "master" is renamed to "main".
general_error alert is defined.

tls v2.0.0 catches up RFC8466 bis as much as possible.

Improving API

To send early data in TLS 1.3, clientEarlyData should be used in tls version 1.9.0. Fixed string can be passed through this interface but it is not feasible for real usage since applications decide early data dynamically. With tls version 2.0.0, sendData can now send early data if clientUseEarlyData is set to True.

Client's handshake for TLS 1.3 can now receive the alert of client authentication failure.

Client's bye can now receive NewSessionTicket in TLS 1.3.

Refactoring

handshake was monolithic. To follow the handshake diagram of TLS 1.2 and 1.3, its internal is divided. The result code for TLS 1.2 client looks:

handshake cparams ctx groups mparams = do
    ...
    crand <- sendClientHello cparams ctx groups mparams pskinfo
    ...
    (ver, hss, hrr) <- receiveServerHello cparams ctx mparams
    ...
    case ver of
        TLS13 -> ...
        _ -> do
            recvServerFirstFlight12 cparams ctx hss
            sendClientSecondFlight12 cparams ctx
            recvServerSecondFlight12 ctx

The test framework is switched from tasty to hspec. The quality of each test case is improved.

Also, the following modifications are done:

The code is now formatted with fourmolu.
PatternSynonyms is introduced for extensibility.
The Strict and StritData pragma are specified.

Session Manager

tls 1.9.0 has an abstraction for session management called SessionManager:

data SessionManager {
    sessionResume :: SessionID -> IO (Maybe SessionData)
  , sessionResumeOnlyOnce :: SessionID -> IO (Maybe SessionData)
  , sessionEstablish :: SessionID -> SessionData -> IO ()
  , sessionInvalidate :: SessionID -> IO ()
}

Network.TLS.SessionManager in tls-session-manager version 0.0.4 provides SessionManager for in-memory session DB. When implementing the session ticket mechanism, it appeared that this abstraction is not good enough since there are no way to return tickets. So, SessionManager in tls version 2.0.0 is now:

data SessionManager {
    sessionResume :: SessionID -> IO (Maybe SessionData)
  , sessionResumeOnlyOnce :: SessionID -> IO (Maybe SessionData)
  , sessionEstablish :: SessionID -> SessionData -> IO (Myabe Ticket)
  , sessionInvalidate :: SessionID -> IO ()
  , sessionUseTicket :: Bool
}

Network.TLS.SessionTicket is finally implemented in version 0.0.5.

Interoperability test

To test interoperability with other implementation, I use tls-simpleclient and tls-simpleserver in tls-debug. Unfortunately, I don't have the upload permission for tls-debug to Hackage. Also, it's very inconvenient to build them since they are in the separate package. So, I imported them into the util directory of tls and renamed to client and server. To build them, specify the devel flag to cabal or your favorite command.

client and server are tested with OpenSSL and gnutls both for TLS 1.2 and 1.3.

2023-12-19

2023年にHaskell関連で知ってよかったこと

Haskell

これはHaskell Advent Calendar 2023の19番目の記事です。

フォーマッター

以前、フォーマッターをいくつか試しましたが、どれもイマイチでした。しかし、fourmoluはいけてます。fourmoluは、Ormoluのフォークで、Ormoluが偉大なのでしょう。両方試しましたが、僕はformoluに決めました。

Hackageに上がっているので好きな方法でインストールしてください。

% cabal install fourmolu

formoluにHaskellのプログラムを渡すと、整形したプログラムを出力してくれます。ファイルの内容を直接書き換えたいときは、-iオプションを渡します。エディタやIDEと連動できますが、お試しでプロジェクト全体を整形するには、以下のようにするといいでしょう。

% find . -name "*.hs" | xargs fourmolu -i

整形が気に入らない部分は、{- FOURMOLU_DISABLE -}と{- FOURMOLU_ENABLE -}で囲んで、手で修正してください。

CPPで #ifdef している部分は、書き方によって、整形できたり、できなかったりします(ファイル全体の整形を諦める)。関数の一部ではなく、冗長になるかもしれませんが関数の全体を#ifdefで囲むように変形すると、きっとうまく整形できるでしょう。

整形の方法をfourmolu.yamlでカスタマイズできます。fourmoluは、起動時にルートディレクトリに向かって、このファイルを探します。設定の各種項目を試せるWebサイトがあるので、そこで遊んでみましょう。僕が使っている設定ファイルは、ここを見てください。

コールグラフ

いつの間にか巨大になってしまったモジュールを分割するとき、コールグラフがあれば、呼び出し関係の薄い部分で分割できると判断できます。認知度が低いのですが、イケてるコールグラフ作成ツールが、calligraphyです。

% cabal install calligraphy

GHCが生成する IDE Infoからコールグラフを作成します。ほーら、イケてる気がするでしょう？使い方は、こんな感じです。

% cabal build --ghc-options=-fwrite-ide-info
% calligraphy Network.TLS.Handshake.Client --output-png out.png

この場合のout.pngは、こんな感じです。

オプションがたくさんあるので、いろいろ試してみてください。IDE Infoを使うので、calligraphyをビルドしたGHCのバージョンと、プロジェクトに使う GHC のバージョンは一致させる必要があります。

シリアライザー

Haskellの data をシリアライズするには、歴史的に binary や cerial が使われてきました。これらのライブラリでは、定義したdataに対するシリアライザーやデシリアライザーを手書きする必要があります。

最近のライブラリでは、シリアライザーやデシリアライザーを自動生成できます。その一つであるserialiseの使い方を紹介します。

% cabal install serialise

基本型は、デフォルトで扱えるようになっています。

% ghci    
ghci> import Codec.Serialise
ghci> serialise (1 :: Int)
"\SOH"

シリアライズできました。では、デシリアライズを試してみましょう。

ghci> deserialise $ serialise (1 :: Int)
*** Exception: DeserialiseFailure 0 "expected null"

作成されたバイナリには、型情報がないんですね。(出力が1バイトの時点で気づけって？) 型を補ってみましょう。

ghci> deserialise $ serialise (1 :: Int) :: Int
1

ヤッホー！では、自前の型を作ってみます。シリアライザを自動生成するには、DeriveGeneric 言語拡張が必要です。

ghci> :set -XDeriveGeneric 
ghci> import GHC.Generics
ghci> data Tree = Leaf | Node Int Tree Tree deriving (Generic)

そして、以下の魔法を唱えると、シリアライザ＆デシリアライザが自動生成されます。

ghci> instance Serialise Tree
ghci> serialise Leaf
"\129\NUL"
ghci> serialise $ Node 1 Leaf Leaf
"\132\SOH\SOH\129\NUL\129\NUL"

やったね！

軽量スレッドのスタック

軽量スレッドが生成されたとき、スタックの最初の大きさは 1KiB です。スタックが消費されたら自動的に伸びますが、伸びる大きさは 32KiBです。つまり、1KiBの次は33KiBになってしいます。なので、たくさん軽量スレッドを使うと、使用するメモリの大半をスタックが占めるようになります。

実際は、スタックの大きさは2KiBでも事足りているかもしれません。伸びたスタックが縮むことはありません。スタックの伸びる大きさを制御する RTS オプションが -kcです。指定できる最小値は、2KiB(-kc2k)のようです。

この事実を知ってから、僕が作っているサーバの cabal ファイルには、以下の行を入れるようになりました。(他の RTS オプションの意味は、各自で調べてください。)

 ghc-options: -Wall -threaded -rtsopts "-with-rtsopts=-qn1 -A32m -kc2k"

2023-03-23

Developing network related libraries in Haskell in 2022FY

Haskell http2 QUIC DNS

This article is my annual report of 2022FY(fiscal year in Japan; from April 2022 to March 2023). My mission in IIJ is contribute to standardizations of new network protocols by their implementations. As you may know, I maintain some network-related libraries in Haskell.

HTTP/2

Background: the http2 library originally provided HTTP/2 frame and HPACK encoder/decoder only. This was integrated into Warp to provide the HTTP/2 server functionality. This functionality was extracted into the server side in http2. Then the client side was implemented.

Supporting HTTP/2 (2015/07/23 blog)
Experience Report: Developing High Performance HTTP/2 Server in Haskell, Haskell Symposium 2016
Exploring HTTP/2 Header Compression, CFI 2017
HTTP/2 server library in Haskell (2019/06/21 blog)
Implementing HTTP/3 in Haskell (2020-06-09 blog)

Version 3.0.x were released in 2021FY to fix the vulnerabilities reported in "HTTP/2 implementations do not robustly handle abnormal traffic and resource exhaustion".

Version 4.0.0

RFC 9113 was published in June 2022 and obsoleted RFC 7540. The main purpose of this version is to catch up this new RFC. The major version up was due to a lot of breaking changes.

Version 4.1.0

The server side has been tested well in the real world but the client side is not. Evgeny Poberezkin kindly reported some bugs relating to streaming in the client side. This version should fix these bugs. I thank him a lot.

The major version up again because the internal data type was changed resulting in the build break of http3. A new version of http3 to catch up this change was also released.

QUIC

Background: the quic/http3 library have been developed since 2019 and were released after RFC9000 was published in 2021.

Releasing QUIC and HTTP/3 libraries (2021/10 blog)
Implementing QUIC in Haskell (2021/11 technical article)

Version 0.0.x adopts the fusion crypto engine for performance reasons. After releasing version 0.0.x, I noticed that it supports the Intel architecture only. I should have quickly worked around but my interest went to QUIC version 2 and the version negotiation. After implementing these new technologies, I had integrated the fusion and cryptonite to let run the quic library on all platforms.

Integrating Fusion and cryptonite in Haskell quic (2021/12/20 blog)

At this moment, I had a dilemma: I cannot release a new version of quic since the numbers of QUIC version 2 and the trasnport parameter for the version negotiation would change. (The value of QUIC version 2 was 0x709a50c4 in drafts and is 0x6b3343cf finally. It's not 0x00000002 at all!)

Meanwhile, I spend time to support Windows. The UDP implementation of Windows is really awkward. See the following blog article for more information.

Accepting UDP connections (2022/02/25 blog)

Based on this experience, I have released the network-udp library which provides best current practice for UDP clients and servers. (This library is also used in the dnsext-do53 library described later.)

Though RFCs of QUIC version 2 and the version negotiation have not been released, the numbers have been fixed. So, I released the quic library v0.1.0 finally in Feb 2023.

DNS

Background: to implement anti-spam technologies such as SPF and DKIM, I have started implementing the dns library purely in Haskell since 2010. Thanks to GHC's concurrency, the stub resolver functionality is highly concurrent. Fortunately this library is used widely but unfortunately two down-sides were turned out:

Resource records are not extensible: resource records are implemented as a sum type. The third party library cannot extend them. The only way to extend them is to send a pull request to the dns library.
Resource records are not friendly to caching: some resource records use ByteString internally. So, if they are cached for a long time, fragmentation happens.

dnsext

It appeared impossible to maintain backward compatibilities to the dns library. So, new libraries whose prefix is dnsext- were created.

dnsext-types: basic types with encoders/decoders. To solve 1), I introduced typeclasses. To fix 2), ShortByteString is used as an internal representation.

The following is the definition of extensible resource records:

class (Typeable a, Eq a, Show a) => ResourceData a where
    resourceDataType :: a -> TYPE
    putResourceData  :: CanonicalFlag -> a -> SPut ()

-- | A type to uniform 'ResourceData' 'a'.
data RData = forall a . (Typeable a, Eq a, Show a, ResourceData a) => RData a

A basic type Domain is defined using ShortByteString instead of ByteString:

data Domain = Domain {
    -- The representation format. Case-sensitive, escaped.
    representation  :: ShortByteString
    -- Labels in wire format. Case-sensitive, not escaped.
  , wireLabels      :: [ShortByteString]
  -- | Eq and Ord key for Canonical DNS Name Order.
  --   Lower cases, not escaped.
  --   https://datatracker.ietf.org/doc/html/rfc4034#section-6.1
  , canonicalLabels :: ~[ShortByteString]
  }

dnsext-do53: DNS over UDP port 53

The following is a typical usage of the stub resolver:

> withLookupConf defaultLookupConf $ \env -> lookupA env "www.iij.ad.jp"
Right [202.232.2.180]

My current interest is SVCB(Service Binding)/HTTPS resource records and DNS over new transport protocols.

dnsext-svcb: SVCB related resource records. This is an extension example of dnsext-types.
dnsext-dox: DNS over HTTP2, HTTP3, TLS and QUIC

After Kei Hibino joined to IIJ, he has focused implementing DNSSEC verifier and a full resolver.

dnsext-dnssec: DNSSEC verifier
dnsext-full-resolver: a full resolver (aka a cache server)

He wrote three articles on this topic in Japanese (article 1, article 2, article 3).

dnsext-full-resolver provides a command line interface called dug. It shows how iterative resolve works if the -i option is specified:

% dug -i www.iij.ad.jp
resolve-just: dc=0, ("www.iij.ad.jp.",A)
    "a.root-servers.net." ["198.41.0.4","2001:503:ba3e::2:30"]
    "b.root-servers.net." ["199.9.14.201","2001:500:200::b"]
    "c.root-servers.net." ["192.33.4.12","2001:500:2::c"]
        ...
iterative: selected addrs: (198.41.0.4,"jp.",A)
iterative: selected addrs: (2001:503:ba3e::2:30,"jp.",A)
iterative: selected addrs: (199.9.14.201,"jp.",A)
...
    "a.dns.jp." ["203.119.1.1","2001:dc4::1"]
    "b.dns.jp." ["202.12.30.131","2001:dc2::1"]
    "c.dns.jp." ["156.154.100.5","2001:502:ad09::5"]
        ...
iterative: selected addrs: (203.119.1.1,"ad.jp.",A)
iterative: selected addrs: (2001:dc4::1,"ad.jp.",A)
iterative: selected addrs: (202.12.30.131,"ad.jp.",A)
...
    "a.dns.jp." ["203.119.1.1","2001:dc4::1"]
    "b.dns.jp." ["202.12.30.131","2001:dc2::1"]
    "c.dns.jp." ["156.154.100.5","2001:502:ad09::5"]
        ...
iterative: selected addrs: (203.119.1.1,"iij.ad.jp.",A)
iterative: selected addrs: (2001:dc4::1,"iij.ad.jp.",A)
iterative: selected addrs: (202.12.30.131,"iij.ad.jp.",A)
...
    "dns0.iij.ad.jp." ["210.130.0.5","2001:240::105"]
    "dns1.iij.ad.jp." ["210.130.1.5","2001:240::115"]
iterative: selected addrs: (210.130.0.5,"www.iij.ad.jp.",A)
iterative: selected addrs: (2001:240::105,"www.iij.ad.jp.",A)
iterative: selected addrs: (210.130.1.5,"www.iij.ad.jp.",A)
...
    "dns0.iij.ad.jp." ["210.130.0.5","2001:240::105"]
    "dns1.iij.ad.jp." ["210.130.1.5","2001:240::115"]
resolve-just: selected addrs: (210.130.0.5,"www.iij.ad.jp.",A)
resolve-just: selected addrs: (2001:240::105,"www.iij.ad.jp.",A)
resolve-just: selected addrs: (210.130.1.5,"www.iij.ad.jp.",A)
resolve-just: selected addrs: (2001:240::115,"www.iij.ad.jp.",A)
--------------------
;; HEADER SECTION:
;Standard query, NoError, id: 8338
;Flags: Authoritative Answer


;; OPTIONAL PSEUDO SECTION:
;UDP: 1232, Data:[]

;; QUESTION SECTION:
;www.iij.ad.jp.     IN  A

;; ANSWER SECTION:
www.iij.ad.jp.  300(5 mins) IN  A   202.232.2.180

If the -d auto option is specified, dug first resolves SVCB RR and selects DNS over X according to its response.

% dug @94.140.14.140 -d auto www.iij.ad.jp
;; 2a10:50c0::2:ff#443/HTTP/3, Tx:42bytes, Rx:58bytes, 132usec

;; HEADER SECTION:
;Standard query, NoError, id: 43730
;Flags: Recursion Desired, Recursion Available

;; OPTIONAL PSEUDO SECTION:
;UDP: 0, Data:[]

;; QUESTION SECTION:
;www.iij.ad.jp.     IN  A

;; ANSWER SECTION:
www.iij.ad.jp.  300(5 mins) IN  A   202.232.2.180

We don't have a releasing plan at this moment. We should concentrate the field test of the full resolver in 2023FY.

2023-02-19

atomicModifyIORef' の歴史

Haskell

遅延評価、何も分からん。
— 山本和彦 (@kazu_yamamoto) 2023年2月18日

Haskellの並行プログラミングの奥義であるatomicModifyIORef'の歴史。内容は予告なく修正される。

atomicModifyIORef'は「IORef」と「関数」を引数に取る。その関数は、IORefが現在指している「古い値」を受け取り、IORefが新たに指すべき「新しい値」と「帰り値」をタプルで返すように書く。

atomicModifyIORef' :: IORef a -> (a -> (a, b)) -> IO b

新しい値は、サンクが作られた時点で、古い値と不可分にスワップされ、その後評価される。評価は重くとも、サンクを作ることは軽いので、スワップは多くの場合成功する。

`atomicModifyIORef` の問題点

もうすっかり忘れていたが、Mightyの開発過程で、atomicModifyIORefがスペースリークをすることを発見し、解決策を2011年の Monad.Reader issue 19に執筆した。

以下は、時間の文字列を1秒に一回作ってキャッシュしておくコード：

atomicModifyIORef ref (\_ -> (tmstr, ()))

サンクが潰れずにスペースリークとなる。解決策として、以下を考案。

x <- atomicModifyIORef ref (\_ -> (tmstr, ()))
x ‘seq‘ return ()

これを Monad.Reader に書いた。その後、バンパターンを使うともっとカッコよく書けることが判明。

!_ <- atomicModifyIORef ref (\_ -> (tmstr, ()))

注意深くコードを読んだ人は疑問に思うだろう。タプルの右側の要素しか評価していないのに、どうして左側の新しい値に起因するスペースリークがなくなるか？

このコードでは、新しく作られる値のサンクを潰すことはできない。しかし、右の要素を評価することでタプルが生成される。すると、左側のサンクは、古い値を参照していないことが分かり、GCが回収できる。

このコードは、IORefが指す古い値と返り値を使わないので、現在では atomicWriteIORef が利用できる。

`atomicModifyIORef'` の登場

Monad.Reader issue 19をきっかけに、Joey Adams氏が atomicModifyIORef' を提案。以下は、2011年にマージされたAdd strict versions of modifyIORef and atomicModifyIORefより：

atomicModifyIORef' :: IORef a -> (a -> (a,b)) -> IO b
atomicModifyIORef' ref f = do
    b <- atomicModifyIORef ref
            (\x -> let (a, b) = f x
                    in (a, a `seq` b))
    b `seq` return b

baseライブラリ4.6.0以降で atomicModifyIORef'が利用可能となった。つまり、2012年にリリースされた GHC 7.6.1 から利用できる。

Simon Marlow氏が、2013年に発行されたParallel and Concurrent Programming in Haskell の284ページでこのパターンを紹介。atomicModifyIORef' 自体への言及はなし。2014年に発行されたHaskellによる並列・並行プログラミングの363ページでは、僕が atomicModifyIORef' に関する訳注を入れた。

参照はサンクが作られた時点でスワップされて、そのサンクは atomicModifyIORef'の終了後、以下のように評価される。

atomicModifyIORefの外側で b を返す前に、bをWHNFにしようとする
bを評価するには、内部で仕込まれたseqにより、aもWHNFになる
その後bがWHNFになる。

このころはプリミティブな命令として atomicModifyMutVar# があり、atomicModifyIORef は、そのラッパーだった。以下は不可分性はないが、atomicModifyIORef の動作をエミュレートした疑似コードである。

pseudoAtomicModifyIORef :: IORef a -> (a -> (a, b)) -> IO b
pseudoAtomicModifyIORef ref f = do
  x <- readIORef ref
  case f x of
    fx -> do  -- thunk 'fx'
      -- only CAS success case
      writeIORef ref $ fst fx -- thunk 'fst fx'
      putStrLn "swap"
      return $ snd fx -- thunk 'snd fx'

これを使い Joey Adams氏の実装をエミュレートする：

joey :: IORef a -> (a -> (a, b)) -> IO b
joey ref f = do
    b <- pseudoAtomicModifyIORef ref
            (\x -> let (a, b) = f x
                    in (a, a `seq` b))
    b `seq` return b

また、評価順序を知るために、たくさん trace 仕込んだ f を用意する：

f :: Int -> (Int, ())
f n = (trace "tupple" ((trace "new" n + 1), (trace "ret" ())))

GHCiで試してみよう。

> ref <-newIORef (0::Int)
> joey ref f
swap
tupple
new
ret

"new" よりも前に "swap" が表示されているのが分かる。

atomicModifyIORef' にどんなに正格な関数を与えても、サンクが先に IORef に格納されて、その後で新しい値が計算される。以下のような正格な関数を考える：

f' :: Int -> (Int, ())
f' n = let !n' = trace "new" n + 1 in (trace "tupple" (n', (trace "ret" ())))

使ってみよう。

> joey ref f'
swap
new
tupple
ret

"tupple" よりも "new" の方が早く生成されるが、それでも "swap" の後だと分かる。

性能向上

オリジナルの atomicModifyIORef' は、タプルでパターンマッチした後、新しいタプルを作成している。これは無駄である。parcs 氏は、セレクターサンクを使って性能を向上させること提案した。

atomicModifyIORef' ref f = do
    b <- atomicModifyIORef ref $ \a ->
            case f a of
                v@(a',_) -> a' `seq` v
    b `seq` return b

このコードはタプルを生成しない。case文はフィールドを取り出すために使われている。これは、セレクターサンクとなって、GCが高速に扱えるらしい。

しかし、本当にこの方法で、新しい値が評価される前にスワップされるのだろうか？ v を返す前に a' が評価されているように見える！

違う。違う。あなたの解釈は違うのだ。Joey Adams氏の実装は、タプルの遅延性を利用しているからうまくいくと思っているなら、違うのだ。atomicModifyMutVar# が遅延させるのは、関数(ラムダ式)全体であって、タプルの生成には頼っていない。

先ほど、正格な関数 f' を与えても、うまく動作していたことを思い出そう。

parcs氏の実装をエミュレートする：

parcs ref f = do
    b <- pseudoAtomicModifyIORef ref $ \a ->
            case f a of
                v@(a',_) -> a' `seq` v
    b `seq` return b

使ってみる。

> ref <-newIORef (0::Int)
> parcs ref f'
swap
new
tupple
ret

やはり、"swap" が最初に起きていることが分かる。

`atomicModifyMutVar2#`

David Feuer氏が、ghc proposals 0149を出し、atomicModifyIORefのスペースリークをなくすために、新たなプリミティブである atomicModifyMutVar2# の作成を提案した。

atomicModifyMutVar では、第一フィールドと第二フィールド両方にサンクを作成していた。atomicModifyMutVar2#では、第一フィールドだけにサンクを作る。第二フィールドは正格に評価されるので「サンクの割り当て直後に潰す」という無駄が生じない。

atomicModifyMutVar2# :: MutVar# s a -> (a -> c) -> State# s -> (# State# s, a, c #)

引数の関数の型から、タプルを仮定してないことが分かる。これを一般的なHaskellのコードから呼ぶための関数にするラッパーが、2つある。

atomicModifyIORef2Lazy :: IORef a -> (a -> (a,b)) -> IO (a, (a, b))
atomicModifyIORef2Lazy (IORef (STRef r#)) f =
  IO (\s -> case atomicModifyMutVar2# r# f s of
              (# s', old, res #) -> (# s', (old, res) #))

atomicModifyIORef2 :: IORef a -> (a -> (a,b)) -> IO (a, (a, b))
atomicModifyIORef2 ref f = do
  r@(_old, (_new, _res)) <- atomicModifyIORef2Lazy ref f
  return r

なぜ2つに分かれているのかは不明。現在のatomicModifyIORef'の実装はこう：

atomicModifyIORef' :: IORef a -> (a -> (a,b)) -> IO b
atomicModifyIORef' ref f = do
  (_old, (_new, !res)) <- atomicModifyIORef2 ref $
    \old -> case f old of
       r@(!_new, _res) -> r
  pure res

以下は、日比野さんが書いた検証コード：

pseudoModifyIORef2Lazy :: IORef a -> (a -> (a, b)) -> IO (a, (a, b))
pseudoModifyIORef2Lazy ref f = do
  x <- readIORef ref
  case f x of
    fx -> do
      -- only CAS success case
      writeIORef ref (fst fx)
      putStrLn "swap"
      return (x, fx)

pseudoModifyIORef2 :: IORef a -> (a -> (a,b)) -> IO (a, (a, b))
pseudoModifyIORef2 ref f = do
  r@(_old, (_new, _res)) <- pseudoModifyIORef2Lazy ref f
  return r

pseudoModifyIORef :: IORef a -> (a -> (a,b)) -> IO b
pseudoModifyIORef ref f = do
  (_old, ~(_new, res)) <- pseudoModifyIORef2 ref f
  pure res

david :: IORef a -> (a -> (a,b)) -> IO b
david ref f = do
    (_old, (_new, !res)) <- pseudoModifyIORef2 ref $
        \old -> case f old of
            r@(!_new, _res) -> r
    pure res

動作順序を調べてみる：

> ref <-newIORef (0::Int)
> david ref f'
swap
new
tupple
ret

"swap" が一番最初に出力されているのが分かる。

2021-12-20

Integrating Fusion and cryptonite in Haskell quic

Haskell QUIC

While Haskell quic version 0.0.1 or earlier supports the x86_64 architecture only, version 0.0.2 or later will supports non-x86_64 architectures.

cryptonite

When I started implementing the quic library in Haskell, I used cryptonite as crypto engine. Since cryptonite takes the functional style where all results are generated newly without side-effects, it is safe to use it among multiple threads.

I defined the following data for payload encryption and decryption:

data Coder = Coder {
    encrypt :: PlainText  -> AssDat -> PacketNumber -> [CipherText]
  , decrypt :: CipherText -> AssDat -> PacketNumber -> Maybe PlainText
  }

Key and Nonce are partially applied at the initialization period and resulting functions are stored in Code. encrypt returns two ByteString in a list, one is a protected header and the other is an encrypted payload. PlainText, CipherText and AssDat (associative data) are ByteString essentially.

For header protection, I defined the following code:

data Protector = Protector {
    protect   :: Sample -> Mask
  , unprotect :: Sample -> Mask
  }

Sample is taken from an encrypted payload and resulting Mask is used for the xor operation on a header.

Fusion

The maximum record size of TLS is 16KiB. After 16KiB plain text is encrypted and sent from the TLS layer, the TCP layer fragments the cipher text into multiple IP packets. Since 16KiB is large enough, cost of initialization and finalization is invisible.

However, the maximum packet size of QUIC is 1,472 bytes. There was no crypto engine to encrypt and decrypt such small data efficiently. Kazuho Oku invented Fusion crypto engine to solve this problem. I decided switch from cryptonite to Fusion in March 2021. New Coder is as follows:

data Coder = Coder {
    encrypt :: Buffer -> BufferSize -> Buffer -> BufferSize -> PacketNumber -> Buffer -> IO Int
t
  , decrypt :: Buffer -> BufferSize -> Buffer -> BufferSize -> PacketNumber -> Buffer -> IO Int
t
  }

The three buffers are input, associative data and output in order. The output buffer is to be modified. The result is the output size. Fusion encryption also calculates the mask of header protection (but not for unprotection). Protector is changed as follows:

data Protector = Protector {
    setSample :: Buffer -> IO ()
  , getMask   :: IO Buffer
  , unprotect :: Sample -> Mask
  }

setSample and getMask are called before and after encrypt, respectively.

Integration

I misunderstood that Fusion uses a slow crypto engine on non-x86_64 architectures. After releasing quic version 0.0.0, I noticed this is not the case. So, I started integrating Fusion and cryptonite in this December. That is, Fusion should be used on the x86_64 architecture while cryptonite should be used on non-x86_64 architectures.

This project was tough since I needed to integrate the functional style and the imperative style. The key idea is to use cryptonite in the imperative style. The body of resulting ByteStrings are copied into a buffer. This buffer approach should be cerebrated by GSO(Generic Segmentation Offload) in the near future.

The resulting Coder and Protector are as follows:

data Coder = Coder {
    encrypt    :: Buffer -> PlainText  -> AssDat -> PacketNumber -> IO Int
  , decrypt    :: Buffer -> CipherText -> AssDat -> PacketNumber -> IO Int
  , supplement :: Maybe Supplement
  }

data Protector = Protector {
    setSample  :: Buffer -> IO ()
  , getMask    :: IO Buffer
  , unprotect :: Sample -> Mask
  }

I'm satisfied with this signature.

Bugs found

One single buffer is used for decryption in a QUIC connection. So, this buffer should be occupied by the receiver thread. However, the dispatcher thread of a server calls decrypt with this buffer in the migration procedure. Migration is rare but this is a possible race. In the current code, receiver takes care of migration to kick the race out.

In the integration process, I sometime used undefineds as initial values in tentative data structures of Coder. After initializing Coder, we cannot touch the undefineds. However, the test suite was trapped. This reveals that dropping keys (which stores the initial values again) is too early. The current code delays the key drop timing.

If the cryptonite is used, the test suite sometime gets stuck. Visualizing packet sequences illustrates that a client is waiting for Fin forever since a server does not send Fin. In the server side, the server thread and sender thread are racing. Since encrypt called by sender is slow, the server is finished before sender actually sends a packet. So, I modified that shutdownStream and closeStream wait for the completion of Fin packet sending.

2021-12-09

我田引水的な「関数プログラミングの入門」資料紹介

Haskell

これは、Haskell Advent Calendar 2021の2日目を埋めるために書いた記事です。実は単に僕が作った「関数プログラミングの入門」の資料の宣伝です。

ちなみに、僕の関数プログラミングの定義は「不変データプログラミング」であり、おそらく最も厳しい定義です。なので内容が分かれば、関数プログラミングに入門できた言ってもよいのではないかと思います。

関数プログラミングことはじめ

僕は毎年、岡山大学の三年生に向けて、2コマで関数プログラミングを教えています。その資料が、「Cプログラマーのための関数プログラミングことはじめ」です。岡山大学工学部情報系学科の学生は、C言語を習っているので、C言語に似た文法を独自に定義して、関数プログラミングを説明しています。

［入門］関数プログラミング

［入門］関数プログラミングは、WEB+DB PRESS Vol.67に掲載された記事です。編集部のご厚意により、オンラインで公開していただきました。関数型言語は記述力が高いので、この少ない分量で、赤黒木やパーサの実装まで辿り着けています。利用している言語はHaskellなので、Haskellの入門としても使えます。

再帰ドリル

再帰ドリルは、宮崎大学での集中講義の資料です。関数プログラミングとは切っても切り離せない再帰をドリル形式で解説しています。分かりやすく、包括的な内容になっていると思います。使用言語はHaskellです。コードを読めば、Haskellでのテストの書き方も習得できるかもしれません。

bowline

New features

DNS transport

Refactoring and testing

dug

Removing insecure stuff

Catching up RFC8446bis

Improving API

Refactoring

Session Manager

Interoperability test

フォーマッター

シリアライザー

軽量スレッドのスタック

HTTP/2

Version 4.0.0

Version 4.1.0

QUIC

atomicModifyIORef の問題点

atomicModifyIORef' の登場

性能向上

atomicModifyMutVar2#

cryptonite

Fusion

Integration

Bugs found

関数プログラミングことはじめ

［入門］関数プログラミング

再帰ドリル

`atomicModifyIORef` の問題点

`atomicModifyIORef'` の登場

`atomicModifyMutVar2#`