あどけない話

Internet technologies

Integrating Fusion and cryptonite in Haskell quic

While Haskell quic version 0.0.1 or earlier supports the x86_64 architecture only, version 0.0.2 or later will supports non-x86_64 architectures.

cryptonite

When I started implementing the quic library in Haskell, I used cryptonite as crypto engine. Since cryptonite takes the functional style where all results are generated newly without side-effects, it is safe to use it among multiple threads.

I defined the following data for payload encryption and decryption:

data Coder = Coder {
    encrypt :: PlainText  -> AssDat -> PacketNumber -> [CipherText]
  , decrypt :: CipherText -> AssDat -> PacketNumber -> Maybe PlainText
  }

Key and Nonce are partially applied at the initialization period and resulting functions are stored in Code. encrypt returns two ByteString in a list, one is a protected header and the other is an encrypted payload. PlainText, CipherText and AssDat (associative data) are ByteString essentially.

For header protection, I defined the following code:

data Protector = Protector {
    protect   :: Sample -> Mask
  , unprotect :: Sample -> Mask
  }

Sample is taken from an encrypted payload and resulting Mask is used for the xor operation on a header.

Fusion

The maximum record size of TLS is 16KiB. After 16KiB plain text is encrypted and sent from the TLS layer, the TCP layer fragments the cipher text into multiple IP packets. Since 16KiB is large enough, cost of initialization and finalization is invisible.

However, the maximum packet size of QUIC is 1,472 bytes. There was no crypto engine to encrypt and decrypt such small data efficiently. Kazuho Oku invented Fusion crypto engine to solve this problem. I decided switch from cryptonite to Fusion in March 2021. New Coder is as follows:

data Coder = Coder {
    encrypt :: Buffer -> BufferSize -> Buffer -> BufferSize -> PacketNumber -> Buffer -> IO Int
t
  , decrypt :: Buffer -> BufferSize -> Buffer -> BufferSize -> PacketNumber -> Buffer -> IO Int
t
  }

The three buffers are input, associative data and output in order. The output buffer is to be modified. The result is the output size. Fusion encryption also calculates the mask of header protection (but not for unprotection). Protector is changed as follows:

data Protector = Protector {
    setSample :: Buffer -> IO ()
  , getMask   :: IO Buffer
  , unprotect :: Sample -> Mask
  }

setSample and getMask are called before and after encrypt, respectively.

Integration

I misunderstood that Fusion uses a slow crypto engine on non-x86_64 architectures. After releasing quic version 0.0.0, I noticed this is not the case. So, I started integrating Fusion and cryptonite in this December. That is, Fusion should be used on the x86_64 architecture while cryptonite should be used on non-x86_64 architectures.

This project was tough since I needed to integrate the functional style and the imperative style. The key idea is to use cryptonite in the imperative style. The body of resulting ByteStrings are copied into a buffer. This buffer approach should be cerebrated by GSO(Generic Segmentation Offload) in the near future.

The resulting Coder and Protector are as follows:

data Coder = Coder {
    encrypt    :: Buffer -> PlainText  -> AssDat -> PacketNumber -> IO Int
  , decrypt    :: Buffer -> CipherText -> AssDat -> PacketNumber -> IO Int
  , supplement :: Maybe Supplement
  }

data Protector = Protector {
    setSample  :: Buffer -> IO ()
  , getMask    :: IO Buffer
  , unprotect :: Sample -> Mask
  }

I'm satisfied with this signature.

Bugs found

One single buffer is used for decryption in a QUIC connection. So, this buffer should be occupied by the receiver thread. However, the dispatcher thread of a server calls decrypt with this buffer in the migration procedure. Migration is rare but this is a possible race. In the current code, receiver takes care of migration to kick the race out.

In the integration process, I sometime used undefineds as initial values in tentative data structures of Coder. After initializing Coder, we cannot touch the undefineds. However, the test suite was trapped. This reveals that dropping keys (which stores the initial values again) is too early. The current code delays the key drop timing.

If the cryptonite is used, the test suite sometime gets stuck. Visualizing packet sequences illustrates that a client is waiting for Fin forever since a server does not send Fin. In the server side, the server thread and sender thread are racing. Since encrypt called by sender is slow, the server is finished before sender actually sends a packet. So, I modified that shutdownStream and closeStream wait for the completion of Fin packet sending.