あどけない話

Internet technologies

Releasing QUIC and HTTP/3 libraries

As I described in The Current Plan for Haskell QUIC, I have released the followings:

tls

tls v1.5.5 provides the Network.TLS.QUIC module. If you are interested in how this module has been improved, please read Improving QUIC APIs of the TLS library in Haskell.

http2

As I explained in Implementing HTTP/3 in Haskell, http2 v3.0.0 or later provides both client and server libraries with the abstraction for HTTP requests and responses. These version resist to some HTTP/2 DoS attacks.

quic

quic provides the QUIC APIs based on Haskell's lightweight threads. The architecture described in Implementation status of QUIC in Haskell is still valid. Runner modules for client and server are divided into Network.QUIC.Client and Network.QUIC.Server, respectively.

According to private discussion after Migration API for QUIC clients, Network.QUIC.Client provides both automatic migration and manual migration.

As explained in Developing QUIC Loss Detection and Congestion Control in Haskell, this library implements the congestion control defined in RFC9002.

http3

This library provides HTTP/3 (on QUIC). The encoder does not use the dynamic table at this moment.

warp-quic

warp-quic library is to provide Web Application Interface(WAI) to HTTP/3. In other words, this is a QUIC wrapper for Warp.

mighttpd2

mighttpd2 version 4.0.0 now provides the HTTP/3 (on QUIC) functionality based on warp-quic. Also, the configuration is now based on Dhall.

To create UDP connected sockets on Linux, mighttpd2 drops capabilities except CAP_NET_BIND_SERVICE as described in Haskell vs Linux capabilities.

IIR

I wrote an article about implementation of QUIC in Haskell in Internet Infrastructure Review(IIR)Vol.52. This article is written in Japanese but will be translated into English within a month.

Migration API for QUIC clients

If I understand correctly, most QUIC implementations of clients and servers uses unconnected UDP sockets with sendto()/sendmsg() and recvfrom()/recvmsg(). For the server side, this is probably because they adopt event-driven programming. The event loop calls recvfrom()/recvmsg() and dispatches a received packet according to the peer address.

As I explained in "Implementation status of QUIC in Haskell", the quic library in Haskell made use of connected sockets in both the client and server sides. Perhaps, this is a good thing for the server side first because lightweight thread programming is common in Haskell and second because dispatching is done in the kernel.

But what about the client side? Since RFCs relating to QUIC are published, I'm trying to fix API of the quic library for the first official release. What kind of migration API should be provided for clients?

Let's consider this typical scenario:

  1. A QUIC client is using a 5G network.
  2. The client moves to the place where WiFi is available.
  3. The client migrates the connection from the 5G network to the WiFi network.

If the clients uses a connected socket, this migration can be implemented as follows:

  1. The client needs to detect the event that the WiFi network interface is available.
  2. The client creates a new connected socket. When connect() is called, the kernel sets the remote address/port according to the argument. Then it looks up the routing table with the server's IP address. Since the WiFi network interface is resolved, the kernel sets the local address of the socket to the IP address of the network interface. A local port number is chosen randomly.
  3. The client starts sending packets through the new socket with send().

It's easy for the quic library to provide the API for item 2. But how to implement item 1? Do major OSes provide such API for network interfaces? Should we prepare a watch-dog thread for network interface events?

I have been wondering why other implementors do not talk about this issue. I finally realized that I went the wrong way. That is, unconnected socket should be used in the client side. If sendto()/sendmsg() is used with a unconnected socket, connection migration is done automatically:

  • When sendto()/sendmsg() is called, the kernel sets the destination address/port of the packet according to the argument. Then it looks up the routing table with the server's IP address. Since a proper network interface is resolved, the kernel sets the source address of the packet to the IP address of the network interface. A local port number is chosen randomly at the first sendto()/sendmsg().

Conclusion: connection migration can be done without any specific API.

Seeking the reasons for segfaults of a Haskell program

My open server of Haskell QUIC on Linux sometimes got segfaults. I saw two types of segfaults. One is a simple segfault by accessing a wrong address:

mighty: segmentation fault

The other is relating to free():

*** Error in `mighty': corrupted double-linked list: 0x00007fcdf0008f90 ***

I guessed that a buffer overrun occurred against a buffer allocated by malloc() and this segfault happened when the buffer is freed.

Many Haskellers would be surprised at this kind of segfaults because it is hard to cause segfaults in normal Haskell programming. However, if you manipulate pointers or use unsafe functions, segfaults are usual like other programming language.

For the first type of segfault, you can use Foreign.Storable.peek:

% ghci
> import Foreign.Ptr
> import Foreign.Storable
> :type peek
peek :: Storable a => Ptr a -> IO a

Let's try to access so-called NULL:

> peek nullPtr :: IO Int
sh: segmentation fault  ghci

Buffer overruns can be caused by Foreign.Storable.poke. Its type is as follows:

> :type poke
poke :: Storable a => Ptr a -> a -> IO ()

I checked all peeks and pokes in my code but I could not figure out the reasons of segfaults. So, I needed to take another approach.

The -g option of GHC

Like other compilers, GHC provides the -g option to add debug information to a complied program. We can run the program in gdb and get a back trace if a segfault happens. To compile all dependent libraries with the -g option, I modified my Cabal wrapper, called cab, to provide a command line option (whose name is also -g) to implement this feature. I also used the sandbox feature of Cabal-v1:

% cd mighty
% cab init             # creating a sandbox
% cab add ~/work/quic  # adding non-Hackage deps
...
% cab install -d -f tls -f quic -g
% cab conf -f tls -f quic -g
% cab build

Then run the complied program in gdb:

% sudo gdb --args mighty conf route
(gdb) handle SIGPIPE nostop noprint pass
Signal        Stop  Print   Pass to program Description
SIGPIPE       No    No  Yes     Broken pipe
(gdb) handle SIGUSR1 nostop noprint pass
Signal        Stop  Print   Pass to program Description
SIGUSR1       No    No  Yes     User defined signal 1
(gdb) run

As you can see, I needed to modify behavior of two signal handlers to ignore them:

Segfault 1

When I added some test cases of QPACK to h3spec and test the open server, gdb finally caught a segfault and showed a back trace. The reason is Data.Array.Base.unsafeAt. I did not check the boundary of an array! (My QPACK code is derived from my HPACK code where this boundary check is not necessary.)

Segfault 2

The segfault relating to free() was really mysterious because the buffer boundary is always checked when poke is used. The error message of free() on Linux is not so kind. But when I got the same segfault on macOS, the following message was displayed:

mighty(75755,0x700009519000) malloc: Incorrect checksum for freed object 0x7fb8de80ea00: probably modified after being freed.

Eureka! Even if the boundary is checked everytime, this segfault happens because a freed buffer is used.

But why is a freed buffer used? This is one of difficulties of multi-thread programming. Suppose thread A and thread B share a buffer. The following is an example clean-up procedure:

  • Thread A sends a kill signal to thread B
  • Thread A frees the buffer
  • Thread A exits

This looks perfect. However the timing of termination of thread B depends on the scheduler. Even after thread A freed the buffer, thread B is alive and can manipulate the buffer.

To prevent this contention, I gave up the approach of Foreign.Marshal.Alloc.mallocBytes and Foreign.Marshal.Alloc.free. Instead, I started using GHC.ForeignPtr.mallocPlainForeignPtrBytes. Buffers allocated by this function are GCed like ByteString.

Now I believe that my QUIC server gets much stabler than before.

Haskell vs Linux capabilities

I found an elegant solution for the problem of Haskell vs Linux capabilities explained in "QUIC and Linux capabilities". To know why the CAP_NET_BIND_SERVICE capability is necessary, please read this article in advance.

On Linux, the following is the procedure to boot a secure multi-threaded server with CAP_NET_BIND_SERVICE:

  • Executed by root.
  • Reading a TLS private key.
  • Setting SECBIT_KEEP_CAPS by prctl(2) -- Without this, all capabilities are lost after setuid(2).
  • Switching the root user to nobody (or something) by setuid(2).
  • Dropping capabilities except CAP_NET_BIND_SERVICE by capset(2).
  • Spawning native threads. CAP_NET_BIND_SERVICE is inherited by all native threads.

GHC RTS executes Haskell code after spawning native threads. So, there are two problems to implement a secure multi-threaded server with CAP_NET_BIND_SERVICE in Haskell.

  1. How to set SECBIT_KEEP_CAPS to all native threads?
  2. How to drop capabilities except CAP_NET_BIND_SERVICE of all native threads?

For 1), by reading the source code of GHC RTS, I finally found a C level hook called FlagDefaultsHook(). The user manual has the section of Hooks to change RTS behaviour, but this hook is not written, sign. GHC RTS executes this hook before spawning native threads. So, if the following code is linked your Haskell program, all native threads keeps all capabilities after setuid(2), yay!

void FlagDefaultsHook () {
  if (geteuid() == 0) {
    prctl(PR_SET_SECUREBITS, SECBIT_KEEP_CAPS, 0L, 0L, 0L);
  }
}

For 2), I considered that signals can be used. On Linux, we can get the thread IDs of all native threads in a process by scanning /proc/<process id>/task/. And Linux provides tgkill(2) to send a signal to the native thread specified a thread ID.

I first tried to use installHandler of Haskell to install a signal handler. But it appeared that an improper native thread catches the signal from tgkill(2), sigh. So, I used sigaction(2) again in FlagDefaultsHook().

The following is the procedure to implement a secure multi-threaded server with CAP_NET_BIND_SERVICE in Haskell:

  • Executed by root.
  • GHC RTS executes FlagDefaultsHook():
    • Setting SECBIT_KEEP_CAPS by prctl(2).
    • Setting a signal handler to drop capabilities except CAP_NET_BIND_SERVICE by sigaction(2).
  • GHC RTS spawns native threads.
  • GHC RTS executes Haskell code:
    • Reading a TLS private key.
    • Switching the root user to nobody (or something) by setuid(2).
    • Sending signals to all native threads to drop capabilities except CAP_NET_BIND_SERVICE by tgkill(2).

You can see a concrete implementation in this commit.

One awkward thing is that the capabilities of the process itself remains in a wrong value. It seems to me that capset(2) for a process is not permitted if it is multi-threaded. However, if I understand correctly, there is no way to access or inherit the capabilities of the process in GHC RTS. So, I don't care it so much.

QUIC and Linux capabilities

For security reasons, the typical boot process of HTTPS servers is as follows:

  • Executed by a root.
  • Reading a TLS private key and open a listen socket on TCP port 443.
  • Switching the root user to nobody (or something).

Since accept() can create connected sockets bound to TCP port 443 even with non-root privilege, servers can accept connections.

Let's consider the case of QUIC servers which uses UDP. Chrome does not allow Alt-Svc to go across the privileged boundary (i.e. 1024). For instance, `Alt-Svc: h3=":4433" provided on TCP port 443 does not work. QUIC servers should provide QUIC on UDP port 443.

Some QUIC servers make use of connected UDP socket. As I described in Implementation status of QUIC in Haskell, the following procedure can be used to create a connected UDP socket when a packet is received on a kind of listen socket of 192.0.2.1:443:

  1. Create a new UDP socket with SO_REUSEADDR
  2. Bind it to *:443
  3. Connect it to 203.0.113.0:3456. This also binds the local address to 192.0.2.1.

For 2), the root privilege or the CAP_NET_BIND_SERVICE capability is necessary on Linux. The easiest way to implement secure QUIC servers is to use the setcap command:

% sudo setcap "CAP_DAC_READ_SEARCH,CAP_NET_BIND_SERVICE+epi" quic-server
% sudo -u nobody -g nobody ./quic-server

CAP_DAC_READ_SEARCH is necessary to read a TLS private key. Since the capability is not dropped, this server can read any files. Yes, still insecure.

To keep only CAP_NET_BIND_SERVICE, the following code should be run after reading the private key:

  /* root */
  /* inherits all capabilities */
  prctl(PR_SET_SECUREBITS, SECBIT_KEEP_CAPS, 0L, 0L, 0L);

  setuid(99);
  /* nobody */

  /* drop capabilities except CAP_NET_BIND_SERVICE */
  cap_t caps = cap_from_text("cap_net_bind_service=ipe");
  cap_set_proc(caps);
  cap_free(caps);

This probably works for most QUIC servers. However, this is not the case for Haskell. The Linux capability is per-thread. GHC threaded RTS spawns some native threads then runs Haskell programs. If I understand correctly, there is no way to set SECBIT_KEEP_CAPS for all native threads.

The manual page of capabilities says:

Neither glibc, nor the Linux kernel honors POSIX semantics for setting capabilities and securebits in the presence of pthreads. That is, changing capability sets, by default, only affect the running thread. To be meaningfully secure, however, the capability sets should be mirrored by all threads within a common program because threads are not memory isolated. As a workaround for this, libcap is packaged with a separate POSIX semantics system call library: libpsx. If your program uses POSIX threads, to achieve meaningful POSIX semantics capability manipulation, you should link your program with:

ld ... -lcap -lpsx -lpthread --wrap=pthread_create

or,

gcc ... -lcap -lpsx -lpthread -Wl,-wrap,pthread_create

This workaround cannot apply to Haskell. In my opinion, the securebits capability of Linux should be per-process.

Testing QUIC servers with h3spec

h2spec is an excellent test tool to check if HTTP/2 servers can handle error cases correctly. When I was developing HTTP/2 server library in Haskell, I used to utilize Firefox and Chrome for normal cases and h2spec for error cases. h2spec much helped me to improve the quality of the library. What is surprised is that the author, Moto Ishizawa, is still enhancing h2spec!

Since the QUIC library in Haskell supports both the client side and the server side, normal cases are tested by itself. I wanted to test error cases, too. After considering for two weeks, it appeared that QUIC error cases can be made easily. The key idea is providing hooks to covert data structures:

data Hooks = Hooks {
    onPlainCreated  :: EncryptionLevel -> Plain -> Plain
  , onTransportParametersCreated :: Parameters -> Parameters
  , onTLSExtensionCreated :: [ExtensionRaw] -> [ExtensionRaw]
  }

If we want to test an error case where a unknown frame is included a QUIC packet, we can provide the following function for onPlainCreated:

unknownFrame :: EncryptionLevel -> Plain -> Plain
unknownFrame lvl plain
  | lvl == RTT1Level = plain { plainFrames = UnknownFrame 0x20 : plainFrames plain }
  | otherwise        = plain

Hooks is passed to the client/server runners via configuration. I used our lovely hspec to run the test:

transportSpec :: ClientConfig -> SpecWith a
transportSpec cc0 = do
    describe "QUIC servers" $ do
        it "MUST send FRAME_ENCODING_ERROR if a frame of unknown type is received [Transport 12.4]" $ \_ -> do
            let cc = addHook cc0 $ setOnPlainCreated unknownFrame
            runC cc waitEstablished `shouldThrow` transportError FrameEncodingError

transportError :: TransportError -> QUICError -> Bool
transportError te (TransportErrorOccurs te' _) = te == te'
transportError _  _                            = False

The notation of `shouldThrow` transportError FrameEncodingError is really cool, isn't it?

After adding some error cases, I hit upon an idea of a command line tool with this error cases reused by adding another main function. Moto agreed that I take the name of h3spec. Here is an example of execution of h3spec:

% h3spec -v
h3spec 0.0.4
% h3spec mew.org 443

QUIC servers

   MUST send TRANSPORT_PARAMETER_ERROR if initial_source_connection_id is missing [Transport 7.3]
  MUST send TRANSPORT_PARAMETER_ERROR if original_destination_connection_id is received [Transport 18.2]
  MUST send TRANSPORT_PARAMETER_ERROR if preferred_address, is received [Transport 18.2]
  MUST send TRANSPORT_PARAMETER_ERROR if retry_source_connection_id is received [Transport 18.2]
  MUST send TRANSPORT_PARAMETER_ERROR if stateless_reset_token is received [Transport 18.2]
  MUST send TRANSPORT_PARAMETER_ERROR if max_udp_payload_size is invalid [Transport 7.4 and 18.2]
  MUST send FRAME_ENCODING_ERROR if a frame of unknown type is received [Transport 12.4]
  MUST send PROTOCOL_VIOLATION on no frames [Transport 12.4]
  MUST send PROTOCOL_VIOLATION if reserved bits in Handshake are non-zero [Transport 17.2]
  MUST send PROTOCOL_VIOLATION if reserved bits in Short are non-zero [Transport 17.2]
  MUST send PROTOCOL_VIOLATION if NEW_TOKEN is received [Transport 19.7]
  MUST send STREAM_STATE_ERROR if MAX_STREAM_DATA is received for a non-existing stream [Transport 19.9] FAILED [1]
  MUST send PROTOCOL_VIOLATION if HANDSHAKE_DONE is received [Transport 19.20]
  MUST send no_application_protocol TLS alert if no application protocols are supported [TLS 8.1]
  MUST the send missing_extension TLS alert if the quic_transport_parameters extension does not included [TLS 8.2]

Failures:

  Transport.hs:60:13:
  1) QUIC servers MUST send STREAM_STATE_ERROR if MAX_STREAM_DATA is received for a non-existing stream [Transport 19.9]
       did not get expected exception: QUICError

  To rerun use: --match "/QUIC servers/MUST send STREAM_STATE_ERROR if MAX_STREAM_DATA is received for a non-existing stream [Transport 19.9]/"

Randomized with seed 1914918977

Finished in 0.7035 seconds
15 examples, 1 failure

I'm planning to add some more error cases. Enjoy!

The Current Plan for Haskell QUIC

In recent days, QUIC is hot. As the following blog posts describe, the standardization of QUIC is now in the final stage and h3-29 (HTTP/3 over QUIC of draft 29) is out there:

These blog posts motivated me to explain my plan for "Haskell QUIC" to the Haskell community. As I described in "Implementing HTTP/3 in Haskell" and "Developing QUIC Loss Detection and Congestion Control in Haskell", Haskell QUIC servers can communicate with Firefox Nightly and Chrome via HTTP/3. Also, as you can see in "IETF QUIC Interop Matrix", the interoperability has been tested with many other QUIC implementations.

After QUIC drafts will be published as a set of RFCs, I will release several Haskell libraries in the following order:

  • tls to provide the Netork.TLS.QUIC module
  • http2 to provide necessary functions for QPACK and HTTP/2 client library
  • quic to provide QUIC core features
  • http3 to provide HTTP/3 client/server libraries and QPACK
  • warp-quic to glue QUIC with WAI (Web Application Interface)
  • mighttpd2 to provide HTTP/3 server

I expect that this would happen in April 2021. For the 2nd half of Japan fiscal year of 2020 (Oct 2020 - Mar 2021), I'm planning to tackle the followings (in no particular order):

Stay tuned.