あどけない話

インターネットに関する技術的な話など

Haskell network library version 3.0

Brief history

The first commit of the network library in Haskell was created by Simon Marlow in 2001. It says:

Package 'net' moved over. URI & CGI still missing because they have dependencies on other bits that haven't made it over yet.

So, I guess that the code existed before the day and actually I heard that network was one of the oldest packages in Haskell. When Johan Tibell became the previous maintainer, network was messy already. Before he started refactoring, he added many test cases. Thank you, Johan!

In 2015, he passed the baton to Evan Borden and me without drastic refactoring. After that, we concentrated on bug fixes only for a while. I don't know about Evan but this is because I didn't have any ideas to improve this package.

In December 2017, I decided to resolve issues as much as possible. During this work, I realized what is important for network:

  • The code was messy like other long-life code. We should clean up the code for maintainability.
  • The build system was terrible. We cannot understand which depends on which. We should also clean up it.
  • Believing or not, Socket cannot be GCed. This is a shame. Socket should be GCed.
  • SockAddr was not extensible. If users want to add a new one, they must send a PR. Once merged, the maintainers must maintain it even they don't know it well. Other packages should be able to extend SockAddr without modifying network.

I divided the jumbo Network.Socket module into small sub-modules. Also, I cleaned up the build system. This work was painful because I don't know Windows well. Luckily, we welcomed Tamar Christina as a new maintainer for Windows.

I will explain the last two items in the next section in detail. But briefly, we had to change the signatures of two APIs.

In network v2.6:

fdSocket :: Socket -> CInt
mkSocket :: CInt -> Family -> SocketType -> ProtocolNumber -> SocketStatus -> IO Socket

But in network v3.0:

fdSocket :: Socket -> IO CInt
mkSocket :: CInt -> IO Socket

To provide migration path, we did:

v2.6

  • Making SockAddrCan deprecated

v2.7

  • Making Network deprecated
  • Making Network.BSD deprecated
  • Making MkSocket deprecated
  • Making many APIs deprecated

v2.8

  • Stop exporting the PortNum constructor in PortNumber

v3.0

  • Removing Network
  • Removing Network.BSD
  • Removing SockAddrCan
  • Changing the internal structure of Socket.
  • Make address extensible.
  • Remove EOF errors

Like Network.URI in the network-uri package, Herbert Valerio Riedel kindly released the network-bsd package for Network.BSD.

Main jobs for v3.0 were done in Dec 2017 and v3.0 was released in Jan 2019. I'm very sorry for breaking backward compatibility but we waited for at least one year.

GC and extensibility

Recall the signature of the old API:

mkSocket :: CInt -> Family -> SocketType -> ProtocolNumber -> SocketStatus -> IO Socket

To make a Socket, we needed to supply Family, SocketType and ProtocolNumber. Since they are sum types, they cannot be extended without modifying the definitions. But CInt, a socket descriptor, is created by the socket() system call with its protocol family, its socket type and its protocol number. Why should we specify them again?

See the old definition of Socket:

data Socket = MkSocket CInt Family SocketType ProtocolNumber (MVar SocketStatus)

I don't know why they were included in. Let's remove them for extensibility. So, what about MVar SocketStatus? A good question! The reason why Socket cannot be GCed is MVar. We tried two approaches: mkWeakMVar and addFinalizer but it appeared that they did not solve the problem.

So, let's remove MVar, too:

data Socket = Socket CInt

But without status control, unexpected things would happen. Consider this scenario:

  • Haskell thread (A) creates Socket with a socket descriptor and close it.
  • The socket descriptor is re-used in another Haskell thread (B).
  • Haskell thread (C) can close the Socket again.
  • At this point, Haskell thread (B) suffers from unexpected behavior.

The key idea to solve this problem was provided by Viktor Dukhovni. He suggested to use IORef:

data Socket = Socket (IORef CInt)

When Socket is closed, we modify the value of IORef to -1 for safety. Unfortunately, to extract the file descriptor in Socket, IO is necessary:

fdSocket :: Socket -> IO CInt

This is the reason why the signature changed. With this definition, we need unsafePerformIO to make Socket an instance of Show. So, the final definition of Socket is:

data Socket = Socket (IORef CInt) CInt -- for Show

Final note

If you want to extend socket addresses, see the new Network.Socket.Address module.

I hope that the reasons for the breaking changes are now more clear.

I thank Lars Petersen for showcasing design for extensibility in his socket package.