TIL about Zero-sized Types in Rust.

I was reading my timeline on BlueSky, and stumbled upon this article about some Rust library that got rewritten. While the hpke-ng library itself is nice and interesting in itself, it was not the main thing I got out ot if.

It was a nice explanation of a concept I’ve seen before in various Rust libraries: Zero-sized types. And the associated traits that go with these.

Traits, interfaces, etc.

Rust programmers are familiar with traits (also called “interfaces” in other languages like Java or Golang). These are a way to specify behaviour in structs or enums. This is like what you can find in OOP languages, but slightly different, especially in Rust.

Take Go for example. You can implement an interface called Block that you would use for a library implementing cryptographic systems (this is a real example, see my own cipher library implementing the interface).

type Block interface {
	// BlockSize returns the cipher's block size.
	BlockSize() int

	// Encrypt encrypts the first block in src into dst.
	// Dst and src must overlap entirely or not at all.
	Encrypt(dst, src []byte)

	// Decrypt decrypts the first block in src into dst.
	// Dst and src must overlap entirely or not at all.
	Decrypt(dst, src []byte)
}

To ease the rewrite from Go to Rust in old-crypto-rs, I used a similar trait, defined like this:

pub trait Block {
    fn block_size(&self) -> usize;
    fn encrypt(&self, dst: &mut [u8], src: &[u8]) -> usize;
    fn decrypt(&self, dst: &mut [u8], src: &[u8]) -> usize;
}

One of the fundamental differences between the two languages is that where interface implementations are implicit in Go, they have to be explicitly defined in Rust.

I must specify that I want to implement said trait:

impl Block for NullCipher {
    /// BlockSize is part of the interface
    fn block_size(&self) -> usize {
        1
    }

    /// Encrypt is part of the interface
    fn encrypt(&self, dst: &mut [u8], src: &[u8]) -> usize {
        dst[..src.len()].copy_from_slice(src);
        src.len()
    }

    /// Decrypt is part of the interface
    fn decrypt(&self, dst: &mut [u8], src: &[u8]) -> usize {
        dst[..src.len()].copy_from_slice(src);
        src.len()
    }
}

Runtime checks

While migration from Go to Rust was not complicated, it moved certain characteristics from one language to another. But remember they are quite different, with Rust insisting on more compile-time checks. Which means that the Rust code ended up checking too many things at runtime, instead of compile-time, and even more, passing parameters to specify behaviour.

#[test]
fn test_new_cipher() {
    let c = SquareCipher::new("PORTABLE", "ADFGVX");
    assert!(c.is_ok());
}

#[test]
fn test_new_cipher_empty_key() {
    let c = SquareCipher::new("", "012345");
    assert!(c.is_err());
}

The second parameter allows you to change the character set used to encode each plaintext letter. So, the first test is meant to create an ADFGVX-like square, whereas the second one is for a more classical Polybius square. And the ADFGVX cipher itself did the same:

pub fn new(key1: &str, key2: &str) -> Result<Self, String> {
    let sqr = SquareCipher::new(key1, "ADFGVX")?;

Compile-time checks

But what if these checks and parameterization would occur at compile-time? What if these could even be at no-cost?

Enter Zero-sized Types or ZST.

As the name implies, defining these types has no runtime cost, no associated storage, they are just here to enforce compîle-time checks and parameterization.

Here is an example, let us define an Coordinates trait to represent the ciphertext letters in a square-like cipher:

pub trait Coordinates {
    const SYMBOLS: &'static [u8];
}

and the Alphabet trait too:

pub trait Alphabet {
    const SIZE: usize;
    const ALPHABET: &'static [u8];

    fn normalize(ch: u8) -> Option<usize>;
    fn denormalize(idx: usize) -> u8;
}

Alphabet is more complex as it includes not only the two constants, SIZE and ALPHABET, but also functions to convert from letters to numbers and vice versa.

In these, plaintext A is represented by GV or 34. As you can see, the Coordinates trait has no associated functions. You do not need a function to be in the trait. You might ask, what is the point here?

It allows us to define a SquareCipher type that is parameterized by the ciphertext letters (and its alphabet too) through constraints with the traits:

#[derive(Debug)]
pub struct SquareCipher<C: Coordinates, A: Alphabet> {
    key: String,
    alpha: Vec<u8>,
    enc_table: [EncEntry; 256],
    dec_table: Box<[u8; 256 * 256]>,
    _marker: PhantomData<(C, A)>,
}

/// The most used square cipher is the Polybius one.
/// 5x5, and thus using the restricted 25-letter alphabet
///
pub type PolybiusCipher = SquareCipher<Numeric5, Latin25>;

thus PolybiusCipher is a SquareCipher with Numeric5 coordinates and Latin25 alphabet. How do you link the traits with the type? Through the PhantomData marker.

What’s better?

pub fn new(key1: &str, key2: &str) -> Result<Self, String> {
    let sqr = SquareCipher::new(key1, "ADFGVX")?;
    let transp = Transposition::new(key2)?;

    Ok(ADFGVX {
        sqr,
        transp,
        buf: RefCell::new(Vec::new()),
    })
}

or?

pub fn new(key1: &str, key2: &str) -> Result<Self> {
    let sqr = ADFGVXSquare::new(key1)?;
    let transp = Transposition::new(key2)?;

    Ok(ADFGVXCipher {
        sqr,
        transp,
        buf: RefCell::new(Vec::new()),
    })
}

The latter is more concise and precise. It specify intent and purpose. The Coordinates trait is used to restrict the possible values of the ciphertext letters, while the Alphabet one is to restrict the character set.

#[derive(Debug)]
pub struct ADFGVX;

impl Coordinates for ADFGVX {
    const SYMBOLS: &'static [u8] = b"ADFGVX";
}

pub type ADFGXSquare = SquareCipher<ADFGX, Latin25>;
pub type ADFGVXSquare = SquareCipher<ADFGVX, Latin36>;

The zero-sized types name comes from the fact that they are empty:

#[derive(Debug)]
pub struct ADFGVX;

impl Coordinates for ADFGVX {
    const SYMBOLS: &'static [u8] = b"ADFGVX";
}

Final thoughts

ZST are a great way to enforce compile-time checks and parameterization, but be careful not to overuse them. Some things still need to be at runtime, because it is the way these types/functions/whatever are supposed to be used.
Do not try to force everything at compile-time.

ZST are elegant, conway intent more precisely, and a way to ensure proper behaviour and API usage earlier.