Ordinateur

TIL: Phantom Types and Typestates in Rust

March 27, 2025

I’ve recently learned some common patterns in Rust to enforce runtime properties at compile time.

Phantom types

Adding a PhantomData<T> field to your type tells the compiler that your type acts as though it stores a value of type T, even though it doesn’t really. This information is used when computing certain safety properties.

Phantom types let you create subtle markers to distinguish objects that may be structurally identical but semantically different. This could be used to add safety when passing objects through layers of an application. When developing an API, it is conventional to validate data when processing a request at the boundary before passing it down into the domain or persistence layer.

I previously worked on a product that stored entity data. Users would send entities into our service as a serialized JSON-based blob. The service would then validate basic properties of the data and make sure that it is compatible with schemas provided by users in a previous step. The structure of an entity is the same before and after validation, but we would only want to store validated ones into our database. With phantom types, it could have looked like this:

pub struct Entity<T: Into<State>> {
	data: Value,
	_validation: PhantomData<T>
}

pub fn read_entity(json: &str) -> Entity<Unvalidated> {
    Entity {
	    data: serde_json::from_str(json).expect("Failed to parse entity"),
	    _validation: PhantomData
	}
}

impl Entity<Validated> {
    pub fn parse(unvalidated: Entity<Unvalidated>) -> Result<Self, Error> {
        ...
    }
}

// The function to store entities only accepts the validated form of the type
pub fn store_entity(entity: Entity<Validated>) -> Result<Id, Error> {
    ...
}

Typestate

The typestate pattern is an API design pattern that encodes information about an object’s run-time state in its compile-time type. In particular, an API using the typestate pattern will have:

This pattern uses phantom types and is an extension of what I showed above. Recently, I was writing an LLM-based application with a guided chat feature for getting key information from a user. It was difficult to balance getting a structured set of attributes from the user in a free form chat. As an exercise, I’ll show how this could be done with typestate in Rust.

Imagine trying to determine what product a user wants to order. You may need the details on name, brand, and quantity. The user might not provide all of this information in one shot, so the application needs to be able to handle partial information.

pub trait QuestionAnswer {
    pub fn generate_question() -> String;
    pub fn parse_answer() -> AnswerState;
}

pub enum AnswerState {
    MissingAttributes(MissingAttributes),
    AllInformation(AllInformation)
}

pub struct MissingAttributes {
    name: Option<String>,
    brand: Option<String>,
    quantity: Option<u64>
}

pub struct AllInformation {
    name: String,
    brand: String,
    quantity: u64
}

impl QuestionAnswer for MissingAttributes {
    fn generate_question() -> String {
        "You are a QA bot trying to get product information from a user..."
    }

    fn parse_answer(response: &str) -> AnswerState {
        serde_json::from_str(response).expect("Failed to parse entity")
    }
}

run_session(model: LLM, user: User) {
    let mut state = AnswerState::MissingAttributes(MissingAttributes {})
    while let AnswerState::MissingAttributes(missing) = state {
        let question = state.generate_question()
        let response = user.get_response(question)
        let state = parse_answer(response)
    }
}

Further reading: I recently came across this old conference talk about “Making Impossible States Impossible”, which has some interesting data structures for making a similar Q&A interface safer. While the presentation is in Elm, the concepts would translate to many different languages.