Lifetimes: a complex case

Warning: in this page, we will touch several Rust areas that have not yet been explained in class. Do not hesitate to read the Rust documentation or to take things as granted for now.

Exercise 3.a: follow the code described below, implement it, make sure you understand how it works.

Problem statement

Sometimes, we would like to manipulate a string, for example to make it lowercase. However, the string we manipulate might already be in lowercase: in this situation, we would like to avoid copying the string and return a reference to the string we got as a parameter.

In short, we will need a type which can hold:

  • either a proper String if we had to build it;
  • or a reference to another string (as a &str) if we could just reuse an existing one.

Building such a type

We will build a type, named StringOrRef, which can either hold an owned String or store a borrowed &str reference.

Enumerated types

Rust has support for enumerated types with values, which means that we can write something like (incorrect for now):

#![allow(unused)]
fn main() {
pub enum StringOrRef {
    Owned(String),
    Borrowed(&str),
}
}

A StringOrRef object will be either a StringOrRef::Owned or a StringOrRef::Borrowed variant. When such an object is destroyed, it will destroy its content:

  • If it is a StringOrRef::Owned, it will destroy the String it owned (and thus free the heap memory associated with the String).
  • If it is a StringOrRef::Borrowed, it will destroy the &str it owned, which does nothing since destroying a reference does not destroy the object it points to (as the reference does not own it).

Fixing the type

Our type will not compile because we haven't told the compiler how long the reference in StringOrRef::Borrowed is supposed to be valid. Since we do not know that ourselves in advance, we have to make it a generic parameter of the type:

#![allow(unused)]
fn main() {
pub enum StringOrRef<'a> {
    Owned(String),
    Borrowed(&'a str),
}
}

We say that the type StringOrRef is parameterized by the generic lifetime parameter 'a.

We can create an owned value by using StringOrRef::Owned(String::from("Hello")), or make a reference to a string s with StringOrRef::Borrowed(&s). In the former case, the lifetime 'a can be anything since the StringOrRef object owns the string. In the later case, 'a will be set to the lifetime of the referenced string s:

Important note: when a type is parameterized by a generic lifetime parameter, an object of this type can never live longer than this lifetime. For example, if s has a lifetime 's, StringOrRef::Borrowed(s) is an object that cannot live longer than 's. This is intuitively sound: since we store a reference to s (wich has lifetime 's) inside our StringOrRef, the StringOrRef cannot survive the disappearance of s as that would leave us with a dangling pointer.

Exploring the type

Our type can be used through pattern-matching:

#![allow(unused)]
fn main() {
fn display(x: &StringOrRef<'_>) {  // '_ means that the lifetime has no importance here
    match x {
        StringOrRef::Owned(s) => println!("owned string: {s}"),
        StringOrRef::Borrowed(s) => println!("borrowed string: {s}"),
    }
}
}

We can also write a function which returns a &str from our object:

#![allow(unused)]
fn main() {
pub fn as_str<'a>(x: &'a StringOrRef<'_>) -> &'a str {
    match x {
        StringOrRef::Owned(s) => &s,
        StringOrRef::Borrowed(s) => s,
    }
}
}

Note how we didn't have to give the lifetime of the StringOrRef generic parameter 'a and used '_ which means "any lifetime": since the StringOrRef reference has a lifetime of 'a which is necessarily shorter or equal than the generic lifetime parameter (see the "Important note" above), we now that the returned reference is shorter than the one used as a generic parameter.

Implementing as_str() as a method

Rather than using a standalone function, we can implement as_str() as a method on StringOrRef objects. Methods are implemented in a impl block. In an impl block, Self designates the type itself. In methods parameters, self in first position designates receiving the current object (it is a shortcut for self: Self), &self is a shortcut for self: &Self and &mut self is a shortcut for self: &mut Self.

Let us rewrite as_str() as a method:

#![allow(unused)]
fn main() {
impl StringOrRef<'_> {
    pub fn as_str(&self) -> &str {
        match self {
            StringOrRef::Owned(s) => &s,
            StringOrRef::Borrowed(s) => s,
        }
    }
}
}

You can note some interesting points about lifetimes:

  • We used <'_> in our impl block: our method defined in this block works with any generic lifetime parameter, as explained below.
  • We didn't explicitely write in the as_str() signature that the returned &str has the same lifetime as &self. This mecanism is called "lifetime elision": when a method has a &self parameter, by default all outputs lifetime which are not explicit will have the same lifetime as &self. This is a shortcut for pub fn as_str<'a>(&'a self) -> &'a str.

Using or StringOrRef type

We can now use our StringOrRef type. For example, let us write a function which returns a lowercase version of a string, but allocates memory on the heap only when the string is not lowercase already:

#![allow(unused)]
fn main() {
// The lifetime of s will be copied into the generic lifetime parameter of StringOrRef.
// Again, this is because of elision rules: if there is only one lifetime parameter in
// the input, it will be copied into all non-explicit lifetime parameters in the output.
pub fn to_lowercase(s: &str) -> StringOrRef<'_> {
    if s.chars().all(|c| c.is_lowercase()) {
        // All characters in the string are lowercase already, return a reference
        StringOrRef::Borrowed(s)
    } else {
        // We need to create a new String with a lowercase version
        StringOrRef::Owned(s.to_lowercase())
    }
}
}

We can now use it in our main program and see that it works:

fn main() {
    let s1 = to_lowercase("HeLlO");
    let s2 = to_lowercase("world");
    println!("s1 = {}, s2 = {}", s1.as_str(), s2.as_str());
}

This will display "s1 = hello, s2 = world". Nothing indicates that "world" has not been copied. Let's enhance the program with the matches! macro which can test if some expression matches a pattern, as in a match expression:

fn variant(x: &StringOrRef<'_>) -> &'static str {
    if matches!(x, StringOrRef::Owned(_)) {
        "owned"
    } else {
        "borrowed"
    }
}

fn main() {
    let s1 = to_lowercase("HeLlO");
    let s2 = to_lowercase("world");
    println!("s1 = {}, s2 = {}", s1.as_str(), s2.as_str());
    println!("s1 is {}, s2 is {}", variant(&s1), variant(&s2));
}

The output is now

s1 = hello, s2 = world
s1 is owned, s2 is borrowed

as expected. Neat eh?

Adding a destructor

When an StringOrRef object is dropped (goes out of scope), it will get destroyed: the destructor for every field will be called (if any). For example, if it holds a StringOrRef::Owned variant, the String contained in this variant will be dropped and its destructor will be called, freeing memory on the heap.

We can visualize what happens by adding a destructor on StringOrRef. It is done by implementing the Drop trait:

#![allow(unused)]
fn main() {
impl Drop for StringOrRef<'_> {
    fn drop(&mut self) {
        print!(
            "Destroying the StringOrRef containing {} which is {}: ",
            self.as_str(),
            variant(self),
        );
        if matches!(self, StringOrRef::Owned(_)) {
            println!("memory on the heap will be freed");
        } else {
            // Dropping a reference doesn't free memory on the heap
            println!("no memory on the heap will be freed");
        }
    }
}
}

If we execute our program, we will now read:

s1 = hello, s2 = world
s1 is owned, s2 is borrowed
Destroying the StringOrRef containing world which is borrowed: no memory on the heap will be freed
Destroying the StringOrRef containing hello which is owned: memory on the heap will be freed

s2 and s1 are destroyed in the reverse order of their creation when they go out of scope. No memory on the heap was ever allocated for string "world", which comes from a read-only memory area and has only been referenced. However, the string "hello" has been built into the heap while lowercasing the string "HeLlO" and needs to be freed: this happens automatically when dropping s1.

Conclusion

Types in Rust are powerful and allow easy memory management without needing a garbage collector. Doing the same thing in C would require extra fields (is the string owner or borrowed, code the deallocation by hand). We will later see even more powerful type manipulation in Rust.