Introduction

This website contains teaching material for the laboratory sessions of the course Safe System Programming (NET7212), at Télécom Paris / Polytechnique Institute of Paris.

ⓒ 2022-2024 Samuel Tardieu and Stefano Zacchiroli

Lab 01 — Unsafe System Programming

In the first lab session of the course, you will mainly work on:

  1. Setting up your development environment for the rest of the course, and
  2. Experimenting with unsafe system programming, leading to buffer overflows and other nasty stuff.

Let's get to it.

Development setup

For the duration of the entire lab sessions of this course you are going to use a number of tools, for the most part related to development in the C and Rust programming languages. As the first task for this lab session you are hence going to install the relevant toolchains and a comfortable (for you) development environment.

Operating system requirements: we are going to assume that your work machine runs Linux as operating system (OS). If that is not the case, you should fix this before proceeding. Any Linux-based distribution would do, but in the following we are going to rely on package names from Debian and Debian-based distributions (e.g., Ubuntu); you should translate them to your distro if need be.

Work environment

You have various options in terms of which "machine" to work on:

  1. work on your real (hardware) machine: laptop or desktop computer;
  2. work in a Docker container;
  3. work in a virtual machine.

Option 1: Work on your machine

We recommend option (1) as it is the most flexible, but it requires more manual setup on your side—setup which we are not going to detail here as it heavily depends on your OS choice.

We only list below the tools (and corresponding Debian package names) that you should install on your machine to be able to complete the exercises of this course.

Option 2: Work in a Docker container

If you would like to work in a Docker container, we provide a Dockerfile that you can use to create locally a Docker image that contains all the tools used for lab sessions. Download the Dockerfile locally and build a matching image like this:

$ docker build -t net7212 - < Dockerfile

(It will take a few minutes to complete, depending mostly on download and I/O speeds.)

Then, create a shell script named run.sh to easily create and run a fresh container on that image every time you need it. It can be something like the following:

#!/bin/bash
host_labdir="${HOME}/net7212-lab"
cont_labdir="/home/net7212/lab"
docker run -ti \
       --net host \
       -e DISPLAY="unix${DISPLAY}" \
       -v /dev/shm:/dev/shm \
       -v /etc/hosts:/etc/hosts \
       -v /etc/localtime:/etc/localtime:ro \
       -v /tmp/.X11-unix:/tmp/.X11-unix \
       -v /var/run/dbus/system_bus_socket:/var/run/dbus/system_bus_socket \
       -v $HOME/.Xauthority:/home/net7212/.Xauthority \
       -v "${host_labdir}:${cont_labdir}" \
       --rm \
       --name net7212 \
       net7212

note that host_labdir should point to a directory on your real machine, that will be mounted within the container so that you can share files between the two worlds. All your lab session material will need to be in this directory if you want to access it from within the container.

You can then enter and use a fresh container every time you need it like this:

./run.sh 
net7212@machine:~$ rustc --version
rustc 1.65.0 (897e37553 2022-11-02)

and that if you create a file in your home directory under ~/net7212-lab you can also access it from within the container, e.g.:

# outside the container
$ echo "// hello.c" > ~/net7212-lab/hello.c
./run.sh
net7212@machine:~$ cat ~/lab/hello.c
// hello.c
net7212@machine:~$ 

Option 3: Work in a virtual machine

If you prefer to fully control your work environment (like with option (1)), but not impact your day-to-day work machine, you have the option to install a virtual machine and configure it manually. Just install in there the same tools that you would have installed, following the instructions below, on your real machine. We are not going to detail this step further, as it heavily depends on your choice of virtualization technology.

C toolchain

If you have chosen to work on your (virtual) machine, you should install the following tools:

ToolDebian package name
batsbats
checkcheck
clang-tidyclang-tidy
clangclang
gccgcc
gdbgdb
ltraceltrace
makemake
stracestrace
valgrindvalgrind

If you have chosen the Docker option, they are all already installed in the container.

Exercise 0.a: Install all the above dependencies in the work environment of your choice.
When ready, make sure you can compile and run the following Hello World program with both gcc and clang:

#include <stdio.h>

int main(void) {
	printf("Hello, World!\n");
	return 0;
} // end of hello.c

like this:

$ gcc -Wall hello.c 
$ ./a.out 
Hello, World!

$ clang -Wall hello.c 
$ ./a.out 
Hello, World!

Now make sure that all of the following commands execute properly without failing on your setup (the exact output might very depending on version, configuration, etc.; that's OK):

$ bats -v
Bats 1.8.2
$ clang-tidy --version
Debian LLVM version 14.0.6
  Optimized build.
  Default target: x86_64-pc-linux-gnu
  Host CPU: skylake
$ gdb --version
GNU gdb (Debian 12.1-4) 12.1
Copyright (C) 2022 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
$ ltrace --version
ltrace version 0.7.3.
Copyright (C) 1997-2009 Juan Cespedes <cespedes@debian.org>.
This is free software; see the GNU General Public Licence
version 2 or later for copying conditions.  There is NO warranty.
$ make --version
GNU Make 4.3
Built for x86_64-pc-linux-gnu
Copyright (C) 1988-2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
$ strace --version
strace -- version 5.10
Copyright (c) 1991-2020 The strace developers <https://strace.io>.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Optional features enabled: stack-trace=libunwind stack-demangle m32-mpers mx32-mpers
$ valgrind --version
valgrind-3.19.0

Rust toolchain

If you have chosen to work on your (virtual) machine, you should install the following tools:

ToolDebian package
rustupnone (install by hand)

If you have chosen the Docker option, they are all already installed in the container.

Exercise 0.b: Install all the above dependencies in the work environment of your choice.
Once installed, verify that you can successfully run the following basic tools of the Rust toolchain, like this (your output might very, depending on the installed versions):

$ rustup --version
rustup 1.26.0 (5af9b9484 2023-04-05)
info: This is the version for the rustup toolchain manager, not the rustc compiler.
info: The currently active `rustc` version is `rustc 1.72.0 (5680fa18f 2023-08-23)`
$ cargo --version
cargo 1.72.0 (103a7ff2e 2023-08-15)
$ rustc --version
rustc 1.72.0 (5680fa18f 2023-08-23)

Editor / IDE

You are also going to need a development editor or an IDE (Integrated Development Environment). As developers tend to be opinionated about this stuff we are not going to force you to use any specific one: pick your own!

We recommend either a flexible editor with IDE-like features (e.g., Emacs or Vim with LSP support for Rust) or a full-featured IDE like Visual Studio Code or NetBeans (do make sure it has support for Rust, though, as that will be needed later in the semester).

Exercise 0.c: Install the editor or IDE you want in the work environment of your choice.
When done, open the hello.c file you used previously and make sure you can comfortably edit/compile/run it, ideally from within the editor/IDE itself.

Done?
Good! You are now all set with the developer setup!
Let's now move to the other matters of the day.

Memory unsafety

Out-of-bounds write

Exercise 1.a: write C programs (new, different from those seen in lecture 01) that exhibit weakness CWE 787: Out-of-bounds write. Try out-of-bound writes in various memory segments, writing one program for each of the following cases:

  • out-of-bound write in dynamically allocated memory on the heap (e.g., with malloc),
  • out-of-bound write in statically allocated memory (i.e., global variable, preinitialized or not),
  • out-of-bound write in automatic memory (i.e., variables allocated on the stack).

Bonus point: make sure your programs do not emit any warnings when compiled invoking the compiler with the --Wall flag (which asks to enable all compile-time warnings and that you should always use anyway!).

Try to determine experimentally what each program was doing just before the segfault (tip: try with gdb, nm, ltrace, strace).
What helped you determine this?

Exercise 1.b: consider the following local variable declarations at the beginning of a function (possibly main):

char s1[16];
char s2[16];

write a program that does not setfault, but with a single memory (or string) copy operation to either of those variables fills the content of both variables with content of your choice. Verify the result experimentally (e.g., with printf or strcmp).

Warning: do not forget about string \0 terminators!

Out-of-bounds read

Exercise 1.c: continuing from the idea from the previous exercise, write a program that exhibits at least one variant of CWE 125: Out-of-bounds read. The program should not segfault, but by performing a memory ready operation on a given variable, it should be able to read the content of another variable, correctly and in full.

Bonus point: make your program read the content of an out-of-scope local variable, e.g., one declared in a calling function.

Smashing the stack

Segmentation fault

Considering the following program:

void function(int a, int b, int c) {
    char buffer1[5];
    char buffer2[10];
}
int main() {
    function(1, 2, 3);
    return 0;
}

Exercise 2.a: compile the program to assembly language like this:

$ gcc -S -o example1.s example1.c

and identify in the generated assembly code where function is called. How are arguments passed to the function? (The answer might vary depending on the compiler and architecture, but the assembly code will tell you!)

Exercise 2.b: compile, execute, and debug the following program:

#include <string.h>

void function(char *str) {
    char buffer[16];
    strcpy(buffer, str);
}

int main() {
    char large_string[256];
    int i;

    for(i = 0; i < 255; i++) {
        large_string[i] = 'A';
    }
    function(large_string);

    return 0;
}

Explain what's happening and why.

Subverting the control flow

Historically (on 32 bit architectures and with less built-in defenses in C compilers) subverting the control flow via stack overflows was pretty easy. You can get an idea of how easy it was by reading the famous tutorial article Smashing The Stack For Fun And Profit by Aleph One (1996).

Nowadays it has become a little bit more complicated, but still very doable (and regularly done!, as we have seen in the course examples).

Exercise 2.c: go through the modern incarnation of Aleph One's tutorial, namely: Smashing the Stack in the 21st Century by Jon Gjengset (archived copy), reproducing the reported experiences on your machine. (Next week we will go through some of the defenses that you will have to manually disable to exploit a stack overflow, and discuss what they are good for and what they are not.)

Lab 02 — Hardening System Programming

In this lab session we will take a practical tour of existing techniques to secure system programs implemented in traditional system languages. We will experiment with: testing, fuzzing, code instrumentation (of both binary and source code), and static analysis.

By the end of the session you will have a better understanding of the state-of-the-art of hardening system programming, including its limitations.

Credits

The exercises of this lab session contain material and ideas reused with permission from week 1 assignments of Stanford's course CS 110L (2022) by Ryan Eberhardt, Armin Namavari, Will Crichton, Julio Ballista, and Thea Rossman.

Unit testing

Setup

Consider the initial C implementation of a simple currency type (a pair of a currency name and a currency amount) given in money.h and money.c.

We want to reassure ourselves that the code thus far is correct, using unit testing.

In check_money_tiny.c you can find the skeleton of a C program that uses money.c as a library and unit tests it with a single test case. It uses the Check unit testing framework for C, which you should have already installed as part of the first lab session. It can be built and run like this:

$ gcc -Wall -c -o money.o money.c
$ gcc -Wall -c -o check_money_tiny.o check_money_tiny.c
$ gcc -Wall -o check_money_tiny check_money_tiny.o money.o `pkg-config --libs check`
$ ./check_money_tiny 
Running suite(s): Money
100%: Checks: 1, Failures: 0, Errors: 0
check_money_tiny.c:17:P:Core:test_money_create_amount:0: Passed

Exercise 1.a: read through the implementation of money.c and check_money_tiny.c to make sure you understand what the code does. Help yourself with the Check documentation as needed.

Code coverage

The notion of code coverage is a measure of how much of your implementation code is exercised during the execution of your test suite. It can be measured in various ways, we will focus on the simplest version of it: which lines of code are (not) executed during test suite running (i.e., line coverage). You can verify the line coverage of your current test suite using gcov, which is part of gcc by:

  1. compile/link your project passing the -fprofile-arcs -ftest-coverage options to compiler and linker,
  2. execute your test suite (./check_money_tiny in this case),
  3. inspect code coverage of a source code file you care about, e.g.:
$ gcov money.c 
File 'money.c'
Lines executed:75.00% of 16
Creating 'money.c.gcov'

Lines executed:75.00% of 16

The summary tells you the percentage of lines of code exercised during execution. The mentioned file money.c.gcov is an annotated version of money.c that shows precisely which lines have been executed (or not).

Exercise 1.b: measure the code coverage of the current test suite for money.c. Then, add all the test cases you need to maximize code coverage. Can you reach 100% line coverage? Why or why not?

Exercise 1.c: (optional and tricky, feel free to postpone this one to the end of the session) add a memory unsafety issue to the money.c implementation, and specifically one that crashes the execution of your test case. For example, you can implement a new money_add method that performs an out-of-bound read or write. Then add a test that run the buggy code, crashing the execution of the test case. Does the test case crash cause the crash of the entire test suite? (i.e., do other test cases supposed to run after it keep running or not?) Why or why not?

Stack overflow detection

Setup

Retrieve the program uppercase.c, compile it (recommended compiler options for this session are: -g -O0 -Wall -Wextra), and run it, e.g., like this:

$ gcc -g -O0 -Wall -Wextra -o uppercase uppercase.c
$ ./uppercase 'Hello, World!'
HELLO, WORLD!

Now take some time to study the code of the program.

Exercise 2.a: the program is affected by a memory-related bug (EEEK!). Explain where the bug is, what its consequences might be, and how to fix it.

Static analysis with Clang-Tidy

Clang-Tidy (which you have installed already, as part of last week's lab session) performs some simple static analysis checks, ranging from basic linting to more advanced techniques. From the tool documentation:

clang-tidy is a clang-based C++ “linter” tool. Its purpose is to provide an extensible framework for diagnosing and fixing typical programming errors, like style violations, interface misuse, or bugs that can be deduced via static analysis.

Exercise 2.b: go (briefly) through the documentation of clang-tidy and learn how to use it to analyze C source code files. Then use clang-tidy on uppercase.c to check if it detects the bug you have identified in the previous exercise.
Does clang-tidy catch the problem? Based on what you have learned in last lecture, explain why clang-tidy is capable to identify the problem or why it isn't.

Dynamic analysis with Valgrind

Clang-Tidy performs static analysis, Valgrind on the other hand performs dynamic analysis, based on dynamic binary re-compilation, as we have seen in last lecture. Let's see how Valgrind fares in detecting the bug with this program. After compiling your program, we can give it a try like this:

$ valgrind [--tool=memcheck] ./uppercase 'Hello, World!'

(Note: --tool=memcheck is optional because Memcheck is the default tool that Valgrind uses unless otherwise requested.)

Exercise 2.c: does Valgrind detect the issue? Based on what you have learned in last lecture, explain why or why not.

Exercise 2.d: benchmark the time required to execute the program with and without Valgrind. (Tip: as a simple way to do that you can use the standard UNIX time command. If you want to be fancier, this is your chance to learn about the amazing benchmarking tool Hyperfine and use it for a more scientifically accurate comparison.)

Dynamic analysis with LLVM sanitizers

As we have seen in last lecture, dynamic analysis can also be performed via source code instrumentation, which happens at compile time to add checks executed at runtime. As we have seen LLVM sanitizers (look for the various entries with "Sanitizer" in their name in the Clang documentation) are a set of plugins for the Clang compiler that do that.

Exercise 2.e: change the compiler you are using to build from gcc to clang and first rebuild the program with it. (You can pass the same options suggested above, as they are compatible between the two compilers.) Then, add all of the following sanitizers (check the documentation for how to do that) and rebuild the program: AddressSanitizer, LeakSanitizer, UndefinedBehaviorSanitizer.
Compare the size of the executables built with and without sanitizers. How different they are and why?

Exercise 2.f: run the instrumented program. Do sanitizers detect the issue? Based on what you have learned in last lecture, explain why or why not.

Let's streamline our build process a bit, as it is becoming a bit of a mess. Retrieve the generic Makefile we make available for building the programs associated to this lab session. If you are not familiar with Makefiles, go (briefly) through the GNU make documentation to make sure you know the basics. In particular, read through the introductory material and make sure you understand the parts of Makefile syntax used in the given Makefile, in particular: rules, variables, and static pattern rules. (If you know about Makefiles already, feel free to skip this documentation study part.)

Exercise 2.g: change the retrieved Makefile so that:

  1. it only builds the uppercase.c programs (other programs mentioned in there will arrive later in this lab session!), and
  2. passes the sanitizer flags to the compiler when compiling.

Recheck the result of the previous exercise about sanitizers to make sure they are there when building with make.

Memory leak detection

Linked lists

Now retrieve the program linkedlist.c, add it to the Makefile, and compile it as before.

Take some time to study the program and understand what it does.

Exercise 3.a: this program is affected by (at least) two memory-related bugs this time, but of a different nature than before. Identify both of them, explain what they consequences might be, and how to fix them.

Exercise 3.b: try linting/static analysis with Clang-Tidy on this program, as you did before.
Does it detect the (right) issues? (Tip: beware of false positives!)

Exercise 3.c: try source code instrumentation with LLVM Sanitizers on this program, as you did before.
Do they detect the issues?

In both cases feel free to speculate about why the used approaches did or did not identify the memory issues affecting this program.

String parsing

You know the drill!

Next program is bracket-parser.c, that extracts a "substring [within brackets]" from a larger string.

Exercise 4.a: read the program, find the memory problem with it, try Clang-Tidy and LLVM sanitizers on it, explaining what you are seeing.

Exercise 4.b: on most inputs the LLVM sanitizers will not detect a memory issue. Based on your understanding of the bug and your reading of the code (= white-box inspection), can you craft a specific input for bracket-parser that will make the LLVM sanitizer spot the bug at runtime?
(And while we are at it: what does this tell you about the reliability of dynamic analysis to detect memory issues?)

Fuzzing with LibFuzzer

Let's now see if fuzzing can help us finding automatically both the memory issue affecting bracket-parser and a test input that proves the issue is there.

Read the introductory documentation of LibFuzzer. Make sure to understand how to add fuzz targets to your code, what the required API is, and the fact you should not have a main function when using LibFuzzer (because it adds its own main).

Exercise 4.c: add a fuzz target to your code, that allows you to fuzz the input of the parse function of bracket-parser.c. Tip: you can take inspiration from the example seen in lecture, but remember that in this case you should pass well-formed C strings (trailing \0!) to avoid triggering bugs other than the one you are looking for.

Exercise 4.d: build the fuzzer (adding fuzzer to the list of -fsanitize=... sanitizers) and run it.
Does it find the issue? Does it produce a test case with a sample input that proves the issue is there? Explain why.

(Pretty cool, huh?)

Lab 03: Rust and references

In this lab session you will build and run your first Rust program. You will manipulate references and objects lifetimes.

By the end of the session you will have a better understanding of the lifecycle of Rust objects.

Your first Rust program

In this section, you will learn how to create and compile your first Rust program using tools such as cargo.

Anatomy of a Rust program

A basic Rust program is organized as follows:

Cargo.toml
Cargo.lock
src
└── main.rs
target
└── debug
    └── myprogram
  • The file Cargo.toml describes your program (name, version, authors, …).
  • The file Cargo.lock contains information about the version of the other packages used when compiling your program.
  • src contains the source of your program — main.rs is the file containing the main function.
  • target is a directory created by the compiler; it will contain the output of the compilation, including the executable produced from your source files, in debug/myprogram (if myprogram is the name of your program as described in Cargo.toml). The target directory should never be checked in your version control software. It should be ignored completely (e.g., by adding it to your .gitignore if you are using git).

Creating your first Rust program: intro

Exercise 0.a: Create the skeleton of your first Rust program by using the command cargo new:

$ cargo new intro
     Created binary (application) `intro` package

This will create a intro directory with a complete skeleton of a Rust program:

$ tree intro
intro
├── Cargo.toml
└── src
    └── main.rs

(you might not have the tree on your computer – never mind, we will not need it)

Change your current directory to intro and display the content of main.rs:

$ cd intro
$ cat src/main.rs
fn main() {
    println!("Hello, world!");
}

As you can see, the program is pretty basic. You can compile it by using cargo build:

$ cargo build
   Compiling intro v0.1.0 (/tmp/intro)
    Finished dev [unoptimized + debuginfo] target(s) in 0.54s

Your program has compiled without errors. You can execute it by using cargo run:

$ cargo run
    Finished dev [unoptimized + debuginfo] target(s) in 0.00s
     Running `target/debug/intro`
Hello, world!

Note that you can execute cargo run without first executing cargo build: the application will be built automatically.

Editing the Rust program

Exercise 0.b: Using your editor of choice (for example Visual Studio code or NetBeans), edit main.rs so that the main program now displays a count from 1 to 5 inclusive:

fn main() {
    for i in 1..=5 {
        println!("i = {i}");
    }
}

Ensure that your editor is Rust-aware, or install the necessary extensions if needed. For Visual Studio Code, we suggest the installation of the "rust-analyzer" extension.

You can now execute your code. If everything went well, you should see the following output:

$ cargo run
   Compiling intro v0.1.0 (/tmp/intro)
    Finished dev [unoptimized + debuginfo] target(s) in 0.20s
     Running `target/debug/intro`
i = 1
i = 2
i = 3
i = 4
i = 5

You can also compile your program and run the executable directly. As cargo run output indicates it is located in target/debug/intro:

$ cargo build
    Finished dev [unoptimized + debuginfo] target(s) in 0.00s
$ ls -l target/debug/intro
-rwxr-xr-x 2 sam sam 4203904 Jan 16 15:02 target/debug/intro
$ target/debug/intro
i = 1
i = 2
i = 3
i = 4
i = 5

Using a Rust tutorial

There exists an interactive tutorial at https://rust-book.cs.brown.edu/. While reading this tutorial, you will be able to take quizzes and to let annotations that can be re-read later on.

You will not have to read the full tutorial at this time. We suggest that you ensure that you understand the concepts (and pass the quizzes) in the following chapters:

Of course, you are invited to read more chapters and take more quizzes as the lecturers describe new Rust concepts.

If you want to do more

Another way to familiarize yourself with Rust programming is to use the rustlings repository and program. Start by installing the rustings program and clone the rustlings repository containing the exercises:

$ cargo install rustlings
$ git clone https://github.com/rust-lang/rustlings/
$ cd rustlings
$ rustlings watch
The source for this exercise is in `exercises/intro/intro1.rs`. Have a look!
Going forward, the source of the exercises will always be in the success/failure output.

====================

You can keep working on this exercise,
or jump into the next one by removing the `I AM NOT DONE` comment
[…waits forever…]

The latest command rustlings watch will watch your progress while editing the exercises in the exercises directory of the repository.

For example, following the instructions given by rustlings watch, edit the file exercises/intro/intro1.rs with your favourite editor. As an exception, this exercise already compiles succesfully. All you have to do is remove the line // I AM NOT DONE so that rustlings watch will go to the next exercise in the series, exercises/intro/intro2.rs.

This one will not compile. You have to edit the code so that it compiles successfully, then when you see that it does you can remove the // I AM NOT DONE line, and so on.

You can stop doing the exercises whenever you want (just stop the rustlings watch program). When you run rustlings watch again it will start from where you left off.

References

Mutable references

The function std::mem::swap() takes two mutable references on the same type as parameters and swap them.

Swapping values

Exercise 1.a: Complete the following function (and call it from main() to see its effect) so that it has the expected effect:

#![allow(unused)]
fn main() {
fn do_swap() {
    let mut a = String::from("apple");
    let mut b = String::from("bread");
    println!("Before swap: a = {a}, b = {b}");
    todo!(); // REPLACE ME with something using std::mem::swap()
    println!("After swap: a = {a}, b = {b}");
}
}

The output should look like:

Before swap: a = apple, b = bread
After swap: a = bread, b = apple

You may realize that swapping two String values do not move the various characters composing the strings in the heap. Only the three fields making up a String (capacity, pointer, and length) are swapped.

Smallest first

Exercise 1.b: Write a function "smallest_first" with the following prototype which ensures that its first argument is smaller than its second argument and swap them otherwise:

fn smallest_first(a: &mut i32, b: &mut i32) {
    todo!(); // REPLACE ME
}

fn main() {
    let mut a = 42;
    let mut b = 10;
    println!("Before calling smallest_first: a = {a}, b = {b}");
    smallest_first(&mut a, &mut b);
    println!("After calling smallest_first: a = {a}, b = {b}");
    smallest_first(&mut a, &mut b);
    println!("After calling smallest_first again: a = {a}, b = {b}");
}

should display

Before calling smallest first: a = 42, b = 10
After calling smallest first: a = 10, b = 42
After calling smallest first again: a = 10, b = 42

Lifetimes

Sorting an immutable collection of strings

Let us assume that we want to write a function which sorts an immutable collection of String elements. Here, "immutable" means that we cannot modify either the strings or the collection itself. We will have to return a Vec containing the result.

Exercise 2.a: write a function sorted which takes a reference to a slice of String instances and return a Vec<String> with the sorted strings.

The function will have the following signature:

#![allow(unused)]
fn main() {
fn sorted(names: &[String]) -> Vec<String> {
    todo!() // Write the code here
}
}

You can use the following constructs to help you to so:

  • vec![] creates an empty Vec (the type of the elements will be deduced from the context).
  • If s is a String (or a &String), s.clone() will produce a new String with the same content as s.
  • You can iterate over the content of a slice names of type &[String] with for name in names { … }. name will have type &String within the loop.
  • You can use .sort() on a mutable slice: it will sort your elements in the natural order.

You can test your function with the following main program:

fn main() {
    let people = [String::from("John Doe"), String::from("Jane Eyre"), String::from("Jane Doe")];
    let in_order = sorted(&people);
    for n in in_order {
        println!("- {n}");
    }
}

This should print "Jane Doe", "Jane Eyre", and "John Doe" in order.

Avoiding cloning the strings

While the previous program works, you had to clone the immutable strings. This is unfortunate as this consumes more memory. Why not work directly with a Vec of &String and sort it?

Exercise 2.b: modify your sorted function so that it returns a Vec<&String> instead of a Vec<String>.

Note that .sort() will work on a collection of &String: it will sort them the same way it sorted String elements.

Testing the guarantees of lifetimes

The signature of your sorted function is:

#![allow(unused)]
fn main() {
fn sorted(names: &[String]) -> Vec<&String>
}

Since you have not indicated any explicit lifetime (remember, lifetimes start with a single tick character '), lifetimes elision kicked in: the lifetime of the &[String] slice was copied over the &String elements inside the Vec.

It means that the result of sorted() cannot live longer than its argument names.

Exercise 2.c: experiment with the following modified main program, which will not compile, and make sure you understand why the compiler is telling you that you cannot store the output of sorted(&people) into in_order. Note how it prevents you from referencing strings that have been destroyed when people went out of scope. You would not have had any such guarantee in C.

fn main() {
    let in_order = {
        let people = [String::from("John Doe"), String::from("Jane Eyre"), String::from("Jane Doe")];
        sorted(&people)
    };
    for n in in_order {
        println!("- {n}");
    }
}

Lifetimes: a complex case

Warning: in this page, we will touch several Rust areas that have not yet been explained in class. Do not hesitate to read the Rust documentation or to take things as granted for now.

Exercise 3.a: follow the code described below, implement it, make sure you understand how it works.

Problem statement

Sometimes, we would like to manipulate a string, for example to make it lowercase. However, the string we manipulate might already be in lowercase: in this situation, we would like to avoid copying the string and return a reference to the string we got as a parameter.

In short, we will need a type which can hold:

  • either a proper String if we had to build it;
  • or a reference to another string (as a &str) if we could just reuse an existing one.

Building such a type

We will build a type, named StringOrRef, which can either hold an owned String or store a borrowed &str reference.

Enumerated types

Rust has support for enumerated types with values, which means that we can write something like (incorrect for now):

#![allow(unused)]
fn main() {
pub enum StringOrRef {
    Owned(String),
    Borrowed(&str),
}
}

A StringOrRef object will be either a StringOrRef::Owned or a StringOrRef::Borrowed variant. When such an object is destroyed, it will destroy its content:

  • If it is a StringOrRef::Owned, it will destroy the String it owned (and thus free the heap memory associated with the String).
  • If it is a StringOrRef::Borrowed, it will destroy the &str it owned, which does nothing since destroying a reference does not destroy the object it points to (as the reference does not own it).

Fixing the type

Our type will not compile because we haven't told the compiler how long the reference in StringOrRef::Borrowed is supposed to be valid. Since we do not know that ourselves in advance, we have to make it a generic parameter of the type:

#![allow(unused)]
fn main() {
pub enum StringOrRef<'a> {
    Owned(String),
    Borrowed(&'a str),
}
}

We say that the type StringOrRef is parameterized by the generic lifetime parameter 'a.

We can create an owned value by using StringOrRef::Owned(String::from("Hello")), or make a reference to a string s with StringOrRef::Borrowed(&s). In the former case, the lifetime 'a can be anything since the StringOrRef object owns the string. In the later case, 'a will be set to the lifetime of the referenced string s:

Important note: when a type is parameterized by a generic lifetime parameter, an object of this type can never live longer than this lifetime. For example, if s has a lifetime 's, StringOrRef::Borrowed(s) is an object that cannot live longer than 's. This is intuitively sound: since we store a reference to s (wich has lifetime 's) inside our StringOrRef, the StringOrRef cannot survive the disappearance of s as that would leave us with a dangling pointer.

Exploring the type

Our type can be used through pattern-matching:

#![allow(unused)]
fn main() {
fn display(x: &StringOrRef<'_>) {  // '_ means that the lifetime has no importance here
    match x {
        StringOrRef::Owned(s) => println!("owned string: {s}"),
        StringOrRef::Borrowed(s) => println!("borrowed string: {s}"),
    }
}
}

We can also write a function which returns a &str from our object:

#![allow(unused)]
fn main() {
pub fn as_str<'a>(x: &'a StringOrRef<'_>) -> &'a str {
    match x {
        StringOrRef::Owned(s) => &s,
        StringOrRef::Borrowed(s) => s,
    }
}
}

Note how we didn't have to give the lifetime of the StringOrRef generic parameter 'a and used '_ which means "any lifetime": since the StringOrRef reference has a lifetime of 'a which is necessarily shorter or equal than the generic lifetime parameter (see the "Important note" above), we now that the returned reference is shorter than the one used as a generic parameter.

Implementing as_str() as a method

Rather than using a standalone function, we can implement as_str() as a method on StringOrRef objects. Methods are implemented in a impl block. In an impl block, Self designates the type itself. In methods parameters, self in first position designates receiving the current object (it is a shortcut for self: Self), &self is a shortcut for self: &Self and &mut self is a shortcut for self: &mut Self.

Let us rewrite as_str() as a method:

#![allow(unused)]
fn main() {
impl StringOrRef<'_> {
    pub fn as_str(&self) -> &str {
        match self {
            StringOrRef::Owned(s) => &s,
            StringOrRef::Borrowed(s) => s,
        }
    }
}
}

You can note some interesting points about lifetimes:

  • We used <'_> in our impl block: our method defined in this block works with any generic lifetime parameter, as explained below.
  • We didn't explicitely write in the as_str() signature that the returned &str has the same lifetime as &self. This mecanism is called "lifetime elision": when a method has a &self parameter, by default all outputs lifetime which are not explicit will have the same lifetime as &self. This is a shortcut for pub fn as_str<'a>(&'a self) -> &'a str.

Using or StringOrRef type

We can now use our StringOrRef type. For example, let us write a function which returns a lowercase version of a string, but allocates memory on the heap only when the string is not lowercase already:

#![allow(unused)]
fn main() {
// The lifetime of s will be copied into the generic lifetime parameter of StringOrRef.
// Again, this is because of elision rules: if there is only one lifetime parameter in
// the input, it will be copied into all non-explicit lifetime parameters in the output.
pub fn to_lowercase(s: &str) -> StringOrRef<'_> {
    if s.chars().all(|c| c.is_lowercase()) {
        // All characters in the string are lowercase already, return a reference
        StringOrRef::Borrowed(s)
    } else {
        // We need to create a new String with a lowercase version
        StringOrRef::Owned(s.to_lowercase())
    }
}
}

We can now use it in our main program and see that it works:

fn main() {
    let s1 = to_lowercase("HeLlO");
    let s2 = to_lowercase("world");
    println!("s1 = {}, s2 = {}", s1.as_str(), s2.as_str());
}

This will display "s1 = hello, s2 = world". Nothing indicates that "world" has not been copied. Let's enhance the program with the matches! macro which can test if some expression matches a pattern, as in a match expression:

fn variant(x: &StringOrRef<'_>) -> &'static str {
    if matches!(x, StringOrRef::Owned(_)) {
        "owned"
    } else {
        "borrowed"
    }
}

fn main() {
    let s1 = to_lowercase("HeLlO");
    let s2 = to_lowercase("world");
    println!("s1 = {}, s2 = {}", s1.as_str(), s2.as_str());
    println!("s1 is {}, s2 is {}", variant(&s1), variant(&s2));
}

The output is now

s1 = hello, s2 = world
s1 is owned, s2 is borrowed

as expected. Neat eh?

Adding a destructor

When an StringOrRef object is dropped (goes out of scope), it will get destroyed: the destructor for every field will be called (if any). For example, if it holds a StringOrRef::Owned variant, the String contained in this variant will be dropped and its destructor will be called, freeing memory on the heap.

We can visualize what happens by adding a destructor on StringOrRef. It is done by implementing the Drop trait:

#![allow(unused)]
fn main() {
impl Drop for StringOrRef<'_> {
    fn drop(&mut self) {
        print!(
            "Destroying the StringOrRef containing {} which is {}: ",
            self.as_str(),
            variant(self),
        );
        if matches!(self, StringOrRef::Owned(_)) {
            println!("memory on the heap will be freed");
        } else {
            // Dropping a reference doesn't free memory on the heap
            println!("no memory on the heap will be freed");
        }
    }
}
}

If we execute our program, we will now read:

s1 = hello, s2 = world
s1 is owned, s2 is borrowed
Destroying the StringOrRef containing world which is borrowed: no memory on the heap will be freed
Destroying the StringOrRef containing hello which is owned: memory on the heap will be freed

s2 and s1 are destroyed in the reverse order of their creation when they go out of scope. No memory on the heap was ever allocated for string "world", which comes from a read-only memory area and has only been referenced. However, the string "hello" has been built into the heap while lowercasing the string "HeLlO" and needs to be freed: this happens automatically when dropping s1.

Conclusion

Types in Rust are powerful and allow easy memory management without needing a garbage collector. Doing the same thing in C would require extra fields (is the string owner or borrowed, code the deallocation by hand). We will later see even more powerful type manipulation in Rust.

Lab 04: Error handling

In this lab session you will:

  • manipulate APIs that can return errors, such as network or disk ones;
  • use external crates (i.e. Rust libraries);
  • write regular expressions;
  • determine which French people are trending in the news;
  • display errors with colors.

By the end of the session you will have a better understanding of the way errors are typically handled in Rust.

Web page retrieval

In this part, you will have to retrieve a web page from a given URL and return its content as a String or return a proper error if the page retrieval fails.

Initialization

But first of all, ensure that your Rust installation is up-to-date by typing:

$ rustup update

If you get an error, it means that you have not installed Rust through rustup. In this case, make sure that your system is up-to-date.

For this lab, you will create a new Rust program named "lab4". Create a new binary (executable) project by typing

$ cargo new --bin lab4
$ cd lab4

Everything you will do today will happen in the lab4 directory, in particular in the src/main.rs file.

Adding a dependency

You could make HTTP requests "by hand" by opening a TCP socket to the right host and port, and by sending HTTP commands and parsing the responses. However, you can take advantage of existing libraries written in Rust that can do that already.

The library ("crate" in Rust terminology) you will want to use is called reqwest (this is not a typo). In order to use it in your program, you have to add it to Cargo.toml in the [dependencies] section.

Rather than adding it by hand, you can use the cargo add command. Moreover, we want to use the "blocking" API of reqwest which is the simplest one at this stage. This "blocking" API is enabled by requesting (ah ah) the "blocking" feature or reqwest:

$ cargo add reqwest --features blocking

If you look at your Cargo.toml, you should see something like:

[dependencies]
reqwest = { version = "0.11.22", features = ["blocking"] }

It indicates that your Rust program will use version 0.11.22 of the "reqwest" crate with the "blocking" feature enabled. 0.11.22 is the latest published version of the crate at the time this page has been written.

Adding the "reqwest" crate as a dependency means that you will be able to use its types and functions by prefixing them with reqwest::.

Fetching a web page

Your first function will retrieve a web page from its URL using the blocking API of reqwest, whose documentation is accessible online.

Exercise 1.a: write a function with signature fn get(url: &str) -> Result<String, reqwest::Error> which returns the content of the web page located at url (use a code similar to the one in the documention).

Use the following main() program to test it:

fn main() -> Result<(), reqwest::Error> {
    println!("{}", get("https://rfc1149.net/")?);
    Ok(())
}

Note how main() can return a Result<(), E> instead of returning nothing (which is written () in Rust and is the equivalent of void in C-like languages). Either get("https://rfc1149.net/") returns an error and it will be propagated to main() by the ? operator, or it returns a String which will be displayed. At the end of the main() program, Ok(()) ensures that () is returned in an Ok.

Returning a better error

Now, try changing the URL with one returning a 404 (not found) code. You can use "https://rfc1149.net/nonexistent", which does not exist.

Note how your get() function returns without an error: it returns the content of the error page. This is not a good idea: in your program, you only want to return pages which were successfully found.

However, you will not be able to return a reqwest::Error to indicate that the page was not found, as it is not an existing error condition for reqwest::Error. You will have to write your own Error type with two variants for now:

#![allow(unused)]
fn main() {
#[derive(Debug)]
enum Error {
    Reqwest(reqwest::Error),
    BadHttpResult(u16),
}
}

This Error type can be a Error::Reqwest and encapsulate a reqwest::Error, or it can be a Error::BadHttpResult and encapsulate the HTTP error code returned by the web server (for example 404 for "not found").

The #[derive(Debug)] will be explained in a later class. For the time being, you just need to know that it will allow an Error object to be displayed by the main() program if needed. Also, you can display an Error yourself by using the {:?} placeholder:

#![allow(unused)]
fn main() {
    let my_error = Error::BadHttpResult(404);
    println!("The error is {my_error:?}.");
}

would display

The error is Error::BadHttpResult(404).

Also, in order to take advantage of the automatic conversion performed by the ? operator, you want to implement From<reqwest::Error> for your Error type:

#![allow(unused)]
fn main() {
impl From<reqwest::Error> for Error {
    fn from(e: reqwest::Error) -> Error {
        // Encapsulate the error into a Error::Reqwest variant
        Error::Reqwest(e)
    }
}
}

Exercise 1.b: update your get() function so that it checks the status code of the response before reading its text. Your get() function will return a Result<String, Error>, and your main() function will return a Result<(), Error>, in order to accomodate both error conditions.

Look at the documentation for the reqwest::blocking::get() method: what is its return type? What method can be called on a Response to get a StatusCode and compare it with StatusCode::Ok? How can you get the numerical u16 code of a StatusCode?

Note: instead of typing qualified type names such as reqwest::StatusCode, you can add a use reqwest::StatusCode; at the beginning of your program: this will import StatusCode in your namespace, and you will be able to use StatusCode instead of the longer reqwest::StatusCode.

Check that your new version of get() works by ensuring that an error is displayed when trying to print the content of "https://rfc1149.net/nonexistent". You should see a 404 error.

Name identification

Now that you are able to retrieve a web page, you will try to identify names written in it. A name looks like "John Doe": an uppercase letter followed by lowercase letters followed by a space followed by an uppercase letter followed by lowercase letters.

Rather than looking for such a pattern yourself, you might want to use the "regex" crate which implements regular expressions.

Using the "regex" crate

Add the "regex" crate to your Cargo.toml:

$ cargo add regex

From now on, you are able to use the entities defined in this crate, in particular the regex::Regex type. To be able to refer it as Regex instead of regex::Regex, you might want to put a use regex::Regex; near the beginning of your program.

Of course we want our regular expression to match French names containing accented characters. You can use the following regular expression patterns to identify uppercase and lowercase letters:

  • \p{Uppercase} will match any Unicode uppercase letter (for example it would match the Greek delta letter "Δ")
  • \p{Lowercase} will match any Unicode lowercase letter, such as the Greek delta letter "δ"
  • + after a pattern means "one or more"

You can then deduce that a name component can be represented as \p{Uppercase}\p{Lowercase}+. This would match "John", "Doe", "François" or "Strauß". The whole regular expression can then be \p{Uppercase}\p{Lowercase}+ \p{Uppercase}\p{Lowercase}+ (two components separated by a space).

Creating the Regex

A regular expression is created using Regex::new(), which returns a Result. Since we know that our expression is valid, we can call unwrap() on the result:

#![allow(unused)]
fn main() {
    let re = Regex::new("\p{Uppercase}\p{Lowercase}+ \p{Uppercase}\p{Lowercase}+").unwrap();
}

However, it you do this, you will likely get an error: \p in the string is interpreted as an escaped p. As \n represents a "line feed", \p represents… nothing known. This is an invalid escape sequence in a string.

Fortunately, Rust has raw strings, in which no escape character is recognized. Also, you can put anything in a raw string, including the " character, as long as Rust can determine the end of the string. The syntax of a raw string is r#"…"#: the start is r#" and the end is "#:

#![allow(unused)]
fn main() {
   let s = r#"This is a string with some "quotes" in it"#;
}

But what if you need to put a quote followed by a pound sign ("#) in the raw string? This is easy, you can change the start and end marker and increase the number of pound signs provided they match:

#![allow(unused)]
fn main() {
   let s = r###"You need a quote + 3 pound signs to end the string"###;
   let t = r###"You can put "## inside without any problem!"###;
}

Writing the extract_names() function

Using the .find_iter() method on a regex::Regex object, you can iterate over the matches found in the input string as shown in the following example which displays all sequences of uppercase characters (with at least two of them) found in the file "text.txt":

fn main() {
    // Read the content of the "text.txt" file into variable s
    let s: String = std::fs::read_to_string("text.txt").unwrap(); 
    // Match at least two consecutive uppercase character
    let re = regex::Regex::new(r#"\p{Uppercase}{2,}"#).unwrap();
    println!("All uppercase sequences found:");
    for m in re.find_iter(&s) {
        // Inside the loop m is a regex::Match, which as a .as_str() method
        println!("  - {}", m.as_str());
    }
}

Exercise 2.a: write a function with signature fn extract_names(s: &str) -> Vec<String> which returns all plausible names in a string.

In this function, you will (those are mere suggestions, you are free to do it any other way you see fit):

  • create a re variable containing a Regex with the pattern seen above
  • create an empty vector (using vec![]) and store it into a mutable variable (this is the vector you will return at the end of the function)
  • iterate over re.find_iter() to find all the matches
  • on every match m you can call .as_str() to get a &str representing the text of the match (for example "John Doe"), that you can transform into a string using .to_owned() or String::from() (as usual)
  • push the String in the Vec<String> that you plan to return (.push() is the method you want to use)
  • return the vector

Try your function with the following main() function:

fn main() {
    let names = extract_names("Yesterday, John Doe met François Grüß in a tavern");
    println!("{names:?}");
}

Since String implements Debug, Vec<String> also implements Debug and can be displayed using the {:?} placeholder which is handy for debugging.

Deduplicating the names

Unfortunately, we may end up with duplicates. If a text contains the same name several times, it will be present several times in the output.

A vector is not really the best structure to represent a set of objects, like a set of names. We do not care about the order, only about the presence of a name.

Rust has a std::collections::HashSet<T> type in its standard library (std). Provided that you added a use std::collections::HashSet; near the beginning of your program, you can:

  • create a new HashSet<T> with HashSet::new()
  • insert an element in a set h with h.insert(element); nothing happens if the element is already present in the set
  • iterate over the elements of a set h like you did with a vector: for element in h { … } (if h is a HashSet<T>, element will be a &T in the loop)
  • display a HashSet<T> using {:?} as long as its element type T implements Debug

Exercise 2.b: make your extract_names() function return a HashSet<String> (instead of a Vec<String>).

Using the following main() function, you will see that there can be no duplicates:

fn main() {
    let names = extract_names("John Doe, François Grüß, John Doe");
    println!("{names:?}");
}

Returning the names in a page

Exercise 2.c: add a function with signature fn names(url: &str) -> Result<HashSet<String>, Error> which returns all plausible names in a web page.

Check your function by displaying the names present on "https://www.liberation.fr".


Extra: if you have more time

If you have more time, you can fix the name detector such that it accepts multipart names, such as "Jean-Pierre Elkabbach".

Sets intersection

If you have two HashSet<T> h1 and h2, you can iterate over their intersection which is denoted by h1.intersection(&h2). This will iterate over all common elements using a reference of type &T.

Exercise 3.a: write a function with signature fn trending() -> Result<Vec<String>, Error> which returns a vector with the names present in both "https://www.liberation.fr" and "https://www.mediapart.fr". The vector will be sorted before being returned.

Print the trending names from your main() function:

fn main() -> Result<(), Error> {
    println!("Trending names:");
    for name in trending()? {
        println!("  - {name}");
    }
    Ok(())
}

File system

We want to store the list of trending people in a file on a disk.

Exercise 4.a: write a function with signature fn write_to_disk(path: &str, data: Vec<String>) -> Result<(), io::Error> which writes every string in data on its own line in file located at path.

The method join("\n") applied to a Vec<String> will return a new String with the \n character (line feed) inserted between each line. Do not forget to add an extra '\n' character at the end of the returned string.

To write a string content into a file file_name, you can use std::fs::write(file_name, content) which returns a Result<(), std::io::Error>.

Exercise 4.b: modify your main() function such that you write the trending names into a file named "trending.txt".

You will have to add a new variant to your Error enum in order to encapsulate the std::io::Error if there is a problem when writing a file.

Check that the error detection and reporting works fine by attempting to write in a file you cannot write to (such as "/trending.txt", you most likely don't have the rights to write at the root of your file system).

Easier errors

You may have noticed that it is rather cumbersome to write all those From methods to convert existing errors into our own variant. Fortunately, a crate exists to make this job easier: thiserror.

After adding thiserror to your Cargo.toml

$ cargo add thiserror

you will be able to define your Error type as follows:

#![allow(unused)]
fn main() {
#[derive(Debug, thiserror::Error)]
enum Error {
    #[error("Network error: {0}")]
    ReqwestError(#[from] reqwest::Error),
    #[error("Wrong HTTP status code: {0}")]
    BadHttpResult(u16),
    #[error("I/O error: {0}")]
    Io(#[from] std::io::Error),
}
}

Our Error type looks the same as before with a few additions:

  • It derives thiserror::Error. This is the heart of the thiserror crate and makes the other things below work. Behind the scenes, it uses "procedural macros" which is a way to transform some Rust code into another Rust code during the compilation.
  • The #[error] attribute indicates how to display the error. {0} refers to the first field. The fields could also be named, a field xyz would use {xyz} in the #[error] attribute.
  • The #[error] attribute on the field of a variant indicates that a From implementation should be generated from the type following #[from] to an Error.
  • Your Error type will implement the std::error::Error trait which allows proper error chaining: the system will know (and will be able to display) that the cause of a Error::ReqwestError is a reqwest::Error, which might also be caused by another deeper error, and so on.

Exercise 5.a: modify your Error type to use thiserror and remove your existing From implementations.

Check that everything still works, including when you make errors happens by changing the urls or the file name.

Prettier messages

You might notice that the error messages printed by default by main are quite ugly. This is due to the default Debug implementation for the error types.

Fortunately we can do better by using the anyhow crate. This crate adds a anyhow::Error type which is compatible with all errors implementing the std::error::Error trait.

Exercise 6.a: after adding the anyhow crate to your Cargo.toml (you should know how to do it by then), modify your main() function so that it returns a Result<(), anyhow::Error>.

Note that in anyhow you will find the following definition

#![allow(unused)]
fn main() {
type Result<T> = std::result::Result<T, anyhow::Error>;
}

(std::result::Result being the "default" Result type that we use)

It means that you can use anyhow::Result<T> everywhere you would be using Result<T, anyhow::Error>, as a shortcut.

Exercise 6.b: make your main() function return a anyhow::Result<()>.

Look at the produced errors. Aren't they beautiful, compared to what we had before?

Even prettier messages

There exists an even better crate derived from anyhow, which is called eyre. The idea of eyre is the same as anyhow: the error type is called eyre::Report (instead of anyhow::Error), and eyre::Result<T> can be used instead of anyhow::Result<T>.

In addition, reporters can be installed to display the eyre::Report errors in a particular way. For example, the [color-eyre](https://crates.io/crates/color-eyre) crate will display the messages with colors to facilitate error reading. All you have to do is call color_eyre::install()?; at the beginning of your main() function.

Exercise 7.a: add the eyre and color-eyre crates to your Cargo.toml, initialize color_eyre as explained above, and make your main() function return a eyre::Result<()>.

Now make some errors happen and see how beautifully they are rendered.

Some thoughts about error handling

In our case, we never use the fact that we can match on the various variant of our Error type. We could have used Result<T, eyre::Report> (or the equivalent eyre::Result<T>) as the return type of all our functions that can fail.

However, this is not always the case: if you are developing a library, your users might want to get precise information about an error you give them back. In this case, it is important to use an Error enumeration allowing the library user to match the error against the various cases.

But here, let's simplify our code:

Exercise 7.b: remove your Error type, and make every function that can fail return a eyre::Result<T> (where T is the type of a successful result).

However, you still have the problem of BadHttpResult which is not an existing error. Don't worry, you can return prematurely with a textual error by using the macro eyre::bail! with works like format!() or println!():

#![allow(unused)]
fn main() {
   // Return an error immediately with the given formatted text
   eyre::bail!("This returns an eyre::Report with {}", 42);
}

Adding context

You may have noticed that the error message describes what the error is but not what you were trying to do. For example, when a web page fails to be retrieved, the URL is not shown anywhere, or when the file cannot be created, the file name is not to be seen.

By adding use eyre::Context; near the beginning of a file, you import an extension trait of the Result type which allows some contextual information to be added.

Let us take the example of the write_to_disk() function:

#![allow(unused)]
fn main() {
fn write_to_disk(path: &str, data: Vec<String>) -> eyre::Result<()> {
    let mut f = std::fs::File::create(path)?;
    for l in data {
        f.write_all(format!("{l}\n").as_bytes())?;
    }
    Ok(())
}
}

Every intermediate Result can be enriched with a context if it contains an error before the error is returned:

#![allow(unused)]
fn main() {
fn write_to_disk(path: &str, data: Vec<String>) -> eyre::Result<()> {
    let mut f = std::fs::File::create(path).context(format!("Unable to create {path}"))?;
    for l in data {
        f.write_all(format!("{l}\n").as_bytes())
            .context(format!("Error while writing {path}"))?;
    }
    Ok(())
}
}

In case of an error, the context will be displayed:

$ cargo run
Error: 
   0: Unable to create /trending.txt
   1: Permission denied (os error 13)

Location:
   src/main.rs:45

[…]

Exercise 7.c: add a context to all errors before returning them.

Check by using a bad URL or a bad file name that the context is correctly printed and contains all the useful information.

Congratulations, you are now a master in Rust error handing.

Lab 05: Complex data structures

In this lab session you will learn about:

  • Implementing recursive data structures in Rust;
  • Implementing standard traits for both your own and foreign data types;
  • Implementing your own iterators, to refactor iteration logic;
  • Making all of the above generic, with appropriate trait bounds.

Implementing a mutable linked list

We want to implement a singly linked list in Rust, capable of storing i32 values in its nodes. As you certainly know, such a data structure is recursive, made of two types of nodes, with each non-empty node pointing to the next node until the final (empty) node is reached, like this:

Rust enums are the right tool for the job, so as a first approximation we can start from:

#![allow(unused)]
fn main() {
pub enum List {
    Nil,
    Cons(i32, List),
}
}

Exercise 1.a: Try the above definition to see if it works. If not, explain in your own words why it doesn't, building upon the knowledge of some of the traits we have learned about in today's lecture.

Exercise 1.b: The Box type that you have learned about allows to put any value into a box of a fixed size. Use it to refine your List type definition so that it compiles. Then, create by hand (using Nil, Cons, and Box methods) sample list values:

  • the empty list,
  • a list containing one element,
  • a list containing two elements,
  • up to the list depicted in the figure above.

To help debugging the values you are building, implement the Debug trait for List, so that you can print your lists with the {:?} format string.

Exercise 1.c: Let's now use the linked list to build a proper abstract data type: a stack. Create a struct Stack that uses List internally to store its elements but also has a separate size field to keep track of the list size. All stack manipulations must respect the invariant that size always corresponds to the total number of non-Nil cells in the list. Add to your Stack an impl block that implements at least the following methods:

  • new that creates a fresh empty stack,
  • len that returns the stack length,
  • push that adds an i32 on top of the stack,
  • pop that removes and returns the top element (if it exists).

Make sure push and pop modify the stack list in place. (But feel free to implement an alternative purely functional version of the stack—returning fresh versions of the stack at each method call—as a separate exercise!)

Tip: due to memory ownership rules, implementing push and pop might be trickier than you would expect based on experience with other programming language. If you get stuck, have a look at the std::mem::replace stdlib function and ponder how it can help.

Standard traits

Implementing standard traits for your own types make them work more nicely (and more idiomatically) with the rest of the Rust ecosystem. So let's make the data types you have defined well-behaved ecosystem citizens!

Exercise 2.a: Implement the Display trait for your Stack (and possibly also for List, if it helps), so that stacks can be pretty-printed like this:

#![allow(unused)]
fn main() {
let mut s = Stack::new();
s.push(10);
s.push(20);
s.push(30);
println!("stack content: {s}");
}
stack content: [ 30 20 10 ]

Exercise 2.b: You lists and stacks, after all, are just glorified integer vectors. It makes sense to ease conversions between them and Vec<i32> and we have seen how the From trait is the idiomatic Rust way for supporting type conversions. Implement conversions in both directions, from Vec<i32> to List and from List to Vec<i32>. Once you have done this, you can do the same for Stack, by simply calling into the conversions you have already implemented for List (with some little extra work to precompute the stack length).

At the end, something like the following should work for your types:

#![allow(unused)]
fn main() {
// Vec<i32> -> List
let v = vec![1, 2, 3];
assert_eq!(
    List::from(v),
    Cons(1, Box::new(Cons(2, Box::new(Cons(3, Box::new(Nil))))))
);

// List -> Vec<i32>
let l = Cons(1, Box::new(Cons(2, Box::new(Cons(3, Box::new(Nil))))));
assert_eq!(
    vec![1, 2, 3],
    Vec::<i32>::from(l)
);
}

Note that this is the first time we implement a trait for a type in a different crate than our own, specifically we are implementing From<List> for Vec<i32>, which is a generic type defined in the standard library. Doing so is permitted by Rust provided that you respect the orphan rule, i.e.: either the trait or the type for which you are implementing the trait must be defined in the same crate as the implementation.

Exercise 2.c: Implement the FromStr trait for your Stack (and possibly List), so that stacks can be parsed from string like this:

#![allow(unused)]
fn main() {
let mut s1 = Stack::new();
s1.push(10);
s1.push(20);
s1.push(30);

let s2: Stack = "[ 30 20 10 ]".parse().unwrap();

assert_eq!(s1, s2);
}

Refactoring with iterators

By this point, your code probably contains a number of places where you iterate over the elements of a list to, e.g., print its content, count its elements, fill an empty vector with them, etc. As fans of the DRY principle we cannot stand this situation, let's refactor our way out of this! Obviously, the situation calls for the introduction of an iterator over list elements.

Exercise 3.a: Add to your List data type a iter() method that returns a type implementing the Iterator trait. When done, the following should work out of the box:

#![allow(unused)]
fn main() {
let l = Cons(1, Box::new(Cons(2, Box::new(Cons(3, Box::new(Nil))))));
let mut i = l.iter();
assert_eq!(i.next(), Some(1));
assert_eq!(i.next(), Some(2));
assert_eq!(i.next(), Some(3));
assert_eq!(i.next(), None);
}

Exercise 3.b: Make your List implement IntoIterator, so that your lists can work out of the box with for loops, like this:

#![allow(unused)]
fn main() {
let l = Cons(1, Box::new(Cons(2, Box::new(Cons(3, Box::new(Nil))))));
for e in &l {
    println!("list element: {e}");
}
}

Exercise 3.c: Refactor previous code you have written for List (and/or Stack) to use iterators so that you can avoid repeating the iteration logic in multiple places.

Exercise 3.d: (Bonus) Try to make your iterator(s) work like those of stdlib collections, specifically:

#![allow(unused)]
fn main() {
for element in &collection { ... }      // produce shared refs (&i32 in our case), do not consume collection
for element in &mut collection { ... }  // produce mutable refs (&mut i32), do not consume collection
for element in collection { ... }       // produce items (i32), consume the collection
}

Going generic

Wait, your data structures are not specific to i32 values at all! It is perfectly legitimate to have lists and stacks containing different types. Let's make them generic data structures that can do so.

Exercise 4.a: Generalize your implementation of List so that it takes a type parameter T and can hold value of as many different types as possible.

Tip: in the process you will face probably face a dilemma, either give up on some of the features and traits you have implemented in the past for i32 lists, or add trait bounds to restrict the list of types your lists can contain. It's up to you to decide what to do, as long as you understand the implications.

Exercise 4.b: Your lists are nice because they are uniform, each list can contain multiple values of the same type. In some cases is desirable to create non-uniform lists, containing values of different types. By using trait objects, create a new data structure that can contain values of different types, that all share a trait (e.g., Display, so that you can easily show what your lists contain).

Try filling your non-uniform lists with values of different types and verify that you can still pretty-print them to stdout with println!("{}", ...).

Now try to add a .len() method to your non-uniform lists. Can you?
How about a .min() method, analogous to the iterator consumer we have seen in class?


PS No! You don't need to implemented linked lists by hand in Rust, you have doubly-linked lists already available in the stdlib. Also, for most practical purposes, vectors are much better anyway.

Lab 06 — Packaging, testing, and fuzzing

In this lab session you will:

  • create packages and crates;
  • test them in various ways (unit, integration, doc);
  • track and maximize the code coverage of your tests;
  • property test and fuzz Rust code to find unexpected bugs.

By the end of the session you will be familiar with the Rust tools and techniques used to perform "serious sanity checks" on production Rust code.

Packaging Rust code

Consider the following structure representing user accounts. Each account has two String fields: a public username and a private password.

#![allow(unused)]
fn main() {
pub struct Account {
    pub username: String,
    password: String,
}
}

Exercise 1.a: implement the following associated functions to create an Account from a pair of string( slice)s and parse one from an existing string:

use std::str::FromStr;

impl Account {
    /// Create a new account with a given username and password.
    fn new(username: &str, password: &str) -> Self {
        todo!()
    }
}

impl FromStr for Account {
    /// Create a new account by parsing its data from a string in the form "USERNAME:PASSWORD".
    fn from_str(s: &str) -> Result<Self, Self::Err> {
        todo!()
    }
}

fn main() {
    let user1 = Account::new("zack", "cne3bfvkw5eerv30jy3990yz0w");
    let user2 = "sam:bf6ddgabh6jghr89beyg3gn7e8".parse::<Account>().unwrap();
    let user1_name = &user1.username;
    let user2_name = &user2.username;
    println!("let's welcome our new users: {user1_name} and {user2_name}");
    // let user1_pwd = &user1.password; // error[E0616]: field `password` of struct `Account` is private
}

A few points worth nothing:

  • The FromStr trait is a variant of the From trait you have already seen, specifically meant to parse data from strings. When implemented, it enables the &str::parse() function, used here in main().
  • To implement FromStr you will have to pick an Err type.
  • The todo! macro is very useful for code prototyping; it will make your code panic at runtime, but pass type checking. have a look at its documentation for details.

When done make sure that the above main function runs without panicking.

Exercise 1.b: put your code into a new package called accountmgr containing two crates: one library crate (implementing the type alias and the functions new_account and parse_account) and one binary crate containing the main function. Make sure your crates build and that the binary runs without panicking.

Now try to uncomment the last line of main, to make sure the module system does not allow you to access private fields from outside the defining module.

Remember: you now have two completely separate crates: a library crate with a public API that does not export the password field of the Account structure; and a binary crate using it as client code via that API.

Exercise 1.c: (optional) in the next few sections you will test the functionalities of your accountmgr crate, which for now is quite bare bone. To make things more interesting you can extend it by adding the following functionalities, with the API you please:

  1. Add a separate constructor function that randomizes password creation instead of taking one as input. For this you can use the rand crate and implement various "strong password" strategies to produce high-entropy passwords.
  2. Add a function that returns the hashed version of a password account. You can use the sha1 crate for this.
  3. Add a batch constructor fn from_file(filename: &Path) -> Vec<Self> to read account data from file, one account per line. Here is a file with many accounts (~20k) that you can use for testing purposes.

Later on, when asked to test accountmgr in the remainder of this lab session, also add tests for these additional functionalities! If you decide not to do this exercise right now, but finish early with the rest of the lab assignments, come back to this one later, add missing functionalities above, add corresponding tests, etc., until done.

Testing Rust code

Unit testing

Unit tests are shipped together with the implementation code of a crate. They have access to all code entities in the implementing module, no matter their visibility from other modules.

Exercise 2.a: add a sub-module tests to the library crate of your accountmgr package. Add to it basic unit tests to convince yourself that the functionalities you have implemented thus far are correct. It is up to you what test cases you want to use, but follow the usual rule of thumbs of testing weird/edge cases as well as some easy ones. Make sure that cargo test pass. Make sure that you also test the internal private state of your values (e.g., that a password field is properly set to the same value passed to the constructor), because unit tests are your only chance to do so.

Integration testing

Integration tests are shipped in separate files located under the top-level tests/ directory and are compiled to independent binary crates linked to the main library crate. As such, they can only access the public API of your library crate and not its private entities.

Exercise 2.b: move all the tests you can from the tests sub-module to integration tests located under tests/ and make sure that cargo test still pass. It should be possible to turn most of them to integration tests, but not all of them. Which tests couldn't you turn into integration tests and why?

Documentation tests

Documentation tests are integration tests that reside alongside with the implementation code, in docstrings.

Exercise 2.c: Add documentation to your code in the form of docstrings. At the very minimum, for each public entity in the module provide a succinct explanation of what it does. Check the documentation of rustdoc on how to write documentation and the details about the docstring markup language. Verify by running cargo doc that documentation is properly generated under target/doc/ for all the crates related to your package.

Exercise 2.d: Either add new or turn some of your existing integration tests into documentation tests integrated into your docstrings. Try to come up with doctests that are actually useful as documentation, because that is one of the key points of doctests. For example, have you documented what should happen when trying to parse: a (1) well-formed account string, (2) an empty account string, (3) a non-empty account string that lacks the colon separator? If not, this is your chance to do so! Make sure that cargo test still pass.

Code coverage

In a previous lab session you have seen how to use code coverage in C/C++ to guide the addition of test cases, so that all your code is exercised during the execution of a test suite. Code coverage tooling is available for Rust as well. You are going to give a try to Tarpaulin a code coverage tool for Rust, that can trace exercised code during test suite execution via either ptrace on Linux/x86_64 (by default) or LLVM's SanitizerCoverage (passing the --engine llvm flag, supported on platforms other than Linux/x86_64).

Exercise 2.e: Read the documentation of tarpaulin crate and install it so that the cargo tarpaulin command become available in your Rust installation.

When you run cargo tarpaulin in a Rust package, it will run the same tests of cargo test (in fact: Tarpaulin even strives to offer a fully-compatible CLI), but under code coverage instrumentation. At the end of the execution Tarpaulin will emit a report about covered/non-covered lines for each module. A coverage "diff" w.r.t. the previous run will also be emitted. Note that currently Tarpaulin only offers line coverage; branch coverage is not implemented yet.

Exercise 2.f: Run cargo tarpaulin on your package and verify what's your current line coverage. By default, only a succinct textual report is returned; try adding the --out html flag to obtain a browsable version of your code, with covered/non-covered lines highlighted.

If your line coverage is not 100%, try to reach that percentage by adding new test cases that cover the missing lines. If it's already 100% go back to Exercise 1.c, add a missing functionality, verify it decreases line coverage, and then make it reach full coverage again.

Property testing

Recall from lecture that:

Property testing is a testing technique [...] where we:

  1. randomly select test inputs,
  2. assert general properties that are automatically verifiable on test input/output pairs,
  3. verify that properties hold on the obtained test input/output pairs.

You are going to use the proptest crate to verify some properties about the accountmgr crate.

Add proptest as a development dependency to your crate with cargo add --dev proptest; doing so will make your package only conditionally depend on protest, specifically when compiling tests, examples, and benchmarks. Verify within Cargo.toml that the dependencies has indeed been added under the [dev-dependencies] section.

Exercise 3.a: Add a new unit test to those already present in your lib.rs module, which randomizes the parse string so that only valid account strings are formed (e.g., you can start with a simple regex like r"[a-z]*:[a-z]*" representing letters followed by a colon followed by letters, and then improve it to be more "nasty" by adding other special characters). Note that the new test will need to be put within a proptest! environment. Add an assertion to verify that the account string is properly parsed (i.e., that parse_account does not return an error).

Here are some useful references about proptest: crate, documentation, book.

Let's now strengthen our parsing requirements, because accounts with empty usernames and/or passwords are no good, right?

Exercise 3.b: Change your parse_account function so that it refuses accounts strings containing empty username and/or passwords. Run your unit tests again. proptest tests should catch the fact that empty usernames/passwords are no longer correctly parsed and give you examples of account strings that fail. Improve your proptest regex so that only (new) valid account strings are produced. Add specific (non-random) unit tests to handle the cases of empty usernames/passwords. Make sure your line coverage is still 100%.

Fuzzing Rust code

And now for something completely different.

ELF is the file format used by most executable binaries on Unix systems, including Linux. For $reasons you want to implement a command-line tool to check whether a file is a valid ELF binary or not. When invoked on an existing file it should return 0 (and output a nice explanation message) if the file is a valid ELF binary, 1 otherwise (and output another nice explanation message).

Oh, look! There is a handy elf_rs crate on crates.io, let's use that as plumbing for implementing your tool.

Exercise 4.a: To begin with:

  1. create a package check_elf containing a library crate,
  2. add to it as a dependency version 0.1.3 of elf_rs (e.g., by running: cargo add elf_rs@0.1.3) and verify in Cargo.toml that the dependency has been added correctly,
  3. in src/lib.ml implement the following functions by filling in their todo!()-s:
#![allow(unused)]
fn main() {
pub fn is_valid_elf(content: &[u8]) -> bool {
    todo!()
}

pub fn is_valid_elf_file<P: AsRef<Path>>(filename: P) -> bool {
    todo!()
}
}

Tips:

  • To check if a file is valid ELF with elf_rs just built an Elf structure using Elf::from_bytes on the bytes (Vec<u8>) read from a file. Iff you obtain an Ok(_) result the file is valid ELF.
  • Use the former function to implement the latter.

In addition to integration tests, Rust packages can contain examples located under the top-level examples/ directory. Each *.rs file in there is a standalone binary crate, with a main() function as its entry point. Individual examples can be built and run using, respectively, cargo build --example=NAME and cargo run --example=NAME. Examples are useful both as sample usage of a library crate and as standalone executables that do something useful.

Exercise 4.b: Create an example file examples/check_elf.rs by completing the following skeleton:

fn main() {
    let filename = std::env::args().nth(1).unwrap();  // get the first argument on the CLI or panic
	todo!();
}

where you can already find an example of how to read the first CLI argument. Instead of todo!() you should use your library crate functions to check if the specified executable is a valid ELF or not. If it is output a message saying so and exit with code 0 (check out the std::process::exit function from the stdlib for that); if it isn't print the opposite message and exit with code 1.

When done test your example program to make sure it behaves like this:

$ cargo run --example=check_elf /bin/ls
    Finished dev [unoptimized + debuginfo] target(s) in 0.01s
     Running `target/debug/examples/check_elf /bin/ls`
/bin/ls is a valid ELF file
$ echo $?
0

$ cargo run --example=check_elf ./Cargo.toml 
    Finished dev [unoptimized + debuginfo] target(s) in 0.01s
     Running `target/debug/examples/check_elf ./Cargo.toml`
./Cargo.toml is NOT a valid ELF file
$ echo $?
1

Looks like your tool is working! Or is it? Let's use fuzzing to check our robust it is against malformed inputs.

First, you need to install the cargo fuzz sub-command and initialize fuzzing in your check_elf package:

$ rustup toolchain add nightly
$ cargo +nightly fuzz init
$ ls fuzz/fuzz_targets/
fuzz_target_1.rs

Note that you need to use the "nightly" Rust toolchain (and hence install it beforehand if you haven't yet), because cargo fuzz is not fully stable yet. The +nightly flag passed to cargo ensures that the subsequent command (fuzz init in this case) are run using that toolchain.

Exercise 4.c: Edit fuzz_target_1.rs to fuzz the input of your is_valid_elf function. (It should be straightforward to do so, as the type of fuzzed payload and what that function expects are very close.) Now run cargo +nightly fuzz run fuzz_target_1 to fuzz is_valid_elf. It will (most likely) identify a pretty serious issue. What is it? How do you explain an issue like that arising in a Rust program? (Spoiler: you will learn a lot more about this in the next lecture.)

Exercise 4.d: The problem you have encountered has been reported as a bug to the elf_rs maintainers a while ago. Since then, the issue has been fixed and the most recent version of elf_rs at the time of writing (0.3.0) is no longer affected by this nasty problem. Update the version of elf_rs you are using in Cargo.toml and try fuzzing again. Is the problem still there? (Phew!)

Lab 07 - Unsafe and FFI, and procedural macros

This lab session is made of two parts: unsafe/FFI first, macros second.

Part 1: unsafe and FFI

You will learn how to:

  • write a build script;
  • interface with a C package using bindgen;
  • write a higher level interface for your bindings.

By the end of this part you will be familiar with the Rust tools and techniques used to safely interface with a foreign language.

(This part has been designed by Guillaume Duc. The lab instructions have been translated from French to English. All translation mistakes are ours.)

Part 2: writing a procedural macro

You will learn how to:

  • write a procedural macro which works directly on the TokenStream;
  • write a procedural macro which uses a visitor pattern;
  • write a derive macro that adds new capabilities to a structure.

You are not expected to complete all three macro exercises in the alloted time. Each exercise is self contained, you can choose what you want to work on after reading each exercise text.

By the end of this part you will be able to write procedural macros to enhance the Rust language capabilities.

Unsafe programming and FFI: setup

In this lab, we will create a Rust interface for the zstd library.

Prerequisite

Check that the zstd library and its development files (headers and .so link) are installed on your computer (libzstd1 and libzstd-dev packages on Debian based installations, zstd on Arch Linux).

Creating a workspace

For this lab, we will need a cargo workspace containing two library crates that you will create later: zstd-sys (low-levl interface) et zstd (high-level interface)

Exercise 0.a: Create the cargo workspace which an empty members list.

Automatic creation of the bindings using bindgen

Exercise 1.a: Create a new library crate named zstd-sys inside the cargo workspace.

Don't forget to add the name of the library to the members list in your workspace definition file.

⚠️ For the rest of this page, everything happens in the zstd-sys crate.

Exercise 1.b: Add bindgen to the build-dependencies section of Cargo.toml. You can use cargo add with the appropriate options to do so.

Exercise 1.c: Create a src/header.h file containing

#include <zstd.h>

Exercise 1.d: Create a build.rs file next to Cargo.toml. Using the example given in class, add the code to generate the bindings. The library you want to link is zstd, and all types and functions start with ZSTD_.

Exercise 1.e: In src/lib.rs, include the file generated by bindgen.

Since the zstd library uses global variables whose name is not in uppercase, and some function names are not in snake case (words_separated_by_underscores_), you can add this near the top of your src/lib.rs file:

#![allow(non_uppercase_globals)]
#![allow(non_snake_case)]

Check that your library builds without errors using cargo build.

Exercise 1.f (optional): Install cargo-expand (cargo install cargo-expand), and look at the code generated by bindgen using cargo expand --lib. To use it you need to use the nightly Rust toolchain. If you don't want to do that, you may also look for the bindings.rs file in target and display its content (assuming you are in the zstd-sys crate root):

$ find ../target -name bindings.rs -exec cat {} \;

Testing the low-level interface

Exercise 2.a: Add a test that checks that ZSTD_compress() and ZSTD_decompress() work as expected. To do so, generate a buffer containing random data (for example 1024 bytes of random data, see the rand crate), compress it, decompress it, and check that the decompressed data match the original data.

This test should be created in a new file located in the tests directory (that you must create) of the zstd-sys crate.

You will need to use the zstd API documentation. You will also have to use the following auxiliary functions:

  • ZSTD_isError() to check the values returned by ZSTD_compress() and ZSTD_decompress();
  • ZSTD_compressBound() to get the maximum size of compressed data;
  • ZSTD_getFrameContentSize() (to get the size of the decompressed data).

High-level interface

It is now time to design a higher-level interface to the zstd library. Our goal is to no longer expose any unsafe function to the user of our library.

Exercise 3.a: Create a new library crate named zstd inside the cargo workspace.

⚠️ For the rest of this page, everything happens in the zstd crate.

Exercise 3.b: Make zstd-sys a dependency of the zstd crate. You may have to lookup how to define a dependency by its path (which will be relative to the zstd crate).

Exercise 3.c: Write a compress_buffer() function with the following signature:

pub fn compress_buffer(src: &[u8], dst: &mut [u8], level: i32) -> Result<usize, ZstdError> {
    // ...
}

ZstdError is an error type that you will define yourself. You may use the thiserror crate if you wish.

Exercise 4.d: Write a decompress_buffer() function with the following signature:

pub fn decompress_buffer(src: &[u8], dst: &mut [u8]) -> Result<usize, ZstdError> {
    // ...
}

Exercise 4.e: Write amax_compressed_size() function with the following signature:

pub fn max_compressed_size(size: usize) -> usize {
    // ...
}

Exercise 4.f: write a decompressed_size() function with the following signature:

pub fn decompressed_size(src: &[u8]) -> Result<usize, ZstdError> {
    // ...
}

Exercise 4.g: Add one or more tests, as you did for the zstd-sys crate, to test that those functions work as expected.

Procedural Macros

During these practical exercises, we will be building some procedural macros. We will need to write new ones during the section dedicated to asynchronous programming.

Exercise 5.a: Create a crate called macros that specifies in its Cargo.toml that it is the source of procedural macros:

[lib]
proc_macro = true

Exercise 5.b: Add the dependencies that we will use to manipulate tokens and the abstract syntax tree, as well as to report errors:

  • proc-macro2 for token manipulation
  • quote for code generation with templates
  • syn with full, visit-mut, and parsing features
  • proc-macro-error for better error messages
  • trybuild to test the errors returned by our macros

Throughout this section, you are encouraged to use sub-modules to store your utility functions. Only the definitions of the procedural macros themselves should be at the top level of the crate.

Translating strings from English to French

We want to write a #[translate] attribute macro that will translate English strings to French for string literals that represent numbers. For example, the following code:

#[translate]
fn main() {
  let res = "forty two";
  println!("12 + 30 = {}", res);
}

will display:

12 + 30 = quarante-deux

Only integer strings within an arbitrary range (e.g., 0..=100) will be translated.

We will use the following two crates to implement this functionality:

Exercise 5.a: Add these crates as dependencies to the macros crate.

Preloading Strings

The english_numbers crate does not provide a way to recognize an English number and retrieve its numeric value. Therefore, we will build a dictionary to store the string representation and its associated numeric value.

Exercise 5.b: Create a Translate struct that contains a dictionary associating a string with an i64, the type used by the english_numbers crate.

Exercise 5.c: Create an associated function new() that returns a Translate object with a preloaded dictionary. We will only enable the spaces formatting option and leave the other options disabled.

Choosing the String Replacement Technique

We could choose to use a mutable visitor to rewrite LitStr nodes that correspond to an English number and replace them with the corresponding French term. However, this technique, which seems to work at first glance, will fail on simple tests like:

#[test]
#[translate]
fn test_translate() {
  assert_eq!("trois", "three");
}

The visitor will visit the Macro node when analyzing this function and encountering assert_eq!. The visitor will correctly visit the path and delimiter fields, but it will not visit the tokens field (available as a proc_macro2::TokenStream), which is the content of the macro, as it may not be valid Rust code at this stage.

Therefore, we need to also intercept the visit of Macro nodes to replace the literal tokens we are interested in. Since our procedural macro already works with TokenStream, why not directly implement this solution? We don't need a visitor.

Transforming the Token Stream

Exercise 5.d: Write a method that substitutes the tokens corresponding to a string literal representing an English number in our dictionary with the corresponding French number. Be sure to recursively call this method when encountering a delimited group of tokens.

impl Translate {

  fn substitute_tokens(stream: proc_macro2::TokenStream) -> proc_macro2::TokenStream {
    todo!()
  }

}

Note that the literal representation we have access to is the one in the source code, enclosed in double quotes (we can ignore string literals using other delimiters like r#""#). Instead of removing these quotes, it may be easier to add them to the dictionary for direct comparison.

Exercise 5.e: Write a procedural macro #[translate] that constructs a Translate object and uses it to transform the TokenStream. Remember that conversions with From and Into are implemented between proc_macro::TokenStream (at the macro interface) and proc_macro2::TokenStream (used inside the macro).

Exercise 5.f: Write tests for your macro. It may be useful to define a str!(a, b) macro with macro_rules! that dynamically constructs a string from a and b, without having the ab string appearing in the source code:

// Check that out-of-range (1..=100) values are not translated
assert_eq!(str!("one h", "undred and one"), "one hundred and one");

Determining the Positive or Zero Bounds

We want to optionally specify the bounds for the numbers to be translated using an attribute. The following notations should be accepted:

#[translate] fn f() { ... }         // Default bounds (0..=100)
#[translate(0..10)] fn f() { ... }
#[translate(0..=10)] fn f() { ... }

However, we want to reject incorrect constructions with clear error messages:

error: unexpected end of input, expected `..=` or `..`
 --> tests/ui/translate.rs:3:1
  |
3 | #[translate(10)]
  | ^^^^^^^^^^^^^^^^
  |
  = note: this error originates in the attribute macro `translate` (in Nightly builds, run with -Z macro-backtrace for more info)

error: expected integer literal
 --> tests/ui/translate.rs:6:13
  |
6 | #[translate(..10)]
  |             ^^

error: unexpected end of input, expected integer literal
 --> tests/ui/translate.rs:9:1
  |
9 | #[translate(10..)]
  | ^^^^^^^^^^^^^^^^^^
  |
  = note: this error originates in the attribute macro `translate` (in Nightly builds, run with -Z macro-backtrace for more info)

error: expected integer literal
  --> tests/ui/translate.rs:12:13
   |
12 | #[translate(x)]
   |             ^

To achieve this, we will build a structure on which we can implement the syn::parse::Parse trait:

struct Bounds { low: i64, high: i64 }

Exercise 5.g: Implement the Parse trait on Bounds. You have to read an integer with type LitInt (syn handles the unary minus sign), look for one of ..= and .., read the higher bound and build the Bounds object. You might want to use Lookahead1 to make things easier.

Exercise 5.h: Add specific tests to check that you can read the various intervals. To avoid exporting private types, you may add the tests in a submodule which is defined only in testing mode:

#[cfg(test)]
mod tests {
    …
}

You can parse strings with parser T using syn::parse_str::<T>(s), this might be handy in your tests.

Exercise 5.i: Update the translate macro so that it reads the bounds from its attribute if it is not empty, and initialize the Translate object appropriately.

Exercise 5.j: Add tests. For example, this test must pass.

#[test]
#[translate(-10..=10)]
fn test_negative_bounds() {
  assert_eq!("moins dix", "negative ten");
  assert_eq!("dix", "ten");
  assert_eq!(str!("neg", "ative eleven"), "negative eleven");
  assert_eq!(str!("ele", "ven"), "eleven");
}

Conclusion

We have seen that several methods might be combined to implement a macro. Here, we wrote a dedicated parser to read bounds, and also worked with the token stream directly.

Chaining Computations

The Elixir language allows for chaining computations using its |> operator, which inserts the left-hand argument as the first argument of the function call on the right-hand argument:

iex> "Elixir" |> String.graphemes() |> Enum.frequencies()
%{"E" => 1, "i" => 2, "l" => 1, "r" => 1, "x" => 1}

We would like to achieve the same thing in Rust using a visitor. Since we want to use a visitor but using syn builtin parsing capabilities, we will take a valid Rust tree as input. Therefore, we will sacrifice the | character (pipe, which represents "bitwise or") for this purpose.

#[pipe]
fn pipe_example() {
  let f = "Rust" | str::chars | Vec::from_iter;
  assert_eq!(f, vec!['R', 'u', 's', 't']);
}

This code is equivalent to:

fn pipe_example() {
  let f = Vec::from_iter(str::chars("Rust"));
  assert_eq!(f, vec!['R', 'u', 's', 't']);
}

Our #[pipe] macro and its | operator should also support function calls with multiple arguments. The expression a | f(b, c) is equivalent to f(a, b, c).

Implementation

Exercise 6.a: Use a syn::visit_mut::VisitMut visitor to intercept the analysis of expressions and replace a binary expression using the | operator followed by a function call or a path with a function call that uses the left-hand argument as the first argument of the call. Remember to recurse to visit the new node (or the sub-nodes if no transformation was made).

Consider what type of node you want to visit in order to modify it. For example, if you are modifying a ExprBinary node that represents a binary expression, you cannot replace it with a node that is not a binary expression. If you are modifying an Expr node that is less specific, it allows you to replace an expression with another.

Also, remember that you can use syn::parse_quote!() to generate nodes of any type from a template.

Exercise 6.b: Create a procedural macro attribute pipe that uses this visitor to transform any item using the visitor.

Exercise 6.c: Add tests to verify the correct behavior of this macro, which should be applicable to functions, modules, and implementations.

Protected Field Access

For pedagogical purposes rather than practical ones, we want to implement a derive-like macro called Opaque. This macro allows us to define secure accessors for the specified fields of a structure:

#[derive(Opaque)]
struct SecureSettings {
  #[key] secret_key: u64,
  regular_field: String,
  #[opaque] protected_field: String,
  #[opaque] other_protected_field: u32,
}

Our macro will automatically add an accessor for fields marked with #[opaque]. This accessor will take a key parameter, which should be of the same type as the field marked with the #[key] attribute, and will return an Option containing the requested field only if the passed key is correct. The generated code will look like this:

impl SecureSettings {
  fn get_protected_field(&self, key: &u64) -> Option<&String> {
    (key == &self.secret_key).then(|| &self.protected_field)
  }
  fn get_other_protected_field(&self, key: &u64) -> Option<&u32> {
    (key == &self.secret_key).then(|| &self.other_protected_field)
  }
}

Implementation

Exercise 7.a: Write a skeleton for the Opaque derive-like macro in the macros crate. This macro should take additional key and opaque attributes to mark fields. For now, don't return anything useful.

Exercise 7.b: Verify that the argument passed to the macro is indeed a structure containing named fields, and provide an appropriate error message if not.

Exercise 7.c: Identify the field marked with #[key], which should be unique, as well as the fields marked with #[opaque]. The field containing the key cannot also be #[opaque].

Exercise 7.d: After storing the name, type, and identifier to be used for each opaque field in vectors, generate the code. You will also need to have variables accessible during expansion for the name and type of the field containing the key.

Exercise 7.e: Test, including tests with trybuild to verify the accuracy of error messages.

Lab 08

In this lab session, you will learn about:

  • Dividing a problem into smaller pieces.
  • Parallelizing each subproblem.
  • Benchmarking your solution.

Don't spend too much time on the first two parts which don't use threads. Don't hesitate to ask for help of faculty or other students. The interesting part is when you start using threads ("Parallel computing").

Counting characters

Today, we are interested into counting the number of occurrences of every character in a text. For example, in the string "foobar!!", the characters 'f', 'b', 'a', and 'r' appear once each, while the characters 'o' and '!' each appear twice.

Exercise 0.a: Create a new Rust binary project named char-counter in which our code will be placed.

The count_chars() function

A text is usually made of several lines. We need a function which takes those lines and returns the occurrences of the various characters.

Exercise 0.b: Write a function with the following prototype:

fn count_chars(input: &[&str]) -> HashMap<char, usize>

This function will build a HashMap whose keys are the various characters and the values are the number of occurrences of each character. To do so, it will loop over the &str of input, and over the characters of each &str.

Tips for implementing count_chars()

The str::chars() method may be useful: it returns an iterator over the characters of a string.

Also, if h is a HashMap<K, V> and V implements Default, the following idiomatic construct may be useful:

  h.entry(some_key).or_default()

It returns a &mut value where value is the existing value for key some_key; if the entry for some_key does not exist in the map, it is created with V::default(), the default value for the type V, and a mutable reference to this new inserted value is returned.

So when implementing a counter, the following construct can often be found, since all integer types have a default value of 0:

  *h.entry(some_key).or_default() += 1;

This increments the value corresponding to some_key, or set it to 1 if it didn't exist in the map before (0 which is the default value + 1).

Note: do not stay stuck on this one. If after 15 minutes you feel that you will take too much time to write this function, you can get inspiration from this code and go on with the lab.

Testing the function

Exercise 0.b: Print the occurrences of every character in the text made from the two lines "Hello, world!", and "I am happy to be here.".

Check by hand that you get the expected result.

Given that HashMap<K, V> implements the Debug trait if K and V do, you can use (:#? pretty-prints the debug information):

fn main() {
    let text = ["Hello world!", "I am happy to be here."];
    let freq = count_chars(&text);
    println!("{freq:#?}");
}

You should obtain something like:

{
    'H': 1,
    'b': 1,
    'y': 1,
    'a': 2,
    ' ': 6,
    '!': 1,
    't': 1,
    'l': 3,
    'd': 1,
    'I': 1,
    'p': 2,
    'o': 3,
    'r': 2,
    'w': 1,
    'm': 1,
    'e': 4,
    '.': 1,
    'h': 2,
}

You will probably agree that this is not very pleasant to read. We can do better.

Prettier display

Note: if you feel that you do not have the time to implement this part, feel free to get some code from here (this is not the point of this lab).

In order to make our program useful, we want to:

  • sort the characters by occurrences;
  • display only the 10 most frequent characters (in this two lines the choice will be arbitrary, with larger text it will get interesting).

How can we do that? Learn or remember that:

  • Iterating over a HashMap<K, V> gives you an iterator over pairs (K, V). Those pairs could be collected into a Vec<(K, V)>.
  • Vectors can be sorted. By using the sort_by_key() method, we can sort it by the key we choose. For example, vec.sort_by_key(|(_k, v)| v) will sort the vector elements according to the second element v of the pair.
  • Iterating over a vector can be reversed: vec.into_iter().rev() gives an iterator over the vector from the end to the beginning.
  • An iterator can be limited to its first N elements using the .take(N) method.

Exercise 0.c: display only the 10 most frequent characters in the text. Keeping our example, it should print something like:

  - ' ': 6 occurrences
  - 'e': 4 occurrences
  - 'l': 3 occurrences
  - 'o': 3 occurrences
  - 'a': 2 occurrences
  - 'r': 2 occurrences
  - 'h': 2 occurrences
  - 'p': 2 occurrences
  - 'y': 1 occurrences
  - 'w': 1 occurrences

And no, we do not care about the extra "s" in "1 occurrences". Fix it if you want but this has 0 importance.

Getting big

Now that our program looks fine, we want to go bigger and consider larger texts.

Exercise 1.a: Download the novel Moby-Dick; or The Whale, by Herman Melville, and place it in the same directory as your Cargo.toml.

This novel, which totals 22316 lines, will be more interesting than our hand-crafted two lines.

Reading the file

Exercise 1.b: Since this lab is not about how to read files, copy the following use statements and function into your program (the easiest exercise ever):

use std::fs::File;
use std::io::{self, BufRead};

fn load_file(name: &str) -> Result<Vec<String>, io::Error> {
    io::BufReader::new(File::open(name)?).lines().collect()
}

Have you noticed that load_file() returns a Vec<String>? This will not be convertible to a &[&str] that we need to count characters, so we will need some adapting.

Adapting count_chars()

We want to adapt count_chars() so that it accepts a slice of &str, as it did before, but also a slice of String. In fact, we would like to accept a slice of any type which can be easily seen as a &str.

The trait AsRef<T> means exactly that: when implemented on a type U, it means that without doing any extra copy, an object of type U can be viewed in memory as an object of type &T. For example, String implements AsRef<str>: calling .as_ref() on a String will return a &str pointing to data owned by the String.

Also, every type T implements AsRef<T>, as seeing a T as a &T is trivial.

Exercise 1.c: Change the signature of count_chars() to the following one, accepting a slice of any type that can be seen as a &str. Also, use .as_ref() on the provided data (in the inner loop) to convert the real type S to a &str.

fn count_chars<S: AsRef<str>>(input: &[S]) -> HashMap<char, usize>

As soon as you have done that, you are able to pass either a &[&str] or a &[String] to count_chars(), and of course a &Vec<String> thanks to Deref which allows a reference to a vector to be seen as a slice.

Exercise 1.d: Change the main() function signature so that it returns Result<(), io::Error>, and make it load and analyze the character frequency of Moby Dick.

Have you noticed that it takes more time than when using our two lines? Let's parallelize this!

Parallel computing

On our computers, we usually have multiple cores which can work in parallel. In this section, we will use several threads. Each thread will analyze a subset of the input data, and we will later aggregate the results.

Here is how it will work for n threads:

  • The number of lines (let's call it l) will be divided into n chunks. The goal is to give l/n lines to each thread.
  • A MPSC (multiple producers single consumer) channel will be created.
  • Each thread will compute the statistics on the lines it got and will put the map containing the result on the channel.
  • The main thread will aggregate the various statistics into a single map and return it.

We will use scoped threads to alleviate lifetime constraints.

Creating the parallel version

Exercise 2.a: Create a function with the following signature which uses up to n threads to compute the number of occurrences of each characters:

fn count_chars_parallel<S: AsRef<str> + Sync>(input: &[S], n: usize) ->
    HashMap<char, usize>

You will notice that we have an extra constraint on S: since objects of type S will b referenced from other threads (the one we will be creating), it must be Sync in addition to being AsRef<str>.

Breakdown of the count_chars_parallel() function

This function must do the following things, in order.

First it must create a (tx, rx) pair corresponding to a MPSC channel.

It must then create a scope in which new threads will be spawned.

For each chunk of size n or less (for the last chunk):

  • tx must be cloned into a new variable that will be captured by the new thread;
  • a new thread should be spawned from the current scope, capturing tx and the current chunk.

This thread will compute the characters' occurrences (using count_chars()) and send the result on the MPSC channel using its captured tx clone.

It is now time to close the thread scope and to concentrate on receiving data from the spawned threads. Since we are after the thread::scope() call, at this stage we know that all analysis have been sent on the MPSC channel.

How does the reception part work?

  • rx.recv() will wait until data is available. As long as there is data available on the channel, it will return Ok(some_data).
  • If all senders are dead (tx and all its clones have been dropped) and all data has been read, rx.recv() will return Err(…).

At this stage, all threads have terminated their work. We can read data from the MPSC channel through rx: we will get a bunch of Ok(…) containing data from the threads, then Err(…) saying that everything has terminated. Right? No… We have to get rid of the tx variable which is still present. We have cloned it each time we have created a thread, but since tx still exists the use of the channel is never considered complete.

How do we get rid of tx? std::mem::drop(tx) is intended for that usage.

Now that we have done that, we can collect every map which comes from the MPSC channel. Since you already know how to do that, let us share the code to build the aggregated hash map and save some time:

    let mut freq = HashMap::default();
    while let Ok(fr) = rx.recv() {
        for (c, n) in fr {
            *freq.entry(c).or_default() += n;
        }
    }

Returning freq is the last thing to do.

Calling the parallel version

Exercise 2.b: Update your main() function so that it uses count_chars_parallel() with, for example, 4 threads, so that it processes Moby Dick text faster.

Benchmarking

Is it really faster to use multiple threads? How many threads should we be using? Rather than guessing, let us benchmark the performances.

Benchmarking consists into executing a task a certain number of times and averaging its execution time over the multiple executions.

Rust has two types handy for dealing with type: Instant, which represents in point in time, and Duration which represents, well, a duration. Instant::now() is the current instant in time, and Instant::elapsed(start) returns a Duration since the instant start.

Also, a Duration can be divided by an integer (of type u32) and return a Duration. It means that if you do something like:

    let start = Instant::now();
    for _ in 0..times {   // Here times is a u32
        do_something();
    }
    let mean_duration = Instant::elapsed(start) / times;

you will get the average time spent to execute do_something() in mean_duration. Also, a Duration implements Debug and will print something like "100 milliseconds" or "3.7 seconds".

Now that you know all this…

Exercise 3.a: Implement a benchmark() function which counts the characters' occurrences over n threads by executing the count_chars_parallel() function times times and returns the result:

fn benchmark<S: AsRef<str> + Sync>(input: &[S], n: usize, times: u32) -> Duration

Exercise 3.b: Implement a benchmark_all() function which calls benchmark() with 1 thread to max threads and prints the results:

fn benchmark_all<S: AsRef<str> + Sync>(input: &[S], max: usize, times: u32)

Exercise 3.c: Call the benchmark_all() function in your main program and look at the result.

See them decrease when you go over the number of cores on your machine.

If you have more time

If you happen to have more time, you can use the clap crate to configure the file to read, the maximum number of threads, and the number of times you will run the benchmarks. Also, you might want to print a summary of the most frequent characters used, with a configurable number of characters printed:

$ cargo run -- --help
Usage: char-counter [OPTIONS] <FILE>

Arguments:
  <FILE>  The file to count characters from

Options:
  -m, --max <MAX>      The maximum level of parallelism to achieve [default: 8]
  -t, --times <TIMES>  The number of times to run each test [default: 100]
  -s, --stats <rank>   Display statistics
  -h, --help           Print help

Exercise 4.a: Do that, and make sure you send a mail to the faculty telling them you did it so that you may earn extra points (and put the code in your git repository so that they can check).

Lab 09

In this lab session you will

  • learn how to build an asynchronous HTTP client
  • see the benefit of doing asynchronous requests
  • learn how to build an asynchronous HTTP server

An asynchronous client

The setup

A proxy server reproducing Wikipedia page data for programming languages has been installed on https://net7212.rfc1149.net. For example, by requesting https://net7212.rfc1149.net/wikipedia/Rust, you can get the raw HTML page corresponding to the Rust page on Wikipedia.

The issue with this server is its slowness: requesting a page takes at least 2 seconds (on purpose). Since in this lab we will make several requests to this server, we want them to be done in parallel (asynchronously) rather than sequentially.

Using reqwest

reqwest is a popular crate used to build HTTP clients. It supports asynchronous requests, and we will build on this. Its uses tokio as its runtime.

Exercise 0.a: Create a new binary Cargo project client with reqwest as a dependency, as well as tokio with the feature flag full. We will also add anyhow to handle errors, or color_eyre (don't forget to initialize it) as you prefer.

Make an asynchronous reqwest to get the Rust page

Exercise 0.b: In your main() async function, build a Client. With this object, do a get() request to get the "Rust" wikipedia page through the proxy server and retrieve the body text (see the reqwest documentation for how to do that). Display the number of bytes in the body.

⚠️ Don't forget to use the tokio::main attribute on your main() async function so that it gets executed in an asynchronous context:

#[tokio::main]
async fn main() -> anyhow::Result<()> {   // or color_eyre::Result<()>
  …
}

Time your main program

We want to see how long our main program takes to execute.

Exercise 0.c: Using the Instant type, measure and display the time taken by your main() program.

You may be interested by Instant::now() as well as Instant::elapsed(). Also, note that a Duration can be printed in Debug mode and print something like "29.8ms" or "2s". By using the floating point notation, you can even limit the number of digits after the decimal point with a format string like {:.3?}.

Make several asynchronous requests

Exercise 0.d: Reorganize your code to create this function which returns information about a programming language and use it from main():

async fn retrieve(client: &Client, language: &str) -> anyhow::Result<String>

(or eyre::Result<String> if you used eyre)

For example, calling retrieve(client, "Rust") should return the Rust page.

Exercise 0.d: In addition to information about the number of bytes in the "Rust" page body, print information about the "C" programming language as well.

You should notice that it takes twice the time, as requests are serialized. We will not make them parallel.

Exercise 0.e: Import the futures crate as a dependency. Using the join! macro, make both requests in parallel then print the results.

Note how the time is back to around 2s.

Take requests from the command line

Exercise 0.f: For every programming language name found on the command line, print the number of bytes in its page body.

For example, running cargo run --release -- C Rust Java should print something like

- C: 401467 bytes
- Rust: 461307 bytes
- Java: 362262 bytes
Elapsed time: 2.008s

You can either use clap or more simply std::env::args() to get the list of languages (in which case, do not forget to skip the first item). The join_all() function from the futures crate may be useful to do this.

You can use zip on an iterator to unite items of two iterators by pairs.

An asynchronous server

This part builds a proxy server similar to the one you have been using for this lab.

This part may take a little more time to do, but we are talking about some 60-70 lines of code top. Do not worry if you don't terminate in the allotted time.

The project

We will use the axum crate to build a HTTP server. We will accept requests with a path like "/wikipedia/LANGUAGE", and forward them to "https://en.wikipedia.org/wiki/LANGUAGE_(programming_language)". When the response arrives, we will forward it to the caller, along with the status code and most headers.

Exercise 1.a: Create a new cargo binary project server, with dependencies on crates axum, tokio with feature flag full, http and reqwest.

Answering a simple HTTP request

Exercise 1.b: Create a axum server, listening on port 3000, accepting a route of /wikipedia/:language and redirecting it to a wikipedia() function. Return just the request name as a String.

async fn wikipedia(Path(language): Path<String>) -> String

Returning the Wikipedia page

Exercise 1.c: Change the signature of the wikipedia() function to return a tuple of StatusCode, HeaderMap and a String, or just a StatusCode if there is an internal error:

async fn wikipedia(Path(language): Path<String>) -> Result<(StatusCode, HeaderMap, String), StatusCode>

In this function, query the right Wikipedia page using reqwest, and return the status code, headers and body from the wikipedia response. The "content-length" header must be removed from the headers as it will be recomputed by HTTP middleware used by axum1.

If there is any error, remap it (using .map_err() to StatusCode::INTERNAL_ERROR) and exit prematurely from the function.

Exercise 1.d: Run your server, it should listen on port 3000. Modify your client so that it uses http://localhost:3000/wikipedia/LANGUAGE as a reference, and check that everything works well.

If it does, it means that your server is properly running things in parallel. However it probably answers too fast and does not add the extra delay.

Slowing down the proxy server

Exercise 1.e:

Before returning the answer, sleep for 2 seconds. Of course this asynchronous sleep function must be used, using std::thread::sleep() would block the whole server thread and prevent other requests from progressing in parallel.

Sharing resources

The same Client can be used to perform concurrent requests. Using the same client to do multiple to the same site (here Wikipedia) will automatically reuse the connection and reduce the load on the server.

The Extension let you add a layer to your application. This layer will be available in any request handler.

Exercise 1.f: Add a Arc<Client> extension to your application, and use this client in your wikikpedia() function.

Adding some cache

Every time you want to access a programming language page, you reload it from Wikipedia. Even though you reduced the load by sharing the client through an extension, if a client requests the same page several times you may end up requesting a page from Wikipedia several times.

You can add a cache, whose type would be:

type Cache = HashMap<String, (StatusCode, HeaderMap, String)>;

The idea here is to share a mutable Cache object and look for the result here first. If it is not available, Wikipedia will be queried, and the result will be stored in the cache then returned to the client.

You will need some synchronization here. Rust's [RwLock](https://doc.rust-lang.org/std/sync/struct.RwLock.html) seems like a good candidate.

Exercise 1.g: Share a Arc<RwLock<Cache>> resource, in order to avoid doing multiple identical requests to Wikipedia.

Be cautious though: you must never keep a resource locked (be it in read or write mode) accross a await point. Limit locking the cache to the shortest possible period.

1

The communication with Wikipedia server through reqwest uses content negotiation and retrieves compressed pages. The Content-Length corresponds to the compressed data, which is neither the original data size nor necessarily the size of the data we would obtain if we used compression ourselves (unless we used the very same algorithm with the very same parameters). It is best to leave the Content-Length computation to the web framework.