Web page retrieval

In this part, you will have to retrieve a web page from a given URL and return its content as a String or return a proper error if the page retrieval fails.

Initialization

But first of all, ensure that your Rust installation is up-to-date by typing:

$ rustup update

If you get an error, it means that you have not installed Rust through rustup. In this case, make sure that your system is up-to-date.

For this lab, you will create a new Rust program named "lab4". Create a new binary (executable) project by typing

$ cargo new --bin lab4
$ cd lab4

Everything you will do today will happen in the lab4 directory, in particular in the src/main.rs file.

Adding a dependency

You could make HTTP requests "by hand" by opening a TCP socket to the right host and port, and by sending HTTP commands and parsing the responses. However, you can take advantage of existing libraries written in Rust that can do that already.

The library ("crate" in Rust terminology) you will want to use is called reqwest (this is not a typo). In order to use it in your program, you have to add it to Cargo.toml in the [dependencies] section.

Rather than adding it by hand, you can use the cargo add command. Moreover, we want to use the "blocking" API of reqwest which is the simplest one at this stage. This "blocking" API is enabled by requesting (ah ah) the "blocking" feature or reqwest:

$ cargo add reqwest --features blocking

If you look at your Cargo.toml, you should see something like:

[dependencies]
reqwest = { version = "0.11.22", features = ["blocking"] }

It indicates that your Rust program will use version 0.11.22 of the "reqwest" crate with the "blocking" feature enabled. 0.11.22 is the latest published version of the crate at the time this page has been written.

Adding the "reqwest" crate as a dependency means that you will be able to use its types and functions by prefixing them with reqwest::.

Fetching a web page

Your first function will retrieve a web page from its URL using the blocking API of reqwest, whose documentation is accessible online.

Exercise 1.a: write a function with signature fn get(url: &str) -> Result<String, reqwest::Error> which returns the content of the web page located at url (use a code similar to the one in the documention).

Use the following main() program to test it:

fn main() -> Result<(), reqwest::Error> {
    println!("{}", get("https://rfc1149.net/")?);
    Ok(())
}

Note how main() can return a Result<(), E> instead of returning nothing (which is written () in Rust and is the equivalent of void in C-like languages). Either get("https://rfc1149.net/") returns an error and it will be propagated to main() by the ? operator, or it returns a String which will be displayed. At the end of the main() program, Ok(()) ensures that () is returned in an Ok.

Returning a better error

Now, try changing the URL with one returning a 404 (not found) code. You can use "https://rfc1149.net/nonexistent", which does not exist.

Note how your get() function returns without an error: it returns the content of the error page. This is not a good idea: in your program, you only want to return pages which were successfully found.

However, you will not be able to return a reqwest::Error to indicate that the page was not found, as it is not an existing error condition for reqwest::Error. You will have to write your own Error type with two variants for now:

#![allow(unused)]
fn main() {
#[derive(Debug)]
enum Error {
    Reqwest(reqwest::Error),
    BadHttpResult(u16),
}
}

This Error type can be a Error::Reqwest and encapsulate a reqwest::Error, or it can be a Error::BadHttpResult and encapsulate the HTTP error code returned by the web server (for example 404 for "not found").

The #[derive(Debug)] will be explained in a later class. For the time being, you just need to know that it will allow an Error object to be displayed by the main() program if needed. Also, you can display an Error yourself by using the {:?} placeholder:

#![allow(unused)]
fn main() {
    let my_error = Error::BadHttpResult(404);
    println!("The error is {my_error:?}.");
}

would display

The error is Error::BadHttpResult(404).

Also, in order to take advantage of the automatic conversion performed by the ? operator, you want to implement From<reqwest::Error> for your Error type:

#![allow(unused)]
fn main() {
impl From<reqwest::Error> for Error {
    fn from(e: reqwest::Error) -> Error {
        // Encapsulate the error into a Error::Reqwest variant
        Error::Reqwest(e)
    }
}
}

Exercise 1.b: update your get() function so that it checks the status code of the response before reading its text. Your get() function will return a Result<String, Error>, and your main() function will return a Result<(), Error>, in order to accomodate both error conditions.

Look at the documentation for the reqwest::blocking::get() method: what is its return type? What method can be called on a Response to get a StatusCode and compare it with StatusCode::Ok? How can you get the numerical u16 code of a StatusCode?

Note: instead of typing qualified type names such as reqwest::StatusCode, you can add a use reqwest::StatusCode; at the beginning of your program: this will import StatusCode in your namespace, and you will be able to use StatusCode instead of the longer reqwest::StatusCode.

Check that your new version of get() works by ensuring that an error is displayed when trying to print the content of "https://rfc1149.net/nonexistent". You should see a 404 error.