Getting big

Now that our program looks fine, we want to go bigger and consider larger texts.

Exercise 1.a: Download the novel Moby-Dick; or The Whale, by Herman Melville, and place it in the same directory as your Cargo.toml.

This novel, which totals 22316 lines, will be more interesting than our hand-crafted two lines.

Reading the file

Exercise 1.b: Since this lab is not about how to read files, copy the following use statements and function into your program (the easiest exercise ever):

use std::fs::File;
use std::io::{self, BufRead};

fn load_file(name: &str) -> Result<Vec<String>, io::Error> {
    io::BufReader::new(File::open(name)?).lines().collect()
}

Have you noticed that load_file() returns a Vec<String>? This will not be convertible to a &[&str] that we need to count characters, so we will need some adapting.

Adapting count_chars()

We want to adapt count_chars() so that it accepts a slice of &str, as it did before, but also a slice of String. In fact, we would like to accept a slice of any type which can be easily seen as a &str.

The trait AsRef<T> means exactly that: when implemented on a type U, it means that without doing any extra copy, an object of type U can be viewed in memory as an object of type &T. For example, String implements AsRef<str>: calling .as_ref() on a String will return a &str pointing to data owned by the String.

Also, every type T implements AsRef<T>, as seeing a T as a &T is trivial.

Exercise 1.c: Change the signature of count_chars() to the following one, accepting a slice of any type that can be seen as a &str. Also, use .as_ref() on the provided data (in the inner loop) to convert the real type S to a &str.

fn count_chars<S: AsRef<str>>(input: &[S]) -> HashMap<char, usize>

As soon as you have done that, you are able to pass either a &[&str] or a &[String] to count_chars(), and of course a &Vec<String> thanks to Deref which allows a reference to a vector to be seen as a slice.

Exercise 1.d: Change the main() function signature so that it returns Result<(), io::Error>, and make it load and analyze the character frequency of Moby Dick.

Have you noticed that it takes more time than when using our two lines? Let's parallelize this!