Translating strings from English to French

We want to write a #[translate] attribute macro that will translate English strings to French for string literals that represent numbers. For example, the following code:

#[translate]
fn main() {
  let res = "forty two";
  println!("12 + 30 = {}", res);
}

will display:

12 + 30 = quarante-deux

Only integer strings within an arbitrary range (e.g., 0..=100) will be translated.

We will use the following two crates to implement this functionality:

english_numbers for English numbers
french_numbers for French numbers

Exercise 5.a: Add these crates as dependencies to the macros crate.

Preloading Strings

The english_numbers crate does not provide a way to recognize an English number and retrieve its numeric value. Therefore, we will build a dictionary to store the string representation and its associated numeric value.

Exercise 5.b: Create a Translate struct that contains a dictionary associating a string with an i64, the type used by the english_numbers crate.

Exercise 5.c: Create an associated function new() that returns a Translate object with a preloaded dictionary. We will only enable the spaces formatting option and leave the other options disabled.

Choosing the String Replacement Technique

We could choose to use a mutable visitor to rewrite LitStr nodes that correspond to an English number and replace them with the corresponding French term. However, this technique, which seems to work at first glance, will fail on simple tests like:

#[test]
#[translate]
fn test_translate() {
  assert_eq!("trois", "three");
}

The visitor will visit the Macro node when analyzing this function and encountering assert_eq!. The visitor will correctly visit the path and delimiter fields, but it will not visit the tokens field (available as a proc_macro2::TokenStream), which is the content of the macro, as it may not be valid Rust code at this stage.

Therefore, we need to also intercept the visit of Macro nodes to replace the literal tokens we are interested in. Since our procedural macro already works with TokenStream, why not directly implement this solution? We don't need a visitor.

Transforming the Token Stream

Exercise 5.d: Write a method that substitutes the tokens corresponding to a string literal representing an English number in our dictionary with the corresponding French number. Be sure to recursively call this method when encountering a delimited group of tokens.

impl Translate {

  fn substitute_tokens(stream: proc_macro2::TokenStream) -> proc_macro2::TokenStream {
    todo!()
  }

}

Note that the literal representation we have access to is the one in the source code, enclosed in double quotes (we can ignore string literals using other delimiters like r#""#). Instead of removing these quotes, it may be easier to add them to the dictionary for direct comparison.

Exercise 5.e: Write a procedural macro #[translate] that constructs a Translate object and uses it to transform the TokenStream. Remember that conversions with From and Into are implemented between proc_macro::TokenStream (at the macro interface) and proc_macro2::TokenStream (used inside the macro).

Exercise 5.f: Write tests for your macro. It may be useful to define a str!(a, b) macro with macro_rules! that dynamically constructs a string from a and b, without having the ab string appearing in the source code:

// Check that out-of-range (1..=100) values are not translated
assert_eq!(str!("one h", "undred and one"), "one hundred and one");

Determining the Positive or Zero Bounds

We want to optionally specify the bounds for the numbers to be translated using an attribute. The following notations should be accepted:

#[translate] fn f() { ... }         // Default bounds (0..=100)
#[translate(0..10)] fn f() { ... }
#[translate(0..=10)] fn f() { ... }

However, we want to reject incorrect constructions with clear error messages:

error: unexpected end of input, expected `..=` or `..`
 --> tests/ui/translate.rs:3:1
  |
3 | #[translate(10)]
  | ^^^^^^^^^^^^^^^^
  |
  = note: this error originates in the attribute macro `translate` (in Nightly builds, run with -Z macro-backtrace for more info)

error: expected integer literal
 --> tests/ui/translate.rs:6:13
  |
6 | #[translate(..10)]
  |             ^^

error: unexpected end of input, expected integer literal
 --> tests/ui/translate.rs:9:1
  |
9 | #[translate(10..)]
  | ^^^^^^^^^^^^^^^^^^
  |
  = note: this error originates in the attribute macro `translate` (in Nightly builds, run with -Z macro-backtrace for more info)

error: expected integer literal
  --> tests/ui/translate.rs:12:13
   |
12 | #[translate(x)]
   |             ^

To achieve this, we will build a structure on which we can implement the syn::parse::Parse trait:

struct Bounds { low: i64, high: i64 }

Exercise 5.g: Implement the Parse trait on Bounds. You have to read an integer with type LitInt (syn handles the unary minus sign), look for one of ..= and .., read the higher bound and build the Bounds object. You might want to use Lookahead1 to make things easier.

Exercise 5.h: Add specific tests to check that you can read the various intervals. To avoid exporting private types, you may add the tests in a submodule which is defined only in testing mode:

#[cfg(test)]
mod tests {
    …
}

You can parse strings with parser T using syn::parse_str::<T>(s), this might be handy in your tests.

Exercise 5.i: Update the translate macro so that it reads the bounds from its attribute if it is not empty, and initialize the Translate object appropriately.

Exercise 5.j: Add tests. For example, this test must pass.

#[test]
#[translate(-10..=10)]
fn test_negative_bounds() {
  assert_eq!("moins dix", "negative ten");
  assert_eq!("dix", "ten");
  assert_eq!(str!("neg", "ative eleven"), "negative eleven");
  assert_eq!(str!("ele", "ven"), "eleven");
}

Conclusion

We have seen that several methods might be combined to implement a macro. Here, we wrote a dedicated parser to read bounds, and also worked with the token stream directly.

NET7212 Lab