Parsing input data

It is now time to try and extract frames. In order to later be able to handle frames in parallel as much as possible, it is necessary to parse the input in two steps:

  1. Do the minimum work needed to extract a frame from the input (parsing phase).
  2. Later, as frames have been separated, do the rest of the work to decode the frame compressed content (decoding phase).

As zstd compressed input does not contain a table of contents indicating where each frame starts, parsing a frame implies having already parsed all frames coming beforehand.

Forward byte parser

The first parser you need is a forward byte parser, which can deliver bytes from the input in order. The parser must also remember not to deliver bytes it has delivered already. To do this, a slice of bytes is enough as it references a piece of memory.

✅ Create a parsing module (in file parsing.rs) and create a ForwardByteParser type in it.

⚠️ Do not forget to reference this module in lib.rs as a module with public visibility.

Since a slice does not own its content, it must be accompanied by a lifetime parameter, for example:

pub struct ForwardByteParser<'a>(&'a [u8]);

Initializing a forward byte parser from an existing slice is straightforward:

impl<'a> ForwardByteParser<'a> {
    pub fn new(data: &'a [u8]) -> Self {
        Self(data)
    }
}

Consuming a byte from the input

Consuming a byte from the input implies returning the first byte, if it exists (the input may be empty) and storing a slice with the first byte removed:

impl<'a> ForwardByteParser<'a> {
    pub fn u8(&mut self) -> Option<u8> {
        let (first, rest) = self.0.split_first()?;
        self.0 = rest;
        Some(*first)
    }
}

While returning an Option<u8> seems handy, it will not be very useful: when you need a byte from the input, not obtaining it should be an error that can be propagated further.

✅ Create an Error type and a Result alias in the parsing module as described in the general principles. Modify your u8() method so that it returns an error if the input is empty.

This error alternative (for example NotEnoughBytes { requested: usize, available: usize }) must be generic and will be reused in other methods. When displayed, the error should print: not enough bytes: 1 requested out of 0 available.

Writing a unit test

Does the parser work as expected? A unit test should be written along with the code.

✅ Create a tests/parsing.rs file in your repository and include the following tests.

use net7212::parsing::{self, ForwardByteParser};

#[test]
fn forward_byte_parser_u8() {
    // Check that bytes are delivered in order
    let mut parser = ForwardByteParser::new(&[0x12, 0x23, 0x34]);
    assert_eq!(0x12, parser.u8().unwrap());
    assert_eq!(0x23, parser.u8().unwrap());
    assert_eq!(0x34, parser.u8().unwrap());
    assert!(matches!(
        parser.u8(),
        Err(parsing::Error::NotEnoughBytes {
            requested: 1,
            available: 0,
        })
    ));
}

Running the tests with cargo test should yield a success.

💡 Notice that we use matches!() instead of assert_eq!() to compare the error, as the parsing::Error type does not implement PartialEq (and does not need to).

Adding more methods

Parsing a frame will require reading a 4-byte unsigned integer (u32) in little-endian format, or extracting an arbitrary number of bytes. You can add utility functions now to your parser, such as (more will be needed later):

impl<'a> ForwardByteParser<'a> {
    /// Return the number of bytes still unparsed
    pub fn len(&self) -> usize { todo!() }

    /// Check if the input is exhausted
    pub fn is_empty(&self) -> bool { todo!() }

    /// Extract `len` bytes as a slice
    pub fn slice(&mut self, len: usize) -> Result<&'a [u8]> { todo!() }

    /// Consume and return a u32 in little-endian format
    pub fn le_u32(&mut self) -> Result<u32> { todo!() }
}

Tests must also be added for those methods to ensure they act as expected. Those tests will also act as non-regression tests, allowing us to later modify the body of those methods without fear.