Profile Photo

Jamie Skipworth


Technology Generalist | Software & Data


Adventures in Rust

I think having “Adventures” in the title is probably pretty misleading. “Mildly interesting nuggets, if you’re into this sort of thing” is probably more accurate.

So a while ago I wrote a little bit about Rust’s system of ownership, and how you throw things around using references and borrowing. This helped prepare me for writing my first proper program.

So guess what it is? That’s right, it’s another word counter!!. Yeh yeh I know it’s boring, but it’s a good way to get familiar with a new language quickly when you don’t have oodles of free time. I wrote one in Go a while ago when I was going through my Go phase, too.

Honestly it took longer than I’d like to write, mostly due to fighting the borrow checker and mis-using some of the features. I got there in the end with a lot of help from Google, though.

Cargo

Let’s get into it.

One of the first things you encounter using Rust is the cargo system. This is Rust’s very own build and dependency management tool. It is very nice to use, especially when compared to Go which doesn’t have one at all.

Want to start a new project? cargo new. How about install a library? cargo intall. Want to build the project? cargo build. How about running the thing? cargo run. It’s very easy to use.

I’ve created a project called “rust-wc-serial” by running cargo new rust-wc-serial. Cargo creates a simple project structure that looks like this:

$ cargo new rust-wc-serial     
note: package will be named `wc-serial `; use --name to override
     Created library `rust-wc-serial` project

Cargo doesn’t like it when you include the word “rust” in library, so it strips it out of the package name. The directory structure looks like this:

rust-wc-serial/
├── Cargo.toml
└── src
    └── lib.rs

Getting Started

Let’s not waste any time here. I want to count the words in files, and I have a basic idea of how I want to structure my program:

  • fn main() - main entry point. Takes file names from the command-line.
  • fn process_file() - Takes a file, opens it and calls counter().
  • fn counter() - Takes a stream of data, reading lines and counting words.
  • fn count_words() - Takes a line of text and counts the words within it.

Function: main

So the first thing I want to be able to do is get file names from the command line in main(). For this I’ll need to use the std::env crate.

use std::env; // Use this crate to access command line arguments.

fn main() {

    // Get command line arguments into 'args' variable.
    let args: Vec<String> = env::args().collect();

    // Make sure we have at least 1 argument
    if args.len() < 2 {
        panic!( "Program arguments missing. Please provide a file name" );            
    } 

    println!( "Arguments: {:?}", args );
}

The above code fetches the arguments from the command line, and ensures there’s at least one. Here’s what compiling and running with cargo run outputs:

$ cargo run foobar
   Compiling wc-serial v0.1.0 (file:///tmp/rust/rust-wc-serial)
    Finished dev [unoptimized + debuginfo] target(s) in 0.73 secs
     Running `target/debug/wc-serial`
Arguments: ["target/debug/wc-serial", "foobar"]     

Success! The first 3 lines are output from cargo. The last one with Arguments is my program output.

I want to be able to count the words in multiple files. I’ll modify the code above to read everything from env::args() and output the file names using an iterator.

use std::env;

fn main() {

    // Get command line arguments
    let args: Vec<String> = env::args().collect();

    // Determine if we have any arguments.
    if args.len() < 2 {
        panic!( "Program arguments missing. Please provide a file name" );            
    } 

    // Get arguments from the command line, skipping the program name
    let files: Vec<String> = Vec::from( &args[1..] );

    // Iterate through file names
    for file_name in files.iter() {
        // Turn into a Path
        println!( "Got file name: {}", file_name );
    }
}

Running this gives me each file on a new line:

cargo run foo bar spam eggs
    Finished dev [unoptimized + debuginfo] target(s) in 0.0 secs
     Running `target/debug/wc-serial foo bar spam eggs`
Got file name: foo
Got file name: bar
Got file name: spam
Got file name: eggs

Function: process_file

Super stuff. I can now get file names from the command line. Now I actually want to open them. For this I’ve written the process_file().

// This function takes a Path, returning a Result.
// Ok() result is a i32 tuple. Err() result is a string.
fn process_file( file_path: &Path ) -> Result< (i32,i32), String> {

    // Attempt to open the file path
    let file_handle = match File::open( &file_path ) {
        // Parse/match the result of open, returning an Err()
        // on error, or a reader instance if Ok()
        Err( why ) => return Err( why.to_string() ),
        Ok( file_handle ) => file_handle
    };

    // On successful opening of the reader create a buffered reader
    let mut reader = BufReader::new( file_handle );

    // Call the counter and return the results.
    let ( lines, words ) = counter( &mut reader )?;
    
    // Return the Ok() result from counter, or automatically
    // propagate the Err()
    Ok( ( lines, words ) )

}

It takes a Path object (representing a file) and attempts to open it. If it opens OK then we return the reader for it so we can access the data. We create a buffered reader on top of this and pass it to counter() to start the counting.

There’s no need to explicitely close() files because Rust closes them automatically when they fall out of scope.

Now I’ll modify the for loop in main() so that it calls process_file() and passes it a Path:

// Iterate through file names
for file_name in files.iter() {
    
    // Turn into a Path
    let path = Path::new( &file_name ); 

    // Execute count_file() on it, parsing the response.
    match process_file( path ) {

        Ok( ( lines, words ) ) => {
            println!("{}\t{} lines\t{} words.", path.display(), lines, words );
        },
        Err( err ) => {
            panic!("Error - {}", err );
        }

    };
}

Ok, we’re half way there.

Function: counter

Now I need to write counter() to take the reader to read lines from and count words.

fn counter<R: BufRead> ( reader: &mut R ) -> Result<( i32, i32 ), String> {

    // Define our line and word count variables
    let mut total_lines: i32 = 0;
    let mut total_words: i32 = 0;

    // Create a String. This will be where each line is read to
    let mut line = String::from( "" );

    // Start a loop
    loop{ 

        // Attempt to read a line into 'line'
        match reader.read_line( &mut line ) {

            // We successfully read some bytes
            Ok( _ ) => {

                    // Exit loop if we didn't read any data.
                    if line.len() == 0 {
                        break;
                    }

                    // Trim the string and increment lines & words
                    line = line.trim().to_string();
                    total_lines += 1;
                    total_words += count_words( &line );
                    // Clear the string buffer
                    line.clear(); 
            },
            // If an error occurred, return it early
            Err( why ) => return Err( why.to_string() )
        };
    

    }
    // Return the counts if everything went ok.
    Ok( ( total_lines, total_words ) )
}

I got stuck here for a while trying to directly pass the BufReader in to counter(), when actually I should’ve been more aware of how to use traits. It turns out that it’s better to use traits and static dispatch: (fn counter<R: BufRead>).

This is the best I can explain it: When I open the file, Rust returns a reader which can be of any type because they can be created from network sockets, files, devices, etc. What makes it a reader is the fact that it implements the Read trait. Therefore, our function should accept an unknown type as long as it’s a reader.

So counter() accepts objects with the BufRead trait, like our BufReader.

Anyway, the function reads lines from the reader and passes them to count_words(). If the line is blank we break and return the counts, and if there’s an error we return an Err early. It returns a successful result of an i32 tuple of lines and words.

Function: count_words

Next, I’ll write my count_words() function. This is the simplest one. It takes a string, iterates through the characters and counts the occurrances of whitespace, returning the count.

fn count_words( s: &String ) -> i32 {

    let mut words: i32 = 0;

    for c in s.chars() {

        if c.is_whitespace() {
            words += 1;    
        }
        
    }

    words + 1
}

Putting it all together

So our program is complete! Now I can run it to see what happens.

$ time cargo run ../data/comments.txt ../data/posts.txt
    Finished dev [unoptimized + debuginfo] target(s) in 0.0 secs
     Running `target/debug/wc-serial ../data/comments.txt ../data/posts.txt`
../data/comments.txt  4700 lines  1624970 words.
../data/posts.txt 22094 lines 4301338 words.

real  0m4.644s
user  0m4.558s
sys 0m0.049s

That’s about 34MB of data read in around 4.6 seconds.

The full program can be found on GitHub here.