Profile Photo

Jamie Skipworth


Technology Generalist | Software & Data


Getting Rusty

I like systems software development. I’m probably not very good at it, but I still like playing around with the nuts and bolts of systems. Most people think of software as just the applications they use, but there’s a ton of other magic happening beneath them.

Software can be divided into 3 fat layers. From bottom to top, the they can be loosely defined as:

  • Operating system (OS) software (provides hardware services. E.g. I/O)
  • System software (provides services to other software. E.g. a file-system)
  • Application software (provides services to users. E.g. a file manager)

Software stack

Generally speaking, the closer you are to the metal (like an OS) the more you’ll need to use lower-level languages to twiddle its bits manually 😉. The further up the stack you go, you’ll more often use a language offering higher-level abstractions and friendlier syntax (despite some developers’ best efforts) to get things done faster.

And there are a lot of languages. Usually when a new one pops up (which is often!), it’s designed to solve a particular problem.

  • C was created to write systems and operating systems (hello Unix), it is small, fast and simple.
  • Go came out of Google to make concurrency and scaling easier.
  • Java was born as a “write once, run anywhere” language; a portable way to write applications.
  • Perl was designed for sadists who wanted the power of C, with none of the readability.
  • Haskell was created to give functional programmers something to talk about.

It goes on and on. They all have a particular purpose with their particular benefits and drawbacks. Anyway, enough rambling.


Rust

I’ve started playing with Rust, a newer systems language squeezed out of Mozilla and most notably used in the Firefox Quantum project. Rust is a compiled, statically-typed language that has a strong focus on memory safety and speed.

Older languages like C let you do pretty much anything with memory, which isn’t necessarily bad, but is very hard to get right, resulting in lots of bugs and security nightmares.

Rust interests me because a) I’m a geek, b) it seems to have learned a lot of lessons from other languages, and c) it has an interesting approach to memory management. It can be used to write pretty much anything from OSs to browsers.

Rust manages memory using a system of ownership, which is what I’m going to quickly whiz through because I found it hardest to grasp. This post is just me putting my thoughts down in a way I understand. It’s really just a quick how-to for my own benefit. I’ll gloss over some of the basics, and instead recommend reading the docs.

Ownership & Scopes

Top-tip: Temporarily forget everything you’ve previously learned about pointers. I found trying myself trying to compare C-like pointers to those in Rust. Don’t do this, because it’s just confusing.

Ownership in Rust is actually quite simple - there are only 3 rules:

  • Each value in Rust has a variable that’s called its owner.
  • A value can only have one owner at a time.
  • When the owner goes out of scope, the value will be dropped.

When you define a variable you’re binding a name to a value, like let my_int = 42. In Rust this means the value 42 is owned by my_int. This binding exists only for the lifetime of the block of code within which it’s defined, so variables have block scope.

This example demonstrates this behaviour. If I bind a variable within a block, once the block ends it is freed.

fn main() {

  // Define a string within a block. Remember everything is immutable
  // by default in Rust.
  {
      let hello: String = String::from( "Hello, world!" );
  }
  // Our block scope has ended. 'hello' is free!

  // Try to print 'hello'. This will fail. 
  // This will only work if println is moved to within the above block
  println!( "{}", hello );
}

Once that block exits, then that binding goes out of scope and Rust will free the variable. This example won’t compile because hello is out of scope when println! tries to print it.

A variable can only have one owner. If I try to give a variable more than one binding like this:

fn main() {

  // Define an immutable string.
  let hello1 = String::from( "Hello, world!" );

  // Create another binding to hello1. Naughty!
  let hello2 = hello1;

  // Try to print the variable out. This will fail.
  println!( "{} - {}", hello1, hello2 );
}

The compiler will tell me I’m an idiot.

$ rustc foo.rs
error[E0382]: use of moved value: `hello1`
  --> ex2.rs:10:24
   |
7  |   let hello2 = hello1;
   |       ------ value moved here
...
10 |   println!( "{} - {}", hello1, hello2 );
   |                        ^^^^^^ value used here after move
   |
   = note: move occurs because `hello1` has type `std::string::String`, which does not implement the `Copy` trait

error: aborting due to previous error

But why? Doesn’t the code above seem reasonable? The compiler says I’ve “moved” the value, which essentially means we’ve transferred ownership.

First, a little bit about how memory allocation works. Any data with an unknown or variable length at compile-time (like the more complex type String), will be stored on the heap with it’s fixed-length metadata on the stack, like this:

Memory

What happened in the above example is that when we did let hello2 = hello1, Rust made a shallow copy of the variable (the stack data only). So now we have two variables pointing to the same heap data, so the data essentially now has 2 owners. This violates the single-ownership rule.

This causes memory-safety issues, so Rust invalidates the previous variable hello1. Ownership has moved to hello2. Now we only have one variable with a pointer to the heap data.

There’s a whole chapter in the Rust book that explains how this works in much more detail.

References & Borrowing

So if values can only have one owner, then how the hell do I pass stuff around or change things? I can’t create a variable and then pass it to a function because a) bindings exist only in block scope, and b) it violates the single ownership rule.

Instead I can use references to these values. When I use a reference to a value, I can use it without taking ownership of it. A reference isn’t freed when it goes out of scope. The Rust documentation has a better description here. Here’s an example.

fn say_hi( s: &String ) {

    // This function prints a string and returns.
    // It accepts a reference to a string.
    println!( "{}", s ); 

}

fn main() {

  // Define a string 
  let hello = String::from( "Hello, world!" );

  // Borrow the string by passing a reference to say_hi()
  say_hi( &hello );
  
}

What if I wanted to modify the string? Everything in Rust is immutable by default. You have to use a mutable reference instead. These allow me to modify the value of something I’m referencing. Rust only allows one mutable reference to a value in a particular scope, no more!

fn say_bye( s: &mut String ) {

    // This function appends to a string and returns.
    // Because it modifies an object, it is passed a mutable reference.
    s.push_str( " Goodbye!" ); 

}

fn main() {

  // Define a string we can modify (using mut)
  let mut hello = String::from( "Hello, world!" );

  // Pass a mutable reference to the string to say_bye()
  say_bye( &mut hello );

  // Print the variable out.
  println!( "{}", hello );
  
}

Passing references into a function is called borrowing. The function is only going to briefly take ownership of the values to do some work, before giving them back.

In the above example the say_bye() function appends Goodbye! to a given string. The s: &mut String in our function spec tells it to expect a mutable reference (&mut) to the input parameter.

My Hello, world! string is now defined as mutable using mut, and I pass a mutable reference to the function using say_bye( &mut hello );. This allows me to change the value of the string.

Smart Pointers

What if you have data that has multiple owners? How do you pass data around which needs modifying by many different processes? Bingo - smart pointers.

A common smart pointer in Rust is a Box, which stores data on the heap. It’s a generic container for any type or struct, so is ideal for complex types or recursive structures.

Here’s a horrible looking example where I have a struct wrapped in a Box. There lots of stuff in here I’m not talking about like Option, Some, traits, etc:


// Define a struct to contain a sentence and word-count
struct Sentence {
  sentence: String,
  words: Option<i32>
}

// Implement the Display trait so we can print the struct nicely
impl std::fmt::Display for Sentence {

  // fmt requires a Formatter and returns a Result
  fn fmt( &self, f: &mut std::fmt::Formatter ) -> std::fmt::Result {

    // write! the output to the formatter.
    // words is an Option. It is set to Some or None. To extract 
    // the value we need to unwrap it and display a default if None.
    write!( f, "Sentence (length {}): \"{}\"", self.words.unwrap_or( 0 ), self.sentence )
  }
}


fn main() {

  // Create our struct
  let hello_world = Sentence { 
    sentence: String::from( "Hello, world!" ),
    words: None
  };
  
  // Box the struct
  let mut world_box = Box::new( hello_world ) ;
  
  // Set the number of words using Some because
  // the type is an Option.
  world_box.words = Some( 2 );

  // Print the struct value
  println!( "{}", world_box );
  
}

All that fluff does this:

$ ./foo
Sentence (length 2): "Hello, world!"

Honestly there’s waay better documentation & examples on the Rust site so I’d recommend that. I’m not sure I can really make it any easier to understand. Like I said, this is really for my own benefit. Hopefully typing all this out will cement it for me.

I might try to build something more useful later. Hopefully I’ll still remember all this by then.