Serve it up with Rust

2021-08-20

Crab Cakes

A fortunate find...

I recently stumbled on some wonderful material by a guy named Greg Wilson. While I was reading, I saw that he had contributed a chapter to a book (you know, the physical object with paper pages?!) that I happened to have on my shelf. It's called 500 Lines or Less: Experienced programmers solve interesting problems.

I enjoyed Greg's material so much that I decided to see what his chapter was all about. It turns out that he wrote a chapter on building a web server using Python. I thought this might be a good way for me to continue my learning journey with the Rust programming language.

Breaking down the chapter

Some things I know going in:

I'm confident in my knowledge of what a web server does
I have a firm understanding of TCP/IP i.e. I understand connections, network addressing and ports.
I know what a raw HTTP request looks like
I know the parlance of HTTP: e.g. GET, PUT, POST, DELETE, Request, Response, URL etc.

Things I'm less sure of going in:

I'm still a little fuzzy on how memory works in Rust, the ownership and borrowing stuff. I think this is because most of my experience is with dynamically typed, interpreted, and garbage collected languages.

With a focus on Rust specifically, here's what I will learn

How to use some free code (a dependency) to avoid writing code that parses HTTP and handles network connections
How to make it easier to add features later, a.k.a extensibility
How to create a set of functions that solve the general problem of handling HTTP requests

Where I'm at with Rust

This post is the first time I've tried to get some of my experience working with Rust down on paper. At this point in my journey with Rust, I have read parts of "The Rust Programming Language" - but I haven't read it cover-to-cover. I have also written some small, menial programs.

Bart learns rust

The questions

Absorbing some of the knowledge of how to learn is also something I've been doing lately. Another person who has contributed a lot on this topic is Julia Evans. In particular, she has written about using questions to hone in on what you don't understand. In thinking about how to apply some of those techniques to my own learning, I'm making note of my questions throughout this mini-project.

One interesting thing about some of these questions is that they came to mind while working with Rust in the code editor. The compiler nudged me in various directions that led me to some of these questions.

How to write less code? Crates!

It is "500 lines or less" after all.

I know that HTTP is a solved problem. Receiving HTTP requests from HTTP clients, de-serializing those requests, and sending the responses back down the wire are common to every interaction a client has with a server. Writing a parser to turn an HTTP request into an "object" in my program is biting off more than I want to chew for this mini-project.

In the Python programming language, the http package is included in the standard library. The solution in the book uses BaseHTTPServer. A quick check of the docs for Rust reveals that its standard library doesn't come with an http module. But I bet there are lots of Rust crates written by developers all over the world that solve this problem. So, how do I select a crate that will meet my needs?

A suitable crate will need to handle network connections to and from the client, and provide the HTTP APIs I need. With this in mind, it's time to go shopping on crates.io! Who doesn't love shopping?!

Searching for a suitable HTTP crate

Searching crates.io with keywords like http, web server, http server shows lots of options. There are a lot of crates to choose from. How to narrow it down?

When selecting any dependency my goal is to find just the right package given the context. This mini-project is low stakes and its purpose is for me to learn some specific things. I want to choose a dependency that is appropriate for the context.

The context is: a personal mini-project I'm using in order to learn some things about the Rust programming language.

I'm going to need to eliminate some options.

Here are some guidelines I can think of straight away:

Some of the crates solve problems I don't have
Some of the crates are small and only solve parts of my problem
Some of the crates are feature-rich and have features I don't need, and/or would be time consuming to learn

I can also think about this in terms of risk. The risk profile for this project is considerably different than the one I might think about for a more serious project. What might actually be a benefit of any given dependency in a serious project might in fact be a risk in this project. Even a project like this has risks - after all, my free time listening to music with a tasty beverage is at stake!

So what are the risks specific to this project?

Risk	Level	Mitigation
1. Written by people I don't know	Minor	Research the authors, read the source code
2. Written as components of larger frameworks *	Major	Avoid
3. Has Lots of dependencies on other crates	Minor	Do nothing
4. Has "pre-1.0" dependencies	Minor	Try it out, but switch to something else if necessary

* Might mean they will be difficult to integrate into my mini-project, they might also change down the road, which might break my little web-server when I update my dependencies

To get specific:

reqwest - a "higher level HTTP client library" solves a problem I don't have - I'm building a server, not a client
http-types - too small for this project as it solves getting an object to work with but it doesn't manage the networking stuff
http-connection - also too small for this project - it solves the networking stuff but not the parsing
I know I'm not going to be doing asynchronous stuff, so even though it's incredibly cool and worth learning about, anything that depends on Tokio is not a fit for my mini-project
actix-http looks very promising but it's a component of a larger framework - the "Actix Ecosystem" (Also incredibly cool) and not a fit given risk #2
Oooh hyper looks promising too: "A fast and correct http library" - but it's asynchronous which means it depends on Tokio - not a fit for this context
web_server - getting a LOT closer, but this one includes an API for routing requests, which isn't quite what I'm looking for
webby - the API looks good but the only author is a game developer from Amsterdam who wrote it just for kicks

Or, as my son William would say...

Hmm... I want to take a look at tiny-http:

It's written to be compliant with the HTTP standard
It accepts and manages connections
It parses requests which means I will get relevant objects to work with
It has an active repository owner and more than one contributor
It has relatively stable dependencies that make sense given the problem being solved
A cursory scan of the source code itself reveals nothing strange or glaringly out of place

I think we have a winner!

Learning to use the selected crate

Time to get the example from tiny_http's README working. After initializing a new rust project and adding tiny_http as a dependency in Cargo.toml, I got to work creating a web server.

extern crate tiny_http //v0.8.2
use tiny_http::Server;

// Run this, and visit localhost:8000/foo with a browser, OR
// make a get request using curl or httpie
fn main() {
  let server = Server::http("127.0.0.1:8000").unwrap();

  for request in server.incoming_requests() {
    println!("{}", request.url());
  }
}

Under the hood, tiny_http takes care of creating network sockets and provides APIs with Request, Response, Header Method and StatusCode types. This will do nicely to play the role of Python's BaseHttpServer That's my first question solved!

However, I don't have a BaseHttpRequestHandler, like the program from Chapter 22 of the book. I guess I'll have to write one because I need a way to handle requests coming from clients. I'll start by doing something more interesting in main. In addition to sending some more useful info to the console via stdout, I will pass requests to a RequestHandler.

for request in server.incoming_requests() {
  println!("method: {:?}\nurl: {:?}\nuser-agent: {:?}\n",
    request.method(),
    request.url(),
    request.headers()[1].value
  );

  // Doesn't compile yet.
  let response = RequestHandler::new(request).handle_request();
}

I'm learning to read the compiler output, so I like to run cargo build a lot in order to train my eyeballs to look at the useful data coming out of the compiler. I know this won't compile yet because I haven't implemented this type yet, but it's giving me good practice - like practicing free-throws on the basketball court.

error[E0433]: failed to resolve: use of undeclared type `RequestHandler`
  --> src/main.rs:78:24
   |
78 |         let response = RequestHandler::new(request).handle_request();
   |                        ^^^^^^^^^^^^^^ use of undeclared type `RequestHandler`

So here's an implementation that might work. For now, I'm just going to print something.

use tiny_http::{Server, Request, Response};

struct RequestHandler {
  request: Request,
}

impl RequestHandler {
  fn new(request: Request) -> Self {
    RequestHandler {
      request: request
    }
  }

  pub fn handle_request() {
    println!("handling request...");
  }
}

error[E0599]: no method named `handle_request` found for struct `RequestHandler` in the current scope
  --> src/main.rs:85:53
   |
20 | struct RequestHandler {
   | --------------------- method `handle_request` not found for this
...
85 |         let response = RequestHandler::new(request).handle_request();
   |                        -----------------------------^^^^^^^^^^^^^^
   |                        |                            |
   |                        |                            this is an associated function, not a method
   |                        help: use associated function syntax instead: `RequestHandler::handle_request`
   |
   = note: found the following associated functions; to be used as methods, functions must have a `self` parameter
note: the candidate is defined in an impl for the type `RequestHandler`
  --> src/main.rs:31:5
   |
31 |     fn handle_request() {
   |     ^^^^^^^^^^^^^^^^^^^

So why isn't this method working... wait what is an "associated function"? Running a search...

From https://doc.rust-lang.org/book/ch05-03-method-syntax.html#associated-functions

... Another useful feature of impl blocks is that we’re allowed to define functions within impl blocks that don’t take self as a parameter. These are called associated functions because they’re associated with the struct.

Oh right - I forgot. Method syntax in Rust requires the first parameter of the method to be &self - a reference to the struct the method belongs to. But it's also possible to omit &self which is called an "associated function". Associated functions have to be called with a different syntax: RequestHandler::handle_request(). Adding the &self param and moving right along...

The simplest design I can think to start with is to have the RequestHandler return tiny_http::Response objects. It looks like the tiny_http::Request has a method called respond() which takes a Response instance as an argument. So my request handler just needs to return Response.

(Note: Leaving off the semicolon for expressions that are being returned means the return keyword can be omitted.)

// Doesn't compile - the return type ( -> Response ) isn't right
fn handle_request(&self) -> Response {
  Response {}
}

error[E0107]: missing generics for struct `Response`
  --> src/main.rs:31:37
   |
31 |     pub fn handle_request(&self) -> Response {
   |                                     ^^^^^^^^ expected 1 generic argument
   |

So I need to learn how tiny_http::Response types work. How do I create one? Looking at the docs shows a few different ways of creating a Response, and Response::from_string() looks like it will do the trick. What is the type signature of its return? Response<std::io::Cursor<Vec<u8>>>.

I know what a Vec<u8> is, but what is a std::io::Cursor? To the docs I go!

From https://doc.rust-lang.org/std/io/index.html#structs

Cursors are used with in-memory buffers, anything implementing AsRef<[u8]>, to allow them to implement Read and/or Write, allowing these buffers to be used anywhere you might use a reader or writer that does actual I/O.

Sounds good to me. I don't think I have a compelling reason to use anything other than a buffer of bytes to respond to requests. And now I know how to return some static HTML!

fn handle_request(&self) -> Response<Cursor<Vec<u8>>> {
    Response::from_string("<html><body><h1>Hello web!</h1></body></html>")
}

So that function compiles (no red squiggles in my editor!), but back in main, the compiler is not happy.

error[E0382]: use of moved value: `request`
  --> src/main.rs:85:9
   |
77 |     for request in server.incoming_requests() {
   |         ------- move occurs because `request` has type `Request`, which does not implement the `Copy` trait
...
84 |         let response = RequestHandler::new(request).handle_request();
   |                                            ------- value moved here
85 |         request.respond(response).unwrap();
   |         ^^^^^^^ value used here after move

So... my first stab at this threw me right into learning about ownership. Which brings me to the next question. Which function actually owns the request?

Requests are sent by clients into a network socket which is managed internally by the server variable. The method incoming_requests() is an iterator over the requests that are received - so it is creating Request objects and then handing them over.

fn main() {
    let server = Server::http("127.0.0.1:8000").unwrap();
    for request in server.incoming_requests() {
        println!("method: {:?}\nurl: {:?}\nuser-agent: {:?}\n",
            request.method(),
            request.url(),
            request.headers()[1].value
        );

        let response = RequestHandler::new(request).handle_request();
        request.respond(response).unwrap();
    }
}

So when this for loop completes an iteration, the request variable goes out of scope and gets dropped. In other words, the memory that the for loop is keeping track of for the Request objects it creates gets cleared out after request.respond() method returns.

At least for now in this program, I'm going to make a mental note and think of main as having ownership of the requests. (Although it's more like the for loop itself has ownership.)

Taking a closer look at the tiny_http::Request.respond() method, I see the function signature is:

pub fn respond<R>(mut self, response: Response<R>) -> Result<(), IoError>
  where R: Read

Because the response parameter in the function signature is not a reference (there's no & character), I can say that it consumes the response. It takes ownership of it. This makes sense because after the response is sent back to the client, there is no need to keep it hanging around in memory. The program is done with it. So it gets freed when respond() finishes its work.

Having a mental model for ownership and borrowing really helps when writing Rust. This explanation works well for me: intorust.com - Ownership - or a more detailed explanation here: youtube - Rust Ownership

A quick and dirty way to think of ownership and borrowing translated directly into code is:

Borrowing Basics

name: String: owned value. freed when it goes out of scope. "Consumed"
name: &String: borrowed (a.k.a shared) reference. "Many readers, no writers"
name &mut String: mutable reference. "exactly 1 writer, no outside readers"

With this in mind, I take another look at my implementation of the request handler.

struct RequestHandler {
    request: Request
}

impl RequestHandler {
    fn new(request: Request) -> Self {
        RequestHandler {
            request: request
        }
    }

    fn handle_request(&self) -> Response<Cursor<Vec<u8>>> {
        Response::from_string("<html><body><h1>Hello web!</h1></body></html>")
    }
}

There is no & character on the request. The new() function consumes (or expects to take ownership of) the request. This is a problem because back in main(), request.respond() is called next.

Since request is not borrowed by the handler, it can't be given back to main() so that request.respond() can be called.

The way it is currently written, the expectation is that the handler will be responsible for freeing request whenever it goes out of scope.

I could try to use clone() in this situation to make a byte-for-byte, deep copy of the request, but that seems like it would use more memory than necessary. It would be inefficient.

The thing to do then, is to pass a reference so that RequestHandler can borrow the request. There should be no need to change (or mutate) the data within the request itself, so it doesn't need to be mutable. That means I could pass it around to as many functions as needed. This would be the "many readers" case for borrowing.

let response = RequestHandler::new(&request).handle_request();

A reference to a value has a different type than an owned value. So the RequestHandler struct also has to be changed to accept a ref type.

struct RequestHandler {
    request: &Request,
}

error[E0106]: missing lifetime specifier
  --> src/main.rs:21:14
   |
21 |     request: &Request,
   |              ^ expected named lifetime parameter
   |
help: consider introducing a named lifetime parameter
   |
20 | struct RequestHandler<'a> {
21 |     request: &'a Request,
   |

Why thank you compiler! I think I will introduce a named lifetime parameter... just as soon as I figure out what that means. I guess you want me to do this...? A little sprinkling of <'a> and &'a?

struct RequestHandler<'a> {
    request: &'a tiny_http::Request,
}

impl<'a> RequestHandler<'a> {
    fn new(request: &'a tiny_http::Request) -> Self {
        RequestHandler {
            request: &request
        }
    }

// -- snip --

What is this "lifetime parameter" thing?

Wait a minute. Before I dive into that, I just noticed something. This is the first time this program has compiled without any errors! So before I do anything else, I'm going to celebrate; I'm going to send a real-life, honest-to-goodness http request to this newly minted http server.

Using cargo run to start the server, and a lovely http client called httpie to make the request, I hereby claim a small but well deserved victory.

~ http get 127.0.0.1:8000
HTTP/1.1 200 OK
Content-Length: 45
Content-Type: text/plain; charset=UTF-8
Date: Sat, 18 Sep 2021 20:41:44 GMT
Server: tiny-http (Rust)

<html><body><h1>Hello web!</h1></body></html>


~

-- VICTORY DANCE --

At this point, I have answered 5 of my questions. I now know:

This program will speak HTTP using the tiny_http crate
Method versus associated function syntax in Rust
How to return some static HTML
That a std::io::Cursor is a chunk of memory that can be read like a file
That the main function has ownership of the request instances

I also find that taking notice of small victories throughout the learning process is helpful. This is one reason why it's helpful to keep the program runnable as I go. Making sure things compile along the way as I add new code can be tricky when I don't know exactly what I want the program to do yet. Little tricks like just printing something in a function that isn't finished yet can sometimes be helpful. Or in the case of this server, just getting out a single, hard-coded HTTP response. Setting an overly-simplified, tiny goal that uses the minimal amount of functionality possible to get to a runnable program is especially helpful when learning something new.

So. Lifetimes.

From https://doc.rust-lang.org/book/ch10-03-lifetime-syntax.html

...every reference in Rust has a lifetime, which is the scope for which that reference is valid. Most of the time, lifetimes are implicit and inferred, just like most of the time, types are inferred. We must annotate types when multiple types are possible. In a similar way, we must annotate lifetimes when the lifetimes of references could be related in a few different ways. Rust requires us to annotate the relationships using generic lifetime parameters to ensure the actual references used at runtime will definitely be valid.

So... because the request being handled is now a reference (&request instead of request), I have to tell the compiler how that reference relates to the object that borrows it. Lifetime annotations can usually be inferred. In many cases they are filled in for me, magically by the compiler, at compile time. But, in this case, the compiler needs a lifetime annotation. (Somewhere in the back of my mind I wonder if that can be avoided...)

I'm passing a reference to the incoming request to my RequestHandler. I have already established that main owns the request. Which means any function that needs access to the request, needs to borrow it from main. So what is the relationship between the request and the request handler? Also, what function owns RequestHandler? What should own RequestHandler?

It makes sense that the RequestHandler would need to be hanging around in memory at least as long as the request itself. Does it also make sense that when the request is freed, the request handler no longer serves a purpose, so it could also be freed?

As currently written, the lifetime annotations that were added to RequestHandler specify that the reference to the request has a lifetime of 'a. Because the struct holds onto that reference, it also needs a lifetime of 'a, and so does the impl.

In the new() function, the relationship between the request reference and the RequestHandler becomes visually very clear. The lifetime annotations are explicitly tying the two together. Those 'a annotations are saying that these two things need to stick around in memory for exactly the same amount of time.

But wait a minute... the web server needs to handle a whole bunch of requests. Requests will be coming in constantly - the job of the RequestHandler is to figure out how to respond to any request that gets sent its way.

So wouldn't it make more sense for the RequestHandler to outlive the request? Wouldn't it be better for it to stick around longer in memory so that it can also handle subsequent requests?

Then, ideally, RequestHandler would have the same lifetime as the server. The server already outlives the requests, and so should the RequestHandler.

Since main owns the tiny_http::Server, then, main also needs to own the RequestHandler. When the entire process exits, Server and RequestHandler will be freed.

That is starting to sound closer to what I want. Rather than specifying matching lifetimes between a Request object and a RequestHandler object, it sounds much better to have matching lifetimes between a RequestHandler and the Server. The individual requests should be freed after a response is sent, but the handler will still be alive and kicking, reading requests and returning responses.

Here's what main looks like right now:

fn main() {
    let server = Server::http("127.0.0.1:8000").unwrap();
    for request in server.incoming_requests() {
        // -- snip --

        let response = RequestHandler::new(&request).handle_request();
        request.respond(response).unwrap();
    }
}

A new RequestHandler is created inside the loop - which means a new one gets created every time a client makes a request. That's why they have the same lifetime! Also, it could be inefficient to create a brand new handler every single time a request is received. RequestHandler::new() is in the wrong spot.

The handler needs to be owned by main just like server, therefore it needs to be created outside of the loop.

This will also mean that instead of passing the &request to new(), it needs to be passed to the handle_request() method on the RequestHandler.

fn main() {
    let server = tiny_http::Server::http("127.0.0.1:8000").unwrap();
    let handler = RequestHandler::new();

    for request in server.incoming_requests() {
        // -- snip --

        let response = handler.handle_request(&request);
        request.respond(response).unwrap();
    }
}

This is much closer to to how the server is initialized in the Python example from 500 Lines Or Less.

In Python, using the http.server module looks like this:

def run(server_class=HTTPServer, handler_class=BaseHTTPRequestHandler):
    server_address = ('', 8000)
    httpd = server_class(server_address, handler_class)
    httpd.serve_forever()

In the above Python snippet, The handler_class parameter is passed to the server. It must exist in memory for as long as server_class exists in memory.

Now that the relationship (lifetimes) between an incoming request and the handler have been separated, the RequestHandler struct no longer needs to have a request field.

The compiler can then infer lifetimes again, and the 'a annotations are no longer necessary. Woo hoo!

Here's what the implementation looks like now:

struct RequestHandler {}

impl RequestHandler {
    fn new() -> Self {
        RequestHandler {}
    }

    fn handle_request(&self, request: &Request) -> Response<Cursor<Vec<u8>>> {
        Response::from_string("<html><body><h1>Hello web!</h1></body></html>")
    }
}

At this point, the small goal of returning a hard-coded bit of HTML has been met. So now I need to move on to adding some more functionality to this server. Right now it's useless. It's time to change that!

Adapting the concepts from Python to Rust

This is where the fun begins!

In reading a bit more of the chapter of the book, I noticed a pattern. The general idea is that the server needs to respond to a handful of different scenarios. For example, in one case, an incoming request could be for a specific file. In another case, it could be a directory. In either case, the web server needs to decide what to do. Both of these cases have two things in common. The server needs to decide if the request can be handled, and then it needs to handle it by creating an appropriate response.

Each case is represented as its own class. The classes for every case all have the same two functions test() and act(). test() determines if the request can be handled and act() does the work of responding to the request. Because these two functions are common to multiple classes, this feels like a concept called "duck typing".

From https://docs.python.org/3/glossary.html#term-duck-typing

duck-typing

A programming style which does not look at an object’s type to determine if it has the right interface; instead, the method or attribute is simply called or used (“If it looks like a duck and quacks like a duck, it must be a duck.”) By emphasizing interfaces rather than specific types, well-designed code improves its flexibility by allowing polymorphic substitution.

So, how do I do duck-typing in Rust? The first language feature that immediately comes to mind is the trait. Rust traits are a lot like interface in other languages. Traits tell that ~~persnickety~~ detail-oriented compiler that an impl is required to have certain functions. Traits enable polymorphism in Rust. So I think I need a trait.

trait HandleRequest {
    fn can_handle(&self, request: &Request) -> bool; // my chosen name for test()
    fn handle(&self, request: &Request) -> Response<Cursor<Vec<u8>>>; // my chosen name for act()
}

So, to take the case: "get me a static file please", I need an impl to be in charge of handling that case. I'll use a struct and an impl to create a type for the static file case, as well as an implementation for the trait I already wrote.

Since the interface for handling a request only needs to be able to read the request, read it but not change it, the request can be borrowed as an immutable reference.

(Note: todo!() is a helpful macro when stubbing out code structure).

// Side-note, a struct declaration without braces is known as a 'Unit Struct'.
// 'Unit structs' are 0 bytes in size, evaluate to `()` and are used
// when a trait needs to be implemented, but there is no need to store
// any data or fields.
struct StaticFile;

impl HandleRequest for StaticFile {
    fn can_handle(&self, _: &tiny_http::Request) -> bool {
        todo!()
    }

    fn handle(&self, _: &tiny_http::Request) -> tiny_http::Response<std::io::Cursor<std::vec::Vec<u8>>> {
        todo!()
    }
}

Next, I need to loop over all the various cases, calling can_handle() and handle(). While I'm at it, I'm going to change handle_request() so that it can delegate the handling of incoming requests to another function that is specific to the kind of HTTP request being handled.

I'm not thinking about PUT, POST, or DELETE requests yet, but web servers do need to know what to do with those. Delegating will be better because handling all of those different kinds of requests in one function could get messy. This will also follow more closely with the structure of the Python example from the book.

impl RequestHandler {
// -- snip --

fn handle_request(&self, request: &Request) -> Response<Cursor<Vec<u8>>> {
  match request.method() {
      Method::Get => self.handle_get(request),
      _ => panic!()
  }
}

fn handle_get(&self, request: &Request) -> Response<Cursor<Vec<u8>>> {
  for case in self.handlers.iter() {
    if case.can_handle(request) {
      let response = case.handle(request);
      return response;
    }
  }
}

I enjoy using pattern matching (the match keyword) in Rust - it's a satisfying language feature.

Don't send a POST request lest you crash the server! panic!() is a placeholder for now - I'll figure out what to do with that later.

I'll also need a handlers field to hold a collection of the objects for each of the cases. For now, it will just hold the StaticFile case, but there will be other cases, like file not found for example.

struct RequestHandler {
  handlers: Vec<dyn HandleRequest>
}

impl RequestHandler {
    fn new() -> Self {
        let cases: Vec<dyn HandleRequest> = vec![
            StaticFile {}
        ];

        RequestHandler {
            handlers: cases
        }
    }

  // -- snip --
}

error[E0277]: the size for values of type `(dyn HandleRequest + 'static)` cannot be known at compilation time
   --> src/main.rs:21:15
    |
21  |     handlers: Vec<dyn HandleRequest>
    |               ^^^^^^^^^^^^^^^^^^^^^^ doesn't have a size known at compile-time
    |

This is one crux of doing duck-typing in Rust. The compiler needs to know how much memory objects will take up. But in this case, the objects stored in handlers aren't all the same type, they just share the same interface. I wonder what the Rust book has to say on this topic... To the Book!

From https://doc.rust-lang.org/book/ch17-02-trait-objects.html

...create a trait object by specifying some sort of pointer, such as a & reference or a Box smart pointer, then the dyn keyword, and then specifying the relevant trait.

So I need to put each case in a Box. Fair enough. What exactly is a Box<T>?

From https://doc.rust-lang.org/book/ch15-01-box.html

The most straightforward smart pointer is a box, whose type is written Box. Boxes allow you to store data on the heap rather than the stack. What remains on the stack is the pointer to the heap data.

You’ll use them most often in these situations:

When you have a type whose size can’t be known at compile time and you want to use a value of that type in a context that requires an exact size
When you have a large amount of data and you want to transfer ownership but ensure the data won’t be copied when you do so
When you want to own a value and you care only that it’s a type that implements a particular trait rather than being of a specific type

Of the above common use-cases for Box<T>, the first and third apply to this problem.

The compiler is telling me it doesn't know the size of the type because dyn HandleRequest is a trait object. It only represents a common set of functions that are needed, it doesn't refer to a concrete type.

What I need is a collection of pointers to my handlers. Since pointers are a known, fixed size, the compiler should be happy. Vec is my goto whenever I need a collection of something, so I will try a Vec of Box<T>s.

struct RequestHandler {
    handlers: Vec<Box<dyn HandleRequest>>
}

impl RequestHandler {
    fn new() -> Self {
        let cases: Vec<Box<dyn HandleRequest>> = vec![
            Box::new(StaticFile),
        ];

        RequestHandler {
            handlers: cases
        }
    }

// -- snip --

Now that RequestHandler is happy, I have another error to sort out:

   |
42 |       fn handle_get(&self, request: &Request) -> Response<Cursor<Vec<u8>>> {
   |                                                  ------------------------- expected `Response<std::io::Cursor<Vec<u8>>>` because of return type
43 | /         for case in self.handlers.iter() {
44 | |             if case.can_handle(&request) {
45 | |                 let response = case.handle(&request);
46 | |                 return response;
47 | |             }
48 | |         }
   | |_________^ expected struct `Response`, found `()`
   |

handle_get() is supposed to return a tiny_http::Response type - which it will do when it finds the right handler to use. But what if it doesn't? If it gets all the way through this loop and has nothing to show for it, what should the server do then?

One option would be to call panic!() and crash the program, but keeping the server running would be better for clients.

So the server needs to return a response that indicates an error of some kind instead of crashing. What does HTTP have to offer for errors?

From https://httpwg.org/specs/rfc7231.html#status.codes

The first digit of the status-code defines the class of response. The last two digits do not have any categorization role. There are five values for the first digit:

1xx (Informational): The request was received, continuing process
2xx (Successful): The request was successfully received, understood, and accepted
3xx (Redirection): Further action needs to be taken in order to complete the request
4xx (Client Error): The request contains bad syntax or cannot be fulfilled
5xx (Server Error): The server failed to fulfill an apparently valid request

I see two options that look appropriate. 500 Internal Server Error or 501 Not Implemented.

After reading the descriptions of these errors, I think 500 is the best choice. 501 is for request methods (e.g. GET, PUT, POST, DELETE etc.) that the server can't respond to but may be able to in the future.

fn handle_get(&self, request: &Request) -> Response<Cursor<Vec<u8>>> {
    for case in self.handlers.iter() {
        if case.can_handle(request) {
            let response = case.handle(request);
            return response;
        }
    }

    // Catch-all server error if no other handlers can handle the request.
    Response::from_string("<html><body>500 Internal Server Error</body></html>")
        .with_status_code(StatusCode::from(500))
}

And this now compiles, woo hoo! This 500 error is great because it will handle my programming mistakes with some grace. The next thing I need to do is actually flesh out the implementation for each case. That's the fun part! Now, I'll turn my attention to implementing the StaticFile case. I need to write can_handle() and handle() for requests that are for a file that exists, and is readable.

To figure out which file is being requested, I need to look at the request itself and return a local path to the file. I think a function on the trait HandleRequest called get_path(req) makes sense. But what should it return? I need to make sure the file exists and it can be opened. Poking around the rust standard library shows the PathBuf type in std::path has what I need to implement these methods.

trait HandleRequest {
    // -- snip --

    get_path(&self, req: &Request) -> PathBuf {
        // Disregard the leading slash.
        let (_, uri) = req.url().split_at(1);
        let uri = uri.as_ref();

        // Use the working directory as the server root.
        let cwd = env::current_dir().unwrap();

        cwd.join(uri)
    }
}

Now that I can get a file path out of the request, I need to deal with the file itself.

struct StaticFile;

impl HandleRequest for StaticFile {
    fn can_handle(&self, req: &Request) -> bool {
        let path = self.get_path(req);
        path.is_file() && File::open(path).is_ok()
    }

    fn handle(&self, req: &Request) -> Response<Cursor<Vec<u8>> {
        let path = self.get_path(req);
        self.handle_file(path)
    }
}

As well as implement a handle_file() function. This is will be the meat and potatoes of dealing with a static file.

  trait HandleRequest {
    // -- snip --

    fn handle_file(&self, path: PathBuf) -> Response<Cursor<Vec<u8>>> {
        let mut buffer = Vec::new();
        let content_type = get_content_type(&path);
        let headers = vec![Header {
            field: "Content-Type".parse().unwrap(),
            value: content_type.parse().unwrap(),
        }];

        let mut file = File::open(path).unwrap();
        file.read_to_end(&mut buffer).unwrap();
        let data = Cursor::new(buffer).unwrap();

        Ok(Response::new(
            StatusCode::from(200),
            headers,
            data,
            None,
            None,
        ))
    }

So that compiles, yay, but, how do I know it actually works? I could test it just by running the server and making a request like before, or I could write a test. Rust has unit testing built right in to the language and tool set, which is really convenient.

The trick is going to be making a "fake request" from code, instead of using the httpie utility from the command line like I did before. I'm wondering if the tiny_http crate has any testing of its own? Perhaps it provides something handy for this? Ah... TestRequest. All I need to do is have tiny_http create a test request, pass it into the request handler, and then check that the result is 200 A-Okay.

// Note that it is good style to put tests into their own module.
#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn serves_static_file() {
        let handler = RequestHandler::new();
        let request = tiny_http::TestRequest::new()
            .with_method(Method::Get)
            .with_path("/test_data/test_file");

        let response = handler.handle_request(&request.into());

        assert_eq!(tiny_http::StatusCode(200), response.status_code());
    }
}

Run cargo test at the command line and...

Finished test [unoptimized + debuginfo] target(s) in 0.03s
Running unittests src/main.rs (target/debug/deps/ch22-fb3f2637e9d26e83)

running 1 test
test tests::serves_static_file ... ok

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

Bingo. Static file delivered.

Cowabunga!

This has been a firehose of learning, and a lot of fun to work on. I'm really happy I managed to answer all of the questions I had. And I'm even more happy that I now have many more questions to dive into!

Bart Simpson

For example, what's up with all these calls to unwrap()? That seems kinda smelly to me because unwrap() will crash the program if it fails. It's like saying to the compiler, "Don't worry buddy, this can't POSSIBLY fail so just give me what I need." But what if the file doesn't exist?

In order for this code to be closer to production ready, it will need to handle errors properly. But that's a post for another day.

Summing up some Rust learnings

Crates.io is an excellent resource for finding and doing homework on external packages & libraries known as crates.
The difference between an associated function and a method is the first parameter self (lowercase s, not Self uppercase S).
Buffered I/O can be done with Cursor.
Lifetimes are hints added to the code to tell the complier about the relationship between function parameters that are references so that they can be checked for validity at compile time. This means no more segfaults because of use after free errors, dangling pointers, and race conditions.
Rust doesn't really have "Classes, Objects, and Interfaces" in the traditional sense. It has struct which holds data fields, and impl and trait which define and enforce an API.
Box is the smart pointer type, and it stores stuff on the heap.
Polymorphism can be done with traits and dynamic dispatch - a bit like v-tables in C++.
Successful compilation does not guarantee correctness - so use tests. It's incredibly convenient in Rust because it's baked right in to the standard tools that come with the language.

Development environment used

Rust is well supported on many platforms. Here's what I used for this mini-project:

Mac OS (with XCode 12.4 & command line tools installed)
rustup for managing the Rust language itself (Rust v1.54.0, 2018 Edition)
rustup components:
- rls-x86_64-apple-darwin (deprecated in favor of rust-analyzer)
- rust-analysis-x86_64-apple-darwin
- rust-src
- rust-std-x86_64-apple-darwin
- rustc-x86_64-apple-darwin
- rustfmt-x86_64-apple-darwin
Visual Studio Code & the following extensions:
- Rust support for Visual Studio Code
- Rust Syntax
httpie for sending http requests to my web server from the command line

Note from Fall of 2022: Recommended VSCode extension for Rust is now rust-analyzer