In my last blog post, I wrote about learning Rust and implementing the RaptorQ (RFC6330) fountain code. I only optimized the library for handling small message sizes, since it was mainly a project to help me learn Rust. However, since releasing it, a number of people have started using the raptorq crate, so I’ve been working on making it more polished.
I recently decided to learn more about Rust, and wrote a high performance RaptorQ (RFC6330) library. RaptorQ is a fountain code, and the core of the algorithm is a lot of matrix math over GF(256) – which translates into lots of XORs and reads from lookup tables. After getting the initial implementation working, I set about optimizing it. Below is a journal of the steps I took to profile and optimize the implementation. By the end, I’d achieved a 24.7x speedup!