Since the last time I posted about ring buffers, they've been appearing in several places in my latest project. I started with a hard coded ring buffer in a serial port driver, but I discovered a need for ring buffers in my SPI driver as well. So I refactored the code into its own library header file and saved a few bytes of flash memory in the process. Grab the code and use it in your own projects.
I want to highlight one small point that caught me by surprise when I was writing the library. Take a look at the code that declares the queue in global memory.
/**
* Declares a circular queue of the specified size as a global
* variable.
*/
#define CQ_DECLARE(var_name, size) \
static volatile struct _##var_name##_tag \
{ \
uint8_t rIdx; \
uint8_t wIdx; \
uint8_t buf[(size)]; \
} var_name;
The queue is declared volatile to make sure the compiler won't register optimize accesses to the read and write indexes in case you want to poll the queue's state. This means that every reference to either index results in a load from SRAM. A quick attempt at writing a routine to read a byte might look like the following code.
/**
* Reads the next byte from the queue into the specified variable.
* Caller is responsible for making sure the queue is not empty
* before the call.
*/
#define CQ_READ(q, d) \
{ \
(d) = (q).buf[(q).rIdx]; \
(q).rIdx = (++(q).rIdx) % sizeof((q).buf); \
}
This gets the job done, however, it results in three unnecessary accesses to the read index. The accesses come from the increment operation ++(q).rIdx. Since the queue is declared volatile, the compiler must reload the read index instead of reusing it from the previous line. It will increment the value, store it to SRAM and then reload it again for the modulus operation. Three accesses that could be avoided if the index used for the array lookup were only saved in a register for use later.
Declaring the queue as volatile is necessary for any polling operations to work properly. So, we must introduce another variable that's not so strict with its load and store requirements. Take a look at the code below.
/**
* Reads the next byte from the queue into the specified variable.
* Caller is responsible for making sure the queue is not empty
* before the call.
*/
#define CQ_READ(q, d) \
{ \
uint8_t t = (q).rIdx; \
(d) = (q).buf[(t)]; \
(q).rIdx = (++t) % sizeof((q).buf); \
}
It uses a temporary variable that the compiler can optimize into a single register. For a microcontroller, reading from and writing to SRAM is an expensive operation. The new version saves a few bytes of program memory and a few clock cycles as well. I wouldn't have caught this problem had I not looked at the compiler's assembly output. Lesson learned: when you're writing performance critical code, verify that the CPU is doing what you want for each and every tick of the clock.
My favourite part about embedded development is that at the end of the day, there's always something I can point to and say "I made this." Having something real to hold is pretty satisfying. An added bonus is that you automatically get to pass the girlfriend/grandmother test. Even if they don't know too much about it, the important woman in your life can always point to something tangible when they're asked what you do. Unfortunately, making tiny cool things can get expensive.
Antoine de Saint-Exupery is credited for the phrase...
Deleting code. Making things simple. It's one of the most refreshing aspects of software development. The great news is, you get to do a lot of it when you make cool tiny things. Sometimes it's because there isn't any space for complicated designs. But it's most fun when you realize that you didn't need something so intricate in the first place.
Slowly but surely, psychologists and parents are realizing that the most successful people in life got there because
For those of you who haven't heard of it,