Deleting code. Making things simple. It's one of the most refreshing aspects of software development. The great news is, you get to do a lot of it when you make cool tiny things. Sometimes it's because there isn't any space for complicated designs. But it's most fun when you realize that you didn't need something so intricate in the first place.
Here's a simple way that you can avoid using mutexes in your code. It comes in very handy when doing things in ISR context, since locking a mutex inside an ISR is prohibited in most OSs. Consider the typical fifo. Not a dequeue that uses a linked list, but fixed size ring buffer. Choose your design right and magic happens. We can design a thread-safe data structure without using any OS thread synchronization primitives.
There are three things that need bookkeeping in ring buffers. The read position, the write position and the full/empty flag. Wikipedia has a great discussion of pros and cons of the full/empty buffer debate. In this case we'll choose to take the hit and lose a buffer position to avoid the need for a count. As for the read and write position, they'll be stored separately and explicitly.
The buffer definition looks something like this...
/**
* The number of bytes to store in the ring buffer.
*/
enum { RING_BUFFER_SIZE = 127 };
/**
* A ring buffer that stores bytes.
*/
typedef struct _ring_buffer_t
{
uint8_t rIdx;
uint8_t wIdx;
uint8_t buf[RING_BUFFER_SIZE+1];
} ring_buffer_t;
/**
* Ring buffer in memory. Volatile keyword is a MUST.
*/
static volatile ring_buffer_t rb;
Pay attention to the buffer size. We're using uint8_t's to store the positions, so the buffer can't get larger than 256 bytes. To keep loads and stores atomic, we need to make sure that the positions are represented by types that match the CPU architecture. Use uint8_t for 8-bit microcontrollers and uint32_t for ARM and 32 bit x86 CPUs. Otherwise the CPU will need more than one instruction for the memory access. Interrupts could preempt the operation and retrieve invalid data.
As follows from the design choices above, the write position will always point to an empty space. One byte of the buffer is lost, however it removes the need for a count variable. So the buffer is empty when read and write positions are equal, and full when the read is just ahead of the write. Let's add some test macros...
/**
* The buffer is empty when the read and write indicies are equal.
*/
#define RING_BUFFER_EMPTY(b) ((b).rIdx == (b).wIdx)
/**
* The buffer is full when the read index is one entry ahead of the
* write index.
*/
#define RING_BUFFER_FULL(b) \
((b).rIdx == ((b).wIdx + 1) % sizeof((b).buf))
The behaviour when adding and removing elements is as you'd expect. Whenever something is read, increment the read pointer, whenever something is written, increment the write pointer. The final rules define what happens on full and empty conditions. Trying to write to a full buffer will fail, as will trying to read from an empty buffer. All other operations succeed. The functions might look like this...
/**
* Reads a byte from the ring buffer.
*/
uint8_t ring_buffer_read(uint8_t* pByte)
{
/*---------------------------------------------------------------
Reads on an empty buffer fail.
-------------------------------------------------------------*/
if(RING_BUFFER_EMPTY(rb)) { return 1; }
/*---------------------------------------------------------------
Grab one byte from the buffer and return it.
-------------------------------------------------------------*/
*pByte = rb.buf[rb.rIdx];
rb.rIdx = (rb.rIdx + 1) % sizeof(rb.buf);
return 0;
}
/**
* Writes a byte to the ring buffer.
*/
uint8_t ring_buffer_write(uint8_t b)
{
/*---------------------------------------------------------------
Writes to a full buffer fail.
-------------------------------------------------------------*/
if(RING_BUFFER_FULL(rb)) { return 1; }
/*---------------------------------------------------------------
Room for at least one more byte. Add it.
-------------------------------------------------------------*/
rb.buf[rb.wIdx] = b;
rb.wIdx = (rb.wIdx + 1) % sizeof(rb.buf);
return 0;
}
So now for the magic! Consider the case where we have two threads, a reader and a writer. We already guaranteed that accesses to the read and write positions will be atomic with the architecture rule. Since the reader doesn't touch the write position, and the writer doesn't touch the read position, there is no mutex required. All buffer accesses are free to happen as fast as the CPU can issue them.
The single reader, single writer case is exactly what happens in embedded world much of the time. For example, your serial port driver will have a transmit and receive buffer. Every write to the transmit buffer and read from the receive buffer will happen from the main loop. The serial port ISR will handle the opposing role for each buffer. Two threads of execution but still obeying the single reader, single writer case.
Even when scaled to larger systems we can still eliminate mutexes from the data structure. We just need to add an atomic increment operation to the requirements. It is slightly more complicated because a = (a + 1) % b translates into several machine instructions. You'll find the primitives required to make this work are often supported in the CPU architecture itself. Once again I defer to Wikipedia's phenomenal reference library. ARM CPUs have LDREX, STREX instructions. x86 CPUs always have to be difficult. Or you can use Windows' InterlockedIncrement() API.
So by making a few simplifying assumptions about functionality, we can guarantee thread safety in the single reader/writer case. Furthermore, by assuming a few more things about the CPU architecture or OS environment, we can guarantee thread safety in the multiple readers/writers case. Everything can be implemented without using mutual exclusion and winning on the performance that it saves.