Friday, 14 December 2012

Nasty nasty ATOMIC

Ooo what an insidious bug, let me set the scene.

I have a function that wants to transmit data on serial line, it's actually the start of a simple packetising feature I'm writing. So far there are two functions used as follows.


The names are fairly self-explanatory I think except that packetSend() wraps the string in "<>" characters, so the end result is this


being transmitted.

It worked just fine straight out of the box and lower bit rates, but then I upped the bit rate to 500kbps and it failed miserably, just transmitting "<A" or "<AB" or maybe  "<ABC" depending on where I placed debugging code.

The LARD serial code works like most others I guess, some code puts bytes into a buffer and some other code takes them out and puts them into the UART with the "taking out" part being interrupt driven. Both these processes involve updating a counter to indicate the number of bytes in the buffer, very important information that tells both sides how many bytes there are and where they should go.

For this reason any accesses to the buffer should be "atomic", meaning that when process A is accessing the buffer process B (or anybody else for that matter) must not also access the buffer. An atomic operation is indivisible, there must be no interrupts during the course of that operation.

In an interrupt-driven system this can be done by using semaphores, mutexes etc but in a simple system like this the easiest way is to just disable the interrupts when putting bytes into the buffer so the taking out code (which being interrupt-driven can occur at any time, even half-way through updating a counter) cannot try to take a byte out while the putting in code is putting one in. Got it?

So to facilitate this I have two nice macros called ATOMIC_START and ATOMIC_END, and to allow them to be nested they maintain their own counter and ATOMIC_END will only disable interrupts if that counter is 0.

So far so good, but the symptoms I mentioned above really smacked of a race condition on the buffer's nItems counter, the transmit ISR was reading 0 items when in fact there was 1. It would therefore take no further action and even though the buffer continued to fill no more bytes would be transmitted.

Time for the logic analyser and some deep thought. It's very hard to fix a problem if you can't see it and that's where a logic analyser comes into its own because this sort of problem cannot be seen with a debugger or "printf" debugging. I find the best way is to use 2-3 spare IO pins and toggle them at critical parts of the code, this has almost no affect on the real-time nature of the program and with the pulses properly placed they can tell you a lot.

So here's the obligatory logic analyser trace pic.

The packetSend()function has 28 bytes to send so it starts the ball rolling by writing the first of them directly into the UART ('A'), after that it writes the bytes into the buffer ('B').

The first write then causes a byte to be transmitted and when that's complete an interrupt ('C') is triggered. This checks to see if there are any bytes in the buffer and if so reads one and writes it to the UART ('D'). This process continues until there are no bytes left in the buffer and as you can see the As, Bs, Cs and Ds are nicely interspersed and everything works well for 4 bytes.

Now look at what happens at around the 200uS mark, the B that has been taking about 12uS blows out to nearly 38uS and smack in the middle of it we see a ISR call (with a negative pulse I used to see which path the code took).

This is the crux of the bug. If ATOMIC_START worked properly it should not be possible to service an interrupt in the middle of packetSend(). This means that potentially both functions are trying to access the buffer's byte counter at the same time and the results are indeterminate.

In this case the ISR obviously reads the counter just before it was incremented from 0 to 1, it therefore got a value of 0 and that folks was the end of any transmission, despite the fact that packetSend() continued to write bytes into the buffer.

I replace ATOMIC_START with the standard _disable_irq() and the whole shebang bursts into life with all 28 bytes being transmitted correctly.

So, another bug squashed, tomorrow I'll be having a long hard look at ATOMIC_START but for now it's 3AM so I'm off to bed.

PCBs arrive

The ArdweeNODE PCBs arrived in the US today, from what I can see in the photos they look pretty good.

From here my US mate will ship half of them and the parts to build 2 to me. Then we'll both start loading components and debugging.

I actually hate this part but it has to be done.

I just hope there aren't any major stuff ups, there's bound to be a track or two wrong or we may decide on a change, but a bad error can be a show stopper,

Wednesday, 12 December 2012

Beware cut & paste

I've just spend nearly half a day tracking down a hard fault error on my board, and all because of cut and paste and bad programming practice.

I have four interrupt handlers, one for each of the timers on the LPC1227, these ISRs have to gain access to appropriate structures in memory that are dynamically allocated and so can't be hard coded. To deal with this I have a static array of pointers to the timer structures that is filled in when the user calls the hwimerCreate() function.

In my LARD framework the timers are known as timers 0-3 and enumerated as such.

typedef enum {
} hwTimerTypes;

And there's an array of pointers to timer structures, one for each hardware timer.

hwTimer * hwTimers[N_HWTIMERS] = {0};

Now an ISR knows of course what hardware timer it was invoked by, but it needs to find the software structure that holds other information, such as a pointer to a user-supplied function to call. So it indexes into the hwTimers array with hwTimers[i]where i is the timer's logical number.

The old ISRs looked like this (much code removed for clarity)

void TIMER16_0_IRQHandler(void) {     // 16-bit Timer0
    hwTimer * t = hwTimers[HWTIMER_0];

void TIMER16_1_IRQHandler(void) {     // 16-bit Timer1
    hwTimer * t = hwTimers[HWTIMER_1];

void TIMER32_1_IRQHandler (void) {    // 32-bit Timer0
    hwTimer * t = hwTimers[HWTIMER_3];

void TIMER32_0_IRQHandler(void) {     // 32-bit Timer1
    hwTimer * t = hwTimers[HWTIMER_2];

Note that the two 16-bit timers are first and the 32-bitters follow, and that we have the order 16_0, 16_1, 32_1, 32_0 and index into the array of pointers using HWTIMER_0, 1, 3, 2 in that order. A little odd but it works and I never rearranged the code to be more logical. Note also that the comments are wrong and designed to confuse any future programmer.

Apart from the comments so far so good, but I changed the code in each ISR, and to save retyping I got one working then cut the text and pasted into the body of other three which meant they all used HWTIMER_0 as their index and // 16-bit Timer0 as the comment. So I then went down the page changing 0, 0, 0, 0, to the logical order of 0, 1, 2, 3 for the HWTIMER_x index and fixed the comments.

void TIMER16_0_IRQHandler(void) {      // 16-bit Timer0
    hwTimer * t = hwTimers[HWTIMER_0];

void TIMER16_1_IRQHandler(void) {      // 16-bit Timer1
    hwTimer * t = hwTimers[HWTIMER_1];

void TIMER32_1_IRQHandler (void) {     // 32-bit Timer0
    hwTimer * t = hwTimers[HWTIMER_2];

void TIMER32_0_IRQHandler(void) {      // 32-bit Timer1
    hwTimer * t = hwTimers[HWTIMER_3];

Then I guess I had dinner, watched some TV, whatever and got back to my programming to find that the two 16-bit timers work just fine but the 32-bit timers cause a hard fault.


Hours later, after looking at the index values and the comments a 1000 times and telling myself that they are in the logical order I finally look at the function names.

TIMER32_1_IRQHandler and TIMER32_0_IRQHandler are swapped, they were before as well but in that case so where the HWTIMER_x indexes so although it wasn't best practice because they were out of order it did work, this time I've been nice and logical in editing the indexes and comments to be sequential and forgotten that the functions are not sequential.

A quick swap of TIMER32_1_IRQHandler and TIMER32_0_IRQHandler and all things work.

So the moral of the story, be very careful with duplicating code with cut & paste, and organize like function that differ only in a number in a logical and numerical order.

Now I've forgotten what I was working on...that's right, I was trying to generate a 100uS break condition on the serial line.

Tuesday, 11 December 2012


Well in another WTF moment I've been tackling the problem of detecting when a UART has transmitted ALL of the bytes you sent.

Trivial right? Well not as trivial as you may think.

Why do you care? Well maybe you have to turn around an RS-485 transceiver and you do that after the last byte, not the second-last. Or maybe you are sending a command to another processor and timing the response, it's a bit unfair to start timing before the last byte has gone.

The LPC UART (or at least the one on the 1227) has no explicit flag to read to tell you that the last byte has left the TSR (Transmit Shift Register). Actually that's not true, there is the TEMT flag.
Transmitter Empty. TEMT is set when both THR and TSR are empty;
Yep, that's clear enough. But there are two issues here, one is that you don't get an interrupt so you have to poll the TEMT flag. Usable but not good. The second issue is worse though, IT DOESN'T WORK.

You can poll the TEMT bit until the cows come home but it gets set when the FIFO is empty, not when the TSR is. (YMMV but that's what I'm seeing)

So just use a timer. Well that was the non-answer provided by an NXP support person on the forum. Use an entire hardware timer for this simple function? I think not, heck you only have 4 and he wants me to tie up two of them to fix their crap design.

Back to square one. So what do you get.

You get a THRE flag and interrupt, but this only tells you that the FIFO is empty, at this point however there is still a single byte in the TSR and that may not be gone for quite some time as is shown in this trace

Here we see two bytes being sent from the UART, 'A' and 'B'. The small pulse is the time at which the THRE interrupt is fired. Note that at this point 'B' has still not been transmitted.

Fortunately there is a mechanism that is clearly and succinctly described in the data sheet.
The UARTn THRE interrupt (UnIIR[3:1] = 001) is a third level interrupt and is activated when the UARTn THR FIFO is empty provided certain initialization conditions have been met. These initialization conditions are intended to give the UARTn THR FIFO a chance to fill up with data to eliminate many THRE interrupts from occurring at system start-up. The initialization conditions implement a one character delay minus the stop bit whenever THRE = 1 and there have not been at least two characters in the UnTHR at one time since the last THRE = 1 event. This delay is provided to give the CPU time to write data to UnTHR without a THRE interrupt to decode and service. A THRE interrupt is set immediately if the UARTn THR FIFO has held two or more characters at one time and currently, the UnTHR is empty. The THRE interrupt is reset when a UnTHR write occurs or a read of the UnIIR occurs and the THRE is the highest interrupt (UnIIR[3:1] = 001). 

Got that? No, I didn't either despite reading it maybe 10 times.

Luckily one of the guys on the LPC forum is smarter than me and he explained it,
So you have only write one byte in the fifo and the THRE interrupt will occur after this byte was sent.  
Still a bit unclear so let me slightly reword it.
If you only place a single byte in the FIFO the THRE interrupt will occur after this byte was sent. 
That's right and worth repeating and rephrasing again in the hope that one of the explainations will make sense, if you only place one byte in the FIFO you get the THRE interrupt after that byte has gone. Yay, that's exactly what we need, and here it is in action

Note that I have only sent a single byte and that the interrupt pulse now occurs after that byte has completely left the TSR (not counting the stop bit).

We're getting there, trouble is you normally send more than one byte. What happens if we send 10? Well in that case unless you take extra steps you are back where you started. If you just blat 10 bytes into the FIFO the interrupt will fire after the 9th byte has been transmitted, not after the 10th.

You have to get clever and hold off with the last byte. You send 9 bytes straight away and when the last of those is in the TSR you write the 10th byte into the FIFO.

Here we have written 9 bytes ('A' thru 'I') into the FIFO, when 'I' goes into the TSR the interrupt is fired. At this point the FIFO is empty and we write the 10th byte ('J') into it, thus satisfying the "only one byte in the FIFO" criteria.

The next time we see the THRE interrupt is after the 'J' has gone. We can now set a global flag somewhere to tell the rest of the program that the data has been completely transmitted.

And here is the pseudo code for the interrupt function (actually this is my real code with a lot of unrelated stuff deleted for clarity)

Note the hwFifoCount variable, this is my workaround for the FIFOLVL bug in the hardware as documented over the last couple of days, it keeps track of the number of bytes in the FIFO.

Phew, what a marathon, it probably took longer to document than to do :)

Monday, 10 December 2012

Well waddaya know?

You know that problem I had yesterday with the FIFOLVL register returning 0 no matter how many bytes there are in the FIFO?

Well it seems it's actually a bug in the chip. And there I was starting to doubt my brilliance, it shook my confidence to the core I don't mind telling you.

So now all is right with the world, NXP stuffed up not me, and my workaround will stand as the way to do this until further notice.

Sunday, 9 December 2012

What's with the FIFO counter?

What a drama.

On the LPC's UART you are supposed to be able to read a register (FIFOLVL) with a counter that tells you how many bytes are in the Tx FIFO, this is very important because you can use this value to have your code decide not to try and put any more bytes in the FIFO. At this point you may decide to either block until there is room for the next byte or write it to a software buffer.

So far so good. Trouble is the bloody thing doesn't work. Or at least nothing I try can coerce it to reveal the number of bytes in the FIFO.

What to do?

Well the simple way is to always write into the buffer and let the "Tx FIFO empty" interrupt read data from that buffer. But that does introduce some small inefficiencies. Meaning that if you want to quickly blat < 17 bytes out the serial port you can do so without all the enqueueing and dequeueing for those bytes, you can just write them directly into the hardware FIFO. But you need to know that there's room in the FIFO.

Hence my conundrum.

So I've decided to implement my own counter in software, and it looks like it's working well.

Here's a trace of the results of sending 20 bytes to the serialWrite() function.

The centre trace is the output from the UART. The pulses on the top trace are produced for each byte written into the software buffer, and those on the bottom trace are for each byte written into the UART's FIFO.

Because 20 is > 16 four of the bytes are enqueued into the buffer during the serialWrite() function, that's the four pulses on the top trace.

Some time later we get a "FIFO empty" interrupt and the ISR fetches the last four bytes from the buffer and writes them to the FIFO.

Job done.

So a day later and a little wiser am I. I'd still rather get the reading of the hardware counter to work but meanwhile this seems to do the job.

Saturday, 8 December 2012

That's odd?

So I've got the UART sending up to 16 bytes (the depth of the FIFO) but about half of the bytes have framing errors. Here's a trace showing the values 0 through 9 being transmitted.

Note the errors, values 1, 2, 4, 7 and 8 have framing errors, meaning that the stop bit was low when of course it should be high.

So here's a pop quiz, what do these values have in common.........don't feel bad if you haven't got it yet, it took me a while. They all have an odd number of bits. The bytes that don't have errors have an even number of bits.

Now if I didn't know better I'd say I have odd parity set on the UART. Let's look at the code.

Now you don't have to know anything about the parameters to the serialCreate() function to spot the reason for the framing errors here eh?

Problem solved.

UART works

Yay, I've got one of the UARTs transmitting, just an incrementing number right now but that's the embedded UART equivalent of "Hello world".

Here's the code

Note the "Arduino" style. It will get more Arduino-like soon, for example delay() instead of delayMilliseconds(), serialPrint() in stead of serialFifoWrite() but at present I'm working with the native LARD functions.

That said I will normally work with the LARD native API because I think it has a lot more functionality, the Arduino functions are mostly wrappers for LARD code anyway, the idea being that Arduino programmers can feel at home using LARD on an LPC straight away.

Of course all the hard work is behind the scenes in the ISR and various other pieces of code, the above is what a user would normally write.

Now I have to clean all the code up and work on receiving.

Friday, 7 December 2012

Hardware timers

I got the timers working on the LPC1227 yesterday, with a reasonable API that almost totally abstracts the underlying hardware.

Here's a screen shot of an output pin waveform generated by a timer callback function that modifies the timer value to create a ramping signal. 

And here we have the results of two callback functions, one that sets the timer to 50mS and the other that sets it to 500mS. Each function then changes the callback to the other one.

Here's the guts of the code

The first time the timer times out it calls hwTimerTest1(), that toggles a LED, changes the time value to 50 then sets the timer’s callback to hwTimerTest2(). 50mS later hwTimerTest2() gets called, it toggles the LED, changes the time value to 500 then sets the timer’s callback back to hwTimerTest1() and the cycle repeats resulting in this waveform on the LED.

Technically this is not part of the ArdweeNET code, it's part of my LARD framework (Lpc for ARDuino) , you can read more about that here


Thursday, 6 December 2012

Logo & website

ArdweeNET is gaining some momentum, there's PCBs being made, code being written, and a few people actively interested so that's a start.

So I thought I get a blog up and running so I can post the happenings.


There's a logo for ArdweeNET

So why a gorilla? Because the chip that does all the heavy lifting for the network is called the Application and Protocol Engine, get it? APE.

Yeah I know, but a bloke's gotta have some fun.


There's a web site as well

Which like this blob doesn't have a lot of content yet but it's a start.