![]() |
|
ADSP 21xx
Have you found this site useful? Did we save you time? Did we cure your head-ache? Is your hair growing back now? Please make a donation to help with maintenance. |
Objective Real-Time Software on the ADSP21XXReal-Time Error HandlingGeneralReal-Time Systems generally need to be utterly reliable. The problem is that you never know what the big, bad world is going to throw at your system. While the system may work perfectly in the laboratory and even at selected field trial sites, once the sytems are running at hundreds of sites, you may start to get some bad problem reports and be faced with a crowd of very irate customers. Some of these problem events can be trapped by special hardware flags, while others require a little software help, but the important thing is to reboot the processor when it cannot handle events anymore. The last thing you want is a system that dies once in a while, for no apparant reason. Of course, our good friend H.Acker firmly believed that his software is totally reliable and bug free. As for the rest of us... Failure MechanismsMost crashes are caused by memory accesses going out of bounds. If the count stack would overflow, then at some point a DO loop will pop the value of 3FFFH from the stack into the CNTR register. If that loop happens to be writing to memory, the processor would shoot itself in the foot. It would overwrite all of Data or Program Memory (DM or PM), after which things are bound to behave rather funny and the device may appear to be dead. If one would exceed the depth of the Program Counter stack with nested calls and interrutps, then at some RET/IRET, the processor will pop the value 3FFFH into the program counter, causing the processor to whizz off to the very top of PM. If that instruction happens to be an IDLE, then nothing much is going to happen anymore and your system may appear to be dead. If you are lucky, it will bungee jump down to address 0000H and restart the program. It will then keep running for a little while more - but usually not for long! If, every once in a while, your system restarts all by itself, and then eventually crashes completely, this is usually the reason why. It is also possible for the hardware to lock up, causing a permanent interrupt, which can cause the processor to get stuck inside an ISR. This happens especially when the interrupt is level triggered, but an edge triggered interrupt can also develop problems if the hardware would oscillate. This will also cause the system to appear to be dead. How can one prevent these problems? You can't always prevent them! At some point, something is going to get you from behind, so the best defence is to plan for it and put some error handling in the code. Program Counter Stack OverflowIf the program counter (PC) would wrap around as explained above, the processor would re-enter the code at the reset vector, but to get things straightened out properly, requires some defensive coding. First of all, you need to ensure that interrupts are disabled and that the primary register set is selected. Then you have to ensure that all hardware stacks are empty, then zero the Data Memory (DM) and continue startup. If you do all that, then the processor would probably run OK again. You need to put the following code at the start of your program, to ensure a proper warm boot:
Stack Overflow TrapsThe processor provides a single register with some hardware flags, the SSTAT register. This register contains flags that monitor the activity of the hardware stacks. The empty flags are not very useful, but the overflow bits are very important. They are so important, that ADI made them sticky - once set, they cannot be cleared, except by a reset. Now how the heck does one get the processor to reset itself? Well, the BDMA is one way. I suggest that you put the following stack overflow trap in the main execution loop (and possibly also at the start of every ISR):
Also, to ensure that the system will wrap to address 0000H when a PC stack overflow occurs, it would be prudent to explicitly initialize Program Memory (PM) address 3FFFH to a DIS INTS during the startup initialization of your system, before interrupts are enabled:
Interrupt TrapsTo trap a stuck interrupt requires a two pronged approach. One way is to initialize a counter to a large value in the main scheduling loop of your program and to decrement the counter inside the Interrupt Service Routine (ISR). If the counter would reach 0, jump from the ISR to an error handler and reboot the processor. I always use a separate counter for every active interrupt (nice for debugging), but you can use a single counter and save some code space. It would also be a good idea to vector every unused interrupt to the reset handler as well. If they would become enabled somehow, this would trap them. You can put the following code in the main scheduling loop of your program:
and put this code at the start of every ISR:
Software StacksIf you followed the recommendations in this book, then your system will probably have a software stack, used to save the processor context during procedure calls. You can perform bounds checking on this stack if you would declare some empty space around it, which should always remain zero. You can then check these areas in the main scheduling loop and if not zero, reboot the processor. This is not utterly reliable, but it will trap most over/underflow cases. Declare a stack with a protection area as follows:
If all of DM RAM is zeroed during the startup initialization, then the protection areas should remain zero if all is well. You can check these areas as follows:
Well, that should take care of most bad events coming your way! Have reliable fun! Herman |
|
Copyright © 1996-2008, Aerospace Software Ltd., GPL. |