SES 3.10e trouble identifying hard fault

Eqqman · Oct 21st 2016, 2:47am

So I am using SES v2.20 on Ubuntu 16.04 and running into some issues with a project for the STM32F429ZI Cortex-M4. I had a working set of files from a project that I copied into a new project so I could make some modifications. After completing the modifications, the project compiles but does not execute correctly. On SES v2.20 the debugger just halts with a message saying something about ending due to a vector catch but provides no specifics. So I tried to upgrade to SES 3.10. This time, instead of the debugger just halting completely, it goes to the hard fault handler.

I've tried commenting out code to see what is causing the error, but this hasn't helped, it only changes where the error occurs. So if I comment out function A, the error happens in function B, and vice-versa. Also, the code executes correctly if I step through it in the disassembly window, but as soon as I stop going line by line and hit `run` I get an error. As an example, with all of my project code in-place, I will get an error on this line:

Source Code

newTaskID = 0x00; // global of type unsigned char

If I put a break point there and try to continue execution with either `run`, `step over`, or `step into`, I get a hard fault, unless I switch to the disassembly window, in which case, it executes fine. If I comment out this line of code, then the error simply happens at the next statement. I am thinking there is something wrong with my project settings and not my code, but I have no idea how to investigate this. The only information I could find about people dealing with the vector catch is that their code was being placed in the wrong section of memory. If this is the case here, I'm not sure how to identify or fix the issue. I've attached source files if anyone cares to try and reproduce the issue. As stated, this code is for the STM32F429ZI Discovery Board with the default project settings.

Kenny · Oct 26th 2016, 8:27pm

Commenting and uncommenting code is likely to confuse the issue when you have something like a hard fault. I'm actually battling one myself right now on an STM32F415. The system control block has some registers that you can inspect that can be helpful. Check your stacks for overflow if you are using an RTOS (the plugins for EMBOS and FreeRTOS will display this information). You can find hard fault handler code out there that will pull useful information off the stack.

The real tool to use for this is trace. Unfortunately the STM32F parts do not have an ETB (on board trace buffer that can be read by the debugger). Instead you need to run the trace lines to your debug connector and purchase something like a J-Trace. I am in the process of ordering one myself for an STM32F7 project.

Good luck.

SEGGER - Johannes · Oct 28th 2016, 11:57am

Hi,

You might want to have a look at our Application Note for more analysis possibilities of Hard Faults: segger.com/downloads/appnotes
There might be different reasons for hard faults, especially in "multi-tasking" applications. The hardfault handler might help you to find it.

Best regards
Johannes

Eqqman · Nov 3rd 2016, 2:33am

I might be closer to an answer, although the exact nature of the problem eludes me. My project has several versions of the code base. In versions n and n-1, I can compile and run the code just fine, and even pause and step into it. However, if I place a break point into the system, I get a hard fault the moment I move past the break point. After some trial and error, I isolated the issue to this line of code:

Source Code

NVIC_EnableIRQ(TIM6_IRQ); // TIM6_IRQ defined as 0x36

This function call is created in one of the default files auto-generated by SES (core_cm4.h). If I comment out this line of code, everything works perfectly fine. If I allow the code to free-run, then everything works fine. However, placing a break point in the system gives a hard fault the second I attempt to advance through the code, provided that the NVIC_Enable function has been called (so I can step through code just fine until I reach this function). This is still a puzzlement to me since in version n-2 of my code base, I can step through that function call just fine in debug mode. And when I was developing version n-1 of my code, I don't recall having any issues with the debugger, so I don't understand what has happened.

Eqqman · Nov 3rd 2016, 9:38pm

Hello-

I have downloaded the recommended document and followed its advice. When I do the simple hard fault handler:

C Source Code

static volatile unsigned int _Continue;
void HardFault_Handler(void) {
_Continue = 0u;
//
// When stuck here, change the variable value to != 0 in order to step out
//
while (_Continue == 0u);
}

the document says "If you step out of the Hard Fault handler, you will reach the first instruction after the instruction which caused the hard fault." However, this is not what happens to me. Instead, execution immediately goes back to the hard fault handler, even when I step through in disassembly mode.

When I add in the more detailed code, these are the non-zero values in the HardFaultRegs register:

Source Code

bfar = 0xe000ed38 // Bus Fault Manage Address Register
ufsr:INVPC = 0x01 // Attempts to do an exception with a bad value in the EXC_RETURN number
hfsr:FORCED = 0x01 // Indicates hard fault is taken because of bus fault/memory management fault/usage fault

Everything else is zero, even `SavedRegs`. I'm not sure how to proceed with this information. As a reminder, the code appears to execute correctly provided I never pause on a break point. This is the section giving the issue:

C Source Code

void
OSp_InitTIM6 (void) {
volatile unsigned int wait = MAX_WAIT;
// Enable the system clock for the TIM6 peripheral
// [1] p.183
RCC->APB1ENR |= RCC_APB1ENR_TIM6EN;
// Wait for the system clock to stabilize
for (wait = 0x00; wait < MAX_WAIT; ) {
++wait;
}
// Enable auto-reload
// [1] p.704
TIM6->CR1 BON(BIT_07);
// Ensure counter is free-running (does not stop counting)
// [1] p.705
TIM6->CR1 BOFF(BIT_03);
// Only over/underflow causes interrupts
// [1] p.705
TIM6->CR1 BON(BIT_02);
// Interrupt on over/underflow enabled
// [1] p.705
TIM6->CR1 BOFF(BIT_01);
// [1] p.706
TIM6->DIER BON(BIT_00);
// No prescaler used
// [1] pp.699, 708
TIM6->PSC = 0x00;
// Value for auto-reload register (sets the timer duration)
// [1] pp.699, 701, 708
TIM6->ARR = ONE_MS;
// Set the timer to the lowest (best) possible priority
// [4] pp.208, 214, core_cmd4.h in (Proj. Dir.)/CMSIS_4/CMSIS/Include
NVIC_SetPriority(TIM6_IRQ, 0x00);
// Timer is ON
// [1] p.705
TIM6->CR1 BON(BIT_00);
// Enable IRQ for TIM6 in the NVIC
// [4] p.208, 214, core_cmd4.h in (Proj. Dir.)/CMSIS_4/CMSIS/Include
NVIC_EnableIRQ(TIM6_IRQ);
} // end OSp_InitTIM6

Display All

If I comment out the line to enable the IRQ, everything functions normally with the debugger. I've also had no problems with this code as-is in other projects.

SEGGER - Johannes · Nov 4th 2016, 1:39pm

Hi,

NVIC_EnableIRQ(TIM6_IRQ); enables the TIM6 interrupt, which seems to be your system/kernel timer.
If you do not call this function your OS probably won't run with multiple tasks.

When you step (on source or instruction level) interrupts are usually disabled.
When you let your application run interrupts are enabled.

So it might be possible that there is a problem in your TIM6 ISR.
Set a breakpoint in the ISR and check if the hard fault happens there.

Best regards
Johannes

Eqqman · Nov 5th 2016, 6:37am

Thanks to everyone that helped.

As guessed by Johannes, the problem is Timer6. This particular timer on the STM32F429 board always runs once enabled, even when the processor is halted in the debugger. So, if a break point occurs too early in the code, the Timer6 ISR immediately launches afterwards before key system variables have been properly set up. When I used this same set of code on the TI Launchpad with the TM4C123G chip, the timer I was using would halt when the debugger halted, so these problems went undetected for 2 years.

Source Code

Source Code

C Source Code

Source Code

C Source Code

Share