[SOLVED] Ozone with JTrace shows ETM trace passing "shutdown" function that should never return, yet code flow continues through

zamniah · Mar 13th 2020, 9:00pm

I have a bit of a puzzle. My code has a stack guard that should call "error_shutdown" function if stack is corrupted (-fstack-protector gcc option), but somehow it seems it's called when it should not have been according to ETM trace.

The error_shutdown is declared as no-return function with the "noreturn" attribute (source link if you want to look at full code):

C Source Code

void __attribute__((noreturn))
error_shutdown(const char *line1, const char *line2, const char *line3,
const char *line4) {
display_orientation(0);
// error messages to user here
shutdown(); // memory cleanup
for (;;); // infinite loop
}

Display All

The issue is that that ETM trace shows instructions from the "error_shutdown" are executed, which should make the program halt. See piece of pasted trace below and also the attached screenshot. For some reason "error_shutdown" shows up in the ETM trace a lot.
Even if I put breakpoints at the instructions of "error_shutdown" shown in screenshot, the breakpoint never occurs, but it's still shown in ETM trace.

Part of CSV exported from the ETM trace (read from bottom to up, this is the ordering in the export) - you can see branch to the error_shutdown function:

Source Code

953122,08097200,F7FFFB6A,4,$Thumb,BL,"SDMMC_GetCmdResp2 ; 0x080968D8",,
953121,080971FE,4628,2,$Thumb,MOV,"R0, R5",,
953120,0809835C,10004006,4,$Data,DC32,"0x40061000",,"}"
953119,0806DB52,D102,2,$Thumb,BNE,"0x0806DB5A",,"return DISPLAY_ORIENTATION;"
953118,0806DB4E,681B,2,$Thumb,LDR,"R3, [R3]",,
953117,0806DB4A,9A01,2,$Thumb,LDR,"R2, [SP, #4]",,
953116,0806DB48,6818,2,$Thumb,LDR,"R0, [R3]",,
953115,0806DB46,4B07,2,$Thumb,LDR,"R3, =DISPLAY_ORIENTATION ; [0x0806DB64] =0x20000010",,
953114,0806DB44,FD514B07,2,$Thumb,,"",,"display_set_orientation(degrees);"
953113,0806DB42,F7FEFD51,4,$Thumb,BL,"display_set_orientation ; 0x0806C5E8",,
953112,0806DB40,6018,2,$Thumb,STR,"R0, [R3]",,"DISPLAY_ORIENTATION = degrees;"
953111,0806DB3E,4B09,2,$Thumb,LDR,"R3, =DISPLAY_ORIENTATION ; [0x0806DB64] =0x20000010",,
953110,0806DB24,9301,2,$Thumb,STR,"R3, [SP, #4]",,"int display_orientation(int degrees) {"
953109,0806DB22,681B,2,$Thumb,LDR,"R3, [R3]",,
953108,0806DB20,4B0F,2,$Thumb,LDR,"R3, =__stack_chk_guard ; [0x0806DB60] =0x20002AA4",,
953107,0806DB1E,B083,2,$Thumb,SUB,"SP, SP, #12",,
953106,0806DB1C,B500,2,$Thumb,PUSH,"{LR}",display_orientation,
953105,080981F4,F7D5FC92,4,$Thumb,BL,"display_orientation ; 0x0806DB1C",,"display_orientation(0);"
953104,080981F2,2000,2,$Thumb,MOVS,"R0, #0",,
953103,080981F0,9305,2,$Thumb,STR,"R3, [SP, #20]",,"const char *line4) {"
953102,080981EE,681B,2,$Thumb,LDR,"R3, [R3]",,
953101,080981EC,4B34,2,$Thumb,LDR,"R3, =__stack_chk_guard ; [0x080982C0] =0x20002AA4",,
953100,080981EA,461D,2,$Thumb,MOV,"R5, R3",,
953099,080981E8,4616,2,$Thumb,MOV,"R6, R2",,
953098,080981E6,460F,2,$Thumb,MOV,"R7, R1",,
953097,080981E4,4604,2,$Thumb,MOV,"R4, R0",,
953096,080981E2,B086,2,$Thumb,SUB,"SP, SP, #24",,
953095,080981E0,B580,2,$Thumb,PUSH,"{R7, LR}",error_shutdown,
953094,08098376,F7FFFF33,4,$Thumb,BL,"error_shutdown ; 0x080981E0",,"error_shutdown("Internal error", "(SS)", NULL, NULL);"

Display All

The question is: how is this possible? May it be some optimizations? I use -Og for the debugging which should be suitable for debugging. There is obvious branching to the error_shutdown function, yet no breakpoint catches it and somehow the program recoveres from it. Is there some speculative execution that this could be? I am at loss and I spent days on trying to find out what is going on.

SEGGER - Nino · Mar 19th 2020, 4:28pm

Hello,

Thank you for your inquiry.
In Ozone you seem to get a LTRACE unknown package error.
This error indicates that at least a part of the trace stream is corrupted. Most likely due to an unstable trace setup.
We recommend to get a stable trace setup first as such trace errors can lead to all kinds of follow up errors e.g. phantom execution.
Troubleshooting steps can be found here:

segger.com/products/debug-prob…hnology/setting-up-trace/

Do you still get the same behaviour after you made sure you have a stable trace connection?

Best regards,
Nino

zamniah · Mar 20th 2020, 10:18pm

I get the LTRACE errors even with the reference STM32F407 board that came with JTrace.

The most basic tutorial on the wiki works without LTRACE error - wiki.segger.com/Tracing_on_ST_…x-M_Trace_Reference_Board) . Though even it sometimes gives error like "Invalid trace timestamp detected at cycle 568542086, buffer index 747292".

If I use different binary, just this simple example - github.com/k-code/stm32f4-exam…e/master/Task-6-Interrupt - LTRACE errors appear again. (Though Task 2 or 4 from this repo doesn't throw LTRACE errors. Clocks are the same - 168 MHz).

Tried working with micropython as the firmware for reference STM32F407 board today, even tried changing the WFI instructions to NOP to prevent sleep (which as someone told me WFI breaks ETM trace), and it gets LTRACE errors all the time.

Even when I tried to adjust timings in Trace settings it didn't get much better.

I also watched the trace status in the JTrace's webserver and it seems it makes no difference whether it shows "OK" or "not OK" on getting LTRACE errors.

It seems that stepping makes the clock OK, but hitting run makes the clock desync and show 0 kHz. Even if I step, wait until it turns "OK", then after run it desyncs. Sometimes small portions marked "trace start-trace stop" look OK.

Could this be a bug?

EDIT: just a while I was able to record 1M instructions in ETM trace. I can't anymore right now in the exactly same code, in the same functions. Firmware: J-Trace PRO V2 Cortex-M compiled Jan 7 2020 16:54:03

The half-sync error is in 1 bit always:

Source Code

Trace clock: 126000 kHz
Half-sync detection: 0xFFF7FFF7 (O.K. half-sync pattern detected)
Last incorrect half-sync sampled: 0xFFF7FFFF

If the half-sync is incorrent, it says in the last line 0xFFF7FFFF

EDIT2: the OK/Not OK in JTrace webserver's view seems totally deterministic, status shown after step over to marked line:

C Source Code: mphalport.c

MP_WEAK int mp_hal_stdin_rx_chr(void) {
/** OK **/ if (MP_STATE_PORT(pyb_stdio_uart) != NULL && uart_rx_any(MP_STATE_PORT(pyb_stdio_uart))) {
return uart_rx_char(MP_STATE_PORT(pyb_stdio_uart));
}
/** OK **/ int dupterm_c = mp_uos_dupterm_rx_chr();
/** Not OK **/ if (dupterm_c >= 0) {
return dupterm_c;
}
/** OK* */ MICROPY_EVENT_POLL_HOOK
/** Not OK **/ }
}

Display All

EDIT3: It apparently seems that whether ETM trace is working solely depends on what code is executing. A simple code like blinking LED in for loop, like in the __fatal_error below always has the "half sync OK" and produces millions of recorded instructions (note in the code below that it contains WFI, yet the trace works). Any ETM trace of more complex code simply does not work correctly even on the STM32F407 reference board.

C Source Code: fatal_error LED blink loop

/** This endless cycle always shows up correctly in ETM trace */
for (uint i = 0;;) {
led_toggle(((i++) & 3) + 1);
for (volatile uint delay = 0; delay < 10000000; delay++) {
}
if (i >= 16) {
// to conserve power
__WFI();
}
}

SEGGER - Nino · Mar 25th 2020, 11:01am

Hello,

zamniah wrote:

I get the LTRACE errors even with the reference STM32F407 board that came with JTrace.

Ok that is odd. With what application?

zamniah wrote:

The most basic tutorial on the wiki works without LTRACE error - wiki.segger.com/Tracing_on_ST_…x-M_Trace_Reference_Board) . Though even it sometimes gives error like "Invalid trace timestamp detected at cycle 568542086, buffer index 747292".

The timestamp error can be ignored as it can sometimes happen that the target sends invalid timestamp packages. But this will not impact the trace stream itself in Ozone.

zamniah wrote:

If I use different binary, just this simple example - github.com/k-code/stm32f4-exam…e/master/Task-6-Interrupt - LTRACE errors appear again. (Though Task 2 or 4 from this repo doesn't throw LTRACE errors. Clocks are the same - 168 MHz).

The sample seems to utilize WFI instructions to put the target device in sleep mode. Arm ETM trace was not designed to work with sleep modes so we recommend disabling this.
Also you write that trace does not work if you run some UART application. Could you check what pins the UART uses? Make sure it is not the same as the pins used for tracing. Otherwise it will obviously not work and our software has to reinit the trace pins again on the next go.
Generally make sure that the peripherals that you are using are not interfering with the trace pins (Motors or other HF signals close to the trace lines). For this we recommend measuring with a scope and an active probe.
If you see the trace signals change drastically after stepping over a certain code part. That code part is causing this either by software or hardware.

Best regards,
Nino

zamniah · Mar 25th 2020, 3:15pm

I think I solved it, partially at least for the reference STM32F407 board. (I will have to see later for my target board when it gets proper connector).

Even WFI does not seem matter.

There were 3 programs that caused LTRACE problems I did testing on:

- micropython with WFI
- micropython with all WFI replaced for NOP
- the task 6 above

The solution that works for the reference STM32F407 board: using "power on" in Jlink command so that target board is powered through 5V supply pin. Or have both on power bank battery (with "power off" and thus two USB cables - one for JTrace, one for target board ).

If connected in either of those ways, the LTRACE errors disappeared in all three programs above.

It seems like sort of ground loop, though weirder. Even when the board is powered from power bank and JTrace from computer (which makes galvanical isolation), it still made LTRACE errors.

SEGGER - Nino · Mar 25th 2020, 3:29pm

Hello,

This sounds very odd. But good to hear that you found a solution.

zamniah wrote:

The solution that works for the reference STM32F407 board: using "power on" in Jlink command so that target board is powered through 5V supply pin. Or have both on power bank battery (with "power off" and thus two USB cables - one for JTrace, one for target board ).

I am a bit confused here. How was your setup before that? Did you not use separate USB cables before to power board and trace probe?
Could you provide a photo of your setup before the "power on" part?

If you have a groundloop with such impact in such a simple setup as J-Trace PRO->19 pin cable->Trace Ref Board->Trace Ref Board power then there is something very wrong with the power supplies used here.
Are you using a USB hub? If yes could you try using the USB ports on your host PC for power. Does it work then?

Best regards,
Nino

zamniah · Mar 25th 2020, 4:17pm

SEGGER - Nino wrote:

I am a bit confused here. How was your setup before that? Did you not use separate USB cables before to power board and trace probe?
Could you provide a photo of your setup before the "power on" part?

See attached photo. This is a powered USB3 hub. Yes, two separate USB cables were used (blue JTrace cable that came with it and a the grey one for power of STM32F407 board).

SEGGER - Nino wrote:

Are you using a USB hub? If yes could you try using the USB ports on your host PC for power. Does it work then?

Yes, but I tried it just now without the hub, connecting directly to two USB ports on computer and the issue is the same. I even tried different USB cables. But the same cables work if I connect them to a power bank and connect to JTrace via ethernet.

EDIT: I can even reproduce the same behavior on another computer with different OS (Windows).

zamniah · Mar 26th 2020, 1:09am

I will get USB galvanic isolators tomorrow to do more testing.

For now, here is for reference how JTrace also works if JTrace and reference board are powered via power bank (JTrace connected via ethernet to Ozone).

Attached photo of HW setup, screenshot of Ozone when it gets to hard fault.

A feature request (maybe this should made be separately?) - you could show the stack trace on vector catch like Black Magic Probe in 3rd image (so that you don't need to go over SCB registers and stack manually). The "BLX R5" with the register content shows why the hard fault happened, but Black Magic Probe's way is more straightforward (especially if you don't have ETM trace). Black Magic's Probe vector catch shows where the exception originated, unlike stack trace in Ozone. I know there are things like imprecise bus error, but this case shouldn't be so hard to implement.

EDIT: ad ground loops - I've tried this at 3 different locations, 2 different computers, one USB3 powered hub (with and without it), switched USB cables, always with the same result. Maybe there is something just very slightly off with the JTrace device, as the experiments show the bugs are reproducible deterministically based on code executed.

SEGGER - Nino · Mar 26th 2020, 3:26pm

Hi,

Could you attach an example project that runs out of the box in Ozone, with which the behaviour is reproducible with?

Regarding the hardfault display in Ozone. Could you give the latest Ozone beta a try? Hardfault information display should be improved.
See here for more information:
segger.com/products/developmen…echnology/fault-analysis/
wiki.segger.com/Analyzing_Cortex-M_Faults_with_Ozone

wiki.segger.com/Cortex-M_Fault

Best regards,

Nino

zamniah · Mar 26th 2020, 5:37pm

SEGGER - Nino wrote:

Could you attach an example project that runs out of the box in Ozone, with which the behaviour is reproducible with?

Here is link to the built project with sources: qabs.cz/o/ARM/micropython_wfi-1.12-out.tar.bz2
Inside the micropython_wfi-1.12-out directory, there is micropython-wfi.jdebug

The ETM trace in this works for me without LTRACE errors only in the two cases ("power on" or or both cables on power bank). I also tried the USB galvanic isolators, and those don't help either (LTRACE errors show up). Had one for JTrace and one for the board.

SEGGER - Nino wrote:

Regarding the hardfault display in Ozone. Could you give the latest Ozone beta a try? Hardfault information display should be improved.

Yes, that's much nicer. Though it could decode the stack further one step, since at SP+0x14 is the stored LR register. For example this would help if there was branch to some garbage range, to see where the branch came from.

SEGGER - Nino · May 5th 2020, 3:58pm

Hello,

I revisited your project and we noticed two things.
The first LTRACE error happens in the PLL init section. This can happen if the CPU clock changes because the trace clock is directly dependent on that clock. In most cases the J-Trace PRO will be able to trace over such clock change but in some cases, depending on timing, it can happen that a couple of trace packets are sampled wrong due to the clock change. This seems to be such a setup. In such case we recommend halting the debug session after the CPU clock is set to the wanted value and continue tracing from there so Ozone and J-Trace PRO can sync and provide a clean tracestream afterwards.
For a reference project where such halt is not needed we recommend using our trace tutorial project as reference.

The other thing is that there are numerous places where WFI instructions are called from that application.
Especially MICROPY_EVENT_POLL_HOOK seems to fire very often. As written before tracing WFI is simply not possible as Arm trace was not designed for this.

Please understand that we can't put more time into this investigation as debugging customer projects leaves our regular support scope.

Best regards,
Nino

[SOLVED] Ozone with JTrace shows ETM trace passing "shutdown" function that should never return, yet code flow continues through

[SOLVED] Ozone with JTrace shows ETM trace passing "shutdown" function that should never return, yet code flow continues through

C Source Code

Source Code

Source Code

C Source Code: mphalport.c

C Source Code: fatal_error LED blink loop

zamniah wrote:

zamniah wrote:

zamniah wrote:

zamniah wrote:

SEGGER - Nino wrote:

SEGGER - Nino wrote:

SEGGER - Nino wrote:

SEGGER - Nino wrote:

Share

Tags