[SOLVED] Locating a memory corruption in my application

This site uses cookies. By continuing to browse this site, you are agreeing to our Cookie Policy.

  • [SOLVED] Locating a memory corruption in my application

    Dear Ozone expert,

    I am debugging an application running on an nRF52840 Nordic chip with FreeRTOS embedded OS. I would like to locate where some memory corruption is occurring.
    I have roughly located conditions happening before and after the corruption itself : so I know that the memory corruption is occurring after some function before_corruption() has been called and has returned, and before I call some SW triggered breakpoint on which the Ozone trace stops.

    So, I have all the instruction trace when the corruption occurs, and I also know at which memory address the corruption is.

    However, I don't know how I can use Ozone to find where that address was accessed causing the corruption.

    Is there some way to run some playback of the instruction trace so as to find where the corruption occurred.

    Also, I am pretty sure that the aggressor is not a FreeRTOS task, so it may be something like an ISR, a DMA write, or Nodic SoftDevice (not meaning that Nordic SoftDevice is bogus, but that my application at some point of time has wrongly configured it.

    Any kind of help is most welcome.

    The post was edited 1 time, last by vincentb1: I am trying to use data sampling, but that is not really easy : I know in advance that the corruption will happen at a couple of possible addresses. However, I don't know in advance which value is going to be overwritten, and on top of that the corruption condition does not last very long, so I need a rather high sampling frequency (I could catch it @10kHz, and the corruption lasted only 5ms, but I made one trial at 500Hz and I did not catch it). If I use a too high data sampling frequency, it seems that the instruction trace is not complete. ().

  • I am trying to use data sampling in order to locate more finely in the instruction trace where the corruption occurs.
    However, that is not easy : I know in advance that the corruption will occur at a couple of possible addresses, but I don't know in advance what value is going to be overwritten, I know it only afterwards by the terminal printf trace.
    Also the corruption condition does not last very long, I could catch it with data sampling frequency of 10kHz, and the condition lasted 5ms. I made a trial at 500Hz, but I did not catch it.
    Maybe there is a better way…
  • Hello,

    Thank you for your inquiry.
    When you reach the fault handler Ozone the application will usually halt and trace data will be displayed in the instruction trace window and timeline window. If I understand you correctly this is also the case for you. Correct?
    You can try to increase the number of displayed data in trace settings by increasing the Maximum instruction count value.
    Depending on how much trace data your application generates the max value will vary. You can try it out with e.g. 100M items.

    Now by clicking into the timeline or instruction trace window you can follow each executed instruction of your application "back in time".
    So that would be one variant of finding the error.


    vincentb1 wrote:

    Also the corruption condition does not last very long, I could catch it with data sampling frequency of 10kHz, and the condition lasted 5ms. I made a trial at 500Hz, but I did not catch it.
    Maybe there is a better way…
    Instead of this we recommend to simply use data breakpoints. There you can specify that the target halts if a certain condition is met at a specific address e.g. break if address 0xabcdef reaches value 1234 or if that address is accessed etc.

    For more information see the Ozone manual.

    Best regards,
    Nino
    Please read the forum rules before posting.

    Keep in mind, this is *not* a support forum.
    Our engineers will try to answer your questions between their projects if possible but this can be delayed by longer periods of time.
    Should you be entitled to support you can contact us via our support system: segger.com/ticket/

    Or you can contact us via e-mail.
  • Dear Nino,

    Thank you for your kind reply.

    I actually stopped using the data sampling when using also instruction trace. I had the impression that, although I had configured the sampling at 500Hz that was taking too much throughput and I was getting incomplete traces. Please note that the nRF52840 has its ETM trace port clocked at 1/4 CPU speed rather then 1/2 as usual.
    I got this impression because going upwards in the instruction trace caused some 8 second leap in the timeline, which seems incredible as I just went one assembly instruction upward. I made a video if you are interested, please make me know I can mail it to you. In the same vein I was also puzzled by the timestamps, instead of being just a floating point number in seconds, it has some « nn: » prefix, where nn is some integer (possibly negative) taking just a few value, and keeping almost always the same value across consecutive instructions. So it looks like as if there were chunks of instruction traces consistently time-stamped and numbered by this prefix, but with holes in-between.

    Maybe this was because I was tracing all the execution, now I am having Trace Start and Stop before and after the problem occurs, so the amount of traces is quite less, and maybe I could re-introduce data sampling.

    Concerning watch points, the nRF52840 has a DWT (Data Watchpoint and Trace unit) with 4 comparators, however I don't know how to use it. My primitive message proves out to be unaccurate, actually the corruption occurs earlier than I initially thought, and instead of being detected in a couple of possible addresses, it is detected inside a circular buffer. So I can know only at runtime where the corruption is going to occur. What would be great is if I could from the embedded SW :
    * control the watched address
    * enable / disable the watchpoint
    * control the watched condition (for instance « break if a value different than 1 is written at this address ») (OK, the watched condition could be a fixed one for my case).
    That would allow to catch it.
    Hopefully the DWT is just spying on the bus, so it will also catch DMA accesses.

    VBR,
    Vincent.
  • Dear Nino,

    After digging in the Arm Cortex M4 documentation, and in the Arm V7M manual I found that the DWT is actually programmable through registers visible in the FW addressing space. So I intend to use that by hacking my FW to dynamically program some watch point.

    I am wondering however whether that may cause any conflict with Ozone / JTrace Pro debugging.

    * If the FW configures some watchpoint dynamically and the watchpoint is hit, will this be visible in Ozone ?
    * if on the other hand I made no watchpoint configuration from the Ozone GUI, is it sufficient for being sure that there aren't going to be any mess in Ozone + JTracePro interfacing with the Core ?
  • Hello,

    vincentb1 wrote:

    * If the FW configures some watchpoint dynamically and the watchpoint is hit, will this be visible in Ozone ?
    How should the debugger know that you changed something behinds its back? This will not work. We do not recommend changing the debug settings of your target device via your target application as this may lead to follow up issues with the J-Link software which we will not provide support for.

    We recommend to us the data breakpoints feature through Ozones GUI.

    Please note that trace start and stop points use the DWT comparators as well, as you only have 4 and they are shared with the data breakpoints as well.

    Best regards,
    Nino
    Please read the forum rules before posting.

    Keep in mind, this is *not* a support forum.
    Our engineers will try to answer your questions between their projects if possible but this can be delayed by longer periods of time.
    Should you be entitled to support you can contact us via our support system: segger.com/ticket/

    Or you can contact us via e-mail.
  • [SOLVED] Locating a memory corruption in my application

    I understood what was happening by setting dynamically a watchpoint from the target app, and finding that it was not hit : this was not a memory corruption but some real time problem (reading some data not yet ready).

    This thread was helpful to me : devzone.nordicsemi.com/f/nordi…-a-watchpoint-dynamically

    I found that my watch point was not hit by polling the match bit in DWT->FUNCTION and logging it over RTT. Good to know that the start/stop trace use the DWT, by chance I hadn't them configured, neither any data breakpoint, nor regular breakpoint, so I could run the debugger until my app made an exception at the point where the not-really-a-corruption is detected.

    Nino : I did not need it, but I think it would have been possible to make tricks by setting the data break point from the GUI and then in the target application changing the comparator value and enabling/disabling the breakpoint, that was my intention to do this when I realized that I had misundertood my bug root cause.

    The post was edited 1 time, last by vincentb1: I had missed Nino's previous answer. ().

  • Hello,

    Great to hear that you are up and running again.
    We will consider this thread as solved now.


    vincentb1 wrote:

    Nino : I did not need it, but I think it would have been possible to make tricks by setting the data break point from the GUI and then in the target application changing the comparator value and enabling/disabling the breakpoint, that was my intention to do this when I realized that I had misundertood my bug root cause.
    Well as said, the debug probe must be aware of all changes happening on the target device in regards to debugging. This is always the case if the IDE/Debugger send the commands to J-Link and J-Link executes them on the target.
    If the target application now changes the debug settings without the debug probe being aware of this all kinds of issues can appear. So even if this theoretically could work we do not recommend nor support it.

    Best regards,
    Nino
    Please read the forum rules before posting.

    Keep in mind, this is *not* a support forum.
    Our engineers will try to answer your questions between their projects if possible but this can be delayed by longer periods of time.
    Should you be entitled to support you can contact us via our support system: segger.com/ticket/

    Or you can contact us via e-mail.