LINK_IsHalted() returns ERROR

This site uses cookies. By continuing to browse this site, you are agreeing to our Cookie Policy.

  • LINK_IsHalted() returns ERROR

    Setup:
    IDE = IAR 6.30.6.3387
    JLink = JLink ARM v8 (SWD with no SWO)
    Device = STM32F051x8
    JLink DLL Version = 44201
    According to my project settings the SWO Clock is set to 2000kHz, however in the cspy log I see the following:
    T143C 004:705 JLINK_GetSpeed() returns 0x20 (0000ms, 4656ms total)
    T143C 004:705 JLINK_SetMaxSpeed() (0001ms, 4656ms total)

    Problem:
    I have two development boards (Rev1 and Rev2).
    Rev 2 has some layout changes but no functional or part value changes (especially in the SWD area)
    I am using the exact same code/project on both boards.
    On Rev 1, I have no problems what so ever with debugging.
    On Rev 2, I get 'Failed to get CPU status after 4 retries' very often when debugging. Not every time, but at least 7 of 10 times once I hit the download and debug button.


    Investigation:
    When I look at the cspy outputs I notice that the return code to JLINK_IsHalted Command is ERROR on the Rev 2 board when I get the error dialog from IAR. On the Rev 1 board I only ever get FALSE/TRUE return codes.


    T1C68 023:679 JLINK_WriteMem(0xE000EDFC, 0x0004 Bytes, ...) -- Data: 00 00 00 01 -- CPU_WriteMem(4 bytes @ 0xE000EDFC ) returns 0x04 (0001ms, 6171ms total)
    T1C68 023:680 JLINK_WriteMem(0xE0001028, 0x0004 Bytes, ...) -- Data: 00 00 00 00 -- CPU_WriteMem(4 bytes @ 0xE0001028 ) returns 0x04 (0001ms, 6172ms total)
    T1C68 023:681 JLINK_WriteMem(0xE0001038, 0x0004 Bytes, ...) -- Data: 00 00 00 00 -- CPU_WriteMem(4 bytes @ 0xE0001038 ) returns 0x04 (0001ms, 6173ms total)
    T1C68 023:682 JLINK_Go() -- CPU_WriteMem(4 bytes @ 0xE0002008 ) -- CPU_WriteMem(4 bytes @ 0xE000200C) -- CPU_WriteMem(4 bytes @ 0xE0002010) -- CPU_WriteMem(4 bytes @ 0xE0002014 ) (0006ms, 6174ms total)
    T1C68 023:689 JLINK_IsHalted() returns FALSE (0000ms, 6180ms total)
    T1C68 023:691 JLINK_IsHalted() returns ERROR (0000ms, 6180ms total)
    T1C68 023:691 JLINK_IsHalted() returns ERROR (0001ms, 6180ms total)



    Questions:
    1) What exactly is the JLINK_IsHalted command considering an "ERROR"?
    2) Is there a way to script out the IAR debug sequence exactly so that I can attempt to replicate the failure outside of IAR. That way I have a mechanism to reproduce the problem for the hardware guy. (Also, so I can confirm the failure is fixed when a "fix" is found). I have tried using the JLink Command line tool, but I don't see an explict IsHalted Command. Note: I tried running the 1000 HaltGo test and did not have any issues on both boards.
    3) Any suggestions about signal integrity on the SWD signal lines? Possibly what irregulaties I should be looking for if I probe the lines.
    4) I don't see how the actual software running on the micro can be affecting the debugger, is this a good/accurate assumption?
    5) Any other suggestions on how to more efficetly reproduce the problem and/or suggestions on how to fix the issue.
    Images
    • HardwareDebugIntermittentFailure.png

      12.24 kB, 332×237, viewed 960 times
    Files
    • cspy_bad.txt

      (746.1 kB, downloaded 902 times, last: )
    • cspy_good.txt

      (471.18 kB, downloaded 930 times, last: )
  • Hello,

    1) What exactly is the JLINK_IsHalted command considering an "ERROR"?

    As the description indicates, this is returned if an error occurred. Usually this means that the J-Link <-> Target communication did not work properly and J-Link got a illegal/invalid response from the target.

    4) I don't see how the actual software running on the micro can be affecting the debugger, is this a good/accurate assumption?

    I would not say that. The software can ALWAYS affect the debugging functionality.
    Some examples:
    1. The target application may re-configure the SWD pins as GPIO making it impossible to connect via SWD
    2. The application may enter a low power mode in which also the clocks for the debugging interface get disabled, making a communication impossible
    3. The target application may perform a bad PLL initialization which brings the CPU into a "locked" state where it does not respond properly anymore

    5) Any other suggestions on how to more efficetly reproduce the problem and/or suggestions on how to fix the issue.

    Did you check the SWD signal quality on both boards?
    Maybe the signal quality on Rev 2 is not as good as on Rev 1 or there are some other lines that create spikes on the SWCLK / SWDIO lines which mess up communication.

    Best regards
    Alex
    Please read the forum rules before posting.

    Keep in mind, this is *not* a support forum.
    Our engineers will try to answer your questions between their projects if possible but this can be delayed by longer periods of time.
    Should you be entitled to support you can contact us via our support system: segger.com/ticket/

    Or you can contact us via e-mail.
  • Following up...

    I am working with the original poster to investigate this issue and can fill in some details and try to address the questions Alex asked in his first reply. Input from Alex or others is welcome!



    Per Alex's questions:

    >1. The target application may re-configure the SWD pins as GPIO making it impossible to connect via SWD



    The error message in dialog box that comes up stating that the XPSR can not be read can and often does occur before any of our code runs on the target. It appears just after the code is loaded into flash at the start of a debuugging session. The SWD pins are not re-pruposed at this time, or at any other time for that matter as they are only dedicated to debugging.


    >2. The application may enter a low power mode in which also the clocks for the debugging interface get disabled, making a communication impossible



    We do not enter any low power modes in our code at present


    >3. The target application may perform a bad PLL initialization which brings the CPU into a "locked" state where it does not respond properly anymore



    This would likely cause the error much more often than we see it, would it not? We encounter the error only rarely, and apparently only from the IDE's GUI, as I have tried to reproduce it with the JLink.exe CLI and have little luck.



    The question from Alex on the signal integrity of the SWD lines is a good one. I have checked this and found that the signals are quite clean and meet the setup and hold times for the SWD interface with lots of margin. I have also run the SWD interface at different clock rates, which had no effect on the error we're seeing.



    What I have seen is some evidence that the pod or its software may not be producing clean, consistent reset pulses in all cases.

    Could that cause this kind of error on debug startup?

    If more than one source of reset arrives at startup, can the processor get confused, or does it condition the reset signal internally to prevent this?



    Thanks,



    jtarantino
  • Hi jtarantino,

    The error message in dialog box that comes up stating that the XPSR

    According to the screenshot is it R15 (PC) that could not be read, not the XPSR...

    It appears just after the code is loaded into flash at the start of a debuugging session.

    The logfile the original poster provided says something different...
    It seems "run to main" was activated when this log was created so there can be of course some initialization be done (in the __low_level_init() etc.) before main() is reached.
    Moreover, I noticed, that in both cases, main() seems to be reached and in both cases the user hit "Go" (F5) to fully start the application. In "cspy_bad.txt" it seems something bad happened while the application was running after a short time and the CPU was not accessible anymore (IsHalted() reported errors).
    Could you please try out to modify your project so that it does NOTHING in main and just ends up in a while(1) loop?
    Does the problem still appear?
    If not, please take in the original application part by part to see when the problem comes up again.
    This might help to locate where the problem comes from.


    Best regards
    Alex
    Please read the forum rules before posting.

    Keep in mind, this is *not* a support forum.
    Our engineers will try to answer your questions between their projects if possible but this can be delayed by longer periods of time.
    Should you be entitled to support you can contact us via our support system: segger.com/ticket/

    Or you can contact us via e-mail.
  • Alex,



    Yes, the original post's attached output showed a case where the code had begun executing when the error occurred, but it also occurs before execution has started. Since this was a simpler case, I figured I would investigate from there first. The message I see most often is in the first of the two originally attached images and does occur at debug startup.



    The reason I had concluded that our code was not contributing is that I can:

    - get the error described

    - attach a host side program to the target and interact with it normally via USB

    - say YES to retry the getting of CPU status

    - can see on the 'scope that the there is communication between the debug pod and target over the SWD interface as it re-tries the status



    So, our code is running and the debug interface is comunicating, but the debugger is still confused.



    It would seem that the "disconnect" occurs where the debug interface in the target and the rest of the target CPU interact, and I don't think we can have much control over what hapens there. Is there something in the target whose state can be changed to preent the debugger from interacting with it?



    Any further insight you can provide would be appreciated. In the mean time, we'll look at slicing up the startup code and putting a dummy main() in place.



    -jtarantino