[SOLVED] J-Link freeze in multiply parallel environment

This site uses cookies. By continuing to browse this site, you are agreeing to our Cookie Policy.

  • [SOLVED] J-Link freeze in multiply parallel environment

    Hi there. We are using JLink [V6.84a] to download code to our test boards and then do RTT logging in a multiply parallel environment, i.e. we have many boards (Nordic NRF52/53) connected to the [Windows 10] PC in question. All operation is controlled through Python [3.8.5] scripts. Each board serial number is specifically addressed with the -USB parameter in the call to J-Link. We have a lock in place so that JLink is never called to download to two boards at the same time.

    Our problem is that about 1 in 10 times J-Link freezes: does nothing whatsoever, no prints, nothing, just sits there. This occurs when a board has been running J-Link for RTT logging and that J-Link executable is terminated to do the next thing, which is a J-Link download of new SW to the same board: on launching J-Link to do that download it freezes. It doesn't happen all the time but it does happen frequently enough to flash red lights on our CI display much of the time, so we need to fix it, but we're all out of ideas.

    Every J-Link session is a Python subprocess. The RTT logging session is terminated using subprocess.terminate(). We retry after 60 seconds of "frozen" but that doesn't help. We've tried resetting the USB port in question, at Windows device level, after a failure but that doesn't help. The only thing that fixes it is physically powering down the board and powering it back on again.

    Has anyone any suggestion for any workarounds we might apply?

    The debugger FW versions on the boards in question are:
    • J-Link OB-SAM3U128-V2-NordicSemi compiled Mar 17 2020 14:43:00 V1.0
    • SEGGER J-Link ARM V10.1
    • J-Link OB-K22-NordicSemi compil V1.00

    The post was edited 4 times, last by RobMeades ().

  • Hi,
    Thank you for your inquiry.

    I am not entirely sure what you do exactly to provoke this issue.
    Could you provide us with a step-by-step explanation of how to reproduce it?
    E.g:
    1. Start J-Link Commander (JLink.exe) with the following command line: xxx via subprocess.call()
    ...

    Which tool do you use for RTT?

    If you are using a J-Link Command Script, could you send it to us?
    Could you please send us a J-Link log file? How to enable:
    wiki.segger.com/J-Link_DLL#Enable_J-Link_Log_File

    Best regards,
    Fabian
    Please read the forum rules before posting.

    Keep in mind, this is *not* a support forum.
    Our engineers will try to answer your questions between their projects if possible but this can be delayed by longer periods of time.
    Should you be entitled to support you can contact us via our support system: segger.com/ticket/

    Or you can contact us via e-mail.
  • Thanks for your remarkably swift attention. Just getting logs for you now. On the other stuff:

    We use JLink for two things: RTT logging and downloading to a board. To do the RTT logging part we open a Python sub-process as follows:

    Source Code

    1. process = subprocess.Popen(["jlink.exe", "-Device", "NRF52840_XXAA", "-If", "SWD",
    2. "-Speed", "4000", "-Autoconnect", "1", "-ExitOnError", "1",
    3. "-RTTTelnetPort", "19021", "-USB", "683253856"],
    4. stdout=subprocess.PIPE, stderr=subprocess.STDOUT,
    5. shell=False)
    We connect to the Telnet port and let it run, monitoring the trace output. When the trace output tells us that the target has finished doing what it should, we terminate J-Link as follows:

    Source Code

    1. return_value = process.poll()
    2. if not return_value:
    3. process.terminate()
    4. while process.poll() is None:
    5. pass
    After this has completed we wait 5 seconds, just for good luck, and then load up the next thing onto the board by calling:

    Source Code

    1. process = subprocess.Popen(["jlink", "-nogui", "1", "-commandfile", "jlink.txt",
    2. "-USB", "683253856"],
    3. stdout=subprocess.PIPE, stderr=subprocess.STDOUT,
    4. shell=False)
    ...where jlink.txt contains:

    Source Code

    1. si swd
    2. speed 4000
    3. device NRF52840_XXAA
    4. connect
    5. r
    6. h
    7. erase
    8. loadfile nrf52840_xxaa.hex
    9. r
    10. exit
    It is when doing this latter download of the next board firmware that the hang often occurs.
  • I can see that in both fail cases the log says:

    T25F8 000:009.788 JLINK_OpenEx(...)
    T25F8 016:870.921 Out of sync , resynchronizing...

    ...where it should be connecting to the probe. Why did the previous J-Link session leave the probe in a state that causes this failure? What can I do about it?
  • Hi,
    Thank you for the detailed report.

    We will look into this as soon as possible.

    Best regards,
    Fabian
    Please read the forum rules before posting.

    Keep in mind, this is *not* a support forum.
    Our engineers will try to answer your questions between their projects if possible but this can be delayed by longer periods of time.
    Should you be entitled to support you can contact us via our support system: segger.com/ticket/

    Or you can contact us via e-mail.
  • Hi,
    Sorry for the delay in response.

    It seems like one of the processes is not terminating correctly.
    This might leave the probe and the software in an undefined state.
    It is user responsibility that a running session is closed properly.
    Such undefined states are not supported by us.

    We recommend to use the J-Link SDK for this instead, as it comes with:
    - A thorough documentation of the J-Link API
    - The source code of the J-Link Commander
    - A python package (python SDK) with a couple of samples (also one for RTT), you could easily extend and adjust to fit your needs.
    - Support via our support ticket system (less wait time, higher priority,...).

    With the SDK you could easily manage everything from one python script without the need of calling any programs besides python.

    For further information, please refer to:
    segger.com/products/debug-prob…nk/technology/j-link-sdk/

    Best regards,
    Fabian
    Please read the forum rules before posting.

    Keep in mind, this is *not* a support forum.
    Our engineers will try to answer your questions between their projects if possible but this can be delayed by longer periods of time.
    Should you be entitled to support you can contact us via our support system: segger.com/ticket/

    Or you can contact us via e-mail.
  • For instance, could you explain what "Out of sync, resynchronizing..." means, aside from "it's broken"? What is out of sync with what: is it the target debug chip, a DLL, something else, and what is it that Jlink is doing to regain sync? That would help me try to work out what is going wrong here.
  • The out of sync message is related to USB. The USB protocol is out of sync.
    Meaning, the PC may have sent a half command to J-Link, so it is still waiting for the rest of the command etc.
    Or the PC did not get the whole answer to a command out of the host buffer so the new session gets it and does not know what to do with it.

    USB is a commectionless protocol, meaning that J-Link has no chance of detecting that a USB handle on the host may have been closed due to process dying or abnormal termination.

    The new session tries to get things into sync again but there are cases where it is just too messed up...

    Reg. your statement:
    What you need to understand is that forcibly terminating a process / thread is NEVER a good idea. It causes exactly this kind of issue.
    What you are doing is asking for trouble.
    It‘s a bit like pulling the plug of your PC while it is working hard on the disk. This also increases the risk of data loss of file system damage.
    It‘s garbage design to kill the process when the target RTT data tells you that your test is „done“.

    The correct way of doing it is:
    Start the RTT process in Python but also provide and stdin to it. As soon as you received what you are looking for via RTT, you send an „exit\n“ over stdin (as you would have typed it in manual mode) and the Commander process will shutdown in a clean way.
    Getting the correct Python syntax etc. for stdin etc. is your job now...

    After sending the exit command do a wait() NOT poll() as poll returns immediately with no timeout:
    stackoverflow.com/questions/29…ess-wait-and-poll/2996026

    A hint reg. stdin:
    Use communicate()
    stackoverflow.com/questions/84…a-python-subprocess-stdin
    Please read the forum rules before posting.

    Keep in mind, this is *not* a support forum.
    Our engineers will try to answer your questions between their projects if possible but this can be delayed by longer periods of time.
    Should you be entitled to support you can contact us via our support system: segger.com/ticket/

    Or you can contact us via e-mail.
  • That's *better*! Some real information, thank you :-).

    FYI, I have since modified my Python script to send a Windows termination, CTRL-C, to the application, so it is not like the power is being pulled, it is being told to exit (5 times at 1 second intervals) before the task-manager-style termination of terminate() is invoked. I hoped that would be a sufficiently clear "tidy up now" signal but obviously not. I will try typing "exit\n" at it over stdin next.

    Thanks, again, for your support.
  • Confirmed: for the benefit of the forum, in case anyone else has this problem, if you are using J-Link for RTT logging (reading from its RTT-socket), driven from an [e.g. Python] script, there is only ONE way to reliably shut J-Link down again and that is to keep a stdin pipe open to it and send it the string "exit\n" before terminating it.

    If you send J-Link SIGTERM/CTRL-C/CTRL-BREAK signals, or if you just terminate it, some percentage of the time (with us it was anything from 10% to 40% of the time) J-Link will hang the next time you try to start it on that board and only power-cycling the board will fix the problem; we tried everything, including programmatically resetting the PC's USB port, and nothing helped. When this happens, if you take a J-Link log (command-line parameter -Log <Path>) the log will end with "Out of sync, resynchronizing..." and then nothing more.

    Having implemented this send-exit-over-stdin-and-then-terminate scheme we can now use J-Link for RTT logging and then for something else afterwards reliably.

    Thanks for the support @SEGGER - Alex.
  • Thanks for confirming that it was an issue in your setup.
    Pulling the plug in the middle of the work is like getting your hands of the steering wheel of your cars when driving on a crowded highway and a high speed.
    There is a pretty high chance that you will be in an accident. (You don't have to but there is a pretty high chance)

    Script mode of J-Link Commander is not different from interactive mode where you type in commands.
    Issue Ctrl + C in interactive mode and J-Link Commander will also not close itself.
    Ctrl + C is used to leave blocking operations. (There are very few of them in J-Link Commander)


    BR
    Alex
    Please read the forum rules before posting.

    Keep in mind, this is *not* a support forum.
    Our engineers will try to answer your questions between their projects if possible but this can be delayed by longer periods of time.
    Should you be entitled to support you can contact us via our support system: segger.com/ticket/

    Or you can contact us via e-mail.