Performance degradation when debugging bare metal BCM2837 (aarch64, Cortex-A53)

    This site uses cookies. By continuing to browse this site, you are agreeing to our Cookie Policy.

    • Performance degradation when debugging bare metal BCM2837 (aarch64, Cortex-A53)

      Hello SEGGER Team,

      I am really enjoying using JLink PLUS and Ozone for debugging embedded targets, thank you.

      Recently I've started to implement a bare metal application for Raspberry Pi 3B+ (BCM2837, Cortex-A53, aarch64, at the moment single core).

      Yes, I know, SEGGER don't support officially this platform, but I've successfully managed to establish a full debug session using JLink and Ozone, and I am very happy of that. The only thing I had to write was a small little stub in the JLinkScript file for RPi 3B+:

      C Source Code: JLinkScript

      1. void InitTarget(void) {
      2. CORESIGHT_CoreBaseAddr = 0x80010000;
      3. JLINK_ExecCommand("CORESIGHT_SetCTICoreBaseAddr=0x80018000");
      4. }
      The reason I am writing a post here is that in recent Ozone (3.38c) the debugging of this target is very, very slow. Stepping to next lines takes 1-2-3 seconds, resetting the target even more. It's simply unusable. Registers are taking long to update, putting breakpoints slow, memory update is slow. I am working on Xubuntu 24.04 LTS, but on Windows 10 it works even slower.

      [Target interface speed is constantly 15MHz, I am using short JTAG cables, the ELF file is very small (100K). Based on experience with other processors (iMX6, STM32, others) it should work MUCH faster]

      I've tried going to older verions of Ozone and to my surprise I finally found version 3.34 (pure) that is working fast again! Everything works instantly. Stepping is instant, resetting is instant. No lagging. Everything refreshes amazingly fast. Love it.

      So my conclusion is - there is definitely some performance degradation between Ozone versions 3.34 and newer (3.34a is already slower).

      I've attached JLinkLog files from version 3.34 and 3.38c. I have separated with blank spaces crucial moments when stepping in code to next lines.
      As you can see, for 3.38c, the first STEP is maybe even fast, but all other are slower and slower.

      What brings my attention is that the biggest slowdown happens right after the `JLINK_ClrBPEx` function call:

      ...
      TF18006C0 046:381.787 JLINK_ReadRegs_64(NumRegs = 1)
      TF18006C0 046:381.792 SP (32) = 0x7FFE0
      TF18006C0 046:381.797 - 0.011ms returns 0
      TF18006C0 046:381.808 JLINK_FindBP(Addr = 0x0008068C)
      TF18006C0 046:381.814 - 0.007ms returns 6
      TF18006C0 046:381.820 JLINK_ClrBPEx(BPHandle = 0x00000006)
      TF18006C0 046:381.827 - 0.008ms returns 0x00
      TF5BF97C0 046:825.348 JLINK_IsHalted() <---------------- 500ms difference, SLOW!
      TF5BF97C0 046:825.391 - 0.043ms returns TRUE
      TF5BF97C0 046:825.395 JLINK_ReadRegs_64(NumRegs = 1)
      TF5BF97C0 046:825.413 CPSR (41) = 0x200003C9
      ...


      Where for Ozone 3.34 there is no big lag at all:

      ...
      TC30006C0 027:397.701 JLINK_ReadRegs_64(NumRegs = 1, Indexes:
      TC30006C0 027:397.705 194)
      TC30006C0 027:397.711 -- AARCH32_R13=0x3F215040
      TC30006C0 027:397.718 - 0.017ms returns 0
      TC30006C0 027:397.727 JLINK_FindBP(Addr = 0x0008067C)
      TC30006C0 027:397.733 - 0.005ms returns 4
      TC30006C0 027:397.737 JLINK_ClrBPEx(BPHandle = 0x00000004)
      TC30006C0 027:397.743 - 0.006ms returns 0x00
      TC71537C0 027:402.091 JLINK_IsHalted() <--------------- 5ms difference, fast!
      TC71537C0 027:402.124 - 0.033ms returns TRUE
      TC71537C0 027:402.127 JLINK_ReadRegs_64(NumRegs = 1, Indexes:
      TC71537C0 027:402.130 41)
      TC71537C0 027:402.158 -- CPSR=0x200003C9
      ...


      Maybe something has changed in respect to putting and clearing breakpoints in Ozone recently?
      Since stepping in the code involves putting and clearing a temporary breakpoint then maybe that's the issue?

      I have even tried swapping libjlinkarm.so library shipped with Ozone with other versions:
      - For Ozone 3.38c I replaced it with libjlinkarm.so.7.96.8 (older) and it is still slow.
      - However for Ozone 3.34 I replaced it with libjlinkarm.so.8.10.12 (newer) and it is still fast!
      So it looks for me more likely like a bug in recent Ozones.

      Please, take a look on that and fix it if possible.

      I know that for wide number of developers a lag of 1-3 seconds or more is not a problem, but the whole reason I am keeping buying JLink Plus for corporations is exactly to not wait that 1-3 seconds each time when I step over the code. :)

      Kind regards
      Adam

      PS: I would simply keep using version 3.34 and forget about 3.38c, but it has problems with updating 64 bit registers (for example version 3.34 can't set 64-bit PC correctly, leaving upper 32 bits with garbage), but for that I have also a small workaround in jlinkscript

      C Source Code: JLinkScript

      1. void AfterResetTarget(void) {
      2. JLINK_CPU_WriteReg(33, 0x80000);
      3. }
      (altough it just solves the problem of setting PC after reset, updating other registers from window is still buggy and leaves garbage in upper halfs, that's why I need to update my Ozone to newer one)
      Files

      The post was edited 4 times, last by ram.techen: Style fix ().

    • Hi ram.techen,
      from the logs I cannot see what you are describing. In particular I looked into the `JLINK_Step` commands and found that they consume quite the same time in both slow and fast case. The command execution time is
      4.836ms vs 4.718ms

      4.501ms vs 4.588ms
      4.676ms vs 4.796ms

      4.573ms vs 4.423ms

      4.929ms vs 4.819ms

      5.093ms vs 4.610ms
      4.761ms vs 4.816ms
      4.724ms vs 4.524ms

      4.752ms vs 4.574ms

      4.596ms vs 4.581ms
      So in both cases you should experience a delay of ~5 seconds when stepping.

      Best regards
      -- AlexD
      Please read the forum rules before posting.

      Keep in mind, this is *not* a support forum.
      Our engineers will try to answer your questions between their projects if possible but this can be delayed by longer periods of time.
      Should you be entitled to support you can contact us via our support system: segger.com/ticket/

      Or you can contact us via e-mail.
    • Hi AlexD

      Thank you for your response.

      Okay, so I have made another test and I think I have isolated the problem of slowdown on 3.38c!

      I made a very simple bare metal program like that:

      Source Code

      1. int main(void)
      2. {
      3. asm("nop");
      4. asm("nop");
      5. asm("nop");
      6. asm("nop");
      7. asm("nop");
      8. while(1)
      9. {
      10. asm("nop");
      11. asm("nop");
      12. asm("nop");
      13. asm("nop");
      14. asm("nop");
      15. asm("nop");
      16. char Character;
      17. Character = 'a';
      18. }
      19. }
      Display All
      On version 3.34 everything goes all the time fast. But on version 3.38c the Ozone goes fast only until it reaches the while loop. After that all subsequent operations are slowed down dramatically. I kept pressing very fast F10 and this is what Ozone said in Console view:

      Source Code

      1. 5.345 164 Debug.StepOver();
      2. 5.541 217 Debug.StepOver();
      3. 5.552 066 Debug.StepOver();
      4. 5.561 909 Debug.StepOver();
      5. 5.571 632 Debug.StepOver();
      6. 5.581 793 Debug.StepOver();
      7. 5.591 562 Debug.StepOver();
      8. 11.351 925 Debug.StepOver(); <--- SLOWDOWN
      9. 15.151 939 Debug.StepOver();
      10. 18.975 404 Debug.StepOver();
      11. 22.783 446 Debug.StepOver();
      12. 26.588 185 Debug.StepOver();
      Display All
      So as you see, before the while loop each StepOver is almost instant (less than 50ms) but right when reaching line number 11 (entering the loop) the Ozone gets veeeery slow (2-5 seconds delay).

      What is competely surprising for me is that if I delete line 21 and 22 (so leaving only NOPs and while loop) and fully restart Ozone:

      Source Code

      1. int main(void)
      2. {
      3. asm("nop");
      4. asm("nop");
      5. asm("nop");
      6. asm("nop");
      7. asm("nop");
      8. while(1)
      9. {
      10. asm("nop");
      11. asm("nop");
      12. asm("nop");
      13. asm("nop");
      14. asm("nop");
      15. asm("nop");
      16. }
      17. }
      Display All
      then it never slows down!

      Source Code

      1. 10.431 091 Debug.StepOver();
      2. 10.626 890 Debug.StepOver();
      3. 10.636 666 Debug.StepOver();
      4. 10.647 651 Debug.StepOver();
      5. 10.658 929 Debug.StepOver();
      6. 10.669 901 Debug.StepOver();
      7. 10.680 924 Debug.StepOver();
      8. 10.691 081 Debug.StepOver();
      9. 10.700 954 Debug.StepOver();
      10. 10.711 097 Debug.StepOver();
      11. 10.721 538 Debug.StepOver();
      12. 10.733 175 Debug.StepOver();
      13. 10.741 856 Debug.StepOver();
      14. 10.750 350 Debug.StepOver();
      15. 10.760 149 Debug.StepOver();
      16. ... instant forever!
      Display All

      So, what do you think, why adding a simple instructions like `char Character; Character = 1;` causes this giant slowdown?
      Maybe register allocation issues? Or maybe some ELF parsing issues? Or maybe even some stack calculation issues?

      Kind regards
      Adam


      PS: Yet another thing, maybe it is related!
      If I write a code like that

      Source Code

      1. void g(void) {
      2. return;
      3. }
      4. int main(void)
      5. {
      6. asm("nop");
      7. asm("nop");
      8. asm("nop");
      9. asm("nop");
      10. while(1)
      11. {
      12. asm("nop");
      13. asm("nop");
      14. asm("nop");
      15. asm("nop");
      16. g();
      17. }
      18. }
      Display All
      Then I put a breakpoint on line 3 (end of function g).
      When program reaches that breakpoint and I press F10 then Ozone just keeps staying on that line, no matter how many times I press F10!
      If after reaching that breakpoint I delete it, then pressing F10 on line 3 makes Ozone to continue forever, and the program doesn't come back to line 14.
      But when I press F11 instead then Ozone properly goes back to line 14.
      So... maybe there is some problem related to stack reading/return address calculation in Ozone?

      The post was edited 12 times, last by ram.techen ().

    • Hi ram.techen,
      from the given information hardly anything can be said. You change the source code and a smart compiler might make a completely different assembly code from that. So a completely different runtime experience might be expected.
      Since you are able to reproduce the issue in your 1st bare metal program, could you please provide a J-Link log and an Ozone log recorded during the same debug session where you observe the slowdown? The logs should show both the fast execution and also the slow execution.
      Best regards
      -- AlexD
      Please read the forum rules before posting.

      Keep in mind, this is *not* a support forum.
      Our engineers will try to answer your questions between their projects if possible but this can be delayed by longer periods of time.
      Should you be entitled to support you can contact us via our support system: segger.com/ticket/

      Or you can contact us via e-mail.
    • Hi AlexD

      Thank you for your support.

      As you suggested I am posting here JLinkLog files made with Ozone 338c for both scenarios (with `char Character; Character = 'x';` statements [slow] and without [fast]).
      I am also attaching screen recordings and even whole bare metal project source code.
      In order to compile the project I am using the arm-gnu-toolchain-14.2.rel1-x86_64-aarch64-none-elf toolchain from ARM website: developer.arm.com/downloads/-/arm-gnu-toolchain-downloads .
      In case project sources can't be opened I have also mirrored them into github: github.com/akowalew/piapp/tree/ozone-338c-test

      As you see in FAST scenario stepping over is so fast that I can even hold F10 key continuously and Ozone just runs so fast.
      On the other hand in SLOW scenario the Ozone is fast only in the beggining. When entering the while loop it becomes so slow that I need to wait up to 5 seconds just to press F10 again. Just because of adding a simple variable and assignment into the loop. When I press F10 then whole Ozone is just completely frozen, during that I can't click/view/close/touch/move anything.

      PS: I am also attaching debug log of Ozone. Crucial part is stepping:

      BCK 0231:505 -------------------- Step Over ------------------------- <--- F10 PRESS
      BCK 0231:505 PC is on a new source line
      BCK 0231:505 Stepped a total of 1 machine instruction(s) <--- AFTER THAT UI FREEZES
      GUI 0235:198 Executing script function: AfterTargetHalt <--- UI GETS BACK (AFTER 4 SECONDS)
      ...
      BCK 0245:298 -------------------- Step Over ------------------------- <--- F10 PRESS
      BCK 0245:298 PC is on a new source line
      BCK 0245:298 Stepped a total of 1 machine instruction(s) <--- AFTER THAT UI FREEZES
      GUI 0249:016 Executing script function: AfterTargetHalt <--- UI GETS BACK (AFTER 4 SECONDS)


      So for me it looks like JLink has stepped to the next instruction probably fast, but maybe Ozone because of some bug needs that long time to update its UI?

      Kind regards
      Adam Kowalewski
      Files

      The post was edited 7 times, last by ram.techen ().

    • Hi ram.techen,
      Sorry, but this is not helpful here. You are providing logs from 2 different firmware versions. As stated before, due to a different firmware a completely different runtime behavior may be expected.
      If I got you right, you do have a firmware which at the beginning shows fast stepping but after some time switches to slow stepping.
      Please do a single debug session where both scenarios, slow stepping and fast stepping are reproduced and record J-Link log and Ozone log with that very debug session. Please also provide the ELF file you were using in that very debug session.
      Can you do that?
      Best regards
      -- AlexD
      Please read the forum rules before posting.

      Keep in mind, this is *not* a support forum.
      Our engineers will try to answer your questions between their projects if possible but this can be delayed by longer periods of time.
      Should you be entitled to support you can contact us via our support system: segger.com/ticket/

      Or you can contact us via e-mail.
    • Hi AlexD

      Thank you for your reply. I did as you wish and I think I have isolated the problem even more!

      The main function has this body:

      C Source Code: main.c

      1. int main(void)
      2. {
      3. asm("nop");
      4. asm("nop");
      5. asm("nop");
      6. asm("nop");
      7. asm("nop");
      8. asm("nop");
      9. asm("nop");
      10. asm("nop");
      11. asm("nop");
      12. asm("nop");
      13. while(1)
      14. {
      15. asm("nop");
      16. asm("nop");
      17. asm("nop");
      18. asm("nop");
      19. asm("nop");
      20. asm("nop");
      21. asm("nop");
      22. asm("nop");
      23. char Character;
      24. Character = 'x';
      25. }
      26. }
      Display All


      Ozone 338c seems is fast as long as I don't open the Local Data window.
      But when I just open Local Data then everything gets slower and all further stepping is very slow, UI takes long time to refresh and so on.
      The same happens with Watch Data window after I add 'Character' variable into it, altough it's not as slow as with Local Data window opened.
      And also the same happens with Global Data window after I declare 'Character' as global.

      So, for me, it looks like maybe there is some problem with ELF parser or some problem with reading variables from memory/registers?

      I have attached both JLinkLog, OzoneDebugLog, ELF file and video recording.
      As you can see, in the beginning I have Local Data closed, and Ozone runs smoothly, steps instantly. When I only open Local Data window then it gets slow. You can see that I am shaking a mouse to show that UI is unresponsive during stepping.

      Kind regards
      Adam
      Files

      The post was edited 8 times, last by ram.techen ().

    • Hi ram.techen,
      thank you for the additional information. We will see if we can reproduce that locally. Since all our engineers are busy these days this might take some time. We will keep you posted here.
      Best regards
      -- AlexD
      Please read the forum rules before posting.

      Keep in mind, this is *not* a support forum.
      Our engineers will try to answer your questions between their projects if possible but this can be delayed by longer periods of time.
      Should you be entitled to support you can contact us via our support system: segger.com/ticket/

      Or you can contact us via e-mail.
    • Hi ram.techen,

      Debugging bare-metal on this device is unfortunately not as straight-forward as with other hardware.

      Could you please additionally provide any other required files for this debug setup? At least the boot-image seems to be missing.
      Could you furthermore describe all required steps to get up and running?

      Thanks and best regards,
      SebastianB
      Please read the forum rules before posting.

      Keep in mind, this is *not* a support forum.
      Our engineers will try to answer your questions between their projects if possible but this can be delayed by longer periods of time.
      Should you be entitled to support you can contact us via our support system: segger.com/ticket/

      Or you can contact us via e-mail.
    • Hello SebastianB,

      Sorry for late reply, I didn't get any e-mail notification from forum unfortunately!

      Here is the whole repository of code, with all build, deploy, and connection instructions: github.com/akowalew/piapp/tree/ozone-338c-test (on branch `ozone-338c-test`).

      I posted it into github because here I can't upload files larger than 1MB.

      Simply download whole repository and follow instructions.

      Hopefully it will help you to run the debug session on the target. If not, feel free to ask more questions.

      Kind regards
      Adam
    • Hi Adam,

      Thank you for providing the files and instructions!

      We were able to reproduce the issue on our end. We will of course take a closer look here and provide a fix in a future Ozone release.

      Best regards,
      SebastianB
      Please read the forum rules before posting.

      Keep in mind, this is *not* a support forum.
      Our engineers will try to answer your questions between their projects if possible but this can be delayed by longer periods of time.
      Should you be entitled to support you can contact us via our support system: segger.com/ticket/

      Or you can contact us via e-mail.