incorrect decoding of RISC-V instructions

    This site uses cookies. By continuing to browse this site, you are agreeing to our Cookie Policy.

    • incorrect decoding of RISC-V instructions

      I'm encountering a problem where SEGGER tools are not correctly decoding RISC-V instructions.

      My target is a RISC-V core (Ibex RV32IMC) with RISC-V debug (pulp riscv-dbg v0.13) on an Arty A7-A35T FPGA board (segger.com/evaluate-our-software/risc-v/digilent-arty/).

      It works fine with OpenOCD, but SEGGER has problems with it. J-Link Commander doesn't correctly decode and single-step instructions (and Embedded Studio also trips over itself at the same point).

      Testing was done with the latest J-Link release (v6.94). I have demonstration code for both the scenario of just using purely open-source Makefile as well as when using SEGGER Embedded Studio.

      Problem 1: Incorrect decode of unconditional jump

      Here is a simple unconditional jump in the code image:

      80: 0100006f j 90 <reset_handler>

      However, J-Link Commander decodes it like this when single-stepping:

      J-Link>r
      Reset delay: 0 ms
      Reset type Normal: Resets core & peripherals using <ndmreset> bit in <dmcontrol> debug register.
      RISC-V: Performing reset via <ndmreset>
      J-Link>h
      pc = 00000080 sp = 00010000 ra = 000000F6
      gp = 00000000 tp = 00000000 fp = 00000000
      t0 = 00000000 t1 = 00000000 t2 = 00000000
      t3 = 00000000 t4 = 00000000 t5 = 00000000 t6 = 00000000
      a0 = 00000000 a1 = 00000000 a2 = 00000000 a3 = 0000C000
      a4 = 005F5E10 a5 = 0021A326 a6 = 00000000 a7 = 00000000
      s1 = 00000000 s2 = 00000000 s3 = 00000000 s4 = 00000000
      s5 = 00000000 s6 = 00000000 s7 = 00000000 s8 = 00000000
      s9 = 00000000 s10 = 00000138 s11 = 00000138
      J-Link>s
      00000080: 6F 00 00 01 J 0x000F0086
      Changed regs: pc = 000F0086
      J-Link>s
      000F0086: 00 00 ILLEGAL
      Changed regs: pc = 00000000
      Unlike J-Link Commander, Embedded Studio correctly decodes the jump instruction in its disassembly window. However, it still causes the PC to branch to the erroneous address (0x000F0086)that J-Link Commander invents.


      Problem 2: Inability to decode compressed instructions?

      Here is the code:

      mv x1, x0
      90: 00000093 li ra,0
      mv x2, x1
      94: 8106 mv sp,ra
      mv x3, x1
      96: 8186 mv gp,ra
      mv x4, x1
      98: 8206 mv tp,ra
      mv x5, x1
      9a: 8286 mv t0,ra
      mv x6, x1
      9c: 8306 mv t1,ra


      and here is what J-Link Commander does:

      Note that:
      a) the "mv x1, x0" at 0x90 becomes "ADDI ra, t1, 9"
      b) for the compact instructions beginning at 0x94, the PC half-steps but continues to display the earlier instruction at 32-bit word alignment

      J-Link>r
      Reset delay: 0 ms
      Reset type Normal: Resets core & peripherals using <ndmreset> bit in <dmcontrol> debug register.
      RISC-V: Performing reset via <ndmreset>
      J-Link>h
      pc = 00000080 sp = 00010000 ra = 000000F6
      gp = 00000000 tp = 00000000 fp = 1A110000
      t0 = 00000000 t1 = 00000000 t2 = 00000000
      t3 = 00000000 t4 = 00000000 t5 = 00000000 t6 = 00000000
      a0 = 1A110000 a1 = 00000000 a2 = 00000000 a3 = 0000C000
      a4 = 005F5E10 a5 = 0021A326 a6 = 00000000 a7 = 00000000
      s1 = 00000000 s2 = 00000000 s3 = 00000000 s4 = 00000000
      s5 = 00000000 s6 = 00000000 s7 = 00000000 s8 = 00000000
      s9 = 00000000 s10 = 00000138 s11 = 00000138
      J-Link>SetPC 0x90
      J-Link>h
      pc = 00000090 sp = 00010000 ra = 000000F6
      gp = 00000000 tp = 00000000 fp = 1A110000
      t0 = 00000000 t1 = 00000000 t2 = 00000000
      t3 = 00000000 t4 = 00000000 t5 = 00000000 t6 = 00000000
      a0 = 1A110000 a1 = 00000000 a2 = 00000000 a3 = 0000C000
      a4 = 005F5E10 a5 = 0021A326 a6 = 00000000 a7 = 00000000
      s1 = 00000000 s2 = 00000000 s3 = 00000000 s4 = 00000000
      s5 = 00000000 s6 = 00000000 s7 = 00000000 s8 = 00000000
      s9 = 00000000 s10 = 00000138 s11 = 00000138
      J-Link>s
      00000090: 93 00 00 00 ADDI ra, t1, 9
      Changed regs: pc = 00000094 ra = 00000009
      J-Link>s
      00000094: 06 81 C.MV sp, ra
      Changed regs: pc = 00000096 sp = 00000009
      J-Link>s
      00000096: 06 81 C.MV sp, ra
      Changed regs: pc = 00000098
      J-Link>s
      00000098: 06 82 C.MV tp, ra
      Changed regs: pc = 0000009A tp = 00000009
      J-Link>s
      0000009A: 06 82 C.MV tp, ra
      Changed regs: pc = 0000009C
      J-Link>s
      0000009C: 06 83 C.MV t1, ra
      Changed regs: pc = 0000009E t1 = 00000009


      On both problems, I would be happy to provide SEGGER with example code and Arty A7-A35T FPGA bitstream images for them to reproduce the problem.
    • Hi Peter,
      I tried to reproduce the described behavior bzut everything is working fine here. We've used 6.94 as well.
      Here's what we did:

      #1:

      Brainfuck Source Code

      1. J-Link>mem 0x80000000, 0x40
      2. 80000000 = 6F 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 o...............
      3. 80000010 = 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
      4. 80000020 = 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
      5. 80000030 = 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
      6. J-Link>setpc 0x80000000
      7. J-Link>disassemble
      8. 80000000: 6F 00 00 01 J 0x80000010
      9. 80000004: 00 00 ILLEGAL
      10. 80000006: 00 00 ILLEGAL
      11. 80000008: 00 00 ILLEGAL
      12. 8000000A: 00 00 ILLEGAL
      13. 8000000C: 00 00 ILLEGAL
      14. 8000000E: 00 00 ILLEGAL
      15. 80000010: 00 00 ILLEGAL
      16. 80000012: 00 00 ILLEGAL
      17. 80000014: 00 00 ILLEGAL
      18. J-Link>step
      19. 80000000: 6F 00 00 01 J 0x80000010
      20. Changed regs: pc = 80000010
      21. J-Link>
      Display All
      Everything looks fine here on our side.

      #2

      Brainfuck Source Code

      1. mem 0x80000000, 0x40
      2. 80000000 = 93 00 00 00 06 81 86 81 06 82 86 82 06 83 00 00 ................
      3. 80000010 = 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
      4. 80000020 = 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
      5. 80000030 = 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
      6. J-Link>setpc 0x80000000
      7. J-Link>disassemble
      8. 80000000: 93 00 00 00 MV ra, zero
      9. 80000004: 06 81 C.MV sp, ra
      10. 80000006: 86 81 C.MV gp, ra
      11. 80000008: 06 82 C.MV tp, ra
      12. 8000000A: 86 82 C.MV t0, ra
      13. 8000000C: 06 83 C.MV t1, ra
      14. 8000000E: 00 00 ILLEGAL
      15. 80000010: 00 00 ILLEGAL
      16. 80000012: 00 00 ILLEGAL
      17. 80000014: 00 00 ILLEGAL
      18. J-Link>step
      19. 80000000: 93 00 00 00 MV ra, zero
      20. Changed regs: pc = 80000004
      21. J-Link>step
      22. 80000004: 06 81 C.MV sp, ra
      23. Changed regs: pc = 80000006
      24. J-Link>step
      25. 80000006: 86 81 C.MV gp, ra
      26. Changed regs: pc = 80000008
      27. J-Link>step
      28. 80000008: 06 82 C.MV tp, ra
      29. Changed regs: pc = 8000000A
      30. J-Link>step
      31. 8000000A: 86 82 C.MV t0, ra
      32. Changed regs: pc = 8000000C
      33. J-Link>step
      34. 8000000C: 06 83 C.MV t1, ra
      35. Changed regs: pc = 8000000E
      36. J-Link>
      Display All
      Everything looks fine here, too.
    • In your test, you are using a different memory offset. The position of the instruction in memory seems to impact J-Link's ability to decode.

      I've attached two sample projects. ibex-ledelf.zip contains the original code; ibex-jumpdup.zip was a quick hack-up of the sample SES project to replicate the same jump instruction at the same memory location.

      For completeness, I've also attached a two breakpoint version of the Ibex RISC-V that can be loaded onto the Arty A7-A35T FPGA board that SEGGER promotes on its web site.
      Files
      • ibex-ledelf.zip

        (19.95 kB, downloaded 12 times, last: )
      • ibex-jumpdup.zip

        (4.95 kB, downloaded 12 times, last: )
      • ibex2bkpt.zip

        (122.15 kB, downloaded 14 times, last: )
    • We used a different memory offset because we did not have your target in house.
      We also currently do not see why the address of the instruction should have any effect.
      However, we will check.

      Can you please let us know what connectors and what pinouts to use to connect J-Link to the ARTY boars, using your bitstream(s) (also the ones from the other thread)
    • The connector pinouts follow the SiFive de facto convention ([SiFive Freedom E310 Arty FPGA Dev Kit Getting Started Guide](sifive.com/documentation/freed…it-getting-started-guide/)) (see Section 2.2):

      | PMOD header JD | |
      | ------------ | ------------ |
      | 1 : TDO | 7 : TDI |
      | 2 : TRST_N | 8 : TMS |
      | 3 : TCK | 9 : RESET_N |
      | 4 | 10 : |
      | 5 : GND | 11 : GND |
      | 6 : VREF | 12 : VREF |

      It should be a direct substitution for any existing SiFive Arty test setup that SEGGER has.

      Per the SES project configuration, there is a shared code/data 64kByte SRAM at 0x0. That isn't a property of the CPU; it just happens to be how the existing example (without debug) was configured. Execution after reset starts at 0x80, and that is a property of the CPU.

      For what it is worth (and/or if you would like details on the internals), I'm trying to promulgate this as a known-good example of a debug-enabled Ibex, but my pull request to the Ibex project hasn't yet been considered:

      add RISC-V debug module to FPGA example
      github.com/lowRISC/ibex/pull/1218

      The same Ibex (RV32IMC) with the exact same debug interface (but with an immense design that requires a massive FPGA to prototype) is employed with OpenTitan:


      opentitan.org/
      github.com/lowRISC/opentitan

      Documentation on the Ibex core is available here:


      ibex-core.readthedocs.io/en/latest/
    • With your bit-stream I was able to reproduce the issue:

      Source Code

      1. halt
      2. mem 0x00000200, 0x40
      3. w4 0x00000200 00000093
      4. w2 0x00000204 8106
      5. w2 0x00000206 8186
      6. w2 0x00000208 8206
      7. w2 0x0000020A 8286
      8. W2 0x0000020C 8306
      9. mem 0x00000200, 0x40
      10. setpc 0x00000200
      11. disassemble
      12. 00000200: 93 00 00 00 ADDI ra, t1, 9
      13. 00000204: 86 81 C.MV gp, ra
      14. 00000206: 86 81 C.MV gp, ra
      15. 00000208: 86 82 C.MV t0, ra
      16. 0000020A: 86 82 C.MV t0, ra
      17. 0000020C: 06 83 C.MV t1, ra
      18. 0000020E: 06 83 C.MV t1, ra
      19. 00000210: 00 00 ILLEGAL
      20. 00000212: 00 00 ILLEGAL
      21. 00000214: 00 00 ILLEGAL
      22. J-Link>
      Display All


      I assume that your device does not support 16-bit and 8-bit read accesses, 32-bit accesses work fine, though. The disassembler reads a 32-bit instruction in 2 phases: In phase 1 the lower 16 bits are read (from address 0x0200) and evaluated. Only if the evaluation indicates a 32-bit instruction the higher 16 bits are read (from address 0x0202). Next, the two 16-bit words are assembled into a 32-bit word and finally the instruction is disassembled. I see that both read accesses deliver the value 0x0093. So the instruction word is assembled to 0x00930093, which is obviously incorrect, expected is 0x00000093. Based on this wrong instruction word the disassembler delivers a wrong mnemonic.


      Having seen that, I forced non-32-bit reads using the 'mem8' and 'mem16' commands. Those force 8-bit and 16-bit accesses when displaying the memory content. 'mem', if executed on a 32-bit aligned address, does the same as 'mem32', i.e. 32-bit reads are performed. The output is as follows:

      Brainfuck Source Code

      1. J-Link>mem8 200 , 40
      2. 00000200 = 93 86 86 06 00 00 00 00 00 00 00 00 00 00 00 00
      3. 00000210 = 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      4. 00000220 = 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      5. 00000230 = 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      6. J-Link>mem16 200 , 20
      7. 00000200 = 0093 8186 8286 8306 0000 0000 0000 0000
      8. 00000210 = 0000 0000 0000 0000 0000 0000 0000 0000
      9. 00000220 = 0000 0000 0000 0000 0000 0000 0000 0000
      10. 00000230 = 0000 0000 0000 0000 0000 0000 0000 0000
      11. J-Link>mem32 200 , 10
      12. 00000200 = 00000093 81868186 82868286 83068306
      13. 00000210 = 00000000 00000000 00000000 00000000
      14. 00000220 = 00000000 00000000 00000000 00000000
      15. 00000230 = 00000000 00000000 00000000 00000000
      16. J-Link>mem 200 , 40
      17. 00000200 = 93 00 00 00 86 81 86 81 86 82 86 82 06 83 06 83 ................
      18. 00000210 = 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
      19. 00000220 = 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
      20. 00000230 = 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
      Display All
      As you can see, the output for 'mem8' and 'mem16' is garbled whereas the output for 'mem32' and 'mem' shows the expected contents.


      I'd like to suggest that you check your bit-stream for proper support of 8-bit and 16-bit accesses.


      The other thread (the one concerning breakpoints) you opened in the Wiki I'd like to pause until this issue here is settled. Memory accesses which do not yield the expected values may as well explain the behavior you described in that other thread. So please kindly check the other issue for persistence once this issue here is resolved.
    • Well, this was an interesting problem. I believe the issue you experienced is due to SEGGER performing writes in a non-compliant manner.

      To explain this, it is necessary to refer to the RISC-V External Debug Support specification, which is available here:

      riscv.org/wp-content/uploads/2019/03/riscv-debug-release.pdf

      I wrote some code to sniff the w4 and w2 instructions. (Thank you so much for specifying the exact test conditions you used; that made my life a lot easier.)

      The first few SEGGER JTAG RISC-V "DTM" transactions of W4 0x200 look like this:

      W: 0x38 0x00050000 2=write
      R: 0x38 0x00407000 0=success
      W: 0x39 0x00000200 2=write
      R: 0x38 0x00050000 0=success
      W: 0x3c 0x00000093 2=write


      The first few SEGGER JTAG RISC-V "DTM" transactions of W2 0x204 look like this:

      W: 0x38 0x00030000 2=write
      R: 0x38 0x00407000 0=success
      W: 0x39 0x00000204 2=write
      R: 0x38 0x00030000 0=success
      W: 0x3c 0x81068106 2=write


      The problem is the write to 0x38. Section 3.12.18 of the specification covers the sbcs (System Bus Access Control and Status) register.


      For W4, the above SEGGER operations specify a sbaccess of 32-bit, and for W2, it uses a sbaccess of 16-bit.

      The problem for SEGGER is that the specification allows the debug unit to specify which accesses are supported. These are bits sbaccess8, sbaccess16, sbaccess32, sbaccess64, and sbaccess128.

      In the PULP riscv-dbg original RTL source code, one can see (beginning at line 532) that only the native bus width is supported for accesses with this particular debug interface implementation:

      github.com/pulp-platform/riscv…lob/master/src/dm_csrs.sv

      sbcs_d.sbaccess128 = 1'b0;
      sbcs_d.sbaccess64 = logic'(BusWidth == 32'd64);
      sbcs_d.sbaccess32 = logic'(BusWidth == 32'd32);
      sbcs_d.sbaccess16 = 1'b0;
      sbcs_d.sbaccess8 = 1'b0;

      I think the J-Link is detecting this correctly initially, as it is visible in the initial blurb when it connects to the target:

      Debug architecture:
      RISC-V debug: 0.13
      AddrBits: 7
      DataBits: 32
      IdleClks: 1
      Memory access:
      Via system bus: Yes (32-bit accesses are supported)
      Via ProgBuf: Yes (8 ProgBuf entries)
      DataBuf: 2 entries
      autoexec[0] implemented: Yes
      Detected: RV32 core
      CSR access via abs. commands: No
      Temp. halted CPU for NumHWBP detection
      HW instruction/data BPs: 2
      Support set/clr BPs while running: No
      HW data BPs trigger before execution of inst
      RISC-V identified.


      The reads do not appear to be at issue. If you connect to the provided image after FPGA initialization, you should see the following:

      J-Link>mem32 0x80, 20
      00000080 = 0100006F 0080006F 0040006F 0000006F
      00000090 = 00000093 81868106 82868206 83868306
      000000A0 = 84868406 85868506 86868606 87868706
      000000B0 = 88868806 89868906 8A868A06 8B868B06
      000000C0 = 8C868C06 8D868D06 8E868E06 8F868F06
      000000D0 = 00010117 F3010113 13800D13 13800D93
      000000E0 = 01BD5763 000D2023 DDE30D11 4501FFAD
      000000F0 = 00EF4581 47290040 882367B1 673700E7
      J-Link>mem16 0x80, 40
      00000080 = 006F 0100 006F 0080 006F 0040 006F 0000
      00000090 = 0093 0000 8106 8186 8206 8286 8306 8386
      000000A0 = 8406 8486 8506 8586 8606 8686 8706 8786
      000000B0 = 8806 8886 8906 8986 8A06 8A86 8B06 8B86
      000000C0 = 8C06 8C86 8D06 8D86 8E06 8E86 8F06 8F86
      000000D0 = 0117 0001 0113 F301 0D13 1380 0D93 1380
      000000E0 = 5763 01BD 2023 000D 0D11 DDE3 FFAD 4501
      000000F0 = 4581 00EF 0040 4729 67B1 8823 00E7 6737
      J-Link>mem8 0x80, 80
      00000080 = 6F 00 00 01 6F 00 80 00 6F 00 40 00 6F 00 00 00
      00000090 = 93 00 00 00 06 81 86 81 06 82 86 82 06 83 86 83
      000000A0 = 06 84 86 84 06 85 86 85 06 86 86 86 06 87 86 87
      000000B0 = 06 88 86 88 06 89 86 89 06 8A 86 8A 06 8B 86 8B
      000000C0 = 06 8C 86 8C 06 8D 86 8D 06 8E 86 8E 06 8F 86 8F
      000000D0 = 17 01 01 00 13 01 01 F3 13 0D 80 13 93 0D 80 13
      000000E0 = 63 57 BD 01 23 20 0D 00 11 0D E3 DD AD FF 01 45
      000000F0 = 81 45 EF 00 40 00 29 47 B1 67 23 88 E7 00 37 67
    • Just because a standard allows it, it does not have to make sense to do it.
      It makes system bus access only half-way useful and not really a replacement for access via the CPU. So far *all* other implementations we came accross so far have implemented system bus access as a full replacement for access via the CPU.

      However, in the end you are right, J-Link seems to ignore the initially detected system bus limitation, so it is a mistake on the J-Link side.
      Will be fixed in one of the next versions but not with highest priority, as this is very special case on limited and very rare CPU designs.
      ETA for the fix is 4-6 weeks.
    • Thank you.I appreciate that you took the time to look seriously at this.

      For what it is worth, supposedly, the PULP riscv-dbg IP has been in several ASIC tape outs already, but perhaps this is the first (so far) instance of a SEGGER (rather than OpenOCD) user attempting it. OpenTitan (backed by Google, Seagate, Nuvoton, Western Digital, and lowRISC) uses the same implementation, so I believe SEGGER's added effort will prove to be worthwhile in the future.

      Just as aside, although you didn't make this particular assertion, you would get no argument from me if someone were to say that the RISC-V debug standard seems much more convoluted (and less capable for a given chip area size) than the ARM equivalent.