[SOLVED] RTT usage of SEGGER_RTT_ASM_ARMv7M.S

  • Hi,
    can you please tell me why and how to use SEGGER_RTT_ASM_ARMv7M.S, there is not much info on that.

    1. What is the advantage, is it faster?
    2. How do I use it? just setting #define USE_RTT_ASM (1) in SEGGER_RTT_Conf.h doesn't do much as far as I can tell

    I am working with Dave IDE (gcc) with XMC4400 (Cortex M4)

  • Usually, there is nothing to do.
    The ASM variant is active by default for GCC + Cortex-M4.

    Yes, the ASM variant is faster and it is constantly fast, no matter how high/low your compiler optimizations around it are.

    Please read the forum rules before posting.

    Keep in mind, this is *not* a support forum.
    Our engineers will try to answer your questions between their projects if possible but this can be delayed by longer periods of time.
    Should you be entitled to support you can contact us via our support system: https://www.segger.com/ticket/

    Or you can contact us via e-mail.

  • I tested two versions, 6.40 and 7.80a:


    V6.40 has #define USE_RTT_ASM 1 in SEGGER_RTT_Conf.h, but this has no effect.

    In both versions the presence of SEGGER_RTT_ASM_ARMv7M.S does not change the output.
    Not using SEGGER_RTT.c results in errors.

    Since the times for writing 82 chars are far away from what is claimed on the website (<= 1us @ 168MHz Cortex M4), I did a bit of testing.
    I'm only running at 120MHz, so my times should be slower by a factor of 1.4. But the best I could do was 3.5 us.
    The only way to achieve <1us for just copying bytes is by using an assembler version of memcpy I found on the internet, or by using DMA.

    Best case RTT_Write I could do takes 3.5us
    82 byte memcpy takes 5 us
    82 byte memcpy32 takes 0.8us
    82 byte copy loop takes 9.1 us
    84 byte copy loop with 32bits at a time takes 2.5 us
    84 byte DMA copying 32bits at a time take 0.8us

    So using an assembler version of memcpy is fast, but I cannot get there. Please tell my what I'm doing wrong.

    My test code below:

  • Hi,
    it seems like there might be some misunderstandings here.
    So I hope to clear them out with this post.

    1) When is the ASM sub module used
    The ASM sub module contains an ASM variant of SEGGER_RTT_WriteSkipNoLock().
    This function is used per default for most cores (e.g. Cortex-M4).
    However, you are using SEGGER_RTT_Write() not SEGGER_RTT_WriteSkipNoLock().
    So in your example code the ASM routine is actually not used.
    You can check this by looking into the SEGGER_RTT.c and SEGGER_RTT.h files, which contain the compiler switches (asm available or not) and code used by RTT (asm version of SEGGER_RTT_WriteSkipNoLock() used or not).

    2) What does output time actually mean
    The output time is the time it takes the RTT module to output data.
    It is the time between the call of the RTT function until the data is available to be read by J-Link, so until it is in the buffer, without overhead(!).

    So to measure the output time can be done by using a scope and an application that toggles a pin:
    a) Set the pin (e.g. LED pin).

    b) Measure the time (clear to set) of the following calls to get overhead time:
    BSP_ClrLED(0);SEGGER_RTT_LOCK();Status = SEGGER_RTT_WriteNoLock(0, 0, 0);SEGGER_RTT_UNLOCK();BSP_SetLED(0);c) Measure the time (clear to set) of the following calls to get actual output time:
    BSP_ClrLED(0);SEGGER_RTT_LOCK();Status = SEGGER_RTT_WriteNoLock(0, "01234567890123456789012345678901234567890123456789012345678901234567890123456789\r\n", 82);SEGGER_RTT_UNLOCK();BSP_SetLED(0);I repeated the measurement on the SEGGER Cortex-M Trace Reference Board (168MHz) and the result was as follows:
    Measurement 1) (overhead): 2.04usMeasurement 2) (82 chars): 2.70us=> Time without overhead: 2.70us - 2.04us = 0.66us

    3) What factors can impact the output time
    The test I did was executed from flash.
    As the string is copied from flash to RAM, the RAM write speed and cache handling (if the core has any) has an impact on the time measured for example.

    BR
    Fabian

    Please read the forum rules before posting.

    Keep in mind, this is *not* a support forum.
    Our engineers will try to answer your questions between their projects if possible but this can be delayed by longer periods of time.
    Should you be entitled to support you can contact us via our support system: https://www.segger.com/ticket/

    Or you can contact us via e-mail.

  • The main problem is that I cannot get to the "time to do a single memcopy" speed. Or worded differently: my memcpy is slow.

    here are some results:

    Are you using a special memcpy function? (I'm not a software engineer so I can't confidently tell from the code)

    • If so, why is it not as fast as in your test? The mepcpy32 function shows that it can be fast.
    • If not, why is it RTT still faster than calling memcpy manually?
  • Hi,
    Your question is not related to RTT but is concerning a general topic
    related to a general C and Assembler language.

    Please understand that this far exceeds the scope of the help we can
    provide in this Forum or in our support ticket system.

    I suggest that you take the time to check the RTT target side implementation, which you have all sources for.
    You could either
    a) follow the function calls or
    b) step through the code
    until you find where the memory is copied in the source.


    As a hint:
    As long as RTT_USE_ASM is set, you can find the memory copy implementation in SEGGER_RTT_ASM_ARMv7M.S => SEGGER_RTT_ASM_WriteSkipNoLock().

    I dare say that it is well enough documented to find the code where the memory is copied.

    Regarding the question why some memcpy implementations are faster than others:
    I suggest to search the internet for this information. I am sure you will find well explained answers to this questions.

    If you can come up with a faster/more optimized routine to copy the memory,
    you are free to adjust the code you are using for RTT.
    You have all the sources and are free to change them.
    Please understand however, that we can not provide any support or similar for code
    adjusted by users.

    Please understand that for these reasons we cannot provide any more answers regarding this topic.
    We will close this thread now.

    BR
    Fabian

    Please read the forum rules before posting.

    Keep in mind, this is *not* a support forum.
    Our engineers will try to answer your questions between their projects if possible but this can be delayed by longer periods of time.
    Should you be entitled to support you can contact us via our support system: https://www.segger.com/ticket/

    Or you can contact us via e-mail.

Participate now!

Don’t have an account yet? Register yourself now and be a part of our community!