embOS RX IAR V3.90a with IAR Embedded Workbench for Renesas RX 2.60.5
We have set up several queues in our system exactly the same way. One of them, depending upon task timing (if we insert an OS_Delay() in one of the tasks), gets corrupted somehow by the OS. It appears to happen when the queue reaches its capacity of messages.
We have many producers and one consumer. All the messages from all the producers are exactly the same length (220 bytes). We have created a queue buffer to hold 4 of these messages, but there could be more than 4 waiting to be put into the queue for retrieval. We simply each OS_Q_Put() statement in a loop that checks the return value and if the queue is full, it delays and tries again until the queue has room to accept the new message.
There are no OS_Q_Put(), OS_Q_GetPtr() or OS_Q_Purge() calls in any interrupts.
It appears that under some circumstances, while the queue is full, that the OS_Q_Purge() puts an incorrect value in the queue.pData.offFirst element of the queue's structure.
Consider the following with a 100 byte message and a 400 byte buffer to hold 4 messages:
The offFirst element should be one of the 4 following values: 0, 100, 200, 300 for the beginning offset of each of the 4 messages.
Under Normal Conditions (no delay) we observe:
The first time the OS_Q_GetPtr() runs, the offFirst remains at 0.
The first time the OS_Q_Purge() runs, the offFirst changes to 100. (Offset of the next message)
Under Abnormal Conditions (we insert a delay to change overall task timing) we observe:
The first time the OS_Q_GetPtr() runs, the offFirst remains at 0.
The first time the OS_Q_Purge() runs, the offFirst changes to 4, which is smack in the middle of the first message data and is therefore invalid.
A few questions:
1) Do you know if this is a known bug in this revision? If so, is there a workaround or a patch? In the release notes, I can see that previous version of embOS had queue issues and corruption, which leads me to suspect that perhaps the bug wasn't completely resolved.
2) I read in the Segger embOS manual that "The queue data buffer contains the messages as also some additional management information bytes. Each message has a message header containing the message size. Additionally the queue buffer will be aligned for those CPUs which need data alignment. Therefore the queue data buffer size has to be larger than the sum of all messages."
This mentions making the buffer size larger than the sum of all messages, but it doesn't state HOW MUCH LARGER. Any clue?
3) If the answer is "upgrade to the latest revision", we cannot do that. We are very close to our release date and cannot afford to make a major change. Plus, it would affect several different product lines that all use the same version. Is the embOS source code available anywhere?
We did 2 things in the interim as quick patches to the problem, but they aren't acceptable solutions knowing that this issue is just waiting to happen:
1) We increased the number of messages the buffer holds.
2) We then increased the buffer by a randomly chosen 50 bytes to try and satisfy the requirement the manual states.
I should also mention that we are working with a licensed version.
Any help would be appreciated!
Thank you,
Bryan Lendroth
Honeywell, Inc.
We have set up several queues in our system exactly the same way. One of them, depending upon task timing (if we insert an OS_Delay() in one of the tasks), gets corrupted somehow by the OS. It appears to happen when the queue reaches its capacity of messages.
We have many producers and one consumer. All the messages from all the producers are exactly the same length (220 bytes). We have created a queue buffer to hold 4 of these messages, but there could be more than 4 waiting to be put into the queue for retrieval. We simply each OS_Q_Put() statement in a loop that checks the return value and if the queue is full, it delays and tries again until the queue has room to accept the new message.
There are no OS_Q_Put(), OS_Q_GetPtr() or OS_Q_Purge() calls in any interrupts.
It appears that under some circumstances, while the queue is full, that the OS_Q_Purge() puts an incorrect value in the queue.pData.offFirst element of the queue's structure.
Consider the following with a 100 byte message and a 400 byte buffer to hold 4 messages:
The offFirst element should be one of the 4 following values: 0, 100, 200, 300 for the beginning offset of each of the 4 messages.
Under Normal Conditions (no delay) we observe:
The first time the OS_Q_GetPtr() runs, the offFirst remains at 0.
The first time the OS_Q_Purge() runs, the offFirst changes to 100. (Offset of the next message)
Under Abnormal Conditions (we insert a delay to change overall task timing) we observe:
The first time the OS_Q_GetPtr() runs, the offFirst remains at 0.
The first time the OS_Q_Purge() runs, the offFirst changes to 4, which is smack in the middle of the first message data and is therefore invalid.
A few questions:
1) Do you know if this is a known bug in this revision? If so, is there a workaround or a patch? In the release notes, I can see that previous version of embOS had queue issues and corruption, which leads me to suspect that perhaps the bug wasn't completely resolved.
2) I read in the Segger embOS manual that "The queue data buffer contains the messages as also some additional management information bytes. Each message has a message header containing the message size. Additionally the queue buffer will be aligned for those CPUs which need data alignment. Therefore the queue data buffer size has to be larger than the sum of all messages."
This mentions making the buffer size larger than the sum of all messages, but it doesn't state HOW MUCH LARGER. Any clue?
3) If the answer is "upgrade to the latest revision", we cannot do that. We are very close to our release date and cannot afford to make a major change. Plus, it would affect several different product lines that all use the same version. Is the embOS source code available anywhere?
We did 2 things in the interim as quick patches to the problem, but they aren't acceptable solutions knowing that this issue is just waiting to happen:
1) We increased the number of messages the buffer holds.
2) We then increased the buffer by a randomly chosen 50 bytes to try and satisfy the requirement the manual states.
I should also mention that we are working with a licensed version.
Any help would be appreciated!
Thank you,
Bryan Lendroth
Honeywell, Inc.
The post was edited 2 times, last by ZaphyB ().