All of lore.kernel.org
 help / color / mirror / Atom feed
* [Xenomai-help] EML conflict with RTCAN? low_level_input framebuilding failed.
@ 2007-08-13  9:45 Roland Tollenaar
  2007-08-13 11:41 ` Wolfgang Grandegger
  0 siblings, 1 reply; 23+ messages in thread
From: Roland Tollenaar @ 2007-08-13  9:45 UTC (permalink / raw)
  To: EML users, Xenomai-help

Hi,

in a 1 ms period task of my xenomai application things work perfectly 
with my BeckHoff devices addressing them via Ethercat with EML until I 
activate rtcan. Then sometimes (not always) I get incessant warnings 
from EML which read:

EC_Telegram:: check_index(): Index field does not correspond with 
received data.
low_level_input(): framebuilding failed.

The cycle time of the task is not being violated AFAI can see.

In dmesg the following can be found when the conflict occurs

RTnet:rtskb allocation from real-time cache failed.
Assertion failed! drivers/xenomai/can/rtcan_raw.c: rtcan_tx_push:168 
dev->tx_socket=0 (3) TX skb still in use.


Can anyone make any suggestions as to what might be the problem here or 
what I could try to look at to establish this?

Regards.

Roland.





^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Xenomai-help] EML conflict with RTCAN? low_level_input framebuilding failed.
  2007-08-13  9:45 [Xenomai-help] EML conflict with RTCAN? low_level_input framebuilding failed Roland Tollenaar
@ 2007-08-13 11:41 ` Wolfgang Grandegger
  2007-08-13 12:41   ` Roland Tollenaar
  0 siblings, 1 reply; 23+ messages in thread
From: Wolfgang Grandegger @ 2007-08-13 11:41 UTC (permalink / raw)
  To: rolandtollenaar; +Cc: Xenomai-help, EML users

Roland Tollenaar wrote:
> Hi,
> 
> in a 1 ms period task of my xenomai application things work perfectly 
> with my BeckHoff devices addressing them via Ethercat with EML until I 
> activate rtcan. Then sometimes (not always) I get incessant warnings 
> from EML which read:
> 
> EC_Telegram:: check_index(): Index field does not correspond with 
> received data.
> low_level_input(): framebuilding failed.
> 
> The cycle time of the task is not being violated AFAI can see.
> 
> In dmesg the following can be found when the conflict occurs
> 
> RTnet:rtskb allocation from real-time cache failed.
> Assertion failed! drivers/xenomai/can/rtcan_raw.c: rtcan_tx_push:168 
> dev->tx_socket=0 (3) TX skb still in use.

Hm, this is not supposed to happen.

> Can anyone make any suggestions as to what might be the problem here or 
> what I could try to look at to establish this?

Can you show the output of /proc/rtcan/devices and /proc/rtcan/sockets 
before and after the problem showed up.

Wolfgang.



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Xenomai-help] EML conflict with RTCAN? low_level_input framebuilding failed.
  2007-08-13 11:41 ` Wolfgang Grandegger
@ 2007-08-13 12:41   ` Roland Tollenaar
  2007-08-13 13:03     ` Wolfgang Grandegger
  0 siblings, 1 reply; 23+ messages in thread
From: Roland Tollenaar @ 2007-08-13 12:41 UTC (permalink / raw)
  To: Wolfgang Grandegger, Xenomai-help, EML users

Hi

>> RTnet:rtskb allocation from real-time cache failed.
>> Assertion failed! drivers/xenomai/can/rtcan_raw.c: rtcan_tx_push:168 
>> dev->tx_socket=0 (3) TX skb still in use.
> 
> Hm, this is not supposed to happen.
Which of the two?


> Can you show the output of /proc/rtcan/devices and /proc/rtcan/sockets 
> before and after the problem showed up.

Below is an accumulation of what I think you are asking for. I am not 
convinced that the rtskb allocation failed message is serious, as you 
will see from the syslog and my comment above it only takes place when i 
close my application. Although I try to close all connections neatly 
certain threads still seem to be busy. See the errors I get on closing 
the application.

App running with no problem:

root@domain.hid:~# cat /proc/rtcan/sockets
fd Name___________ Filter ErrMask RX_Timeout_ns TX_Timeout_ns RX_BufFull 
TX_Lo
  2 rtcan2               1 0x00000      infinite      infinite 
0     1
  0 rtcan2              -1 0x00000      infinite      infinite 
0     1

root@domain.hid:~# cat /proc/rtcan/devices
Name___________ _Baudrate State___ TX_Counter RX_Counter ____Errors
rtcan0          undefined stopped           0          0          0
rtcan1          undefined stopped           0          0          0
rtcan2            1000000 active     16321347   27633347    2367116


App running with messages failing

root@domain.hid# cat /proc/rtcan/sockets
fd Name___________ Filter ErrMask RX_Timeout_ns TX_Timeout_ns RX_BufFull 
TX_Lo
  2 rtcan2               1 0x00000      infinite      infinite 
0     1
  0 rtcan2              -1 0x00000      infinite      infinite 
0     1


root@domain.hid# cat /proc/rtcan/devices
Name___________ _Baudrate State___ TX_Counter RX_Counter ____Errors
rtcan0          undefined stopped           0          0          0
rtcan1          undefined stopped           0          0          0
rtcan2            1000000 active     16850473   28691571    2367116



cat /var/syslog shows that the error only seems to come up when the 
application closes.

Only occurs on closing the application
Aug 13 13:01:28 (none) kernel: RTnet: rtskb allocation from real-time 
cache failed
Aug 13 13:02:14 (none) kernel: RTnet: rtskb allocation from real-time 
cache failed
Aug 13 14:02:34 (none) kernel: RTnet: rtskb allocation from real-time 
cache failed
Aug 13 14:03:36 (none) kernel: RTnet: rtskb allocation from real-time 
cache failed
Aug 13 14:18:39 (none) kernel: RTnet: rtskb allocation from real-time 
cache failed
Aug 13 14:19:33 (none) kernel: RTnet: rtskb allocation from real-time 
cache failed
Aug 13 14:19:58 (none) kernel: RTnet: rtskb allocation from real-time 
cache failed
Aug 13 14:21:27 (none) kernel: RTnet: rtskb allocation from real-time 
cache failed
Aug 13 14:22:10 (none) kernel: RTnet: rtskb allocation from real-time 
cache failed


When I close the application I get these errors

rt_dev_recv: aborted because socket was closed
rt_dev_recv: aborted because socket was closed
rt_dev_recv: aborted because socket was closed
rt_dev_recv: aborted because socket was closed
rt_dev_recv: aborted because socket was closed
rt_dev_recv: aborted because socket was closed
rt_dev_recv: aborted because socket was closed
rt_dev_recv: aborted because socket was closed
rt_dev_recv: aborted because socket was closed
rt_dev_recv: aborted because socket was closed
rt_dev_recv: aborted because socket was closed
rt_dev_recv: aborted because socket was closed
rt_dev_recv: aborted because socket was closed
rt_dev_recv: aborted because socket was closed
rt_dev_recv: aborted because socket was closed
rt_dev_recv: aborted because socket was closed
rt_dev_ioctl: Bad file descriptor
Waiting for tasks to stop....low_level_output(): Cannot Send
low_level_output(): Cannot Send
low_level_output(): Cannot Send
low_level_output(): Cannot Send
low_level_output(): Cannot Send
low_level_output(): Cannot Send
low_level_output(): Cannot Send
low_level_output(): Cannot Send
low_level_output(): Cannot Send
low_level_output(): Cannot Send
low_level_txandrx: failed: MAX_TRIES_TX: Giving up
DLL::txandrx() Error
PD_Buffer: Error sending PD
txandrx failed:


Does this shed any light on the matter?


Roland


> 
> Wolfgang.
> 
> 


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Xenomai-help] EML conflict with RTCAN? low_level_input framebuilding failed.
  2007-08-13 12:41   ` Roland Tollenaar
@ 2007-08-13 13:03     ` Wolfgang Grandegger
  2007-08-13 13:11       ` Roland Tollenaar
  2007-08-13 14:00       ` Roland Tollenaar
  0 siblings, 2 replies; 23+ messages in thread
From: Wolfgang Grandegger @ 2007-08-13 13:03 UTC (permalink / raw)
  To: rolandtollenaar; +Cc: Xenomai-help, EML users

Roland Tollenaar wrote:
> Hi
> 
>>> RTnet:rtskb allocation from real-time cache failed.
>>> Assertion failed! drivers/xenomai/can/rtcan_raw.c: rtcan_tx_push:168 
>>> dev->tx_socket=0 (3) TX skb still in use.
>>
>> Hm, this is not supposed to happen.
> Which of the two?

The RTCAN assertion. Well, in fact, it can happen when the device goes 
bus-off or is stopped while a TX message is pending. The next message 
after (re-)start will the trigger this message. This is a bug but it 
should _not_ harm (either I remove the assertion or I reset properly the 
value of dev->tx_socket).

The first one should be pretty clear. The rtskb pool seems to be exhausted.

> 
>> Can you show the output of /proc/rtcan/devices and /proc/rtcan/sockets 
>> before and after the problem showed up.
> 
> Below is an accumulation of what I think you are asking for. I am not 
> convinced that the rtskb allocation failed message is serious, as you 
> will see from the syslog and my comment above it only takes place when i 
> close my application. Although I try to close all connections neatly 
> certain threads still seem to be busy. See the errors I get on closing 
> the application.
> 
> App running with no problem:
> 
> root@domain.hid:~# cat /proc/rtcan/sockets
> fd Name___________ Filter ErrMask RX_Timeout_ns TX_Timeout_ns RX_BufFull 
> TX_Lo
>  2 rtcan2               1 0x00000      infinite      infinite 0     1
>  0 rtcan2              -1 0x00000      infinite      infinite 0     1
> 
> root@domain.hid:~# cat /proc/rtcan/devices
> Name___________ _Baudrate State___ TX_Counter RX_Counter ____Errors
> rtcan0          undefined stopped           0          0          0
> rtcan1          undefined stopped           0          0          0
> rtcan2            1000000 active     16321347   27633347    2367116
> 
> 
> App running with messages failing
> 
> root@domain.hid# cat /proc/rtcan/sockets
> fd Name___________ Filter ErrMask RX_Timeout_ns TX_Timeout_ns RX_BufFull 
> TX_Lo
>  2 rtcan2               1 0x00000      infinite      infinite 0     1
>  0 rtcan2              -1 0x00000      infinite      infinite 0     1
> 
> 
> root@domain.hid# cat /proc/rtcan/devices
> Name___________ _Baudrate State___ TX_Counter RX_Counter ____Errors
> rtcan0          undefined stopped           0          0          0
> rtcan1          undefined stopped           0          0          0
> rtcan2            1000000 active     16850473   28691571    2367116

Oops, that much errors?

> cat /var/syslog shows that the error only seems to come up when the 
> application closes.
> 
> Only occurs on closing the application
> Aug 13 13:01:28 (none) kernel: RTnet: rtskb allocation from real-time 
> cache failed
> Aug 13 13:02:14 (none) kernel: RTnet: rtskb allocation from real-time 
> cache failed
> Aug 13 14:02:34 (none) kernel: RTnet: rtskb allocation from real-time 
> cache failed
> Aug 13 14:03:36 (none) kernel: RTnet: rtskb allocation from real-time 
> cache failed
> Aug 13 14:18:39 (none) kernel: RTnet: rtskb allocation from real-time 
> cache failed
> Aug 13 14:19:33 (none) kernel: RTnet: rtskb allocation from real-time 
> cache failed
> Aug 13 14:19:58 (none) kernel: RTnet: rtskb allocation from real-time 
> cache failed
> Aug 13 14:21:27 (none) kernel: RTnet: rtskb allocation from real-time 
> cache failed
> Aug 13 14:22:10 (none) kernel: RTnet: rtskb allocation from real-time 
> cache failed
> 
> 
> When I close the application I get these errors
> 
> rt_dev_recv: aborted because socket was closed
> rt_dev_recv: aborted because socket was closed
> rt_dev_recv: aborted because socket was closed
> rt_dev_recv: aborted because socket was closed
> rt_dev_recv: aborted because socket was closed
> rt_dev_recv: aborted because socket was closed
> rt_dev_recv: aborted because socket was closed
> rt_dev_recv: aborted because socket was closed
> rt_dev_recv: aborted because socket was closed
> rt_dev_recv: aborted because socket was closed
> rt_dev_recv: aborted because socket was closed
> rt_dev_recv: aborted because socket was closed
> rt_dev_recv: aborted because socket was closed
> rt_dev_recv: aborted because socket was closed
> rt_dev_recv: aborted because socket was closed
> rt_dev_recv: aborted because socket was closed

You should handle this error properly.

> rt_dev_ioctl: Bad file descriptor
> Waiting for tasks to stop....low_level_output(): Cannot Send
> low_level_output(): Cannot Send
> low_level_output(): Cannot Send
> low_level_output(): Cannot Send
> low_level_output(): Cannot Send
> low_level_output(): Cannot Send
> low_level_output(): Cannot Send
> low_level_output(): Cannot Send
> low_level_output(): Cannot Send
> low_level_output(): Cannot Send
> low_level_txandrx: failed: MAX_TRIES_TX: Giving up
> DLL::txandrx() Error
> PD_Buffer: Error sending PD
> txandrx failed:
> 
> 
> Does this shed any light on the matter?

Hm, seems that your shutdown is not implemented properly.

Wolfgang.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Xenomai-help] EML conflict with RTCAN? low_level_input framebuilding failed.
  2007-08-13 13:03     ` Wolfgang Grandegger
@ 2007-08-13 13:11       ` Roland Tollenaar
  2007-08-13 14:00       ` Roland Tollenaar
  1 sibling, 0 replies; 23+ messages in thread
From: Roland Tollenaar @ 2007-08-13 13:11 UTC (permalink / raw)
  To: Wolfgang Grandegger; +Cc: Xenomai-help, EML users

Hi Wolfgang,

> The RTCAN assertion. Well, in fact, it can happen when the device goes 
> bus-off or is stopped while a TX message is pending. The next message 
> after (re-)start will the trigger this message. This is a bug but it 
> should _not_ harm (either I remove the assertion or I reset properly the 
> value of dev->tx_socket).
Clear. Thanks.


> The first one should be pretty clear. The rtskb pool seems to be exhausted.
Sorry if this is not clear to me. What is the rtskb pool and what are 
the implications of it being full?

>>
>> root@domain.hid# cat /proc/rtcan/devices
>> Name___________ _Baudrate State___ TX_Counter RX_Counter ____Errors
>> rtcan0          undefined stopped           0          0          0
>> rtcan1          undefined stopped           0          0          0
>> rtcan2            1000000 active     16850473   28691571    2367116
> 
> Oops, that much errors?
eeuuh yes, I started up can after having had it disabled for a very long 
time while I was working on the ethercat. I seem to have forgotten that 
CAN is not wireless, forgot to plug in the bus. So I think that those 
errors were picked up then, they did not seem to increase later on.


>> rt_dev_recv: aborted because socket was closed
>> rt_dev_recv: aborted because socket was closed
>> rt_dev_recv: aborted because socket was closed
>> rt_dev_recv: aborted because socket was closed
> 
> You should handle this error properly.
You are right. I think I am not closing the threads in the correct 
sequence, not sure I know how to yet. But can this be the cause of my 
problem? Where is the conflict/ complication arising between rtcan and 
eml. I do understand that this is an almost impossible question to find 
an answer to over two separate lists. :(


> 
> Hm, seems that your shutdown is not implemented properly.
I'd say this assessment is rather accurate. Will look into it:)

Roland

> 
> Wolfgang.
> 


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Xenomai-help] EML conflict with RTCAN? low_level_input framebuilding failed.
  2007-08-13 13:03     ` Wolfgang Grandegger
  2007-08-13 13:11       ` Roland Tollenaar
@ 2007-08-13 14:00       ` Roland Tollenaar
  2007-08-13 14:51         ` [Xenomai-help] [Ethercatmaster-users] " Jan Kiszka
  1 sibling, 1 reply; 23+ messages in thread
From: Roland Tollenaar @ 2007-08-13 14:00 UTC (permalink / raw)
  To: Wolfgang Grandegger; +Cc: Xenomai-help, EML users

Hi,

All closing & shutting down has been perfected. There are no more errors 
on closing my application.

Yet the problem persists very explicitly. Rtcan and EML can run 
separately and never throw up any errors. As soon as they are used in 
combination then in 50% of the cases the framebuilding in EML gets 
messed up (as per the error message)

There is definitely something between the two that is not right.


>>>> RTnet:rtskb allocation from real-time cache failed.

Could I get some tips as to what I can do about this? I seem to get it 
even when I do not have rtcan activity running in my application and 
(because I am clueless) I would like to prevent this message which may 
signify the root of the problem.



Regards,

Roland.






>>> Hm, this is not supposed to happen.
>> Which of the two?
> 
> The RTCAN assertion. Well, in fact, it can happen when the device goes 
> bus-off or is stopped while a TX message is pending. The next message 
> after (re-)start will the trigger this message. This is a bug but it 
> should _not_ harm (either I remove the assertion or I reset properly the 
> value of dev->tx_socket).
> 
> The first one should be pretty clear. The rtskb pool seems to be exhausted.
> 
>>
>>> Can you show the output of /proc/rtcan/devices and 
>>> /proc/rtcan/sockets before and after the problem showed up.
>>
>> Below is an accumulation of what I think you are asking for. I am not 
>> convinced that the rtskb allocation failed message is serious, as you 
>> will see from the syslog and my comment above it only takes place when 
>> i close my application. Although I try to close all connections neatly 
>> certain threads still seem to be busy. See the errors I get on closing 
>> the application.
>>
>> App running with no problem:
>>
>> root@domain.hid:~# cat /proc/rtcan/sockets
>> fd Name___________ Filter ErrMask RX_Timeout_ns TX_Timeout_ns 
>> RX_BufFull TX_Lo
>>  2 rtcan2               1 0x00000      infinite      infinite 0     1
>>  0 rtcan2              -1 0x00000      infinite      infinite 0     1
>>
>> root@domain.hid:~# cat /proc/rtcan/devices
>> Name___________ _Baudrate State___ TX_Counter RX_Counter ____Errors
>> rtcan0          undefined stopped           0          0          0
>> rtcan1          undefined stopped           0          0          0
>> rtcan2            1000000 active     16321347   27633347    2367116
>>
>>
>> App running with messages failing
>>
>> root@domain.hid# cat /proc/rtcan/sockets
>> fd Name___________ Filter ErrMask RX_Timeout_ns TX_Timeout_ns 
>> RX_BufFull TX_Lo
>>  2 rtcan2               1 0x00000      infinite      infinite 0     1
>>  0 rtcan2              -1 0x00000      infinite      infinite 0     1
>>
>>
>> root@domain.hid# cat /proc/rtcan/devices
>> Name___________ _Baudrate State___ TX_Counter RX_Counter ____Errors
>> rtcan0          undefined stopped           0          0          0
>> rtcan1          undefined stopped           0          0          0
>> rtcan2            1000000 active     16850473   28691571    2367116
> 
> Oops, that much errors?
> 
>> cat /var/syslog shows that the error only seems to come up when the 
>> application closes.
>>
>> Only occurs on closing the application
>> Aug 13 13:01:28 (none) kernel: RTnet: rtskb allocation from real-time 
>> cache failed
>> Aug 13 13:02:14 (none) kernel: RTnet: rtskb allocation from real-time 
>> cache failed
>> Aug 13 14:02:34 (none) kernel: RTnet: rtskb allocation from real-time 
>> cache failed
>> Aug 13 14:03:36 (none) kernel: RTnet: rtskb allocation from real-time 
>> cache failed
>> Aug 13 14:18:39 (none) kernel: RTnet: rtskb allocation from real-time 
>> cache failed
>> Aug 13 14:19:33 (none) kernel: RTnet: rtskb allocation from real-time 
>> cache failed
>> Aug 13 14:19:58 (none) kernel: RTnet: rtskb allocation from real-time 
>> cache failed
>> Aug 13 14:21:27 (none) kernel: RTnet: rtskb allocation from real-time 
>> cache failed
>> Aug 13 14:22:10 (none) kernel: RTnet: rtskb allocation from real-time 
>> cache failed
>>
>>
>> When I close the application I get these errors
>>
>> rt_dev_recv: aborted because socket was closed
>> rt_dev_recv: aborted because socket was closed
>> rt_dev_recv: aborted because socket was closed
>> rt_dev_recv: aborted because socket was closed
>> rt_dev_recv: aborted because socket was closed
>> rt_dev_recv: aborted because socket was closed
>> rt_dev_recv: aborted because socket was closed
>> rt_dev_recv: aborted because socket was closed
>> rt_dev_recv: aborted because socket was closed
>> rt_dev_recv: aborted because socket was closed
>> rt_dev_recv: aborted because socket was closed
>> rt_dev_recv: aborted because socket was closed
>> rt_dev_recv: aborted because socket was closed
>> rt_dev_recv: aborted because socket was closed
>> rt_dev_recv: aborted because socket was closed
>> rt_dev_recv: aborted because socket was closed
> 
> You should handle this error properly.
> 
>> rt_dev_ioctl: Bad file descriptor
>> Waiting for tasks to stop....low_level_output(): Cannot Send
>> low_level_output(): Cannot Send
>> low_level_output(): Cannot Send
>> low_level_output(): Cannot Send
>> low_level_output(): Cannot Send
>> low_level_output(): Cannot Send
>> low_level_output(): Cannot Send
>> low_level_output(): Cannot Send
>> low_level_output(): Cannot Send
>> low_level_output(): Cannot Send
>> low_level_txandrx: failed: MAX_TRIES_TX: Giving up
>> DLL::txandrx() Error
>> PD_Buffer: Error sending PD
>> txandrx failed:
>>
>>
>> Does this shed any light on the matter?
> 
> Hm, seems that your shutdown is not implemented properly.
> 
> Wolfgang.
> 


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Xenomai-help] [Ethercatmaster-users] EML conflict with RTCAN? low_level_input framebuilding failed.
  2007-08-13 14:00       ` Roland Tollenaar
@ 2007-08-13 14:51         ` Jan Kiszka
  2007-08-13 15:55           ` Roland Tollenaar
  2007-08-14 13:56           ` Roland Tollenaar
  0 siblings, 2 replies; 23+ messages in thread
From: Jan Kiszka @ 2007-08-13 14:51 UTC (permalink / raw)
  To: rolandtollenaar; +Cc: Xenomai-help, EML users

[-- Attachment #1: Type: text/plain, Size: 1593 bytes --]

Roland Tollenaar wrote:
> Hi,
> 
> All closing & shutting down has been perfected. There are no more errors 
> on closing my application.
> 
> Yet the problem persists very explicitly. Rtcan and EML can run 
> separately and never throw up any errors. As soon as they are used in 
> combination then in 50% of the cases the framebuilding in EML gets 
> messed up (as per the error message)
> 
> There is definitely something between the two that is not right.
> 

In 9 of 10 cases (if not more): timing. Running both alone doesn't
expose some timing issue (race) or transient overload. I can't help with
EML complaints, maybe the FMTC guys have an idea what can trigger this
and how to debug it.

> 
>>>>> RTnet:rtskb allocation from real-time cache failed.
> 
> Could I get some tips as to what I can do about this? I seem to get it 
> even when I do not have rtcan activity running in my application and 
> (because I am clueless) I would like to prevent this message which may 
> signify the root of the problem.

You have created the socket for some/all EML activity from primary mode
of some Xenomai thread, thus network buffer allocation is ought to run
against the real-time rtskb pool - which is by default empty :p. See
README.pools from the RTnet documentation on this.

I don't have the EML design at hand, but you might be able to avoid this
by initialising before creating the shadow task or by explicitly
switching to secondary mode before initialising. [Sorry for this issue,
it's at least partly due to some outdated RTnet design.]

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 250 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Xenomai-help] [Ethercatmaster-users] EML conflict with RTCAN? low_level_input framebuilding failed.
  2007-08-13 14:51         ` [Xenomai-help] [Ethercatmaster-users] " Jan Kiszka
@ 2007-08-13 15:55           ` Roland Tollenaar
  2007-08-13 16:57             ` Jan Kiszka
  2007-08-14 13:56           ` Roland Tollenaar
  1 sibling, 1 reply; 23+ messages in thread
From: Roland Tollenaar @ 2007-08-13 15:55 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Xenomai-help, EML users

Hi

>> There is definitely something between the two that is not right.
>>
> 
> In 9 of 10 cases (if not more): timing. Running both alone doesn't
> expose some timing issue (race) or transient overload. I can't help with
> EML complaints, maybe the FMTC guys have an idea what can trigger this
> and how to debug it.
Out of interest, what timing exactly? These two systems (rtcan and eml) 
run separately, they don;t need to access the same address space or 
otherwise share resources that would require timing? What am I not 
understanding?



>>>>>> RTnet:rtskb allocation from real-time cache failed.
>> Could I get some tips as to what I can do about this? I seem to get it 
>> even when I do not have rtcan activity running in my application and 
>> (because I am clueless) I would like to prevent this message which may 
>> signify the root of the problem.
> 
> You have created the socket for some/all EML activity from primary mode
> of some Xenomai thread, 
100% correct.


thus network buffer allocation is ought to run
> against the real-time rtskb pool - which is by default empty :p. See
> README.pools from the RTnet documentation on this.


> I don't have the EML design at hand, but you might be able to avoid this
> by initialising before creating the shadow task or by explicitly
In fact this is what I tried initially. IT does not work at all. so I 
ended up initializing in the thread. Problem?

Is this allocation possibly the cause of the problem or is the rtnet 
warning harmless? At least it is not related to rtcan in any manner 
because it appears even if rtcan is not activated in the application.

Roland



> switching to secondary mode before initialising. [Sorry for this issue,
> it's at least partly due to some outdated RTnet design.]
> 
> Jan
> 


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Xenomai-help] [Ethercatmaster-users] EML conflict with RTCAN? low_level_input framebuilding failed.
  2007-08-13 15:55           ` Roland Tollenaar
@ 2007-08-13 16:57             ` Jan Kiszka
  2007-08-13 17:40               ` Roland Tollenaar
  0 siblings, 1 reply; 23+ messages in thread
From: Jan Kiszka @ 2007-08-13 16:57 UTC (permalink / raw)
  To: rolandtollenaar; +Cc: Xenomai-help, EML users

[-- Attachment #1: Type: text/plain, Size: 2453 bytes --]

Roland Tollenaar wrote:
> Hi
> 
>>> There is definitely something between the two that is not right.
>>>
>>
>> In 9 of 10 cases (if not more): timing. Running both alone doesn't
>> expose some timing issue (race) or transient overload. I can't help with
>> EML complaints, maybe the FMTC guys have an idea what can trigger this
>> and how to debug it.
> Out of interest, what timing exactly? These two systems (rtcan and eml)
> run separately, they don;t need to access the same address space or
> otherwise share resources that would require timing? What am I not
> understanding?

The share the same CPU? Varying the load can re-order the execution
order in otherwise independent components.

> 
> 
>>>>>>> RTnet:rtskb allocation from real-time cache failed.
>>> Could I get some tips as to what I can do about this? I seem to get
>>> it even when I do not have rtcan activity running in my application
>>> and (because I am clueless) I would like to prevent this message
>>> which may signify the root of the problem.
>>
>> You have created the socket for some/all EML activity from primary mode
>> of some Xenomai thread, 
> 100% correct.
> 
> 
> thus network buffer allocation is ought to run
>> against the real-time rtskb pool - which is by default empty :p. See
>> README.pools from the RTnet documentation on this.
> 
> 
>> I don't have the EML design at hand, but you might be able to avoid this
>> by initialising before creating the shadow task or by explicitly
> In fact this is what I tried initially. IT does not work at all. so I
> ended up initializing in the thread. Problem?

Not necessarily. But it would have been nice to report the other issue
as well, because maybe there is something to be fixed (either in the
code or in the docs). Initialisation almost always happens in non-RT
context, and you shouldn't be force to do this under RT constraints. If
this is an RTnet and/or EML problem, please report it on the related lists!

> 
> Is this allocation possibly the cause of the problem or is the rtnet
> warning harmless? At least it is not related to rtcan in any manner
> because it appears even if rtcan is not activated in the application.

Did you set the rtskb_cache_size module parameter for the rtnet.ko? Did
you choose it appropriately large so that buffer pool do not exhaust if
RTnet is blocked by other system activity? Again, check the documentation.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 250 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Xenomai-help] [Ethercatmaster-users] EML conflict with RTCAN? low_level_input framebuilding failed.
  2007-08-13 16:57             ` Jan Kiszka
@ 2007-08-13 17:40               ` Roland Tollenaar
  2007-08-13 17:57                 ` Jan Kiszka
  0 siblings, 1 reply; 23+ messages in thread
From: Roland Tollenaar @ 2007-08-13 17:40 UTC (permalink / raw)
  To: Jan Kiszka, Xenomai-help, rtnet-users, EML users

Hi Jan,


>> thus network buffer allocation is ought to run
>>> against the real-time rtskb pool - which is by default empty :p. See
>>> README.pools from the RTnet documentation on this.
I read this documentation. Together with an archive email of this list I 
  understand that if I load rtnet.ko like

insmod rtnet.ko rtskb_cache_size=64

(for the benfit of other poor souls in the future :))

it should help. And it does make a huge difference. Now instead of not 
giving a problem 1 out of 5 times its more like giving a problem 1 every 
10 times.

The 64 is a value I got from the mailing list. How large can I make this 
and what am I compromising?


>>> I don't have the EML design at hand, but you might be able to avoid this
>>> by initialising before creating the shadow task or by explicitly
>> In fact this is what I tried initially. IT does not work at all. so I
>> ended up initializing in the thread. Problem?
> 
> Not necessarily. But it would have been nice to report the other issue
> as well, because maybe there is something to be fixed (either in the
> code or in the docs). Initialisation almost always happens in non-RT
> context, and you shouldn't be force to do this under RT constraints. If
> this is an RTnet and/or EML problem, please report it on the related lists!
Will do so with your compliments and regards. :) I tried to initialize 
like I initialize rtcan in non-rt but it really does not work.


> Did you set the rtskb_cache_size module parameter for the rtnet.ko? Did
> you choose it appropriately large so that buffer pool do not exhaust if
> RTnet is blocked by other system activity? Again, check the documentation.
As stated, this seems to mitigate the problem. What is not clear to me 
is why the default of the rtskb pool is zero?

Roland.

> 
> Jan
> 


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Xenomai-help] [Ethercatmaster-users] EML conflict with RTCAN? low_level_input framebuilding failed.
  2007-08-13 17:40               ` Roland Tollenaar
@ 2007-08-13 17:57                 ` Jan Kiszka
  2007-08-13 18:17                   ` Roland Tollenaar
  0 siblings, 1 reply; 23+ messages in thread
From: Jan Kiszka @ 2007-08-13 17:57 UTC (permalink / raw)
  To: rolandtollenaar; +Cc: Xenomai-help, EML users, rtnet-users

[-- Attachment #1: Type: text/plain, Size: 2412 bytes --]

Roland Tollenaar wrote:
> Hi Jan,
> 
> 
>>> thus network buffer allocation is ought to run
>>>> against the real-time rtskb pool - which is by default empty :p. See
>>>> README.pools from the RTnet documentation on this.
> I read this documentation. Together with an archive email of this list I
>  understand that if I load rtnet.ko like
> 
> insmod rtnet.ko rtskb_cache_size=64
> 
> (for the benfit of other poor souls in the future :))
> 
> it should help. And it does make a huge difference. Now instead of not
> giving a problem 1 out of 5 times its more like giving a problem 1 every
> 10 times.
> 
> The 64 is a value I got from the mailing list. How large can I make this
> and what am I compromising?

Each buffer is slightly more than 1.5 KB heavy. Do your maths :). How
many buffers you need depend on how many incoming and outgoing frames
might be queued into they are processed. And that depends on the frame
rate and the time your EML stack has to handle it in the worst case. I
can't give you numbers on this, that depends on _your_ setup.

> 
> 
>>>> I don't have the EML design at hand, but you might be able to avoid
>>>> this
>>>> by initialising before creating the shadow task or by explicitly
>>> In fact this is what I tried initially. IT does not work at all. so I
>>> ended up initializing in the thread. Problem?
>>
>> Not necessarily. But it would have been nice to report the other issue
>> as well, because maybe there is something to be fixed (either in the
>> code or in the docs). Initialisation almost always happens in non-RT
>> context, and you shouldn't be force to do this under RT constraints. If
>> this is an RTnet and/or EML problem, please report it on the related
>> lists!
> Will do so with your compliments and regards. :) I tried to initialize
> like I initialize rtcan in non-rt but it really does not work.

That sounds like a bug - of what component soever.

> 
> 
>> Did you set the rtskb_cache_size module parameter for the rtnet.ko? Did
>> you choose it appropriately large so that buffer pool do not exhaust if
>> RTnet is blocked by other system activity? Again, check the
>> documentation.
> As stated, this seems to mitigate the problem. What is not clear to me
> is why the default of the rtskb pool is zero?

Because you _normally_ don't need it and would thus wast the allocated
memory.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 250 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Xenomai-help] [Ethercatmaster-users] EML conflict with RTCAN? low_level_input framebuilding failed.
  2007-08-13 17:57                 ` Jan Kiszka
@ 2007-08-13 18:17                   ` Roland Tollenaar
  2007-08-13 18:30                     ` Jan Kiszka
  0 siblings, 1 reply; 23+ messages in thread
From: Roland Tollenaar @ 2007-08-13 18:17 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Xenomai-help, EML users, rtnet-users

Hi,

>> The 64 is a value I got from the mailing list. How large can I make this
>> and what am I compromising?
> 
> Each buffer is slightly more than 1.5 KB heavy. Do your maths :). How
> many buffers you need depend on how many incoming and outgoing frames
> might be queued into they are processed. And that depends on the frame
> rate and the time your EML stack has to handle it in the worst case. I
> can't give you numbers on this, that depends on _your_ setup.

That much is clear. Will make it big out of shear inability to 
calculate. The lost memory is of almost no concern.

Thanks.

Roland


> 
> Jan
> 


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Xenomai-help] [Ethercatmaster-users] EML conflict with RTCAN? low_level_input framebuilding failed.
  2007-08-13 18:17                   ` Roland Tollenaar
@ 2007-08-13 18:30                     ` Jan Kiszka
  0 siblings, 0 replies; 23+ messages in thread
From: Jan Kiszka @ 2007-08-13 18:30 UTC (permalink / raw)
  To: rolandtollenaar; +Cc: Xenomai-help, EML users, rtnet-users

[-- Attachment #1: Type: text/plain, Size: 1077 bytes --]

Roland Tollenaar wrote:
> Hi,
> 
>>> The 64 is a value I got from the mailing list. How large can I make this
>>> and what am I compromising?
>>
>> Each buffer is slightly more than 1.5 KB heavy. Do your maths :). How
>> many buffers you need depend on how many incoming and outgoing frames
>> might be queued into they are processed. And that depends on the frame

s/into/until/ (my brain-based dictionary must be broken)

>> rate and the time your EML stack has to handle it in the worst case. I
>> can't give you numbers on this, that depends on _your_ setup.
> 
> That much is clear. Will make it big out of shear inability to
> calculate. The lost memory is of almost no concern.

Just make sure that picking an arbitrary large pool size doesn't paper
over some real system design issue that may manifests in huge latencies.
Again, I don't know your numbers, so I cannot tell what is reasonable
and what an indication of a problem. A system-level analysis of the
event flows would be a good job for LTTng now - if it only worked already...

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 250 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Xenomai-help] [Ethercatmaster-users] EML conflict with RTCAN? low_level_input framebuilding failed.
  2007-08-13 14:51         ` [Xenomai-help] [Ethercatmaster-users] " Jan Kiszka
  2007-08-13 15:55           ` Roland Tollenaar
@ 2007-08-14 13:56           ` Roland Tollenaar
  2007-08-14 14:47             ` Klaas Gadeyne
  1 sibling, 1 reply; 23+ messages in thread
From: Roland Tollenaar @ 2007-08-14 13:56 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Xenomai-help, EML users, rtnet-users

Hi,

Some new insight:

I have included the rtnet list because I think this is of interest and I 
don;t know exactly where the problem arises. Most likely EML but which 
goes unnecessarily haywire when xenomai has a latency glitch which seems 
to be caused by rtcan but may also be rtnet (so the rest of you can stop 
reading here :) )

To refresh, I am running EML and rtcan together and separately they 
appear to function perfectly but when combined, EML sometimes starts 
ejecting the error:

EC_Telegram::index_check() : index not the same,
low_level_input() :  framebuilding failed.

Or something pretty closely to that effect.

Despite EML warning that the frame could not be built, the inputs (which 
rely on the framebuilding being succesful pretty stringently) seem to 
function perfectly. It seems as though EML is emitting a false warning.

I have now killed the warning from EML with the following hack:

In EC_Telegram::check_index() I have effectively killed the check by 
snubbing the holler and always returning true. Now the application no 
longer emits warnings and everything functions well (or so it seems). I 
have a sawtooth analog output on a scope which triggers well. It 
fluttered sometimes when EML emitted the warning. Presumably the 
fluttering is caused by some latency which I thought might be the result 
of EML emitting the warnings. However even without the warnings from EML 
this fluttering incidentally takes place. Generally I can increase the 
chance of this flutter (I will confirm this with a latency test later 
on) by clicking about in the user interface of my application (QT).


So EML seems to go haywire when there is latency and there seems to be 
latency when I am using rtcan. (Bit of latency can also be caused by 
clicking about but I think it is mainly rtcan causing the latency spikes 
and the clicking about just knocks it over the edge more often.)

I have also made the index that is supposedly not the same visible. For 
the EML chaps: index and m_idx and the output seems to be something like 
this

index	m_idx
0	1
0	0
1	2
2	3
2	3
3	4
4	5
:	:
:	:
253	254
0	2

or something to that effect. Can anyone comment on this?


P.S.

Everything else is sorted out now, application closes neatly all 
sockets, no buffer overruns, no errors in syslog etc.

Also I have managed to get the initialization of the socket to the non 
real-time context but the problem persists in exactly the same manner.

I have increased the rtskbf_cache_size. The problem occurs less 
frequently but certainly does not subside completely. Irrespective of 
how big I make it after that.  There is no mention of any problem in 
this regard in any of the logs anymore (dmesg syslog etc)

Roland.









These are the comments that have been made which may be relevant:



Jan Kiszka wrote:
> In 9 of 10 cases (if not more): timing. Running both alone doesn't
> expose some timing issue (race) or transient overload. I can't help with
> EML complaints, maybe the FMTC guys have an idea what can trigger this
> and how to debug it.

>>>>>> RTnet:rtskb allocation from real-time cache failed.

> You have created the socket for some/all EML activity from primary mode
> of some Xenomai thread, thus network buffer allocation is ought to run
> against the real-time rtskb pool - which is by default empty :p. See
> README.pools from the RTnet documentation on this.

Although this was a problem


> 
> I don't have the EML design at hand, but you might be able to avoid this
> by initialising before creating the shadow task or by explicitly
> switching to secondary mode before initialising. [Sorry for this issue,
> it's at least partly due to some outdated RTnet design.]
> 
> Jan
> 


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Xenomai-help] [Ethercatmaster-users] EML conflict with RTCAN? low_level_input framebuilding failed.
  2007-08-14 13:56           ` Roland Tollenaar
@ 2007-08-14 14:47             ` Klaas Gadeyne
  2007-08-14 18:03               ` Roland Tollenaar
  0 siblings, 1 reply; 23+ messages in thread
From: Klaas Gadeyne @ 2007-08-14 14:47 UTC (permalink / raw)
  To: rolandtollenaar; +Cc: Xenomai-help, EML users, rtnet-users

On Tue, 14 Aug 2007, Roland Tollenaar wrote:
> To refresh, I am running EML and rtcan together and separately they
> appear to function perfectly but when combined, EML sometimes starts
> ejecting the error:

How many threads do you have sending process data, and what are there
priorities? (/proc/xenomai/sched IIRC)

> EC_Telegram::index_check() : index not the same,
> low_level_input() :  framebuilding failed.
>
> Or something pretty closely to that effect.
>
> Despite EML warning that the frame could not be built, the inputs (which
> rely on the framebuilding being succesful pretty stringently) seem to
> function perfectly. It seems as though EML is emitting a false warning.

What do you mean with "inputs functioning perfectly"?

> I have now killed the warning from EML with the following hack:
>
> In EC_Telegram::check_index() I have effectively killed the check by
> snubbing the holler and always returning true. Now the application no
> longer emits warnings and everything functions well (or so it seems). I
> have a sawtooth analog output on a scope which triggers well.

Should I have read s/input/output/g in the above?

> It
> fluttered sometimes when EML emitted the warning. Presumably the
> fluttering is caused by some latency which I thought might be the result
> of EML emitting the warnings. However even without the warnings from EML
> this fluttering incidentally takes place. Generally I can increase the
> chance of this flutter (I will confirm this with a latency test later
> on) by clicking about in the user interface of my application (QT).
>
>
> So EML seems to go haywire when there is latency and there seems to be
> latency when I am using rtcan. (Bit of latency can also be caused by
> clicking about but I think it is mainly rtcan causing the latency spikes
> and the clicking about just knocks it over the edge more often.)
>
> I have also made the index that is supposedly not the same visible. For
> the EML chaps: index and m_idx and the output seems to be something like
> this
>
> index	m_idx
> 0	1
> 0	0
> 1	2
> 2	3
> 2	3
> 3	4
> 4	5
> :	:
> :	:
> 253	254
> 0	2
>
> or something to that effect. Can anyone comment on this?

Is the above the index in the output captured with wireshark or
something else?

AFAIS from the code shouldn't be affected by latency of the PD
thread.  You might uncomment the following log statements to get more
info too (I wonder why they are commented out anyway, Tom?)

static bool ec_rtdm_txandrx(struct EtherCAT_Frame * frame, struct
netif * netif) {
         int tries = 0;
         while (tries < MAX_TRIES_TX) {
                 pthread_mutex_lock (&txandrx_mut);
                 if (low_level_output(frame,netif)){
                         if (low_level_input(frame,netif)){
                                 pthread_mutex_unlock(&txandrx_mut);
                                 return true;
                         }
                         else{
                                 //log(EC_LOG_ERROR,
 				"low_level_txandrx: receiving
 				failed\n");
                                 pthread_mutex_unlock(&txandrx_mut);
                         }
                 }
                 else{
                         //log(EC_LOG_ERROR, "low_level_txandrx:
 			sending failed\n");
                         pthread_mutex_unlock(&txandrx_mut);
                 }
                 tries++;
         }
         log(EC_LOG_FATAL, "low_level_txandrx: failed: MAX_TRIES_TX:
 	Giving up\n");
         return false;

Klaas



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Xenomai-help] [Ethercatmaster-users] EML conflict with RTCAN? low_level_input framebuilding failed.
  2007-08-14 14:47             ` Klaas Gadeyne
@ 2007-08-14 18:03               ` Roland Tollenaar
  2007-08-14 19:17                 ` Jan Kiszka
  0 siblings, 1 reply; 23+ messages in thread
From: Roland Tollenaar @ 2007-08-14 18:03 UTC (permalink / raw)
  To: Klaas Gadeyne; +Cc: Xenomai-help, EML users, rtnet-users

Hi,

> How many threads do you have sending process data, and what are there
> priorities? (/proc/xenomai/sched IIRC)
I have 3 rt tasks running. Only one sends and receives process data. The 
priorities are:
rt_task1 99
rt_task2 75
rt_task3 1

Period times are
1ms
3ms
indefinite(holds a blocking rt_can recv call to catch any incoming CAN 
messages)


> 
>> EC_Telegram::index_check() : index not the same,
>> low_level_input() :  framebuilding failed.
>>
>> Or something pretty closely to that effect.
>>
>> Despite EML warning that the frame could not be built, the inputs (which
>> rely on the framebuilding being succesful pretty stringently) seem to
>> function perfectly. It seems as though EML is emitting a false warning.
> 
> What do you mean with "inputs functioning perfectly"?

The digital inputs are packed into the frame as are the digital outputs 
and analog output process data. The outputs function as they should but 
the warning complains mainly about the retrieving part of the ethercat 
cycle. Hence my comment that the digital inputs also function as they 
should that is to say the data arrives correctly and uncorrupted. AFAI 
understand from ETG the index is not changed by the ESC's so I would 
expect the check always return true. But even if it does not what does 
that mean? Can it mean that EML is losing some frames that have been 
transmitted? I.e. the index is incremented with every transmit and the 
message with the same index is expected on the next read but instead it 
is only getting one later? If so, what could cause this?

>> I have now killed the warning from EML with the following hack:
>>
>> In EC_Telegram::check_index() I have effectively killed the check by
>> snubbing the holler and always returning true. Now the application no
>> longer emits warnings and everything functions well (or so it seems). I
>> have a sawtooth analog output on a scope which triggers well.
> 
> Should I have read s/input/output/g in the above?
No, output. The output is incremented up in task 1 and reset to -10V 
when it reaches 10V. The steps take place at exactly 1ms with impressive 
accuracy and consistency. This saw tooth wave is so stable that my scope 
has no problem locking onto it and it remains stationary (allowing me to 
admire the accurate 1ms steps :) ). Now and then if I try hard the wave 
flimmers i.e. the triggering is lost) which I think indicates that the 
1ms task did not increment the analog output on time. Whatever happens 
to cause this (rtcan ?, we know that it displays some latency breaking 
behaviour when the buffer is full. But there is no evidence that there 
is any buffer overflow at present anymore) might make an ethernet frame 
go lost Ah!, a delayed read **** which causes the index shift and 
consequently the irritating messages.

**** It occurs to me to ask whether there is an incomming buffer for 
ethercat frames that is maybe read out in such a manner (e.g. FIFO) that 
if a message can get "buried" and once a slip has occurred the index 
shift stays resident. ?? If anyone would care to enlighten me on how 
this part of EML works and whether this hypothesis is a possibility or 
not I would be much obliged.


>> I have also made the index that is supposedly not the same visible. For
>> the EML chaps: index and m_idx and the output seems to be something like
>> this
>>
>> index    m_idx
>> 0    1
>> 0    0
>> 1    2
>> 2    3
>> 2    3
>> 3    4
>> 4    5
>> :    :
>> :    :
>> 253    254
>> 0    2
>>
>> or something to that effect. Can anyone comment on this?
> 
> Is the above the index in the output captured with wireshark or
> something else?
No EML and switches are no friends of each other when the frame gets 
long I suspect. EML bombs out when I introduce a switch. I would be much 
obliged to anyone who would tell me how to set up any timeout delay 
measurement in EML.
What I did was simply put a printf line into EML to output index and 
m_idx to screen. So unfortunately this does not tell us where the shift 
is acquired.


Thanks. I'll give below a try.

Roland.

> 
> AFAIS from the code shouldn't be affected by latency of the PD
> thread.  You might uncomment the following log statements to get more
> info too (I wonder why they are commented out anyway, Tom?)
> 
> static bool ec_rtdm_txandrx(struct EtherCAT_Frame * frame, struct
> netif * netif) {
>         int tries = 0;
>         while (tries < MAX_TRIES_TX) {
>                 pthread_mutex_lock (&txandrx_mut);
>                 if (low_level_output(frame,netif)){
>                         if (low_level_input(frame,netif)){
>                                 pthread_mutex_unlock(&txandrx_mut);
>                                 return true;
>                         }
>                         else{
>                                 //log(EC_LOG_ERROR,
>                 "low_level_txandrx: receiving
>                 failed\n");
>                                 pthread_mutex_unlock(&txandrx_mut);
>                         }
>                 }
>                 else{
>                         //log(EC_LOG_ERROR, "low_level_txandrx:
>             sending failed\n");
>                         pthread_mutex_unlock(&txandrx_mut);
>                 }
>                 tries++;
>         }
>         log(EC_LOG_FATAL, "low_level_txandrx: failed: MAX_TRIES_TX:
>     Giving up\n");
>         return false;
> 
> Klaas
> 
> 


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Xenomai-help] [Ethercatmaster-users] EML conflict with RTCAN? low_level_input framebuilding failed.
  2007-08-14 18:03               ` Roland Tollenaar
@ 2007-08-14 19:17                 ` Jan Kiszka
  2007-08-15  6:11                   ` Roland Tollenaar
  0 siblings, 1 reply; 23+ messages in thread
From: Jan Kiszka @ 2007-08-14 19:17 UTC (permalink / raw)
  To: rolandtollenaar; +Cc: EML users, rtnet-users, Xenomai-help

[-- Attachment #1: Type: text/plain, Size: 2555 bytes --]

Roland Tollenaar wrote:
> Hi,
> 
>> How many threads do you have sending process data, and what are there
>> priorities? (/proc/xenomai/sched IIRC)
> I have 3 rt tasks running. Only one sends and receives process data. The 
> priorities are:
> rt_task1 99

Check the /proc output again, there should be also RTnet's stack manager
at prio 98. Maybe that is too low for your scenario and causes prio
inversions (note: every incoming Ethernet frame goes through its hands).
Try lowering the prio of your rt_task1 beneath 98.

> rt_task2 75
> rt_task3 1
> 
> Period times are
> 1ms
> 3ms
> indefinite(holds a blocking rt_can recv call to catch any incoming CAN 
> messages)
> 
> 
>>> EC_Telegram::index_check() : index not the same,
>>> low_level_input() :  framebuilding failed.
>>>
>>> Or something pretty closely to that effect.
>>>
>>> Despite EML warning that the frame could not be built, the inputs (which
>>> rely on the framebuilding being succesful pretty stringently) seem to
>>> function perfectly. It seems as though EML is emitting a false warning.
>> What do you mean with "inputs functioning perfectly"?
> 
> The digital inputs are packed into the frame as are the digital outputs 
> and analog output process data. The outputs function as they should but 
> the warning complains mainly about the retrieving part of the ethercat 
> cycle. Hence my comment that the digital inputs also function as they 
> should that is to say the data arrives correctly and uncorrupted. AFAI 
> understand from ETG the index is not changed by the ESC's so I would 
> expect the check always return true. But even if it does not what does 
> that mean? Can it mean that EML is losing some frames that have been 
> transmitted? I.e. the index is incremented with every transmit and the 
> message with the same index is expected on the next read but instead it 
> is only getting one later? If so, what could cause this?

If the problem persists (or your _really_ want to understand what
happens), you could try to put an xntrace_user_freeze(0, 1) before the
line which emits that EML warning, turn on the I-pipe tracer, set a
large back_trace_points value (a few thousand), enable verbose mode, and
grab what /proc/ipipe/trace/frozen reports after the hick-up. See [1]
for more howtos.

If you post the dump, we may be able to analyse what the system is doing
before the problem report, if there are long delays due to high-prio
tasks e.g.

Jan

[1] http://www.xenomai.org/index.php/I-pipe:Tracer


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 250 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Xenomai-help] [Ethercatmaster-users] EML conflict with RTCAN? low_level_input framebuilding failed.
  2007-08-14 19:17                 ` Jan Kiszka
@ 2007-08-15  6:11                   ` Roland Tollenaar
  2007-08-15  8:24                     ` Jan Kiszka
  0 siblings, 1 reply; 23+ messages in thread
From: Roland Tollenaar @ 2007-08-15  6:11 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: EML users, rtnet-users, Xenomai-help

Hi,

> Check the /proc output again, there should be also RTnet's stack manager
> at prio 98. Maybe that is too low for your scenario and causes prio
> inversions (note: every incoming Ethernet frame goes through its hands).
> Try lowering the prio of your rt_task1 beneath 98.

Thanks. This seems to have made a big improvement. I have so far not 
once detected the scope to loose lock on the sawtooth when the 
index_check in eml is still disabled. Before lowering the priority of my 
task (to 97) I could still invoke what I suspect to be a latency spike.

If the index_check is enabled I now mostly have less problems too. There 
is a chance in start-up of the application that there is a latency spike 
and then the warning kicks in. Due to the fact that the shift is 
permanent, the error is persistent and this then destabilizes the 
sawtooth a bit.

I will keep the check disabled but for the EML chaps I do think this is 
a point of interest. I would be very interested how this index shift 
occurs and why it is persistent after occurring once.

Sorry for the pragmatic qualifications here but in the end its the 
stability of the outputs that will determine the behaviour of the 
machine so its not a bad way to assess the software. :)

> If the problem persists (or your _really_ want to understand what
> happens), you could try to put an xntrace_user_freeze(0, 1) before the
> line which emits that EML warning, turn on the I-pipe tracer, set a
> large back_trace_points value (a few thousand), enable verbose mode, and
> grab what /proc/ipipe/trace/frozen reports after the hick-up. See [1]
> for more howtos.

Done this before so it should not be a problem. Don't think it is 
necessary quite yet as the behaviour at the moment looks good.


Regards,

Roland.

> 
> If you post the dump, we may be able to analyse what the system is doing
> before the problem report, if there are long delays due to high-prio
> tasks e.g.
> 
> Jan
> 
> [1] http://www.xenomai.org/index.php/I-pipe:Tracer
> 


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Xenomai-help] [Ethercatmaster-users] EML conflict with RTCAN? low_level_input framebuilding failed.
  2007-08-15  6:11                   ` Roland Tollenaar
@ 2007-08-15  8:24                     ` Jan Kiszka
  2007-08-15  8:37                       ` Roland Tollenaar
  2007-08-15  9:50                       ` Roland Tollenaar
  0 siblings, 2 replies; 23+ messages in thread
From: Jan Kiszka @ 2007-08-15  8:24 UTC (permalink / raw)
  To: rolandtollenaar; +Cc: EML users, rtnet-users, Xenomai-help

[-- Attachment #1: Type: text/plain, Size: 2292 bytes --]

Roland Tollenaar wrote:
> Hi,
> 
>> Check the /proc output again, there should be also RTnet's stack manager
>> at prio 98. Maybe that is too low for your scenario and causes prio
>> inversions (note: every incoming Ethernet frame goes through its hands).
>> Try lowering the prio of your rt_task1 beneath 98.
> 
> Thanks. This seems to have made a big improvement. I have so far not
> once detected the scope to loose lock on the sawtooth when the
> index_check in eml is still disabled. Before lowering the priority of my
> task (to 97) I could still invoke what I suspect to be a latency spike.
> 
> If the index_check is enabled I now mostly have less problems too. There
> is a chance in start-up of the application that there is a latency spike
> and then the warning kicks in. Due to the fact that the shift is
> permanent, the error is persistent and this then destabilizes the
> sawtooth a bit.

Hmm, this doesn't convince me yet. Such skews during startup may as well
be triggered by unusual load during runtime (non-RT activity or new RT
components). Did you put your system under adequate non-RT load as well
while measuring the outputs?

> 
> I will keep the check disabled but for the EML chaps I do think this is
> a point of interest. I would be very interested how this index shift
> occurs and why it is persistent after occurring once.
> 
> Sorry for the pragmatic qualifications here but in the end its the
> stability of the outputs that will determine the behaviour of the
> machine so its not a bad way to assess the software. :)

A problem isn't solved until it is also understood.

> 
>> If the problem persists (or your _really_ want to understand what
>> happens), you could try to put an xntrace_user_freeze(0, 1) before the
>> line which emits that EML warning, turn on the I-pipe tracer, set a
>> large back_trace_points value (a few thousand), enable verbose mode, and
>> grab what /proc/ipipe/trace/frozen reports after the hick-up. See [1]
>> for more howtos.
> 
> Done this before so it should not be a problem. Don't think it is

In that case, I would even more suggest to collect the data, maybe now
about the fragile startup case.

> necessary quite yet as the behaviour at the moment looks good.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 250 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Xenomai-help] [Ethercatmaster-users] EML conflict with RTCAN? low_level_input framebuilding failed.
  2007-08-15  8:24                     ` Jan Kiszka
@ 2007-08-15  8:37                       ` Roland Tollenaar
  2007-08-15  9:50                       ` Roland Tollenaar
  1 sibling, 0 replies; 23+ messages in thread
From: Roland Tollenaar @ 2007-08-15  8:37 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: EML users, rtnet-users, Xenomai-help

> Hmm, this doesn't convince me yet. Such skews during startup may as well
> be triggered by unusual load during runtime (non-RT activity or new RT
> components). Did you put your system under adequate non-RT load as well
> while measuring the outputs?
could you please just remind me how to do this again? OR can i just run 
the latency test, it has dummy loading in it does it not?

>> Sorry for the pragmatic qualifications here but in the end its the
>> stability of the outputs that will determine the behaviour of the
>> machine so its not a bad way to assess the software. :)
> 
> A problem isn't solved until it is also understood.
You are so right. :(


> 
>>> If the problem persists (or your _really_ want to understand what
>>> happens), you could try to put an xntrace_user_freeze(0, 1) before the
>>> line which emits that EML warning, turn on the I-pipe tracer, set a
>>> large back_trace_points value (a few thousand), enable verbose mode, and
>>> grab what /proc/ipipe/trace/frozen reports after the hick-up. See [1]
>>> for more howtos.
>> Done this before so it should not be a problem. Don't think it is
> 
> In that case, I would even more suggest to collect the data, maybe now
> about the fragile startup case.

Have got it on my todo list. :)

Roland.


> 
>> necessary quite yet as the behaviour at the moment looks good.
> 
> Jan
> 


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Xenomai-help] [Ethercatmaster-users] EML conflict with RTCAN? low_level_input framebuilding failed.
  2007-08-15  8:24                     ` Jan Kiszka
  2007-08-15  8:37                       ` Roland Tollenaar
@ 2007-08-15  9:50                       ` Roland Tollenaar
  2007-08-15 10:30                         ` Wolfgang Grandegger
  1 sibling, 1 reply; 23+ messages in thread
From: Roland Tollenaar @ 2007-08-15  9:50 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: EML users, rtnet-users, Xenomai-help

Hi,

Some more interesting findings (no i-pipe trace yet though).

> Hmm, this doesn't convince me yet. Such skews during startup may as well
> be triggered by unusual load during runtime (non-RT activity or new RT
> components). Did you put your system under adequate non-RT load as well
> while measuring the outputs?
Running latencytest with my application shows an average latency of 
about 40 and a max of 200ns. This was rather shocking so I turned off 
rtcan in my application. Now the max latecy is 60ns. Turn off EML and 
turn on rtcan, max latecy is 230ns. How is that for strange? But since I 
can see the scope output bobbing with 200ns during the latency test, I 
can also see that if I run my application without the latency test the 
huge max latency disappears entirely. Maybe it is time for the trace but 
then again I am still using CAN over the parallel port so will see what 
it does on a machine with a PCI CAN adaptor first. Because I think I 
know what happens: Due to the external loading the CAN recv interrupt 
triggers the Rx ISR briefly before the 1ms task period ends. Due to the 
priority of the ISR (huge debate over this) and its atomicness (if I 
remember correctly) the reading out of the slow hardware delays the 
start of the new task period.

Just thought it was interesting to mention. Btw when the latency appears 
there are no overflow messages or anything like that which support the 
theory I have about the cause.

Btw2 the 200ns latency spikes do not cause the scope to loose lock on 
the saw-tooth so whatever causes that problem is of a different nature 
still.

Regards,

Roland.



> 
>> I will keep the check disabled but for the EML chaps I do think this is
>> a point of interest. I would be very interested how this index shift
>> occurs and why it is persistent after occurring once.
>>
>> Sorry for the pragmatic qualifications here but in the end its the
>> stability of the outputs that will determine the behaviour of the
>> machine so its not a bad way to assess the software. :)
> 
> A problem isn't solved until it is also understood.
> 
>>> If the problem persists (or your _really_ want to understand what
>>> happens), you could try to put an xntrace_user_freeze(0, 1) before the
>>> line which emits that EML warning, turn on the I-pipe tracer, set a
>>> large back_trace_points value (a few thousand), enable verbose mode, and
>>> grab what /proc/ipipe/trace/frozen reports after the hick-up. See [1]
>>> for more howtos.
>> Done this before so it should not be a problem. Don't think it is
> 
> In that case, I would even more suggest to collect the data, maybe now
> about the fragile startup case.
> 
>> necessary quite yet as the behaviour at the moment looks good.
> 
> Jan
> 


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Xenomai-help] [Ethercatmaster-users] EML conflict with RTCAN? low_level_input framebuilding failed.
  2007-08-15  9:50                       ` Roland Tollenaar
@ 2007-08-15 10:30                         ` Wolfgang Grandegger
  2007-08-15 10:30                           ` Roland Tollenaar
  0 siblings, 1 reply; 23+ messages in thread
From: Wolfgang Grandegger @ 2007-08-15 10:30 UTC (permalink / raw)
  To: rolandtollenaar; +Cc: Xenomai-help, EML users, Jan Kiszka, rtnet-users

Roland Tollenaar wrote:
> Hi,
> 
> Some more interesting findings (no i-pipe trace yet though).
> 
>> Hmm, this doesn't convince me yet. Such skews during startup may as well
>> be triggered by unusual load during runtime (non-RT activity or new RT
>> components). Did you put your system under adequate non-RT load as well
>> while measuring the outputs?
> Running latencytest with my application shows an average latency of 
> about 40 and a max of 200ns. This was rather shocking so I turned off 
> rtcan in my application. Now the max latecy is 60ns. Turn off EML and 
> turn on rtcan, max latecy is 230ns. How is that for strange? But since I 
> can see the scope output bobbing with 200ns during the latency test, I 
> can also see that if I run my application without the latency test the 
> huge max latency disappears entirely. Maybe it is time for the trace but 
> then again I am still using CAN over the parallel port so will see what 
> it does on a machine with a PCI CAN adaptor first. Because I think I 
> know what happens: Due to the external loading the CAN recv interrupt 
> triggers the Rx ISR briefly before the 1ms task period ends. Due to the 
> priority of the ISR (huge debate over this) and its atomicness (if I 
> remember correctly) the reading out of the slow hardware delays the 
> start of the new task period.
> 
> Just thought it was interesting to mention. Btw when the latency appears 
> there are no overflow messages or anything like that which support the 
> theory I have about the cause.
> 
> Btw2 the 200ns latency spikes do not cause the scope to loose lock on 
> the saw-tooth so whatever causes that problem is of a different nature 
> still.

s/ns/us/ ?

Wolfgang.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Xenomai-help] [Ethercatmaster-users] EML conflict with RTCAN? low_level_input framebuilding failed.
  2007-08-15 10:30                         ` Wolfgang Grandegger
@ 2007-08-15 10:30                           ` Roland Tollenaar
  0 siblings, 0 replies; 23+ messages in thread
From: Roland Tollenaar @ 2007-08-15 10:30 UTC (permalink / raw)
  To: Wolfgang Grandegger; +Cc: Xenomai-help, EML users, Jan Kiszka, rtnet-users

>>
>> Btw2 the 200ns latency spikes do not cause the scope to loose lock on 
>> the saw-tooth so whatever causes that problem is of a different nature 
>> still.
> 
> s/ns/us/ ?

Indeed. Sorry.

Roland


> 
> Wolfgang.
> 


^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2007-08-15 10:30 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-08-13  9:45 [Xenomai-help] EML conflict with RTCAN? low_level_input framebuilding failed Roland Tollenaar
2007-08-13 11:41 ` Wolfgang Grandegger
2007-08-13 12:41   ` Roland Tollenaar
2007-08-13 13:03     ` Wolfgang Grandegger
2007-08-13 13:11       ` Roland Tollenaar
2007-08-13 14:00       ` Roland Tollenaar
2007-08-13 14:51         ` [Xenomai-help] [Ethercatmaster-users] " Jan Kiszka
2007-08-13 15:55           ` Roland Tollenaar
2007-08-13 16:57             ` Jan Kiszka
2007-08-13 17:40               ` Roland Tollenaar
2007-08-13 17:57                 ` Jan Kiszka
2007-08-13 18:17                   ` Roland Tollenaar
2007-08-13 18:30                     ` Jan Kiszka
2007-08-14 13:56           ` Roland Tollenaar
2007-08-14 14:47             ` Klaas Gadeyne
2007-08-14 18:03               ` Roland Tollenaar
2007-08-14 19:17                 ` Jan Kiszka
2007-08-15  6:11                   ` Roland Tollenaar
2007-08-15  8:24                     ` Jan Kiszka
2007-08-15  8:37                       ` Roland Tollenaar
2007-08-15  9:50                       ` Roland Tollenaar
2007-08-15 10:30                         ` Wolfgang Grandegger
2007-08-15 10:30                           ` Roland Tollenaar

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.