* flexcan napi poll and error frames
@ 2014-10-24 10:26 Andri Yngvason
2014-10-24 10:43 ` Wolfgang Grandegger
0 siblings, 1 reply; 12+ messages in thread
From: Andri Yngvason @ 2014-10-24 10:26 UTC (permalink / raw)
To: linux-can, Marc Kleine-Budde
Hi,
I was running some tests on my patches when I noticed the following:
If I have 2 flexcan devices on the bus, each sending to the bus using
cangen,and then I disconnect the cable to one of them, that device
will enter"error-warning" state, but it will not continue on to
"error-passive" as itshould.
However, when I reconnect the cable, I get the "error-passive" message
followed by an "error-warning" and eventually "back-to-error-active".
Notice the time differences:
root@(none):~# candump -td -e can0,0~0,#FFFFFFFFFF
(000.000000) can0 20000004 [8] 00 08 00 00 00 00 00 00 ERRORFRAME
controller-problem{tx-error-warning}
(006.493209) can0 20000004 [8] 00 40 00 00 00 00 00 00 ERRORFRAME
controller-problem{back-to-error-active}
(002.701331) can0 20000004 [8] 00 08 00 00 00 00 00 00 ERRORFRAME
controller-problem{tx-error-warning}
(006.498567) can0 20000004 [8] 00 20 00 00 00 00 00 00 ERRORFRAME
controller-problem{tx-error-passive}
(000.013915) can0 20000004 [8] 00 08 00 00 00 00 00 00 ERRORFRAME
controller-problem{tx-error-warning}
(001.990695) can0 20000004 [8] 00 40 00 00 00 00 00 00 ERRORFRAME
controller-problem{back-to-error-active}
I suspect that the problem is that the driver doesn't receive any
interruptsother than the one for "error-passive" and so things
won't "weigh" enoughfor napi. There seems to be some truth in this
conjecture, because when Itried setting the napi weight to 1, the
message got through.
Another thing that I found peculiar was that I had to be sending on
both devices for the error states to change to anything other than
"error-warning".
Best regards,
Andri Yngvason
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: flexcan napi poll and error frames
2014-10-24 10:26 flexcan napi poll and error frames Andri Yngvason
@ 2014-10-24 10:43 ` Wolfgang Grandegger
2014-10-24 10:55 ` Andri Yngvason
0 siblings, 1 reply; 12+ messages in thread
From: Wolfgang Grandegger @ 2014-10-24 10:43 UTC (permalink / raw)
To: Andri Yngvason; +Cc: linux-can, Marc Kleine-Budde
On Fri, 24 Oct 2014 10:26:11 +0000, Andri Yngvason
<andri.yngvason@marel.com> wrote:
> Hi,
>
> I was running some tests on my patches when I noticed the following:
> If I have 2 flexcan devices on the bus, each sending to the bus using
> cangen,and then I disconnect the cable to one of them, that device
> will enter"error-warning" state, but it will not continue on to
> "error-passive" as itshould.
>
> However, when I reconnect the cable, I get the "error-passive" message
> followed by an "error-warning" and eventually "back-to-error-active".
Yes, I think I observed that behaviour as well as you can see here:
https://gitorious.org/linux-can/wg-linux-can-next/commit/bd3acb12dbb9551541d28ae8766c154d3cf6ed57.patch
> Notice the time differences:
> root@(none):~# candump -td -e can0,0~0,#FFFFFFFFFF
> (000.000000) can0 20000004 [8] 00 08 00 00 00 00 00 00
ERRORFRAME
> controller-problem{tx-error-warning}
> (006.493209) can0 20000004 [8] 00 40 00 00 00 00 00 00
ERRORFRAME
> controller-problem{back-to-error-active}
> (002.701331) can0 20000004 [8] 00 08 00 00 00 00 00 00
ERRORFRAME
> controller-problem{tx-error-warning}
> (006.498567) can0 20000004 [8] 00 20 00 00 00 00 00 00
ERRORFRAME
> controller-problem{tx-error-passive}
> (000.013915) can0 20000004 [8] 00 08 00 00 00 00 00 00
ERRORFRAME
> controller-problem{tx-error-warning}
> (001.990695) can0 20000004 [8] 00 40 00 00 00 00 00 00
ERRORFRAME
> controller-problem{back-to-error-active}
>
>
> I suspect that the problem is that the driver doesn't receive any
> interruptsother than the one for "error-passive" and so things
> won't "weigh" enoughfor napi. There seems to be some truth in this
> conjecture, because when Itried setting the napi weight to 1, the
> message got through.
Hm, why should it depend on NAPI. It does not delay messages for
a long time. I think the problem is that the state change is not
signalled my an interrupt but some time later when another event
(message) occurs.
> Another thing that I found peculiar was that I had to be sending on
> both devices for the error states to change to anything other than
> "error-warning".
Well, the error reporting on the SJA1000 is perfect... on all other
CAN controllers it's more or less worse.
Wolfgang.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: flexcan napi poll and error frames
2014-10-24 10:43 ` Wolfgang Grandegger
@ 2014-10-24 10:55 ` Andri Yngvason
2014-10-24 12:33 ` Wolfgang Grandegger
0 siblings, 1 reply; 12+ messages in thread
From: Andri Yngvason @ 2014-10-24 10:55 UTC (permalink / raw)
To: Wolfgang Grandegger; +Cc: linux-can, Marc Kleine-Budde
On fös 24.okt 2014 10:43, Wolfgang Grandegger wrote:
> On Fri, 24 Oct 2014 10:26:11 +0000, Andri Yngvason
> <andri.yngvason@marel.com> wrote:
>> Hi,
>>
>> I was running some tests on my patches when I noticed the following:
>> If I have 2 flexcan devices on the bus, each sending to the bus using
>> cangen,and then I disconnect the cable to one of them, that device
>> will enter"error-warning" state, but it will not continue on to
>> "error-passive" as itshould.
>>
>> However, when I reconnect the cable, I get the "error-passive" message
>> followed by an "error-warning" and eventually "back-to-error-active".
> Yes, I think I observed that behaviour as well as you can see here:
> https://gitorious.org/linux-can/wg-linux-can-next/commit/bd3acb12dbb9551541d28ae8766c154d3cf6ed57.patch
Good to know.
>> Notice the time differences:
>> root@(none):~# candump -td -e can0,0~0,#FFFFFFFFFF
>> (000.000000) can0 20000004 [8] 00 08 00 00 00 00 00 00
> ERRORFRAME
>> controller-problem{tx-error-warning}
>> (006.493209) can0 20000004 [8] 00 40 00 00 00 00 00 00
> ERRORFRAME
>> controller-problem{back-to-error-active}
>> (002.701331) can0 20000004 [8] 00 08 00 00 00 00 00 00
> ERRORFRAME
>> controller-problem{tx-error-warning}
>> (006.498567) can0 20000004 [8] 00 20 00 00 00 00 00 00
> ERRORFRAME
>> controller-problem{tx-error-passive}
>> (000.013915) can0 20000004 [8] 00 08 00 00 00 00 00 00
> ERRORFRAME
>> controller-problem{tx-error-warning}
>> (001.990695) can0 20000004 [8] 00 40 00 00 00 00 00 00
> ERRORFRAME
>> controller-problem{back-to-error-active}
>>
>>
>> I suspect that the problem is that the driver doesn't receive any
>> interruptsother than the one for "error-passive" and so things
>> won't "weigh" enoughfor napi. There seems to be some truth in this
>> conjecture, because when Itried setting the napi weight to 1, the
>> message got through.
> Hm, why should it depend on NAPI. It does not delay messages for
> a long time. I think the problem is that the state change is not
> signalled my an interrupt but some time later when another event
> (message) occurs.
>
Perhaps, but how do you explain that the message got through when I
set the weight to 1?
>> Another thing that I found peculiar was that I had to be sending on
>> both devices for the error states to change to anything other than
>> "error-warning".
> Well, the error reporting on the SJA1000 is perfect... on all other
> CAN controllers it's more or less worse.
>
Should we just ignore this problem then? I'd rather like to figure
out if this is problem with the controller or not. Do you remember
if you've had this problem with flexcan?
Andri.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: flexcan napi poll and error frames
2014-10-24 10:55 ` Andri Yngvason
@ 2014-10-24 12:33 ` Wolfgang Grandegger
2014-10-24 14:39 ` Andri Yngvason
0 siblings, 1 reply; 12+ messages in thread
From: Wolfgang Grandegger @ 2014-10-24 12:33 UTC (permalink / raw)
To: Andri Yngvason; +Cc: linux-can, Marc Kleine-Budde
On Fri, 24 Oct 2014 10:55:48 +0000, Andri Yngvason
<andri.yngvason@marel.com> wrote:
> On fös 24.okt 2014 10:43, Wolfgang Grandegger wrote:
>> On Fri, 24 Oct 2014 10:26:11 +0000, Andri Yngvason
>> <andri.yngvason@marel.com> wrote:
>>> Hi,
>>>
>>> I was running some tests on my patches when I noticed the following:
>>> If I have 2 flexcan devices on the bus, each sending to the bus using
>>> cangen,and then I disconnect the cable to one of them, that device
>>> will enter"error-warning" state, but it will not continue on to
>>> "error-passive" as itshould.
>>>
>>> However, when I reconnect the cable, I get the "error-passive" message
>>> followed by an "error-warning" and eventually "back-to-error-active".
>> Yes, I think I observed that behaviour as well as you can see here:
>>
https://gitorious.org/linux-can/wg-linux-can-next/commit/bd3acb12dbb9551541d28ae8766c154d3cf6ed57.patch
> Good to know.
>>> Notice the time differences:
>>> root@(none):~# candump -td -e can0,0~0,#FFFFFFFFFF
>>> (000.000000) can0 20000004 [8] 00 08 00 00 00 00 00 00
>> ERRORFRAME
>>> controller-problem{tx-error-warning}
>>> (006.493209) can0 20000004 [8] 00 40 00 00 00 00 00 00
>> ERRORFRAME
>>> controller-problem{back-to-error-active}
>>> (002.701331) can0 20000004 [8] 00 08 00 00 00 00 00 00
>> ERRORFRAME
>>> controller-problem{tx-error-warning}
>>> (006.498567) can0 20000004 [8] 00 20 00 00 00 00 00 00
>> ERRORFRAME
>>> controller-problem{tx-error-passive}
>>> (000.013915) can0 20000004 [8] 00 08 00 00 00 00 00 00
>> ERRORFRAME
>>> controller-problem{tx-error-warning}
>>> (001.990695) can0 20000004 [8] 00 40 00 00 00 00 00 00
>> ERRORFRAME
>>> controller-problem{back-to-error-active}
>>>
>>>
>>> I suspect that the problem is that the driver doesn't receive any
>>> interruptsother than the one for "error-passive" and so things
>>> won't "weigh" enoughfor napi. There seems to be some truth in this
>>> conjecture, because when Itried setting the napi weight to 1, the
>>> message got through.
>> Hm, why should it depend on NAPI. It does not delay messages for
>> a long time. I think the problem is that the state change is not
>> signalled my an interrupt but some time later when another event
>> (message) occurs.
>>
> Perhaps, but how do you explain that the message got through when I
> set the weight to 1?
If it's really true it would be a bug in the NAPI handling. Could you
please elaborate a bit more by adding some printouts in the interrupt
handler. I will have a closer look tomorrow.
>>> Another thing that I found peculiar was that I had to be sending on
>>> both devices for the error states to change to anything other than
>>> "error-warning".
>> Well, the error reporting on the SJA1000 is perfect... on all other
>> CAN controllers it's more or less worse.
>>
> Should we just ignore this problem then? I'd rather like to figure
> out if this is problem with the controller or not. Do you remember
> if you've had this problem with flexcan?
We can do little if the CAN controller does not notify the Software
via interrupt.
Wolfgang.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: flexcan napi poll and error frames
2014-10-24 12:33 ` Wolfgang Grandegger
@ 2014-10-24 14:39 ` Andri Yngvason
2014-10-24 16:04 ` Andri Yngvason
0 siblings, 1 reply; 12+ messages in thread
From: Andri Yngvason @ 2014-10-24 14:39 UTC (permalink / raw)
To: Wolfgang Grandegger; +Cc: linux-can, Marc Kleine-Budde
On fös 24.okt 2014 12:33, Wolfgang Grandegger wrote:
> On Fri, 24 Oct 2014 10:55:48 +0000, Andri Yngvason
> <andri.yngvason@marel.com> wrote:
>> On fös 24.okt 2014 10:43, Wolfgang Grandegger wrote:
>>> On Fri, 24 Oct 2014 10:26:11 +0000, Andri Yngvason
>>> <andri.yngvason@marel.com> wrote:
>>>> Hi,
>>>>
>>>> I was running some tests on my patches when I noticed the following:
>>>> If I have 2 flexcan devices on the bus, each sending to the bus using
>>>> cangen,and then I disconnect the cable to one of them, that device
>>>> will enter"error-warning" state, but it will not continue on to
>>>> "error-passive" as itshould.
>>>>
>>>> However, when I reconnect the cable, I get the "error-passive" message
>>>> followed by an "error-warning" and eventually "back-to-error-active".
>>> Yes, I think I observed that behaviour as well as you can see here:
>>>
> https://gitorious.org/linux-can/wg-linux-can-next/commit/bd3acb12dbb9551541d28ae8766c154d3cf6ed57.patch
>> Good to know.
>>>> Notice the time differences:
>>>> root@(none):~# candump -td -e can0,0~0,#FFFFFFFFFF
>>>> (000.000000) can0 20000004 [8] 00 08 00 00 00 00 00 00
>>> ERRORFRAME
>>>> controller-problem{tx-error-warning}
>>>> (006.493209) can0 20000004 [8] 00 40 00 00 00 00 00 00
>>> ERRORFRAME
>>>> controller-problem{back-to-error-active}
>>>> (002.701331) can0 20000004 [8] 00 08 00 00 00 00 00 00
>>> ERRORFRAME
>>>> controller-problem{tx-error-warning}
>>>> (006.498567) can0 20000004 [8] 00 20 00 00 00 00 00 00
>>> ERRORFRAME
>>>> controller-problem{tx-error-passive}
>>>> (000.013915) can0 20000004 [8] 00 08 00 00 00 00 00 00
>>> ERRORFRAME
>>>> controller-problem{tx-error-warning}
>>>> (001.990695) can0 20000004 [8] 00 40 00 00 00 00 00 00
>>> ERRORFRAME
>>>> controller-problem{back-to-error-active}
>>>>
>>>>
>>>> I suspect that the problem is that the driver doesn't receive any
>>>> interruptsother than the one for "error-passive" and so things
>>>> won't "weigh" enoughfor napi. There seems to be some truth in this
>>>> conjecture, because when Itried setting the napi weight to 1, the
>>>> message got through.
>>> Hm, why should it depend on NAPI. It does not delay messages for
>>> a long time. I think the problem is that the state change is not
>>> signalled my an interrupt but some time later when another event
>>> (message) occurs.
>>>
>> Perhaps, but how do you explain that the message got through when I
>> set the weight to 1?
> If it's really true it would be a bug in the NAPI handling. Could you
> please elaborate a bit more by adding some printouts in the interrupt
> handler. I will have a closer look tomorrow.
I wasn't lying about it. Perhaps by changing the weight it got through with
something else. I don't know; I'm not an expert on the inner workings of napi.
But let's just forget about the weight thing. I found out by looking in the
i.mx6 reference manual that there is no interrupt for this transition. I
found that quite incredible so I searched through it a few times. Anyway,
there are only interrupts for active->tx-warning, active->rx-warning and
active->bus-off.
>
>>>> Another thing that I found peculiar was that I had to be sending on
>>>> both devices for the error states to change to anything other than
>>>> "error-warning".
>>> Well, the error reporting on the SJA1000 is perfect... on all other
>>> CAN controllers it's more or less worse.
>>>
>> Should we just ignore this problem then? I'd rather like to figure
>> out if this is problem with the controller or not. Do you remember
>> if you've had this problem with flexcan?
> We can do little if the CAN controller does not notify the Software
> via interrupt.
Yes, that's why I wanted to figure out if it's a controller problem or not.
Turns out it's a controller problem, but perhaps we can work around it?
E.g. if we check esr for state changes every time someone transmits a
frame, both of these problems would go away. Would it be unacceptable
overhead to do so?
Cheers,
Andri
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: flexcan napi poll and error frames
2014-10-24 14:39 ` Andri Yngvason
@ 2014-10-24 16:04 ` Andri Yngvason
2014-10-24 16:36 ` Steffen Rose
2014-10-24 19:08 ` Wolfgang Grandegger
0 siblings, 2 replies; 12+ messages in thread
From: Andri Yngvason @ 2014-10-24 16:04 UTC (permalink / raw)
To: Wolfgang Grandegger; +Cc: linux-can, Marc Kleine-Budde
On fös 24.okt 2014 14:39, Andri Yngvason wrote:
> On fös 24.okt 2014 12:33, Wolfgang Grandegger wrote:
>> On Fri, 24 Oct 2014 10:55:48 +0000, Andri Yngvason
>> <andri.yngvason@marel.com> wrote:
>>> On fös 24.okt 2014 10:43, Wolfgang Grandegger wrote:
>>>> On Fri, 24 Oct 2014 10:26:11 +0000, Andri Yngvason
>>>> <andri.yngvason@marel.com> wrote:
>>>>> Hi,
>>>>>
>>>>> I was running some tests on my patches when I noticed the following:
>>>>> If I have 2 flexcan devices on the bus, each sending to the bus using
>>>>> cangen,and then I disconnect the cable to one of them, that device
>>>>> will enter"error-warning" state, but it will not continue on to
>>>>> "error-passive" as itshould.
>>>>>
>>>>> However, when I reconnect the cable, I get the "error-passive" message
>>>>> followed by an "error-warning" and eventually "back-to-error-active".
>>>> Yes, I think I observed that behaviour as well as you can see here:
>>>>
>> https://gitorious.org/linux-can/wg-linux-can-next/commit/bd3acb12dbb9551541d28ae8766c154d3cf6ed57.patch
>>> Good to know.
>>>>
...
>>>>
>>>> I suspect that the problem is that the driver doesn't receive any
>>>> interruptsother than the one for "error-passive" and so things
>>>> won't "weigh" enoughfor napi. There seems to be some truth in this
>>>> conjecture, because when Itried setting the napi weight to 1, the
>>>> message got through.
>>>> Hm, why should it depend on NAPI. It does not delay messages for
>>>> a long time. I think the problem is that the state change is not
>>>> signalled my an interrupt but some time later when another event
>>>> (message) occurs.
>>>>
>>> Perhaps, but how do you explain that the message got through when I
>>> set the weight to 1?
>> If it's really true it would be a bug in the NAPI handling. Could you
>> please elaborate a bit more by adding some printouts in the interrupt
>> handler. I will have a closer look tomorrow.
> I wasn't lying about it. Perhaps by changing the weight it got through with
> something else. I don't know; I'm not an expert on the inner workings of napi.
>
> But let's just forget about the weight thing. I found out by looking in the
> i.mx6 reference manual that there is no interrupt for this transition. I
> found that quite incredible so I searched through it a few times. Anyway,
> there are only interrupts for active->tx-warning, active->rx-warning and
> active->bus-off.
>
>>>>> Another thing that I found peculiar was that I had to be sending on
>>>>> both devices for the error states to change to anything other than
>>>>> "error-warning".
>>>> Well, the error reporting on the SJA1000 is perfect... on all other
>>>> CAN controllers it's more or less worse.
>>>>
>>> Should we just ignore this problem then? I'd rather like to figure
>>> out if this is problem with the controller or not. Do you remember
>>> if you've had this problem with flexcan?
>> We can do little if the CAN controller does not notify the Software
>> via interrupt.
> Yes, that's why I wanted to figure out if it's a controller problem or not.
> Turns out it's a controller problem, but perhaps we can work around it?
> E.g. if we check esr for state changes every time someone transmits a
> frame, both of these problems would go away. Would it be unacceptable
> overhead to do so?
>
I've just confirmed that this "fix" works, but only if berr-reporting is
enabled.
Andri.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: flexcan napi poll and error frames
2014-10-24 16:04 ` Andri Yngvason
@ 2014-10-24 16:36 ` Steffen Rose
2014-10-24 17:40 ` Andri Yngvason
2014-10-24 19:08 ` Wolfgang Grandegger
1 sibling, 1 reply; 12+ messages in thread
From: Steffen Rose @ 2014-10-24 16:36 UTC (permalink / raw)
To: linux-can
Hello,
Am Freitag, 24. Oktober 2014, 16:04:56 schrieben Sie:
> > We can do little if the CAN controller does not notify the Software
> >> via interrupt.
> > Yes, that's why I wanted to figure out if it's a controller problem or
> > not.
> > Turns out it's a controller problem, but perhaps we can work around it?
> > E.g. if we check esr for state changes every time someone transmits a
> > frame, both of these problems would go away. Would it be unacceptable
> > overhead to do so?
>
> I've just confirmed that this "fix" works, but only if berr-reporting is
> enabled.
Is this workaround working in case of an open CAN Bus?
(Ack error situation)
The flexcan can generate an error interrupt after every CAN bus error. But in
case of an error situation the interrupt load would be very high (e.g. short
circuit of the CAN).
--
Mit freundlichen Grüßen
Steffen Rose
www.emtas.de
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: flexcan napi poll and error frames
2014-10-24 16:36 ` Steffen Rose
@ 2014-10-24 17:40 ` Andri Yngvason
2014-10-27 7:29 ` David Jander
0 siblings, 1 reply; 12+ messages in thread
From: Andri Yngvason @ 2014-10-24 17:40 UTC (permalink / raw)
To: Steffen Rose, linux-can
On fös 24.okt 2014 16:36, Steffen Rose wrote:
> Hello,
>
> Am Freitag, 24. Oktober 2014, 16:04:56 schrieben Sie:
>>> We can do little if the CAN controller does not notify the Software
>>>> via interrupt.
>>> Yes, that's why I wanted to figure out if it's a controller problem or
>>> not.
>>> Turns out it's a controller problem, but perhaps we can work around it?
>>> E.g. if we check esr for state changes every time someone transmits a
>>> frame, both of these problems would go away. Would it be unacceptable
>>> overhead to do so?
>> I've just confirmed that this "fix" works, but only if berr-reporting is
>> enabled.
> Is this workaround working in case of an open CAN Bus?
> (Ack error situation)
Yes
> The flexcan can generate an error interrupt after every CAN bus error. But in
> case of an error situation the interrupt load would be very high (e.g. short
> circuit of the CAN).
>
That's what berr-reporting does, right? Considering that when the bus is
flooded with errors, things are in a pretty bad shape anyway, I think it's
not really going to contribute much to the over-all mess to just leave the
error interrupts on all the time. It's far worse if the user doesn't get
the correct error state.
Anyway, defensive measures can be taken. When the bus has reached
error-passive, the driver is not going to need those interrupts any more
so it could turn off the interrupts until the bus goes to bus-off or back
down to warning or active; except when berr-reporting is enabled.
--
Andri
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: flexcan napi poll and error frames
2014-10-24 16:04 ` Andri Yngvason
2014-10-24 16:36 ` Steffen Rose
@ 2014-10-24 19:08 ` Wolfgang Grandegger
1 sibling, 0 replies; 12+ messages in thread
From: Wolfgang Grandegger @ 2014-10-24 19:08 UTC (permalink / raw)
To: Andri Yngvason; +Cc: linux-can, Marc Kleine-Budde
On 10/24/2014 06:04 PM, Andri Yngvason wrote:
>
> On fös 24.okt 2014 14:39, Andri Yngvason wrote:
>> On fös 24.okt 2014 12:33, Wolfgang Grandegger wrote:
>>> On Fri, 24 Oct 2014 10:55:48 +0000, Andri Yngvason
>>> <andri.yngvason@marel.com> wrote:
>>>> On fös 24.okt 2014 10:43, Wolfgang Grandegger wrote:
>>>>> On Fri, 24 Oct 2014 10:26:11 +0000, Andri Yngvason
>>>>> <andri.yngvason@marel.com> wrote:
>>>>>> Hi,
>>>>>>
>>>>>> I was running some tests on my patches when I noticed the following:
>>>>>> If I have 2 flexcan devices on the bus, each sending to the bus using
>>>>>> cangen,and then I disconnect the cable to one of them, that device
>>>>>> will enter"error-warning" state, but it will not continue on to
>>>>>> "error-passive" as itshould.
>>>>>>
>>>>>> However, when I reconnect the cable, I get the "error-passive" message
>>>>>> followed by an "error-warning" and eventually "back-to-error-active".
>>>>> Yes, I think I observed that behaviour as well as you can see here:
>>>>>
>>> https://gitorious.org/linux-can/wg-linux-can-next/commit/bd3acb12dbb9551541d28ae8766c154d3cf6ed57.patch
>>>> Good to know.
>>>>>
> ...
>>>>>
>>>>> I suspect that the problem is that the driver doesn't receive any
>>>>> interruptsother than the one for "error-passive" and so things
>>>>> won't "weigh" enoughfor napi. There seems to be some truth in this
>>>>> conjecture, because when Itried setting the napi weight to 1, the
>>>>> message got through.
>>>>> Hm, why should it depend on NAPI. It does not delay messages for
>>>>> a long time. I think the problem is that the state change is not
>>>>> signalled my an interrupt but some time later when another event
>>>>> (message) occurs.
>>>>>
>>>> Perhaps, but how do you explain that the message got through when I
>>>> set the weight to 1?
>>> If it's really true it would be a bug in the NAPI handling. Could you
>>> please elaborate a bit more by adding some printouts in the interrupt
>>> handler. I will have a closer look tomorrow.
>> I wasn't lying about it. Perhaps by changing the weight it got through with
>> something else. I don't know; I'm not an expert on the inner workings of napi.
>>
>> But let's just forget about the weight thing. I found out by looking in the
>> i.mx6 reference manual that there is no interrupt for this transition. I
>> found that quite incredible so I searched through it a few times. Anyway,
>> there are only interrupts for active->tx-warning, active->rx-warning and
>> active->bus-off.
>>
>>>>>> Another thing that I found peculiar was that I had to be sending on
>>>>>> both devices for the error states to change to anything other than
>>>>>> "error-warning".
>>>>> Well, the error reporting on the SJA1000 is perfect... on all other
>>>>> CAN controllers it's more or less worse.
>>>>>
>>>> Should we just ignore this problem then? I'd rather like to figure
>>>> out if this is problem with the controller or not. Do you remember
>>>> if you've had this problem with flexcan?
>>> We can do little if the CAN controller does not notify the Software
>>> via interrupt.
>> Yes, that's why I wanted to figure out if it's a controller problem or not.
>> Turns out it's a controller problem, but perhaps we can work around it?
>> E.g. if we check esr for state changes every time someone transmits a
>> frame, both of these problems would go away. Would it be unacceptable
>> overhead to do so?
>>
> I've just confirmed that this "fix" works, but only if berr-reporting is
> enabled.
Ah, oh, this reminds me that there is a related bug in some versions of
the FLEXCAN core. If you look to "flexcan.c" you will find:
#define FLEXCAN_HAS_BROKEN_ERR_STATE BIT(2) /* [TR]WRN INT not connected */
On these cores bus-error reporting is required to realize these state
changes.
Wolfgang.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: flexcan napi poll and error frames
2014-10-24 17:40 ` Andri Yngvason
@ 2014-10-27 7:29 ` David Jander
[not found] ` <544E2C19.1050608@marel.com>
0 siblings, 1 reply; 12+ messages in thread
From: David Jander @ 2014-10-27 7:29 UTC (permalink / raw)
To: Andri Yngvason; +Cc: Steffen Rose, linux-can, Marc Kleine-Budde
Hi Andri,
On Fri, 24 Oct 2014 17:40:19 +0000
Andri Yngvason <andri.yngvason@marel.com> wrote:
>
> On fös 24.okt 2014 16:36, Steffen Rose wrote:
> > Hello,
> >
> > Am Freitag, 24. Oktober 2014, 16:04:56 schrieben Sie:
> >>> We can do little if the CAN controller does not notify the Software
> >>>> via interrupt.
> >>> Yes, that's why I wanted to figure out if it's a controller problem or
> >>> not.
> >>> Turns out it's a controller problem, but perhaps we can work around it?
> >>> E.g. if we check esr for state changes every time someone transmits a
> >>> frame, both of these problems would go away. Would it be unacceptable
> >>> overhead to do so?
> >> I've just confirmed that this "fix" works, but only if berr-reporting is
> >> enabled.
> > Is this workaround working in case of an open CAN Bus?
> > (Ack error situation)
> Yes
> > The flexcan can generate an error interrupt after every CAN bus error. But
> > in case of an error situation the interrupt load would be very high (e.g.
> > short circuit of the CAN).
> >
> That's what berr-reporting does, right? Considering that when the bus is
> flooded with errors, things are in a pretty bad shape anyway, I think it's
> not really going to contribute much to the over-all mess to just leave the
> error interrupts on all the time. It's far worse if the user doesn't get
> the correct error state.
>
> Anyway, defensive measures can be taken. When the bus has reached
> error-passive, the driver is not going to need those interrupts any more
> so it could turn off the interrupts until the bus goes to bus-off or back
> down to warning or active; except when berr-reporting is enabled.
Would you mind trying out the patch series posted a few days ago here:
http://article.gmane.org/gmane.linux.can/6654
It applies to the flexcan-next branch of
git://gitorious.org/linux-can/linux-can-next.git
(base commit should be 907aa2d61697035a921cad6375c0d546b1d18af6 if HEAD fails).
I think it solves your problem.
Best regards,
--
David Jander
Protonic Holland.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: flexcan napi poll and error frames
[not found] ` <544E2C19.1050608@marel.com>
@ 2014-10-27 14:01 ` David Jander
2014-10-27 15:53 ` Andri Yngvason
0 siblings, 1 reply; 12+ messages in thread
From: David Jander @ 2014-10-27 14:01 UTC (permalink / raw)
To: Andri Yngvason; +Cc: Marc Kleine-Budde, linux-can
On Mon, 27 Oct 2014 11:27:21 +0000
Andri Yngvason <andri.yngvason@marel.com> wrote:
> Hi David,
>
> On mán 27.okt 2014 07:29, David Jander wrote:
> > Andri Yngvason <andri.yngvason@marel.com> wrote:
> >
> >> On fös 24.okt 2014 16:36, Steffen Rose wrote:
> >>> Hello,
> >>>
> >>> Am Freitag, 24. Oktober 2014, 16:04:56 schrieben Sie:
> >>>>> We can do little if the CAN controller does not notify the Software
> >>>>>> via interrupt.
> >>>>> Yes, that's why I wanted to figure out if it's a controller problem or
> >>>>> not.
> >>>>> Turns out it's a controller problem, but perhaps we can work around it?
> >>>>> E.g. if we check esr for state changes every time someone transmits a
> >>>>> frame, both of these problems would go away. Would it be unacceptable
> >>>>> overhead to do so?
> >>>> I've just confirmed that this "fix" works, but only if berr-reporting is
> >>>> enabled.
> >>> Is this workaround working in case of an open CAN Bus?
> >>> (Ack error situation)
> >> Yes
> >>> The flexcan can generate an error interrupt after every CAN bus error.
> >>> But in case of an error situation the interrupt load would be very high
> >>> (e.g. short circuit of the CAN).
> >>>
> >> That's what berr-reporting does, right? Considering that when the bus is
> >> flooded with errors, things are in a pretty bad shape anyway, I think it's
> >> not really going to contribute much to the over-all mess to just leave the
> >> error interrupts on all the time. It's far worse if the user doesn't get
> >> the correct error state.
> >>
> >> Anyway, defensive measures can be taken. When the bus has reached
> >> error-passive, the driver is not going to need those interrupts any more
> >> so it could turn off the interrupts until the bus goes to bus-off or back
> >> down to warning or active; except when berr-reporting is enabled.
> > Would you mind trying out the patch series posted a few days ago here:
> >
> > http://article.gmane.org/gmane.linux.can/6654
> >
> > It applies to the flexcan-next branch of
> > git://gitorious.org/linux-can/linux-can-next.git
> > (base commit should be 907aa2d61697035a921cad6375c0d546b1d18af6 if HEAD
> > fails).
> >
> > I think it solves your problem.
> >
> Sure, I'll try those patches, but applying them is non-trivial. I.e. they
> don't apply. Patching the base commit fails after "Applying: can: rx-fifo:
> fix long lines". Could you maybe just send me a link to a git repo or branch
> where they're already applied? A single .patch file might also work.
Hmmm. It seems that tree is kind of a moving target sometimes....
Here's a link to the whole series re-based on top of vanilla 3.17:
https://github.com/yope/linux/tree/flexcan-v3.17
I hope this works for you.
Best regards,
--
David Jander
Protonic Holland.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: flexcan napi poll and error frames
2014-10-27 14:01 ` David Jander
@ 2014-10-27 15:53 ` Andri Yngvason
0 siblings, 0 replies; 12+ messages in thread
From: Andri Yngvason @ 2014-10-27 15:53 UTC (permalink / raw)
To: David Jander; +Cc: Marc Kleine-Budde, linux-can
On Mon, Oct 27, 2014 at 03:01:11PM +0100, David Jander wrote:
> On Mon, 27 Oct 2014 11:27:21 +0000
> Andri Yngvason <andri.yngvason@marel.com> wrote:
>
> > Hi David,
> >
> > On mán 27.okt 2014 07:29, David Jander wrote:
> > > Andri Yngvason <andri.yngvason@marel.com> wrote:
> > >
> > >> On fös 24.okt 2014 16:36, Steffen Rose wrote:
> > >>> Hello,
> > >>>
> > >>> Am Freitag, 24. Oktober 2014, 16:04:56 schrieben Sie:
> > >>>>> We can do little if the CAN controller does not notify the Software
> > >>>>>> via interrupt.
> > >>>>> Yes, that's why I wanted to figure out if it's a controller problem or
> > >>>>> not.
> > >>>>> Turns out it's a controller problem, but perhaps we can work around it?
> > >>>>> E.g. if we check esr for state changes every time someone transmits a
> > >>>>> frame, both of these problems would go away. Would it be unacceptable
> > >>>>> overhead to do so?
> > >>>> I've just confirmed that this "fix" works, but only if berr-reporting is
> > >>>> enabled.
> > >>> Is this workaround working in case of an open CAN Bus?
> > >>> (Ack error situation)
> > >> Yes
> > >>> The flexcan can generate an error interrupt after every CAN bus error.
> > >>> But in case of an error situation the interrupt load would be very high
> > >>> (e.g. short circuit of the CAN).
> > >>>
> > >> That's what berr-reporting does, right? Considering that when the bus is
> > >> flooded with errors, things are in a pretty bad shape anyway, I think it's
> > >> not really going to contribute much to the over-all mess to just leave the
> > >> error interrupts on all the time. It's far worse if the user doesn't get
> > >> the correct error state.
> > >>
> > >> Anyway, defensive measures can be taken. When the bus has reached
> > >> error-passive, the driver is not going to need those interrupts any more
> > >> so it could turn off the interrupts until the bus goes to bus-off or back
> > >> down to warning or active; except when berr-reporting is enabled.
> > > Would you mind trying out the patch series posted a few days ago here:
> > >
> > > http://article.gmane.org/gmane.linux.can/6654
> > >
> > > It applies to the flexcan-next branch of
> > > git://gitorious.org/linux-can/linux-can-next.git
> > > (base commit should be 907aa2d61697035a921cad6375c0d546b1d18af6 if HEAD
> > > fails).
> > >
> > > I think it solves your problem.
> > >
> > Sure, I'll try those patches, but applying them is non-trivial. I.e. they
> > don't apply. Patching the base commit fails after "Applying: can: rx-fifo:
> > fix long lines". Could you maybe just send me a link to a git repo or branch
> > where they're already applied? A single .patch file might also work.
>
> Hmmm. It seems that tree is kind of a moving target sometimes....
> Here's a link to the whole series re-based on top of vanilla 3.17:
>
> https://github.com/yope/linux/tree/flexcan-v3.17
>
> I hope this works for you.
>
Thanks for rabasing this for me. It turns out that the source of all my problems
was my mail client (Thunderbird). It seems to mangle even maildir.
I switched to offlineimap and mutt, and the patches applied. I've been trying to
make thunderbird work because co-workers keep sending me HTML mail...
Sorry for making you do that extra work.
--
Andri
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2014-10-27 15:53 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-10-24 10:26 flexcan napi poll and error frames Andri Yngvason
2014-10-24 10:43 ` Wolfgang Grandegger
2014-10-24 10:55 ` Andri Yngvason
2014-10-24 12:33 ` Wolfgang Grandegger
2014-10-24 14:39 ` Andri Yngvason
2014-10-24 16:04 ` Andri Yngvason
2014-10-24 16:36 ` Steffen Rose
2014-10-24 17:40 ` Andri Yngvason
2014-10-27 7:29 ` David Jander
[not found] ` <544E2C19.1050608@marel.com>
2014-10-27 14:01 ` David Jander
2014-10-27 15:53 ` Andri Yngvason
2014-10-24 19:08 ` Wolfgang Grandegger
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).