From: Wolfgang Grandegger <wg@grandegger.com>
To: Andreas Larsson <andreas@gaisler.com>
Cc: linux-can@vger.kernel.org, Marc Kleine-Budde <mkl@pengutronix.de>,
devicetree-discuss@lists.ozlabs.org, software@gaisler.com
Subject: Re: [PATCH v3] can: grcan: Add device driver for GRCAN and GRHCAN cores
Date: Wed, 31 Oct 2012 21:21:04 +0100 [thread overview]
Message-ID: <50918830.9040601@grandegger.com> (raw)
In-Reply-To: <509152D5.2050701@gaisler.com>
On 10/31/2012 05:33 PM, Andreas Larsson wrote:
> On 2012-10-31 13:51, Wolfgang Grandegger wrote:
>> On 10/30/2012 05:24 PM, Andreas Larsson wrote:
>>> On 10/30/2012 11:07 AM, Wolfgang Grandegger wrote:
>>>>> + /* AHB bus error interrupts (not CAN bus errors) - shut down the
>>>>> + * device.
>>>>> + */
>>>>> + if (sources & (GRCAN_IRQ_TXAHBERR | GRCAN_IRQ_RXAHBERR)) {
>>>>> + if (sources & GRCAN_IRQ_TXAHBERR) {
>>>>> + netdev_err(dev, "got AHB bus error on tx\n");
>>>>> + stats->tx_errors++;
>>>>> + } else {
>>>>> + netdev_err(dev, "got AHB bus error on rx\n");
>>>>> + stats->rx_errors++;
>>>>> + }
>>>>> + netdev_err(dev, "halting device\n");
>>>>> +
>>>>> + /* Prevent anything to be enabled again and halt device */
>>>>> + SPIN_LOCK(&priv->lock);
>>>>> + priv->closing = true;
>>>>> + netif_stop_queue(dev);
>>>>> + grcan_stop(dev);
>>>>> + SPIN_UNLOCK(&priv->lock);
>>>>
>>>> Hm, does that really happen? How can the user/app realized the problem
>>>> and recover?
>>>
>>> My understanding of it is that if you get amba bus errors something is
>>> seriously wrong and nothing can be done at driver level to recover.
>>> Shutting down the device is to prevent the driver from spamming console
>>> messages. I used to have a sysfs indication of such errors. Now dmesg is
>>> the way to find out about the problem. The user can always bring the
>>> interface down and up again and try again after such an error.
>>
>> Well, sounds like a fatal error. Did you ever saw it? If that happens
>> frequently, or even once, the device/system seems not really usable.
>
> Fatal and yes if this is ever seen something is very bad with the
> system. I have never seen it. However the hardware reports if it would
> happen, so why not try to take care of it in some manner. If this is
> seen the can device is probably useless but it is possible that the rest
> of the system might be usable. I don't really know in what circumstances
> this would trigger.
OK, maybe a more serious message would just be fine.
>>>> Furthermore, why is a spin_clock enough here? THe interrupt may run a
>>>> thread.
>>>
>>> It should be enough because the function is only called
>>> directly from the interrupt handler, right? Or did I miss something?
>>
>> The interrupt handler may run as thread, e.g. if the -rt patch has been
>> applied. Therefore it's safer to use the irqsave/restore variants of the
>> spin_lock functions. At least that's my understanding. See also
>> "Documentation/spinlock.txt".
>
> I'll look into this. Seems strange if the same cpu can be interrupted in
> an interrupt handler to run the very same interrupt handler. Sounds like
> a horrible situation for any driver that does not lock down most of its
> interrupt handler. Or maybe there is some situation that I don't foresee.
As I said, if you use CONFIG_PREEMPT_RT all interrupt isr will be
threaded (running by threads).
>>>>> + priv->can.ctrlmode_supported =
>>>>> + CAN_CTRLMODE_LISTENONLY | CAN_CTRLMODE_ONE_SHOT;
>>>>
>>>> What about triple-sampling?
>>>
>>> That is not supported by the hardware.
>
> Well, now it will be in future versions of the controller, so support is
> added to the driver.
>
>
>>>> I curious how the device behaves on bus errors and state changes. Could
>>>> you please show the output of "candump -e any,0:0,#FFFFFFFF" while
>>>> sending a CAN message with no other node on the bus (not connected) and
>>>> with going bus off (by short-circuiting CAN high and low).
>>>
>>>
>>> Here is the output (with long sequences of similar error frames
>>> where only one counter is increasing cut out) from the the upcoming v4.
>>> let me know if you see any problems with this.
>>
>> How did your trigger that sequence? Short-circuit or not cable
>> connected? The latter, I assume, as bus-off is not reached.
>
> Triggered by short-circuiting, just as asked.
I ask for both tests but I missed the bus-off forther down.
>>>
>>> can0 20000006 [8] 00 00 00 00 00 00 10 00 ERRORFRAME
>>> lost-arbitration{at bit 0}
>>> controller-problem{}
>>> error-counter-tx-rx{{16}{0}}
>>
>> This is most probably an ack slot bus error similar to. You seem not to
>> handle bus errors and it's not a controller problem as the state did not
>> change.
>
> No, the hardware has no support for can-bus error reporting.
OK, but why is CAN_ERR_CRTL set?
>> http://lxr.linux.no/#linux+v3.6.4/drivers/net/can/flexcan.c#L353
>>
>>> can0 20000004 [8] 00 20 00 00 00 00 88 00 ERRORFRAME
>>> controller-problem{tx-error-passive}
>>> error-counter-tx-rx{{136}{0}}
>>
>> OK, a state change to error passive. The device seems not to report
>> changes to error warning.
>
> No, there is no interrupts or anything that alert about the error
> states. That is why I set the sate based on the error counters in the
> error handler.
Who does trigger the message above?
>>> can0 20000006 [8] 00 20 00 00 00 00 90 00 ERRORFRAME
>>> lost-arbitration{at bit 0}
>>> controller-problem{tx-error-passive}
>>> error-counter-tx-rx{{144}{0}}
>>
>> State changes should be reported only once.
>
> Do you mean that an error message should only be sent on state change?
Yes. On state change or another error, e.g. the lost-arbitration above.
>> But it did not change. This
>> is then a bus error (CAN_ERR_PROT | CAN_ERR_BUSERROR). See also above.
>>
>>> [...]
>>> can0 20000006 [8] 00 20 00 00 00 00 F0 00 ERRORFRAME
>>> lost-arbitration{at bit 0}
>>> controller-problem{tx-error-passive}
>>> error-counter-tx-rx{{240}{0}}
>>> can0 20000006 [8] 00 20 00 00 00 00 F8 00 ERRORFRAME
>>> lost-arbitration{at bit 0}
>>> controller-problem{tx-error-passive}
>>> error-counter-tx-rx{{248}{0}}
>>> can0 20000042 [8] 00 00 00 00 00 00 80 00 ERRORFRAME
>>> lost-arbitration{at bit 0}
>>> bus-off
>>> error-counter-tx-rx{{128}{0}}
>>> can0 20000006 [8] 00 00 00 00 00 00 18 00 ERRORFRAME
>>> lost-arbitration{at bit 0}
>>> controller-problem{}
Obviously the device restarts automatically after bus-off. Can this be
switched off. The normal procedure is to call "ip ... type can restart"
or to set "restart-ms" for automatic restart after the specified delay.
>> Also a sequence to bus-off including the recovery would be nice.
The sequence with no cable connected would be nice as well.
...
> Thanks for the input! I'll send in v4 so you can see what the code that
> generated the above.
Please fix the error handling first.
Thanks,
Wolfgang.
next prev parent reply other threads:[~2012-10-31 20:21 UTC|newest]
Thread overview: 41+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-10-02 14:38 [PATCH] can: grcan: Add device driver for GRCAN and GRHCAN cores Andreas Larsson
2012-10-04 9:45 ` Marc Kleine-Budde
2012-10-11 10:04 ` Marc Kleine-Budde
2012-10-11 11:22 ` Andreas Larsson
2012-10-11 11:28 ` Marc Kleine-Budde
2012-10-11 12:08 ` Andreas Larsson
2012-10-23 9:57 ` [PATCH v2] " Andreas Larsson
2012-10-23 16:26 ` Wolfgang Grandegger
2012-10-24 13:31 ` Andreas Larsson
2012-10-30 9:06 ` [PATCH v3] " Andreas Larsson
2012-10-30 10:07 ` Wolfgang Grandegger
2012-10-30 16:24 ` Andreas Larsson
2012-10-31 12:51 ` Wolfgang Grandegger
2012-10-31 16:33 ` Andreas Larsson
2012-10-31 16:39 ` [PATCH v4] " Andreas Larsson
2012-10-31 20:21 ` Wolfgang Grandegger [this message]
2012-11-01 16:08 ` [PATCH v3] " Andreas Larsson
2012-11-02 14:23 ` [PATCH v5] " Andreas Larsson
2012-11-05 9:28 ` [PATCH v3] " Wolfgang Grandegger
2012-11-07 7:32 ` Andreas Larsson
2012-11-07 11:15 ` Wolfgang Grandegger
2012-11-07 12:55 ` Andreas Larsson
2012-11-07 15:20 ` [PATCH v6] " Andreas Larsson
2012-11-08 8:29 ` Wolfgang Grandegger
2012-11-08 9:27 ` Marc Kleine-Budde
2012-11-08 10:37 ` Andreas Larsson
[not found] ` <509B7B1E.5040509-bIcnvbaLZ9MEGnE8C9+IrQ@public.gmane.org>
2012-11-08 13:10 ` [PATCH v7] " Andreas Larsson
2012-11-09 0:01 ` Marc Kleine-Budde
2012-11-12 14:57 ` [PATCH v8] " Andreas Larsson
2012-11-13 21:15 ` Marc Kleine-Budde
2012-11-14 7:50 ` Andreas Larsson
2012-11-14 8:43 ` Marc Kleine-Budde
2012-11-14 11:02 ` Andreas Larsson
2012-11-14 11:22 ` Marc Kleine-Budde
2012-11-14 15:07 ` Andreas Larsson
2012-11-14 15:12 ` Marc Kleine-Budde
2012-11-15 7:47 ` [PATCH v9] " Andreas Larsson
2012-11-15 20:32 ` Marc Kleine-Budde
2012-11-16 6:17 ` Andreas Larsson
2012-11-08 10:33 ` [PATCH v6] " Andreas Larsson
2012-10-30 9:29 ` [PATCH v2] " Wolfgang Grandegger
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=50918830.9040601@grandegger.com \
--to=wg@grandegger.com \
--cc=andreas@gaisler.com \
--cc=devicetree-discuss@lists.ozlabs.org \
--cc=linux-can@vger.kernel.org \
--cc=mkl@pengutronix.de \
--cc=software@gaisler.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.