Re: RFC: improve and consolidate state change and bus-off handling

linux-can.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Oliver Hartkopp <socketcan-fJ+pQTUTwRTk1uMJSBkQmQ@public.gmane.org>
To: Sebastian Haas <dev-xpvPi5bcW5W9w4jpWW8B1qHonnlKzd3f@public.gmane.org>
Cc: SocketCAN Core Mailing List
	<socketcan-core-0fE9KPoRgkgATYTw5x5z8w@public.gmane.org>,
	Marc Kleine-Budde <mkl-bIcnvbaLZ9MEGnE8C9+IrQ@public.gmane.org>,
	Wolfgang Grandegger <wg-5Yr1BZd7O62+XT7JhA+gdA@public.gmane.org>,
	Linux-can Mailing List
	<linux-can-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
Subject: Re: RFC: improve and consolidate state change and bus-off handling
Date: Sat, 03 Dec 2011 11:18:56 +0100	[thread overview]
Message-ID: <4ED9F790.2060308@hartkopp.net> (raw)
In-Reply-To: <4ED9EBAB.4060701-xpvPi5bcW5W9w4jpWW8B1qHonnlKzd3f@public.gmane.org>

Hello Sebastian,

nice to read that you're back on air ;-)

On 03.12.2011 10:28, Sebastian Haas wrote:

> Hello,
> 
> Am 02.12.2011 10:41, schrieb Marc Kleine-Budde:
>> On 12/02/2011 10:04 AM, Wolfgang Grandegger wrote:
>>> - Bus-off recovery:
>>>
>>>    Currently, I think, we do not handle bus-off recovery correctly for
>>>    most controllers. We brute-force stop and restart the controller.
>>>    The controller will do the recovery cycle anyway and we may send
>>>    messages to early. Instead the software should handle the bus-off
>>>    recovery cycle as shown below:
>>>
>>>    * bus-off happens
>>>      - call netif_stop_queue() and maybe disable interrupts
>>>
>>>    * automatic or manual restart is done
> Short note from me about that. We should be carefully with automatic recovery
> at driver-level, since this is basically not covered by the standard (only the
> application is allowed to decide whether or not the controller shall go back
> to error active). A bus off is an unusual state in a CAN network, because if
> this happens the network is physically in a bad shape (cable length, stub
> lines, electrical defects) or has a logical error (identifier conflicts,
> different bit-rates). So in my opinion the driver shall not allow anyway of
> automatic recovery and shall be up to the application to decide what to do.

You are right. For that reason the automatic restart is an option. You're
still able to define one handler on the multi-user system to cope with
bus-off. OTOH you can enable the automatic recovery when you know that you are
working in a proved CAN setup (regarding cable & bittiming issues) to handle
other temporary issues (EMC/hotplug). E.g. in some vehicle setups an automatic
recovery from bus-off after 200ms is specified. If this is done by a 'special'
handler or just by the driver can not be seen from the network view.

> 
>>>      - trigger bus-off recovery sequence by resetting the init bit
>>>        (on SJA1000) and maybe re-enable the interrupts
>>>      - await the controller going back to error-active state
>>>        (signaled via interrupt).
>>
>> I'm not sure if all controllers signal correctly that they are back in
>> error active. My observation is that bus off handling is a bit like
>> climbing the mount Everest, the air is quite thin and things can lock up
>> quite fast.
> In a worst case you need to poll for the error state if the controller doesn't
> support correct signalling. But I expect most controllers to implement such a
> feature since this is quite essential for most application to know the
> controller got bus-on/error active.

Yes. Indeed some CAN controllers recover themselves (IIRC mscan) and others
don't (SJA1000).  If a controller recovers silently (and no CAN frames come
from the outside to detect it) the state would need to be polled to be able to
send frames from the local host ...

> 
>>>      - call netif_wake_queue()
>>>
>>>    Here is a "candump -e" output for the SJA1000 (with delta times)
>>>
>>>      (009.832477)  can0  20000204  [8] 00 30 00 00 00 00 88 00   ERRORFRAME
>>>     controller-problem{tx-error-passive}
>>>     state-change{tx-error-passive}
>>>     error-counter-tx-rx{{136}{0}}
>>>      (000.000804)  can0  20000240  [8] 00 00 00 00 00 00 7F 00   ERRORFRAME
>>>     bus-off
>>>     state-change{}
>>>     error-counter-tx-rx{{127}{0}}
>>>      (000.099795)  can0  20000100  [8] 00 00 00 00 00 00 7F 00   ERRORFRAME
>>>     restarted-after-bus-off
>>>     error-counter-tx-rx{{127}{0}}
>>>      (000.003061)  can0  20000200  [8] 00 40 00 00 00 00 00 00   ERRORFRA
>>>     state-change{back-to-error-active}
>>>
>>>     Before doing all the necessary code changes, which are not always
>>>     trivial I ask: Would that be the correct bus-off handling???
>>
>> However if hardware permits the described steps sound reasonable (from
>> my non CAN expert point of view).

To me too. Nice rework idea.

> Despite the fact that you misuse a CAN message structure to carry CAN error
> frames, the overall procedure looks quite good.

From a todays view the CAN controller states could also be transferred into
the userspace by using recvmsg() that currently supports to push the timestamp
and the dropcounter within one syscall to the user.

The idea to get the error message through the data path (when it's enabled
with a special filter) is, that the CAN controller states fit into the logfile
and have their own timestamps - even when there's no valid CAN frame
receiption in that time. E.g. if you look into some Vector ASC logfiles the
problems detected by the CAN controller are also put inside the log at the
appropriate place. The reception of CAN data and CAN controller states in-line
with timestamps has real debugging advantages. Although it's easy to implement
in the interrupt handler whether you have a correct RX-interrupt or an error
interrupt -> you create a skb and push it into the system :-)

Regards,
Oliver

next prev parent reply	other threads:[~2011-12-03 10:18 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-12-02  9:04 RFC: improve and consolidate state change and bus-off handling Wolfgang Grandegger
2011-12-02  9:41 ` Marc Kleine-Budde
2011-12-02 10:04   ` Wolfgang Grandegger
2011-12-03  9:28   ` Sebastian Haas
     [not found]     ` <4ED9EBAB.4060701-xpvPi5bcW5W9w4jpWW8B1qHonnlKzd3f@public.gmane.org>
2011-12-03 10:18       ` Oliver Hartkopp [this message]
2011-12-03 11:53         ` Sebastian Haas
2011-12-03 16:53           ` Oliver Hartkopp

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4ED9F790.2060308@hartkopp.net \
    --to=socketcan-fj+pqtutwrtk1umjsbkqmq@public.gmane.org \
    --cc=dev-xpvPi5bcW5W9w4jpWW8B1qHonnlKzd3f@public.gmane.org \
    --cc=linux-can-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=mkl-bIcnvbaLZ9MEGnE8C9+IrQ@public.gmane.org \
    --cc=socketcan-core-0fE9KPoRgkgATYTw5x5z8w@public.gmane.org \
    --cc=wg-5Yr1BZd7O62+XT7JhA+gdA@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).