Re: RFC: improve and consolidate state change and bus-off handling

linux-can.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Marc Kleine-Budde <mkl@pengutronix.de>
To: Wolfgang Grandegger <wg@grandegger.com>
Cc: Linux-can Mailing List <linux-can@vger.kernel.org>,
	SocketCAN Core Mailing List <socketcan-core@lists.berlios.de>
Subject: Re: RFC: improve and consolidate state change and bus-off handling
Date: Fri, 02 Dec 2011 10:41:36 +0100	[thread overview]
Message-ID: <4ED89D50.9040401@pengutronix.de> (raw)
In-Reply-To: <4ED8948E.5090806@grandegger.com>

[-- Attachment #1: Type: text/plain, Size: 5161 bytes --]

On 12/02/2011 10:04 AM, Wolfgang Grandegger wrote:
> Hello,
> 
> as you might know, our handling of CAN state changes and bus-off is
> not consistent, weak or even incorrect. Therefore I'm making an effort
> to improve, consolidate and *unify* it. Most things are straight-
> forward, but others need more attention and especially for the bus-off
> recovery I would appreciate some CAN expert advice (more below). I have
> already some patches implementing:
> 
> - Add missing do_get_berr_counter() callbacks (for ti_hecc, etc.).

+1

> - Add error counters to the data fields 6..7 of *any* CAN error message
>   automatically in alloc_can_err_skb():
> 
>        if (priv->do_get_berr_counter) {
>                struct can_berr_counter bec;
> 
>                priv->do_get_berr_counter(dev, &bec);
>                (*cf)->data[6] = bec.txerr;
>                (*cf)->data[7] = bec.rxerr;
>        }

What about some not directly connected devices like the mcp251x. At
least the mcp2515 driver (which is not mainline, though) needs a spi
transfer for that.

Do we need a flag in the driver to indicate not to read the berr_counter?

> - Allow state changes going down including "back to error active":
> 
>   Therefore I added:
> 
>     $ cat include/linux/can/error.h
>     ...
>     #define CAN_ERR_STATE_CHANGE 0x00000200U /* CAN error state change / data[1] */
>     ...
>     #define CAN_ERR_CRTL_ACTIVE      0x40 /* recovered to error active state */
>  
>   For any state change the CAN_ERR_STATE_CHANGE will be set in the
>   can_id. If the state gets worse, CAN_ERR_CRTL is set as usual
>   also for backward compatibility. The state change management will
>   be done by a common "can_change_state()" function doing all the bit

yeah! common function! +1

>   settings and counter increments. For the SJA1000 "candump -e" will
>   then report for recovery from the error passive state (no cable):
> 
>     can0  20000204  [8] 00 08 00 00 00 00 60 00   ERRORFRAME
> 	controller-problem{tx-error-warning}
> 	state-change{tx-error-warning}
> 	error-counter-tx-rx{{96}{0}}
>     can0  20000204  [8] 00 30 00 00 00 00 80 00   ERRORFRAME
> 	controller-problem{tx-error-passive}
> 	state-change{tx-error-passive}
> 	error-counter-tx-rx{{128}{0}}
>     can0  124  [3] 12 34 56
>     ...
>     can0  124  [3] 12 34 56
>     can0  20000200  [8] 00 08 00 00 00 00 7F 00   ERRORFRAME
> 	state-change{tx-error-warning}
> 	error-counter-tx-rx{{127}{0}}
>     can0  124  [3] 12 34 56
>     ...
>     can0  124  [3] 12 34 56
>     can0  20000200  [8] 00 40 00 00 00 00 5F 00   ERRORFRAME
> 	state-change{back-to-error-active}
> 	error-counter-tx-rx{{95}{0}}
>  
>    Updating all drivers correctly is a challenge, especially because I
>    do not have all hardware. Help and comments are appreciated.

I can test the at91_can, flexcan and if we're lucky we've a mcp251x in
the office.

> - Bus-off recovery:
> 
>   Currently, I think, we do not handle bus-off recovery correctly for
>   most controllers. We brute-force stop and restart the controller.
>   The controller will do the recovery cycle anyway and we may send
>   messages to early. Instead the software should handle the bus-off
>   recovery cycle as shown below:
> 
>   * bus-off happens
>     - call netif_stop_queue() and maybe disable interrupts
> 
>   * automatic or manual restart is done
>     - trigger bus-off recovery sequence by resetting the init bit
>       (on SJA1000) and maybe re-enable the interrupts
>     - await the controller going back to error-active state
>       (signaled via interrupt).

I'm not sure if all controllers signal correctly that they are back in
error active. My observation is that bus off handling is a bit like
climbing the mount Everest, the air is quite thin and things can lock up
quite fast.

>     - call netif_wake_queue()
> 
>   Here is a "candump -e" output for the SJA1000 (with delta times)
> 
>     (009.832477)  can0  20000204  [8] 00 30 00 00 00 00 88 00   ERRORFRAME
> 	controller-problem{tx-error-passive}
> 	state-change{tx-error-passive}
> 	error-counter-tx-rx{{136}{0}}
>     (000.000804)  can0  20000240  [8] 00 00 00 00 00 00 7F 00   ERRORFRAME
> 	bus-off
> 	state-change{}
> 	error-counter-tx-rx{{127}{0}}
>     (000.099795)  can0  20000100  [8] 00 00 00 00 00 00 7F 00   ERRORFRAME
> 	restarted-after-bus-off
> 	error-counter-tx-rx{{127}{0}}
>     (000.003061)  can0  20000200  [8] 00 40 00 00 00 00 00 00   ERRORFRAME
> 	state-change{back-to-error-active}
> 
>    Before doing all the necessary code changes, which are not always
>    trivial I ask: Would that be the correct bus-off handling???

However if hardware permits the described steps sound reasonable (from
my non CAN expert point of view).

> Thanks for feedback.

Marc

-- 
Pengutronix e.K.                  | Marc Kleine-Budde           |
Industrial Linux Solutions        | Phone: +49-231-2826-924     |
Vertretung West/Dortmund          | Fax:   +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

next prev parent reply	other threads:[~2011-12-02  9:41 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-12-02  9:04 RFC: improve and consolidate state change and bus-off handling Wolfgang Grandegger
2011-12-02  9:41 ` Marc Kleine-Budde [this message]
2011-12-02 10:04   ` Wolfgang Grandegger
2011-12-03  9:28   ` Sebastian Haas
     [not found]     ` <4ED9EBAB.4060701-xpvPi5bcW5W9w4jpWW8B1qHonnlKzd3f@public.gmane.org>
2011-12-03 10:18       ` Oliver Hartkopp
2011-12-03 11:53         ` Sebastian Haas
2011-12-03 16:53           ` Oliver Hartkopp

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4ED89D50.9040401@pengutronix.de \
    --to=mkl@pengutronix.de \
    --cc=linux-can@vger.kernel.org \
    --cc=socketcan-core@lists.berlios.de \
    --cc=wg@grandegger.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).