All of lore.kernel.org
 help / color / mirror / Atom feed
From: Marc Kleine-Budde <mkl@pengutronix.de>
To: Wolfgang Grandegger <wg@grandegger.com>
Cc: Linux-can Mailing List <linux-can@vger.kernel.org>,
	SocketCAN Core Mailing List <socketcan-core@lists.berlios.de>
Subject: Re: RFC: improve and consolidate state change and bus-off handling
Date: Fri, 02 Dec 2011 10:41:36 +0100	[thread overview]
Message-ID: <4ED89D50.9040401@pengutronix.de> (raw)
In-Reply-To: <4ED8948E.5090806@grandegger.com>

[-- Attachment #1: Type: text/plain, Size: 5161 bytes --]

On 12/02/2011 10:04 AM, Wolfgang Grandegger wrote:
> Hello,
> 
> as you might know, our handling of CAN state changes and bus-off is
> not consistent, weak or even incorrect. Therefore I'm making an effort
> to improve, consolidate and *unify* it. Most things are straight-
> forward, but others need more attention and especially for the bus-off
> recovery I would appreciate some CAN expert advice (more below). I have
> already some patches implementing:
> 
> - Add missing do_get_berr_counter() callbacks (for ti_hecc, etc.).

+1

> - Add error counters to the data fields 6..7 of *any* CAN error message
>   automatically in alloc_can_err_skb():
> 
>        if (priv->do_get_berr_counter) {
>                struct can_berr_counter bec;
> 
>                priv->do_get_berr_counter(dev, &bec);
>                (*cf)->data[6] = bec.txerr;
>                (*cf)->data[7] = bec.rxerr;
>        }

What about some not directly connected devices like the mcp251x. At
least the mcp2515 driver (which is not mainline, though) needs a spi
transfer for that.

Do we need a flag in the driver to indicate not to read the berr_counter?

> - Allow state changes going down including "back to error active":
> 
>   Therefore I added:
> 
>     $ cat include/linux/can/error.h
>     ...
>     #define CAN_ERR_STATE_CHANGE 0x00000200U /* CAN error state change / data[1] */
>     ...
>     #define CAN_ERR_CRTL_ACTIVE      0x40 /* recovered to error active state */
>  
>   For any state change the CAN_ERR_STATE_CHANGE will be set in the
>   can_id. If the state gets worse, CAN_ERR_CRTL is set as usual
>   also for backward compatibility. The state change management will
>   be done by a common "can_change_state()" function doing all the bit

yeah! common function! +1

>   settings and counter increments. For the SJA1000 "candump -e" will
>   then report for recovery from the error passive state (no cable):
> 
>     can0  20000204  [8] 00 08 00 00 00 00 60 00   ERRORFRAME
> 	controller-problem{tx-error-warning}
> 	state-change{tx-error-warning}
> 	error-counter-tx-rx{{96}{0}}
>     can0  20000204  [8] 00 30 00 00 00 00 80 00   ERRORFRAME
> 	controller-problem{tx-error-passive}
> 	state-change{tx-error-passive}
> 	error-counter-tx-rx{{128}{0}}
>     can0  124  [3] 12 34 56
>     ...
>     can0  124  [3] 12 34 56
>     can0  20000200  [8] 00 08 00 00 00 00 7F 00   ERRORFRAME
> 	state-change{tx-error-warning}
> 	error-counter-tx-rx{{127}{0}}
>     can0  124  [3] 12 34 56
>     ...
>     can0  124  [3] 12 34 56
>     can0  20000200  [8] 00 40 00 00 00 00 5F 00   ERRORFRAME
> 	state-change{back-to-error-active}
> 	error-counter-tx-rx{{95}{0}}
>  
>    Updating all drivers correctly is a challenge, especially because I
>    do not have all hardware. Help and comments are appreciated.

I can test the at91_can, flexcan and if we're lucky we've a mcp251x in
the office.

> - Bus-off recovery:
> 
>   Currently, I think, we do not handle bus-off recovery correctly for
>   most controllers. We brute-force stop and restart the controller.
>   The controller will do the recovery cycle anyway and we may send
>   messages to early. Instead the software should handle the bus-off
>   recovery cycle as shown below:
> 
>   * bus-off happens
>     - call netif_stop_queue() and maybe disable interrupts
> 
>   * automatic or manual restart is done
>     - trigger bus-off recovery sequence by resetting the init bit
>       (on SJA1000) and maybe re-enable the interrupts
>     - await the controller going back to error-active state
>       (signaled via interrupt).

I'm not sure if all controllers signal correctly that they are back in
error active. My observation is that bus off handling is a bit like
climbing the mount Everest, the air is quite thin and things can lock up
quite fast.

>     - call netif_wake_queue()
> 
>   Here is a "candump -e" output for the SJA1000 (with delta times)
> 
>     (009.832477)  can0  20000204  [8] 00 30 00 00 00 00 88 00   ERRORFRAME
> 	controller-problem{tx-error-passive}
> 	state-change{tx-error-passive}
> 	error-counter-tx-rx{{136}{0}}
>     (000.000804)  can0  20000240  [8] 00 00 00 00 00 00 7F 00   ERRORFRAME
> 	bus-off
> 	state-change{}
> 	error-counter-tx-rx{{127}{0}}
>     (000.099795)  can0  20000100  [8] 00 00 00 00 00 00 7F 00   ERRORFRAME
> 	restarted-after-bus-off
> 	error-counter-tx-rx{{127}{0}}
>     (000.003061)  can0  20000200  [8] 00 40 00 00 00 00 00 00   ERRORFRAME
> 	state-change{back-to-error-active}
> 
>    Before doing all the necessary code changes, which are not always
>    trivial I ask: Would that be the correct bus-off handling???

However if hardware permits the described steps sound reasonable (from
my non CAN expert point of view).

> Thanks for feedback.

Marc

-- 
Pengutronix e.K.                  | Marc Kleine-Budde           |
Industrial Linux Solutions        | Phone: +49-231-2826-924     |
Vertretung West/Dortmund          | Fax:   +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

  reply	other threads:[~2011-12-02  9:41 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-12-02  9:04 RFC: improve and consolidate state change and bus-off handling Wolfgang Grandegger
2011-12-02  9:41 ` Marc Kleine-Budde [this message]
2011-12-02 10:04   ` Wolfgang Grandegger
2011-12-03  9:28   ` Sebastian Haas
     [not found]     ` <4ED9EBAB.4060701-xpvPi5bcW5W9w4jpWW8B1qHonnlKzd3f@public.gmane.org>
2011-12-03 10:18       ` Oliver Hartkopp
2011-12-03 11:53         ` Sebastian Haas
2011-12-03 16:53           ` Oliver Hartkopp

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4ED89D50.9040401@pengutronix.de \
    --to=mkl@pengutronix.de \
    --cc=linux-can@vger.kernel.org \
    --cc=socketcan-core@lists.berlios.de \
    --cc=wg@grandegger.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.