RFC: improve and consolidate state change and bus-off handling

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Wolfgang Grandegger <wg@grandegger.com>
To: Linux-can Mailing List <linux-can@vger.kernel.org>,
	SocketCAN Core Mailing List <socketcan-core@lists.berlios.de>
Subject: RFC: improve and consolidate state change and bus-off handling
Date: Fri, 02 Dec 2011 10:04:14 +0100	[thread overview]
Message-ID: <4ED8948E.5090806@grandegger.com> (raw)

Hello,

as you might know, our handling of CAN state changes and bus-off is
not consistent, weak or even incorrect. Therefore I'm making an effort
to improve, consolidate and *unify* it. Most things are straight-
forward, but others need more attention and especially for the bus-off
recovery I would appreciate some CAN expert advice (more below). I have
already some patches implementing:

- Add missing do_get_berr_counter() callbacks (for ti_hecc, etc.).

- Add error counters to the data fields 6..7 of *any* CAN error message
  automatically in alloc_can_err_skb():

       if (priv->do_get_berr_counter) {
               struct can_berr_counter bec;

               priv->do_get_berr_counter(dev, &bec);
               (*cf)->data[6] = bec.txerr;
               (*cf)->data[7] = bec.rxerr;
       }

- Allow state changes going down including "back to error active":

  Therefore I added:

    $ cat include/linux/can/error.h
    ...
    #define CAN_ERR_STATE_CHANGE 0x00000200U /* CAN error state change / data[1] */
    ...
    #define CAN_ERR_CRTL_ACTIVE      0x40 /* recovered to error active state */

  For any state change the CAN_ERR_STATE_CHANGE will be set in the
  can_id. If the state gets worse, CAN_ERR_CRTL is set as usual
  also for backward compatibility. The state change management will
  be done by a common "can_change_state()" function doing all the bit
  settings and counter increments. For the SJA1000 "candump -e" will
  then report for recovery from the error passive state (no cable):

    can0  20000204  [8] 00 08 00 00 00 00 60 00   ERRORFRAME
	controller-problem{tx-error-warning}
	state-change{tx-error-warning}
	error-counter-tx-rx{{96}{0}}
    can0  20000204  [8] 00 30 00 00 00 00 80 00   ERRORFRAME
	controller-problem{tx-error-passive}
	state-change{tx-error-passive}
	error-counter-tx-rx{{128}{0}}
    can0  124  [3] 12 34 56
    ...
    can0  124  [3] 12 34 56
    can0  20000200  [8] 00 08 00 00 00 00 7F 00   ERRORFRAME
	state-change{tx-error-warning}
	error-counter-tx-rx{{127}{0}}
    can0  124  [3] 12 34 56
    ...
    can0  124  [3] 12 34 56
    can0  20000200  [8] 00 40 00 00 00 00 5F 00   ERRORFRAME
	state-change{back-to-error-active}
	error-counter-tx-rx{{95}{0}}

   Updating all drivers correctly is a challenge, especially because I
   do not have all hardware. Help and comments are appreciated.

- Bus-off recovery:

  Currently, I think, we do not handle bus-off recovery correctly for
  most controllers. We brute-force stop and restart the controller.
  The controller will do the recovery cycle anyway and we may send
  messages to early. Instead the software should handle the bus-off
  recovery cycle as shown below:

  * bus-off happens
    - call netif_stop_queue() and maybe disable interrupts

  * automatic or manual restart is done
    - trigger bus-off recovery sequence by resetting the init bit
      (on SJA1000) and maybe re-enable the interrupts
    - await the controller going back to error-active state
      (signaled via interrupt).
    - call netif_wake_queue()

  Here is a "candump -e" output for the SJA1000 (with delta times)

    (009.832477)  can0  20000204  [8] 00 30 00 00 00 00 88 00   ERRORFRAME
	controller-problem{tx-error-passive}
	state-change{tx-error-passive}
	error-counter-tx-rx{{136}{0}}
    (000.000804)  can0  20000240  [8] 00 00 00 00 00 00 7F 00   ERRORFRAME
	bus-off
	state-change{}
	error-counter-tx-rx{{127}{0}}
    (000.099795)  can0  20000100  [8] 00 00 00 00 00 00 7F 00   ERRORFRAME
	restarted-after-bus-off
	error-counter-tx-rx{{127}{0}}
    (000.003061)  can0  20000200  [8] 00 40 00 00 00 00 00 00   ERRORFRAME
	state-change{back-to-error-active}

   Before doing all the necessary code changes, which are not always
   trivial I ask: Would that be the correct bus-off handling???

Thanks for feedback.

Wolfgang.

next             reply	other threads:[~2011-12-02  9:04 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-12-02  9:04 Wolfgang Grandegger [this message]
2011-12-02  9:41 ` RFC: improve and consolidate state change and bus-off handling Marc Kleine-Budde
2011-12-02 10:04   ` Wolfgang Grandegger
2011-12-03  9:28   ` Sebastian Haas
     [not found]     ` <4ED9EBAB.4060701-xpvPi5bcW5W9w4jpWW8B1qHonnlKzd3f@public.gmane.org>
2011-12-03 10:18       ` Oliver Hartkopp
2011-12-03 11:53         ` Sebastian Haas
2011-12-03 16:53           ` Oliver Hartkopp

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4ED8948E.5090806@grandegger.com \
    --to=wg@grandegger.com \
    --cc=linux-can@vger.kernel.org \
    --cc=socketcan-core@lists.berlios.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.