linux-can.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Wolfgang Grandegger <wg@grandegger.com>
To: Andri Yngvason <andri.yngvason@marel.com>
Cc: linux-can@vger.kernel.org
Subject: Re: [PATCH v2 1/4] can: dev: Consolidate and unify state change handling.
Date: Wed, 22 Oct 2014 20:42:05 +0200	[thread overview]
Message-ID: <5447FA7D.7060806@grandegger.com> (raw)
In-Reply-To: <5447DC34.7070602@marel.com>

On 10/22/2014 06:32 PM, Andri Yngvason wrote:
> 
> On mið 22.okt 2014 12:56, Wolfgang Grandegger wrote:
>>
>>>>>         restarted-after-bus-off
>>>> And the restart does come when the short-circuit is gone?
>>>>
>>> No, in fact the bus keeps restarting until the short-circuit is gone.
>>>
>>> Note that I'm using peak_pci. It's sja1000 but maybe there is something
>>> different?
>> No, because it's a memory mapped SJA1000 chip. What do you see with
>> "dmesg"
>> assuming that CONFIG_CAN_DEV_DEBUG is enabled.
>>
> dmesg wasn't very informative so I added some debug output of my own. This is
> the output after I've fixed everything:
> [ 3903.594640] peak_pci 0000:02:00.0: can1 at reg_base=0xf892a400
> cfg_base=0xf8928000 irq=18
> [ 3903.684937] peak_pci 0000:02:00.0 can1: setting BTR0=0x07 BTR1=0x05
> [ 3922.009853] peak_pci 0000:02:00.0 can1: error warning interrupt; bs=false,
> es=true

According to the manual: error active -> warning

> [ 3922.017428] peak_pci 0000:02:00.0 can1: state=1, txerr=96, rxerr=1
> [ 3922.023615] peak_pci 0000:02:00.0 can1: error passive interrupt; bs=false,
> es=true

warning -> error passive

> [ 3922.031197] peak_pci 0000:02:00.0 can1: state=2, txerr=128, rxerr=1
> [ 3928.401643] peak_pci 0000:02:00.0 can1: error passive interrupt; bs=false,
> es=true

error passive -> warning

> [ 3928.409243] peak_pci 0000:02:00.0 can1: state=1, txerr=127, rxerr=0
> [ 3934.804395] peak_pci 0000:02:00.0 can1: error warning interrupt; bs=false,
> es=false

warning -> error active

> [ 3934.812075] peak_pci 0000:02:00.0 can1: state=0, txerr=95, rxerr=0
> 
> Note that, between the two passive interrupts, es does not change. This is
> correct according to
> http://www.nxp.com/documents/data_sheet/SJA1000.pdf because ES only signifies
> warning state.
> Thus, "if (status & SR_ES)" is redundant.
> 
> @@ -440,11 +446,14 @@ static int sja1000_err(struct net_device *dev, uint8_t
> isrc, uint8_t status)
>      }
>      if (isrc & IRQ_EPI) {
>          /* error passive interrupt */
> -        netdev_dbg(dev, "error passive interrupt\n");
> -        if (status & SR_ES)
> -            state = CAN_STATE_ERROR_PASSIVE;
> +        netdev_dbg(dev, "error passive interrupt; bs=%s, es=%s\n",
> +               status & SR_BS ? "true" : "false",
> +               status & SR_ES ? "true" : "false");
> +
> +        if (state == CAN_STATE_ERROR_PASSIVE)
> +            state = CAN_STATE_ERROR_WARNING;
>          else
> -            state = CAN_STATE_ERROR_ACTIVE;
> +            state = CAN_STATE_ERROR_PASSIVE;
>      }
>      if (isrc & IRQ_ALI) {
>          /* arbitration lost interrupt */
> 
> The data sheet says this about EPI:
>     set; this bit is set whenever the CAN controller has
>     reached the error passive status (at least one
>     error counter exceeds the protocol-defined level of
>     127) or if the CAN controller is in the error passive
>     status and enters the error active status again and
>     the EPIE bit is set within the interrupt enable
>     register
> 
> I'm a little bit concerned that it actually says that EPI is set on "error passive"
> and "error passive to error active". However, the log says that txerr=127 on
> that second interrupt. It makes more sense for it to be "error warning". Might
> this be a error in the datasheet?

IIRC, the CAN standard only knows error active and error passive.
Various vendors added the warning level where the level is sometimes
even programmable.

> Another thing that worries me is that when I enabled debug, txerr would advance
> while the interrupt was being handled. This yielded mismatches between
> the new state and the actual current state of the error counters. I moved the
> reading of the RXERR/TXERR registers to the top of the error handler function
> and that fixed the problem for me. Might this still be an issue on slower
> platforms?

Yes, strictly speaking we cannot be sure that we read the correct error
counter when the state change is triggered via interrupt.

> candump looks like this:
> ...
>  (000.053403)  can0  5B1   [5]  D1 D1 BB 7C DF
>  (000.146664)  can0  506   [8]  2C 6E 9C 77 5E F1 AE 2D
>  (000.053628)  can0  488   [8]  FF C8 AC 26 43 C5 AF 11
>  (000.146428)  can0  5AD   [8]  C1 31 BD 1B 6B 8D 34 13
>  (000.053112)  can0  658   [0]
>  (000.171131)  can0  20000004   [8]  00 08 00 00 00 00 60 00   ERRORFRAME
>     controller-problem{tx-error-warning}
>     error-counter-tx-rx{{96}{0}}
>  (000.013841)  can0  20000004   [8]  00 20 00 00 00 00 80 00   ERRORFRAME
>     controller-problem{tx-error-passive}
>     error-counter-tx-rx{{128}{0}}
>  (004.433642)  can0  20000002   [8]  04 00 00 00 00 00 00 00   ERRORFRAME
>     lost-arbitration{at bit 4}
>  (000.014785)  can0  20000002   [8]  02 00 00 00 00 00 00 00   ERRORFRAME
>     lost-arbitration{at bit 2}
>  (000.012565)  can0  20000002   [8]  02 00 00 00 00 00 00 00   ERRORFRAME
>     lost-arbitration{at bit 2}
>  (000.000006)  can0  152   [8]  48 94 14 45 C3 60 8A 58
>  (000.000015)  can0  6D7   [4]  3A B8 84 26
>  (000.012537)  can0  20000002   [8]  02 00 00 00 00 00 00 00   ERRORFRAME
>     lost-arbitration{at bit 2}

Why do you see these errors? Are there electrical problems on the CAN
bus? And if no cable is connected just txerr should increase (and
decrease if it's reconnected!

>  (000.014083)  can0  2E7   [7]  A2 6F F8 5F DF DD 3B
>  (000.000904)  can0  757   [8]  4F 0E AB 57 C3 58 DB 75
>  (000.000680)  can0  459   [5]  52 52 58 29 8C
> ...
>  (000.000546)  can0  3C7   [2]  18 AC
>  (000.000875)  can0  130   [8]  B3 57 0A 52 CC CA D3 08
>  (000.015962)  can0  5DF   [8]  64 14 6C 61 F9 FD E9 55
>  (000.000023)  can0  5D2   [8]  C5 B2 39 6B 65 45 4C 05
>  (000.014554)  can0  20000002   [8]  03 00 00 00 00 00 00 00   ERRORFRAME
>     lost-arbitration{at bit 3}
>  (000.000008)  can0  2FC   [8]  19 73 77 6C 86 91 D5 58
> ...
>  (000.053183)  can0  7A1   [6]  F8 EE 7B 12 A3 43
>  (000.146734)  can0  089   [5]  36 B8 A0 4F DD
>  (000.014032)  can0  20000004   [8]  00 0C 00 00 00 00 7F 74   ERRORFRAME
>     controller-problem{rx-error-warning,tx-error-warning}
>     error-counter-tx-rx{{127}{116}}
>  (000.039316)  can0  684   [6]  ED D1 1E 32 00 1D
>  (000.146892)  can0  0D1   [8]  CD 99 DE 18 62 B0 45 27
>  (000.053317)  can0  687   [8]  43 37 CF 5E 6B A4 CA 3D
> ...
>  (000.053561)  can0  42A   [4]  65 20 3B 5C
>  (000.146834)  can0  3C2   [8]  32 50 A4 56 31 EC DD 55
>  (000.013948)  can0  20000004   [8]  00 40 00 00 00 00 5F 54   ERRORFRAME
>     controller-problem{back-to-error-active}
>     error-counter-tx-rx{{95}{84}}
>  (000.039372)  can0  455   [5]  58 38 31 62 B6
> ...

Wolfgang.


  reply	other threads:[~2014-10-22 18:42 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-09-26 18:25 [PATCH v2 1/4] can: dev: Consolidate and unify state change handling Andri Yngvason
2014-10-20 19:45 ` Wolfgang Grandegger
2014-10-21 10:42   ` Andri Yngvason
2014-10-21 10:52     ` Wolfgang Grandegger
2014-10-21 14:56       ` Andri Yngvason
2014-10-21 15:21         ` Wolfgang Grandegger
2014-10-22 11:48           ` Andri Yngvason
2014-10-22 12:30             ` Wolfgang Grandegger
2014-10-22 12:48               ` Andri Yngvason
2014-10-22 12:56                 ` Wolfgang Grandegger
2014-10-22 16:32                   ` Andri Yngvason
2014-10-22 18:42                     ` Wolfgang Grandegger [this message]
2014-10-23 12:50                       ` Andri Yngvason
2014-10-23 13:10                         ` Wolfgang Grandegger
2014-10-23 15:55                           ` Andri Yngvason
2014-11-22 17:10 ` Marc Kleine-Budde
2014-11-22 18:15   ` Andri Yngvason

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5447FA7D.7060806@grandegger.com \
    --to=wg@grandegger.com \
    --cc=andri.yngvason@marel.com \
    --cc=linux-can@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).