From mboxrd@z Thu Jan 1 00:00:00 1970 From: Wolfgang Grandegger Subject: Re: [PATCH v2 1/4] can: dev: Consolidate and unify state change handling. Date: Wed, 22 Oct 2014 20:42:05 +0200 Message-ID: <5447FA7D.7060806@grandegger.com> References: <5425AF94.5000206@marel.com> <5445666A.6090601@grandegger.com> <54463893.3090906@marel.com> <14598c49d530f22df994073aff17d729@grandegger.com> <5446742F.1010709@marel.com> <5447998A.5080601@marel.com> <5447A78B.8090501@marel.com> <2acb44ee148c76e61617e7d9090c7180@grandegger.com> <5447DC34.7070602@marel.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: QUOTED-PRINTABLE Return-path: Received: from ngcobalt02.manitu.net ([217.11.48.102]:48128 "EHLO ngcobalt02.manitu.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753448AbaJVSmK (ORCPT ); Wed, 22 Oct 2014 14:42:10 -0400 In-Reply-To: <5447DC34.7070602@marel.com> Sender: linux-can-owner@vger.kernel.org List-ID: To: Andri Yngvason Cc: linux-can@vger.kernel.org On 10/22/2014 06:32 PM, Andri Yngvason wrote: >=20 > On mi=C3=B0 22.okt 2014 12:56, Wolfgang Grandegger wrote: >> >>>>> restarted-after-bus-off >>>> And the restart does come when the short-circuit is gone? >>>> >>> No, in fact the bus keeps restarting until the short-circuit is gon= e. >>> >>> Note that I'm using peak_pci. It's sja1000 but maybe there is somet= hing >>> different? >> No, because it's a memory mapped SJA1000 chip. What do you see with >> "dmesg" >> assuming that CONFIG_CAN_DEV_DEBUG is enabled. >> > dmesg wasn't very informative so I added some debug output of my own.= This is > the output after I've fixed everything: > [ 3903.594640] peak_pci 0000:02:00.0: can1 at reg_base=3D0xf892a400 > cfg_base=3D0xf8928000 irq=3D18 > [ 3903.684937] peak_pci 0000:02:00.0 can1: setting BTR0=3D0x07 BTR1=3D= 0x05 > [ 3922.009853] peak_pci 0000:02:00.0 can1: error warning interrupt; b= s=3Dfalse, > es=3Dtrue According to the manual: error active -> warning > [ 3922.017428] peak_pci 0000:02:00.0 can1: state=3D1, txerr=3D96, rxe= rr=3D1 > [ 3922.023615] peak_pci 0000:02:00.0 can1: error passive interrupt; b= s=3Dfalse, > es=3Dtrue warning -> error passive > [ 3922.031197] peak_pci 0000:02:00.0 can1: state=3D2, txerr=3D128, rx= err=3D1 > [ 3928.401643] peak_pci 0000:02:00.0 can1: error passive interrupt; b= s=3Dfalse, > es=3Dtrue error passive -> warning > [ 3928.409243] peak_pci 0000:02:00.0 can1: state=3D1, txerr=3D127, rx= err=3D0 > [ 3934.804395] peak_pci 0000:02:00.0 can1: error warning interrupt; b= s=3Dfalse, > es=3Dfalse warning -> error active > [ 3934.812075] peak_pci 0000:02:00.0 can1: state=3D0, txerr=3D95, rxe= rr=3D0 >=20 > Note that, between the two passive interrupts, es does not change. Th= is is > correct according to > http://www.nxp.com/documents/data_sheet/SJA1000.pdf because ES only s= ignifies > warning state. > Thus, "if (status & SR_ES)" is redundant. >=20 > @@ -440,11 +446,14 @@ static int sja1000_err(struct net_device *dev, = uint8_t > isrc, uint8_t status) > } > if (isrc & IRQ_EPI) { > /* error passive interrupt */ > - netdev_dbg(dev, "error passive interrupt\n"); > - if (status & SR_ES) > - state =3D CAN_STATE_ERROR_PASSIVE; > + netdev_dbg(dev, "error passive interrupt; bs=3D%s, es=3D%s\n= ", > + status & SR_BS ? "true" : "false", > + status & SR_ES ? "true" : "false"); > + > + if (state =3D=3D CAN_STATE_ERROR_PASSIVE) > + state =3D CAN_STATE_ERROR_WARNING; > else > - state =3D CAN_STATE_ERROR_ACTIVE; > + state =3D CAN_STATE_ERROR_PASSIVE; > } > if (isrc & IRQ_ALI) { > /* arbitration lost interrupt */ >=20 > The data sheet says this about EPI: > set; this bit is set whenever the CAN controller has > reached the error passive status (at least one > error counter exceeds the protocol-defined level of > 127) or if the CAN controller is in the error passive > status and enters the error active status again and > the EPIE bit is set within the interrupt enable > register >=20 > I'm a little bit concerned that it actually says that EPI is set on "= error passive" > and "error passive to error active". However, the log says that txerr= =3D127 on > that second interrupt. It makes more sense for it to be "error warnin= g". Might > this be a error in the datasheet? IIRC, the CAN standard only knows error active and error passive. Various vendors added the warning level where the level is sometimes even programmable. > Another thing that worries me is that when I enabled debug, txerr wou= ld advance > while the interrupt was being handled. This yielded mismatches betwee= n > the new state and the actual current state of the error counters. I m= oved the > reading of the RXERR/TXERR registers to the top of the error handler = function > and that fixed the problem for me. Might this still be an issue on sl= ower > platforms? Yes, strictly speaking we cannot be sure that we read the correct error counter when the state change is triggered via interrupt. > candump looks like this: > ... > (000.053403) can0 5B1 [5] D1 D1 BB 7C DF > (000.146664) can0 506 [8] 2C 6E 9C 77 5E F1 AE 2D > (000.053628) can0 488 [8] FF C8 AC 26 43 C5 AF 11 > (000.146428) can0 5AD [8] C1 31 BD 1B 6B 8D 34 13 > (000.053112) can0 658 [0] > (000.171131) can0 20000004 [8] 00 08 00 00 00 00 60 00 ERRORF= RAME > controller-problem{tx-error-warning} > error-counter-tx-rx{{96}{0}} > (000.013841) can0 20000004 [8] 00 20 00 00 00 00 80 00 ERRORF= RAME > controller-problem{tx-error-passive} > error-counter-tx-rx{{128}{0}} > (004.433642) can0 20000002 [8] 04 00 00 00 00 00 00 00 ERRORF= RAME > lost-arbitration{at bit 4} > (000.014785) can0 20000002 [8] 02 00 00 00 00 00 00 00 ERRORF= RAME > lost-arbitration{at bit 2} > (000.012565) can0 20000002 [8] 02 00 00 00 00 00 00 00 ERRORF= RAME > lost-arbitration{at bit 2} > (000.000006) can0 152 [8] 48 94 14 45 C3 60 8A 58 > (000.000015) can0 6D7 [4] 3A B8 84 26 > (000.012537) can0 20000002 [8] 02 00 00 00 00 00 00 00 ERRORF= RAME > lost-arbitration{at bit 2} Why do you see these errors? Are there electrical problems on the CAN bus? And if no cable is connected just txerr should increase (and decrease if it's reconnected! > (000.014083) can0 2E7 [7] A2 6F F8 5F DF DD 3B > (000.000904) can0 757 [8] 4F 0E AB 57 C3 58 DB 75 > (000.000680) can0 459 [5] 52 52 58 29 8C > ... > (000.000546) can0 3C7 [2] 18 AC > (000.000875) can0 130 [8] B3 57 0A 52 CC CA D3 08 > (000.015962) can0 5DF [8] 64 14 6C 61 F9 FD E9 55 > (000.000023) can0 5D2 [8] C5 B2 39 6B 65 45 4C 05 > (000.014554) can0 20000002 [8] 03 00 00 00 00 00 00 00 ERRORF= RAME > lost-arbitration{at bit 3} > (000.000008) can0 2FC [8] 19 73 77 6C 86 91 D5 58 > ... > (000.053183) can0 7A1 [6] F8 EE 7B 12 A3 43 > (000.146734) can0 089 [5] 36 B8 A0 4F DD > (000.014032) can0 20000004 [8] 00 0C 00 00 00 00 7F 74 ERRORF= RAME > controller-problem{rx-error-warning,tx-error-warning} > error-counter-tx-rx{{127}{116}} > (000.039316) can0 684 [6] ED D1 1E 32 00 1D > (000.146892) can0 0D1 [8] CD 99 DE 18 62 B0 45 27 > (000.053317) can0 687 [8] 43 37 CF 5E 6B A4 CA 3D > ... > (000.053561) can0 42A [4] 65 20 3B 5C > (000.146834) can0 3C2 [8] 32 50 A4 56 31 EC DD 55 > (000.013948) can0 20000004 [8] 00 40 00 00 00 00 5F 54 ERRORF= RAME > controller-problem{back-to-error-active} > error-counter-tx-rx{{95}{84}} > (000.039372) can0 455 [5] 58 38 31 62 B6 > ... Wolfgang.