From mboxrd@z Thu Jan 1 00:00:00 1970 From: Wolfgang Grandegger Subject: Re: [PATCH 2/4] Consolidate and unify state change handling Date: Tue, 23 Sep 2014 22:33:48 +0200 Message-ID: <5421D92C.80906@grandegger.com> References: <541B0792.5030002@marel.com> <541C9BD8.8070303@grandegger.com> <541EE4E9.9010506@marel.com> <541EEF28.5000103@grandegger.com> <541F0A80.7060800@marel.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: QUOTED-PRINTABLE Return-path: Received: from ngcobalt02.manitu.net ([217.11.48.102]:59055 "EHLO ngcobalt02.manitu.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755000AbaIWUd4 (ORCPT ); Tue, 23 Sep 2014 16:33:56 -0400 In-Reply-To: <541F0A80.7060800@marel.com> Sender: linux-can-owner@vger.kernel.org List-ID: To: Andri Yngvason , Marc Kleine-Budde , linux-can@vger.kernel.org On 09/21/2014 07:27 PM, Andri Yngvason wrote: >=20 > On sun 21.sep 2014 15:30, Wolfgang Grandegger wrote: >> On 09/21/2014 04:47 PM, Andri Yngvason wrote: >>> On f=C3=B6s 19.sep 2014 21:10, Wolfgang Grandegger wrote: >>>> On 09/18/2014 06:25 PM, Andri Yngvason wrote: >>>>> Signed-off-by: Andri Yngvason >>>>> --- >>>>> ... >>>>> - cf->can_id |=3D CAN_ERR_CRTL; >>>>> - cf->data[1] =3D (bec.txerr > bec.rxerr) ? >>>>> - CAN_ERR_CRTL_TX_WARNING : >>>>> - CAN_ERR_CRTL_RX_WARNING; >>>> Hm, can_change_state() handles the equal case differently. In the >>>> SJA1000 manual I found: >>>> >>>> "Errors detected during reception or transmission will affect the = error >>>> counters according to the CAN 2.0B protocol >>>> specification. The error status bit is set when at least one of th= e >>>> error counters has reached or exceeded the CPU >>>> warning limit of 96. An error interrupt is generated, if enabled." >>>> >>>> If both are equal we do not known if rx or tx has caused the state >>>> change and therefore setting "CAN_ERR_CRTL_TX_WARNING | >>>> CAN_ERR_CRTL_RX_WARNING" seems more logical, indeed. But maybe it >>>> simply >>>> does not happen. Any other opinions? >>> I think that not specifically handling the equal case would be wron= g. >>> Let's >>> consider the following sequence of events: >>> * txerr reaches warning level >>> * rxerr reaches warning level >>> If they are both equal at this point, you will only get a second >>> CAN_ERR_CRTL_TX_WARNING in the current implementation, whereas in t= he >>> proposed >>> implementation, the user would get >>> CAN_ERR_CRTL_TX_WARNING | CAN_ERR_CRTL_RX_WARNING and because the u= ser >>> can know >>> the prior error state message, he can find out which state actually >>> changed. >> The question is what error (rx or tx) error did triger the error sta= te >> change interrupt. I doubt that such an interrupt is triggered when o= ne >> error counter catches up, .e.g. txer was > 128 and rxerr exceeded 12= 8. >> It's even not sure that all the controllers act the same way. Theref= ore >> also keeping the current behaviour would be fine for me. > Also, because of the state !=3D priv->state assert, the equal case wo= n't > happen > when the state increases, but it might happen when it goes down. Perh= aps > that should be changed? >=20 > But in the case where the state goes down, there will definitely be a= n > interrupt generated. E.g. rx_state =3D warn, tx_state =3D passive and= then when > tx_state -> warn, we will have the controller's state go to warn from > passive, > and then rx_state =3D=3D tx_state. So, if we only want to send which = state > changed, we actually have to keep copies of each counter's current (l= ast) > state, as is done in priv->state, for the whole controller. Well, that's definitely to sophisticated > I think it would be easier, simpler and more useful to just send the > current, > state of each counter whenever the state changes. Consider this: > diff --git a/drivers/net/can/dev.c b/drivers/net/can/dev.c > index 02492d2..6199571 100644 > --- a/drivers/net/can/dev.c > +++ b/drivers/net/can/dev.c > @@ -273,6 +273,118 @@ static int can_get_bittiming(struct net_device > *dev, struct can_bittiming *bt, > return err; > } >=20 > +static void can_update_error_counters(enum can_state new_state) > +{ > + if (state < priv->state) > + return; > + > + switch (new_state) { > + case CAN_STATE_ERROR_ACTIVE: > + netdev_warn(dev, "%s: oops, did we come from a state less th= an > error-active?", > + __func__); > + break; > ... > +} > + > +static int can_txstate_to_frame(enum can_state state) > +{ > + switch(state) > + { > + case CAN_STATE_ERROR_ACTIVE: > + return CAN_ERR_CRTL_TX_ACTIVE; > ... > +} > + > +static int can_rxstate_to_frame(enum can_state state) > +{ > + switch(state) > + { > + case CAN_STATE_ERROR_ACTIVE: > + return CAN_ERR_CRTL_RX_ACTIVE; > ... > +} > + > +void can_change_state(struct net_device *dev, struct can_frame *cf, > + enum can_state new_state, enum can_state tx_state, > + enum can_state rx_state) > +{ > + struct can_priv *priv =3D netdev_priv(dev); > + > + if (unlikely(state =3D=3D priv->state)) { > + netdev_warn(dev, "%s: oops, state did not change", __func__)= ; > + return; > + } > + > + can_update_error_counters(new_state); > + > + if (unlikely(state =3D=3D CAN_STATE_BUS_OFF)) { > + cf->can_id |=3D CAN_ERR_BUSOFF; > + } else { > + cf->can_id |=3D CAN_ERR_CRTL; > + /* Absolute: */ > + cf->data[1] |=3D can_txstate_to_frame(tx_state) > + | can_rxstate_to_frame(rx_state); > + /* Alternatively, the difference: > + * if (tx_state > rx_state) > + * cf->data[1] |=3D can_txstate_to_frame(tx_state); > + * if (tx_state < rx_state) > + * cf->data[1] |=3D can_rxstate_to_frame(rx_state); > + * else > + * cf->data[1] |=3D can_txstate_to_frame(tx_state) > + * | can_rxstate_to_frame(rx_state); > + * Or even, disregarding the equal case: > + * cf->data[1] |=3D (tx_state > rx_state) ? > + * can_txstate_to_frame(tx_state) : > + * can_rxstate_to_frame(rx_state); > + */ > + > + } > + > + priv->state =3D state; > +} > +EXPORT_SYMBOL_GPL(can_change_state); > + > /* > * Local echo of CAN messages > * =46or simplicity, I vote for setting (CAN_ERR_CRTL_TX_WARNING | CAN_ERR_CRTL_RX_WARNING) if the tx and rx error counters a equal. >>> But this is all based on the premise that txerr hasn't progressed s= ince. >>> In fact, >>> because we cannot assume that txerr stays in place until rxerr catc= hes >>> up, this >>> is what we should be doing: >>> enum can_state errcount_to_state(unsigned int count) >>> { >>> if (unlikely(count > 127)) >>> return CAN_STATE_ERROR_PASSIVE; >>> >>> if (unlikely(count > 96)) >>> return CAN_STATE_ERROR_WARNING; >>> >>> return CAN_STATE_ERROR_ACTIVE; >>> } >>> >>> enum can_err_dir can_get_err_dir(unsigned int txerr, unsigned int r= xerr) >>> { >>> enum can_err_dir dir; >>> >>> enum can_state tx_state =3D errcount_to_state(txerr); >>> enum can_state rx_state =3D errcount_to_state(rxerr); >>> >>> if (tx_state > rx_state) >>> return CAN_ERR_DIR_TX; >>> >>> if (tx_state < rx_state) >>> return CAN_ERR_DIR_RX; >>> >>> return CAN_ERR_DIR_TX | CAN_ERR_DIR_RX; >>> } >>> >>> However, now that we've introduced errcount_to_state(), it seems to= me >>> that it would >>> be simpler to dump the proposed CAN_ERR_DIR enum in favour of passi= ng >>> the two states >>> directly to can_change_state(). >> D'accord. >> >>> >>>> To validate the correct behaviour could you please send messages w= hile >>>> the cable is disconnected. Then reconnect the cable and see how th= e >>>> error state decreases. You can monitor the behaviour with ""candum= p -td >>>> -e any,0:0,#FFFFFFFF" in another shell. >>>> >>> I'm using PCAN-USB Pro to generate errors on the bus. It works quit= e >>> well. >>> I can generate tx errors by sending from the device and then have t= he >>> pcan >>> ruin a few frames. rx errors can be generated by having an other >>> device on >>> the bus outputting random data and then let the pcan corrupt the fr= ames. >> Short-circuiting the CAN low and high lines is a simple method to > Ahh, yes, I tried that too. That's what triggered bus-off. I got it > mixed up in > my head. :) >>> Sadly the error generation mechanism only works on windows. :( >>> >>> I've tried the "disconnected cable" method too in the past. It usua= lly >>> puts mscan into bus-off quite fast. >> Sending a message whithout cable should never trigger an bus-off. Th= e tx >> error counter never exceeds 128. >> >> Here is an example output of "candump -candump -td -e any,0:0,#FFFFF= =46FF" >> for a recovery from error passive state due to no ack/cable (reconne= ct >> after 5s) for a SJA1000 on an on EMS PCI card: >> >> (000.201913) can0 1C [0] >> (000.212241) can0 20000204 [8] 00 08 00 00 00 00 60 00 ERRORF= RAME >> controller-problem{tx-error-warning} >> state-change{tx-error-warning} >> error-counter-tx-rx{{96}{0}} >> (000.003544) can0 20000204 [8] 00 20 00 00 00 00 80 00 ERRORF= RAME >> controller-problem{tx-error-passive} >> state-change{tx-error-passive} >> error-counter-tx-rx{{128}{0}} >> (004.901842) can0 1D [7] 1D F6 33 52 31 4B DE >> (000.000116) can0 20000200 [8] 00 08 00 00 00 00 7F 00 ERRORF= RAME >> state-change{tx-error-warning} >> error-counter-tx-rx{{127}{0}} >> (000.000678) can0 1E [6] 42 05 14 82 23 B6 >> ... >> (000.201927) can0 49 [4] 2F 1A 97 25 >> (000.000096) can0 20000200 [8] 00 40 00 00 00 00 5F 00 ERRORF= RAME >> state-change{back-to-error-active} >> error-counter-tx-rx{{95}{0}} >> (000.202184) can0 4A [8] 7F 87 0E FE 03 BA 78 91 >> >> This is from my related patch-set. >> > Okay, I'll try that but the -e flag won't help much because candump e= xpects > the PROT abuse. >=20 > Andri >=20 > PS.: I must admit that I don't actually know why it's useful to know = which > error counter changed; tx or rx. I think it would be much simpler to = send > the max of both and be done with it. Can anyone point out a case wher= e this > helps? I agree that it would be much simpler not to distinguish between rx and tx state changes. This is for historical reasons. Oliver, do you remember why we adapted that solution? Wolfgang.