From: Miquel Raynal <miquel.raynal@bootlin.com>
To: Marc Kleine-Budde <mkl@pengutronix.de>
Cc: "Wolfgang Grandegger" <wg@grandegger.com>,
"David S. Miller" <davem@davemloft.net>,
"Jakub Kicinski" <kuba@kernel.org>,
"Paolo Abeni" <pabeni@redhat.com>,
"Eric Dumazet" <edumazet@google.com>,
netdev@vger.kernel.org, linux-can@vger.kernel.org,
"Jérémie Dautheribes" <jeremie.dautheribes@bootlin.com>,
"Thomas Petazzoni" <thomas.petazzoni@bootlin.com>,
sylvain.girard@se.com, pascal.eberhard@se.com,
stable@vger.kernel.org
Subject: Re: [PATCH v2] can: sja1000: Always restart the Tx queue after an overrun
Date: Mon, 2 Oct 2023 16:26:01 +0200 [thread overview]
Message-ID: <20231002162601.6b71c4d9@xps-13> (raw)
In-Reply-To: <20230928-headphone-premiere-d92deb9c29e5-mkl@pengutronix.de>
Hi Marc,
mkl@pengutronix.de wrote on Thu, 28 Sep 2023 09:53:17 +0200:
> On 27.09.2023 18:44:42, Miquel Raynal wrote:
> > Upstream commit 717c6ec241b5 ("can: sja1000: Prevent overrun stalls with
> > a soft reset on Renesas SoCs") fixes an issue with Renesas own SJA1000
> > CAN controller reception: the Rx buffer is only 5 messages long, so when
> > the bus loaded (eg. a message every 50us), overrun may easily
> > happen. Upon an overrun situation, due to a possible internal crosstalk
> > situation, the controller enters a frozen state which only can be
> > unlocked with a soft reset (experimentally). The solution was to offload
> > a call to sja1000_start() in a threaded handler. This needs to happen in
> > process context as this operation requires to sleep. sja1000_start()
> > basically enters "reset mode", performs a proper software reset and
> > returns back into "normal mode".
> >
> > Since this fix was introduced, we no longer observe any stalls in
> > reception. However it was sporadically observed that the transmit path
> > would now freeze. Further investigation blamed the fix mentioned above,
> > and especially the reset operation. Reproducing the reset in a loop
> > helped identifying what could possibly go wrong. The sja1000 is a single
> > Tx queue device, which leverages the netdev helpers to process one Tx
> > message at a time. The logic is: the queue is stopped, the message sent
> > to the transceiver, once properly transmitted the controller sets a
> > status bit which triggers an interrupt, in the interrupt handler the
> > transmission status is checked and the queue woken up. Unfortunately, if
> > an overrun happens, we might perform the soft reset precisely between
> > the transmission of the buffer to the transceiver and the advent of the
> > transmission status bit. We would then stop the transmission operation
> > without re-enabling the queue, leading to all further transmissions to
> > be ignored.
> >
> > The reset interrupt can only happen while the device is "open", and
> > after a reset we anyway want to resume normal operations, no matter if a
> > packet to transmit got dropped in the process, so we shall wake up the
> > queue. Restarting the device and waking-up the queue is exactly what
> > sja1000_set_mode(CAN_MODE_START) does. In order to be consistent about
> > the queue state, we must acquire a lock both in the reset handler and in
> > the transmit path to ensure serialization of both operations. As the
> > reset handler might still be called after the transmission of a frame to
> > the transceiver but before it actually gets transmitted, we must ensure
> > we don't leak the skb, so we free it (the behavior is consistent, no
> > matter if there was an skb on the stack or not).
> >
> > Fixes: 717c6ec241b5 ("can: sja1000: Prevent overrun stalls with a soft reset on Renesas SoCs")
> > Cc: stable@vger.kernel.org
> > Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
> > ---
> >
> > Changes in v2:
> > * As Marc sugested, use netif_tx_{,un}lock() instead of our own
> > spin_lock.
> >
> > drivers/net/can/sja1000/sja1000.c | 11 ++++++++++-
> > 1 file changed, 10 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/net/can/sja1000/sja1000.c b/drivers/net/can/sja1000/sja1000.c
> > index ae47fc72aa96..91e3fb3eed20 100644
> > --- a/drivers/net/can/sja1000/sja1000.c
> > +++ b/drivers/net/can/sja1000/sja1000.c
> > @@ -297,6 +297,7 @@ static netdev_tx_t sja1000_start_xmit(struct sk_buff *skb,
> > if (can_dropped_invalid_skb(dev, skb))
> > return NETDEV_TX_OK;
> >
> > + netif_tx_lock(dev);
> > netif_stop_queue(dev);
> >
> > fi = dlc = cf->can_dlc;
> > @@ -335,6 +336,8 @@ static netdev_tx_t sja1000_start_xmit(struct sk_buff *skb,
> >
> > sja1000_write_cmdreg(priv, cmd_reg_val);
> >
> > + netif_tx_unlock(dev);
> > +
>
> I think netif_tx_lock() should be used in a different way. As far as I
> understand it, you should call it only in the sja1000_reset_interrupt(),
> where you want to tx path to interfere.
I believe you meant "don't want"? And yes you're right current use
can't properly handle my problem.
> Please test the new code with lockdep enabled.
I will fix the current implementation and test again by manually
producing overruns.
Thanks,
Miquèl
prev parent reply other threads:[~2023-10-02 14:26 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-09-27 16:44 [PATCH v2] can: sja1000: Always restart the Tx queue after an overrun Miquel Raynal
2023-09-28 7:53 ` Marc Kleine-Budde
2023-10-02 14:26 ` Miquel Raynal [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20231002162601.6b71c4d9@xps-13 \
--to=miquel.raynal@bootlin.com \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=jeremie.dautheribes@bootlin.com \
--cc=kuba@kernel.org \
--cc=linux-can@vger.kernel.org \
--cc=mkl@pengutronix.de \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=pascal.eberhard@se.com \
--cc=stable@vger.kernel.org \
--cc=sylvain.girard@se.com \
--cc=thomas.petazzoni@bootlin.com \
--cc=wg@grandegger.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).