netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Daniel Glöckner" <dg@emlix.com>
To: Marc Kleine-Budde <mkl@pengutronix.de>
Cc: netdev@vger.kernel.org, linux-can@vger.kernel.org
Subject: Re: Softirq error with mcp251xfd driver
Date: Wed, 10 Mar 2021 22:56:21 +0100	[thread overview]
Message-ID: <20210310215621.GA5538@homes.emlix.com> (raw)
In-Reply-To: <20210310212254.GA2050@homes.emlix.com>

On Wed, Mar 10, 2021 at 10:22:54PM +0100, Daniel Glöckner wrote:
> On Wed, Mar 10, 2021 at 08:13:51AM +0100, Marc Kleine-Budde wrote:
> > On 10.03.2021 07:46:26, Daniel Glöckner wrote:
> > > the mcp251xfd driver uses a threaded irq handler to queue skbs with the
> > > can_rx_offload_* helpers. I get the following error on every packet until
> > > the rate limit kicks in:
> > > 
> > > NOHZ tick-stop error: Non-RCU local softirq work is pending, handler
> > > #08!!!
> > 
> > That's a known problem. But I had no time to investigate it.
> > 
> > > Adding local_bh_disable/local_bh_enable around the can_rx_offload_* calls
> > > gets rid of the error, but is that the correct way to fix this?
> > > Internally the can_rx_offload code uses spin_lock_irqsave to safely
> > > manipulate its queue.
> > 
> > The problem is not the queue handling inside of rx_offload, but the call
> > to napi_schedule(). This boils down to raising a soft IRQ (the NAPI)
> > from the threaded IRQ handler of the mcp251xfd driver.
> > 
> > The local_bh_enable() "fixes" the problem running the softirq if needed.
> > 
> > https://elixir.bootlin.com/linux/v5.11/source/kernel/softirq.c#L1913
> > 
> > I'm not sure how to properly fix the problem, yet.
> 
> If I understand correctly, the point of using can_rx_offload_* in the
> mcp251xfd driver is that it sorts the rx, tx, and error frames according
> to their timestamp. In that case calling local_bh_enable after each packet
> is not correct because there will never be more than one packet in the
> queue. We want to call local_bh_disable + can_rx_offload_schedule +
> local_bh_enable only at the end of mcp251xfd_irq after intf_pending
> indicated that there are no more packets inside the chip. How about adding
> a flag to struct can_rx_offload that suppresses the automatic calls to
> can_rx_offload_schedule?
> 
> If there is the risk that under high load we will never exit the loop in
> mcp251xfd_irq or if can_rx_offload_napi_poll might run again while we add
> more packets to the queue, a more complex scheme is needed. We could
> extend can_rx_offload_napi_poll to process only packets with a timestamp
> below a certain value. That value has to be read from the TBC register
> before we read the INT register. Then the three functions can be run after
> each iteration to empty the queue. We need to update that timestamp limit
> one more time when we finally exit the loop to process those packets that
> have arrived after the reading of the TBC register when the INT register
> still had bits set. Using the timestamp of the tail of the queue is
> probably the easiest way to set the final limit.

Or we leave can_rx_offload unchanged and keep two additional lists of skbs
inside the mcp251xfd driver: One for the packets that arrived before the
timestamp read from TBC and one for the packets that arrived later. At the
end of an iteration we call local_bh_disable, enqueue all packets from the
first list with can_rx_offload_queue_sorted, and the ask the softirq to
process them by calling local_bh_enable. Afterwards we move everything
from the second list to the first list and do the next iteration.

The drawback is that we can't use can_rx_offload_get_echo_skb.

Best regards,

  Daniel

-- 
Dipl.-Math. Daniel Glöckner, emlix GmbH, http://www.emlix.com
Fon +49 551 30664-0, Fax +49 551 30664-11,
Gothaer Platz 3, 37083 Göttingen, Germany
Sitz der Gesellschaft: Göttingen, Amtsgericht Göttingen HR B 3160
Geschäftsführung: Heike Jordan, Dr. Uwe Kracke
Ust-IdNr.: DE 205 198 055

emlix - your embedded linux partner

  reply	other threads:[~2021-03-10 21:57 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-10  6:46 Softirq error with mcp251xfd driver Daniel Glöckner
2021-03-10  7:13 ` Marc Kleine-Budde
2021-03-10 21:22   ` Daniel Glöckner
2021-03-10 21:56     ` Daniel Glöckner [this message]
2021-03-11 12:20       ` Marc Kleine-Budde
2021-03-11 11:55     ` Marc Kleine-Budde
2021-04-22  8:25 ` Marc Kleine-Budde

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210310215621.GA5538@homes.emlix.com \
    --to=dg@emlix.com \
    --cc=linux-can@vger.kernel.org \
    --cc=mkl@pengutronix.de \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).