From: Jakub Kicinski <kuba@kernel.org>
To: Alexander Duyck <alexander.duyck@gmail.com>
Cc: davem@davemloft.net, netdev@vger.kernel.org, edumazet@google.com,
pabeni@redhat.com, corbet@lwn.net, linux-doc@vger.kernel.org
Subject: Re: [PATCH net] docs: net: clarify the NAPI rules around XDP Tx
Date: Tue, 25 Jul 2023 13:41:22 -0700 [thread overview]
Message-ID: <20230725134122.1684a2f1@kernel.org> (raw)
In-Reply-To: <CAKgT0UdKWmogiFD_Gip3TCi8-ydy+CVjwca1hPTYBRQQZ8_mGQ@mail.gmail.com>
On Tue, 25 Jul 2023 13:10:18 -0700 Alexander Duyck wrote:
> On Tue, Jul 25, 2023 at 11:55 AM Jakub Kicinski <kuba@kernel.org> wrote:
> > > This isn't accurate, and I would say it is somewhat dangerous advice.
> > > The Tx still needs to be processed regardless of if it is processing
> > > page_pool pages or XDP pages. I agree the Rx should not be processed,
> > > but the Tx must be processed using mechanisms that do NOT make use of
> > > NAPI optimizations when budget is 0.
> > >
> > > So specifically, xdp_return_frame is safe in non-NAPI Tx cleanup. The
> > > xdp_return_frame_rx_napi is not.
> > >
> > > Likewise there is napi_consume_skb which will use either a NAPI or non-
> > > NAPI version of things depending on if budget is 0 or not.
> > >
> > > For the page_pool calls there is the "allow_direct" argument that is
> > > meant to decide between recycling in directly into the page_pool cache
> > > or not. It should only be used in the Rx handler itself when budget is
> > > non-zero.
> > >
> > > I realise this was written up in response to a patch on the Mellanox
> > > driver. Based on the patch in question it looks like they were calling
> > > page_pool_recycle_direct outside of NAPI context. There is an explicit
> > > warning above that function about NOT calling it outside of NAPI
> > > context.
> >
> > Unless I'm missing something budget=0 can be called from hard IRQ
> > context. And page pool takes _bh() locks. So unless we "teach it"
> > not to recycle _anything_ in hard IRQ context, it is not safe to call.
>
> That is the thing. We have to be able to free the pages regardless of
> context. Otherwise we make a huge mess of things. Also there isn't
> much way to differentiate between page_pool and non-page_pool pages
> because an skb can be composed of page pool pages just as easy as an
> XDP frame can be. All you would just have to enable routing or
> bridging for Rx frames to end up with page pool pages in the Tx path.
>
> As far as netpoll itself we are safe because it has BH disabled and so
We do? Can you point me to where netpoll disables BH?
> as a result page_pool doesn't use the _bh locks. There is code in
> place to account for that in the producer locking code, and if it were
> an issue we would have likely blown up long before now. The fact is
> that page_pool has proliferated into skbs, so you are still freeing
> page_pool pages indirectly anyway.
>
> That said, there are calls that are not supposed to be used outside of
> NAPI context, such as page_pool_recycle_direct(). Those have mostly
> been called out in the page_pool.h header itself, so if someone
> decides to shoot themselves in the foot with one of those, that is on
> them. What we need to watch out for are people abusing the "direct"
> calls and such or just passing "true" for allow_direct in the
> page_pool calls without taking proper steps to guarantee the context.
>
> > > We cannot make this distinction if both XDP and skb are processed in
> > > the same Tx queue. Otherwise you will cause the Tx to stall and break
> > > netpoll. If the ring is XDP only then yes, it can be skipped like what
> > > they did in the Mellanox driver, but if it is mixed then the XDP side
> > > of things needs to use the "safe" versions of the calls.
> >
> > IDK, a rare delay in sending of a netpoll message is not a major
> > concern.
>
> The whole point of netpoll is to get data out after something like a
> crash. Otherwise we could have just been using regular NAPI. If the Tx
> ring is hung it might not be a delay but rather a complete stall that
> prevents data on the Tx queue from being transmitted on since the
> system will likely not be recovering. Worse yet is if it is a scenario
> where the Tx queue can recover it might trigger the Tx watchdog since
> I could see scenarios where the ring fills, but interrupts were
> dropped because of the netpoll.
I'm not disagreeing with you. I just don't have time to take a deeper
look and add the IRQ checks myself and I'm 90% sure the current code
can't work with netpoll. So I thought I'd at least document that :(
next prev parent reply other threads:[~2023-07-25 20:41 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-07-20 16:13 [PATCH net] docs: net: clarify the NAPI rules around XDP Tx Jakub Kicinski
2023-07-21 2:35 ` Wei Fang
2023-07-21 3:07 ` Jakub Kicinski
2023-07-21 4:31 ` Wei Fang
2023-07-22 2:10 ` patchwork-bot+netdevbpf
2023-07-25 17:30 ` Alexander H Duyck
2023-07-25 18:55 ` Jakub Kicinski
2023-07-25 20:10 ` Alexander Duyck
2023-07-25 20:41 ` Jakub Kicinski [this message]
2023-07-26 0:02 ` Alexander Duyck
2023-07-26 0:56 ` Jakub Kicinski
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230725134122.1684a2f1@kernel.org \
--to=kuba@kernel.org \
--cc=alexander.duyck@gmail.com \
--cc=corbet@lwn.net \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=linux-doc@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).