netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jakub Kicinski <kuba@kernel.org>
To: Jesper Dangaard Brouer <jbrouer@redhat.com>
Cc: "Toke Høiland-Jørgensen" <toke@redhat.com>,
	brouer@redhat.com,
	"Ilias Apalodimas" <ilias.apalodimas@linaro.org>,
	netdev@vger.kernel.org, "Eric Dumazet" <eric.dumazet@gmail.com>,
	linux-mm@kvack.org, "Mel Gorman" <mgorman@techsingularity.net>,
	lorenzo@kernel.org, linyunsheng@huawei.com, bpf@vger.kernel.org,
	"David S. Miller" <davem@davemloft.net>,
	"Paolo Abeni" <pabeni@redhat.com>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	willy@infradead.org
Subject: Re: [PATCH RFC net-next/mm V3 1/2] page_pool: Remove workqueue in new shutdown scheme
Date: Wed, 3 May 2023 18:47:02 -0700	[thread overview]
Message-ID: <20230503184702.65dceb90@kernel.org> (raw)
In-Reply-To: <3a5a28c4-01a3-793c-6969-475aba3ff3b5@redhat.com>

On Wed, 3 May 2023 17:49:34 +0200 Jesper Dangaard Brouer wrote:
> On 03/05/2023 13.18, Toke Høiland-Jørgensen wrote:
> > Jakub Kicinski <kuba@kernel.org> writes:
> >> We can remove the warning without removing the entire delayed freeing
> >> scheme. I definitely like the SHUTDOWN flag and patch 2 but I'm a bit
> >> less clear on why the complexity of datapath freeing is justified.
> >> Can you explain?  
> > 
> > You mean just let the workqueue keep rescheduling itself every minute
> > for the (potentially) hours that skbs will stick around? Seems a bit
> > wasteful, doesn't it? :)  
> 
> I agree that this workqueue that keeps rescheduling is wasteful.
> It actually reschedules every second, even more wasteful.
> NIC drivers will have many HW RX-queues, with separate PP instances, 
> that each can start a workqueue that resched every sec.

There's a lot of work items flying around on a working system.
I don't think the rare (and very cheap) PP check work is going 
to move the needle. You should see how many work items DIM schedules :(

I'd think that potentially having the extra memory pinned is much more
of an issue than a periodic check, and that does not go away by
changing the checking mechanism.

> Eric have convinced me that SKBs can "stick around" for longer than the
> assumptions in PP.  The old PP assumptions came from XDP-return path.
> It is time to cleanup.

I see. Regardless - should we have some count/statistic for the number
of "zombie" page pools sitting around in the system? Could be useful
for debug.

> > We did see an issue where creating and tearing down lots of page pools
> > in a short period of time caused significant slowdowns due to the
> > workqueue mechanism. Lots being "thousands per second". This is possible
> > using the live packet mode of bpf_prog_run() for XDP, which will setup
> > and destroy a page pool for each syscall...  
> 
> Yes, the XDP live packet mode of bpf_prog_run is IMHO abusing the
> page_pool API.  We should fix that somehow, at least the case where live
> packet mode is only injecting a single packet, but still creates a PP
> instance. The PP in live packet mode IMHO only makes sense when
> repeatedly sending packets that gets recycles and are pre-inited by PP.

+1, FWIW, I was surprised that we have a init_callback which sits in 
the fastpath exists purely for testing.

> This use of PP does exemplify why is it problematic to keep the workqueue.
> 
> I have considered (and could be convinced) delaying the free via
> call_rcu, but it also create an unfortunate backlog of work in the case
> of live packet mode of bpf_prog_run.

Maybe let the pp used by BPF testing be reusable? No real driver will
create thousands of PPs a seconds, that's not sane.

Anyway, you'll choose what you'll choose. I just wanted to cast my vote
for the work queue rather than the tricky lockless release code.

  reply	other threads:[~2023-05-04  1:47 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-28 16:16 [PATCH RFC net-next/mm V3 0/2] page_pool: new approach for leak detection and shutdown phase Jesper Dangaard Brouer
2023-04-28 16:16 ` [PATCH RFC net-next/mm V3 1/2] page_pool: Remove workqueue in new shutdown scheme Jesper Dangaard Brouer
2023-04-28 21:38   ` Toke Høiland-Jørgensen
2023-05-03 15:21     ` Jesper Dangaard Brouer
2023-05-03  2:33   ` Jakub Kicinski
2023-05-03 11:18     ` Toke Høiland-Jørgensen
2023-05-03 15:49       ` Jesper Dangaard Brouer
2023-05-04  1:47         ` Jakub Kicinski [this message]
2023-05-04  2:42   ` Yunsheng Lin
2023-05-04 13:48     ` Jesper Dangaard Brouer
2023-05-05  0:54       ` Yunsheng Lin
2023-05-06 13:11         ` Yunsheng Lin
2023-04-28 16:16 ` [PATCH RFC net-next/mm V3 2/2] mm/page_pool: catch page_pool memory leaks Jesper Dangaard Brouer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230503184702.65dceb90@kernel.org \
    --to=kuba@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=bpf@vger.kernel.org \
    --cc=brouer@redhat.com \
    --cc=davem@davemloft.net \
    --cc=eric.dumazet@gmail.com \
    --cc=ilias.apalodimas@linaro.org \
    --cc=jbrouer@redhat.com \
    --cc=linux-mm@kvack.org \
    --cc=linyunsheng@huawei.com \
    --cc=lorenzo@kernel.org \
    --cc=mgorman@techsingularity.net \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=toke@redhat.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).