Re: [PATCH bpf-next 5/5] bpf: use module_alloc_huge for bpf_prog_pack

BPF List
 help / color / mirror / Atom feed

From: "Edgecombe, Rick P" <rick.p.edgecombe@intel.com>
To: "songliubraving@fb.com" <songliubraving@fb.com>
Cc: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"daniel@iogearbox.net" <daniel@iogearbox.net>,
	"peterz@infradead.org" <peterz@infradead.org>,
	"ast@kernel.org" <ast@kernel.org>,
	"bpf@vger.kernel.org" <bpf@vger.kernel.org>,
	"Torvalds, Linus" <torvalds@linux-foundation.org>,
	"Kernel-team@fb.com" <Kernel-team@fb.com>,
	"song@kernel.org" <song@kernel.org>,
	"mcgrof@kernel.org" <mcgrof@kernel.org>
Subject: Re: [PATCH bpf-next 5/5] bpf: use module_alloc_huge for bpf_prog_pack
Date: Wed, 18 May 2022 16:49:34 +0000	[thread overview]
Message-ID: <3ab4c6cf8158891167c145015bdf5754f972223a.camel@intel.com> (raw)
In-Reply-To: <42042EE3-EDDF-4DBF-AFD5-89A5CCC59AA3@fb.com>

On Wed, 2022-05-18 at 06:34 +0000, Song Liu wrote:
> > > I am not quite sure the exact work needed here. Rick, would you
> > > have
> > > time to enable VM_FLUSH_RESET_PERMS for huge pages? Given the
> > > merge 
> > > window is coming soon, I guess we need current work around in
> > > 5.19. 
> > 
> > I would have hard time squeezing that in now. The vmalloc part is
> > easy,
> > I think I already posted a diff. But first hibernate needs to be
> > changed to not care about direct map page sizes.
> 
> I guess I missed the diff, could you please send a link to it?


https://lore.kernel.org/lkml/5bd16e2c06a2df357400556c6ae01bb5d3c5c32a.camel@intel.com/

The remaining problem is that hibernate may encounter NP pages when
saving memory to disk. It resets them with CPA calls 4k at a time. So
if a page is NP, hibernate needs it to be already be 4k or it might
need to split. I think hibernate should just utilize a different
mapping to get at the page when it encounters this rare scenario. In
that diff I put some locking so that hibernate couldn't race with a
huge NP page, but then I thought we should just change hibernate.

> 
> > 
> > > 
> > > > 
> > > > > Signed-off-by: Song Liu <song@kernel.org>
> > > > > ---
> > > > > kernel/bpf/core.c | 12 +++++++-----
> > > > > 1 file changed, 7 insertions(+), 5 deletions(-)
> > > > > 
> > > > > diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> > > > > index cacd8684c3c4..b64d91fcb0ba 100644
> > > > > --- a/kernel/bpf/core.c
> > > > > +++ b/kernel/bpf/core.c
> > > > > @@ -857,7 +857,7 @@ static size_t
> > > > > select_bpf_prog_pack_size(void)
> > > > >       void *ptr;
> > > > > 
> > > > >       size = BPF_HPAGE_SIZE * num_online_nodes();
> > > > > -    ptr = module_alloc(size);
> > > > > +    ptr = module_alloc_huge(size);
> > > > 
> > > > This select_bpf_prog_pack_size() function always seemed weird -
> > > > doing a
> > > > big allocation and then immediately freeing. Can't it check a
> > > > config
> > > > for vmalloc huge page support?
> > > 
> > > Yes, it is weird. Checking a config is not enough here. We also
> > > need
> > > to 
> > > check vmap_allow_huge, which is controlled by boot parameter
> > > nohugeiomap. 
> > > I haven’t got a better solution for this. 
> > 
> > It's too weird. We should expose whats needed in vmalloc.
> > huge_vmalloc_supported() or something.
> 
> Yeah, this should work. I will get something like this in the next 
> version.
> 
> > 
> > I'm also not clear why we wouldn't want to use the prog pack
> > allocator
> > even if vmalloc huge pages was disabled. Doesn't it improve
> > performance
> > even with small page sizes, per your benchmarks? What is the
> > downside
> > to just always using it?
> 
> With current version, when huge page is disabled, the prog pack
> allocator
> will use 4kB pages for each pack. We still get about 0.5% performance
> improvement with 4kB prog packs. 

Oh, I thought you were comparing a 2MB sized, small page mapped
allocation to a 2MB sized, huge page mapped allocation.

It looks like the logic is to free a pack if it is empty, so then for
smaller packs you are more likely to let the pages go back to the page
allocator. Then future allocations would break more pages.

So I think that is not a fully apples to apples test of huge mapping
benefits. I'd be surprised if there really was no huge mapping benefit,
since its been seen with core kernel text. Did you notice if the direct
map breakage was different between the tests?

next prev parent reply	other threads:[~2022-05-18 16:49 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-16  5:40 [PATCH bpf-next 0/5] bpf_prog_pack followup Song Liu
2022-05-16  5:40 ` [PATCH bpf-next 1/5] bpf: fill new bpf_prog_pack with illegal instructions Song Liu
2022-05-16  5:40 ` [PATCH bpf-next 2/5] x86/alternative: introduce text_poke_set Song Liu
2022-05-17 19:16   ` Edgecombe, Rick P
2022-05-17 21:09     ` Song Liu
2022-05-18  6:58   ` Song Liu
2022-05-18  7:45     ` Peter Zijlstra
2022-05-18 15:32       ` Song Liu
2022-05-18 17:09   ` Peter Zijlstra
2022-05-18 18:34     ` Song Liu
2022-05-19  7:38       ` Peter Zijlstra
2022-05-16  5:40 ` [PATCH bpf-next 3/5] bpf: introduce bpf_arch_text_invalidate for bpf_prog_pack Song Liu
2022-05-16  5:40 ` [PATCH bpf-next 4/5] module: introduce module_alloc_huge Song Liu
2022-05-18  6:28   ` Song Liu
2022-05-16  5:40 ` [PATCH bpf-next 5/5] bpf: use module_alloc_huge for bpf_prog_pack Song Liu
2022-05-17 19:15   ` Edgecombe, Rick P
2022-05-17 21:08     ` Song Liu
2022-05-17 23:58       ` Edgecombe, Rick P
2022-05-18  6:25         ` Song Liu
2022-05-18  6:34         ` Song Liu
2022-05-18 16:49           ` Edgecombe, Rick P [this message]
2022-05-18 18:31             ` Song Liu
2022-05-19  6:42         ` Song Liu
2022-05-19 16:56           ` Edgecombe, Rick P
2022-05-19 18:25             ` Song Liu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3ab4c6cf8158891167c145015bdf5754f972223a.camel@intel.com \
    --to=rick.p.edgecombe@intel.com \
    --cc=Kernel-team@fb.com \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mcgrof@kernel.org \
    --cc=peterz@infradead.org \
    --cc=song@kernel.org \
    --cc=songliubraving@fb.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox