BPF List
 help / color / mirror / Atom feed
From: Song Liu <songliubraving@fb.com>
To: "Edgecombe, Rick P" <rick.p.edgecombe@intel.com>
Cc: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"daniel@iogearbox.net" <daniel@iogearbox.net>,
	"peterz@infradead.org" <peterz@infradead.org>,
	"ast@kernel.org" <ast@kernel.org>,
	"bpf@vger.kernel.org" <bpf@vger.kernel.org>,
	"Torvalds, Linus" <torvalds@linux-foundation.org>,
	Kernel Team <Kernel-team@fb.com>,
	"song@kernel.org" <song@kernel.org>,
	"mcgrof@kernel.org" <mcgrof@kernel.org>
Subject: Re: [PATCH bpf-next 5/5] bpf: use module_alloc_huge for bpf_prog_pack
Date: Wed, 18 May 2022 06:34:29 +0000	[thread overview]
Message-ID: <42042EE3-EDDF-4DBF-AFD5-89A5CCC59AA3@fb.com> (raw)
In-Reply-To: <dc23afb892846ef41d73a41d58c07f6620cb6312.camel@intel.com>



> On May 17, 2022, at 4:58 PM, Edgecombe, Rick P <rick.p.edgecombe@intel.com> wrote:
> 
> On Tue, 2022-05-17 at 21:08 +0000, Song Liu wrote:
>>> On May 17, 2022, at 12:15 PM, Edgecombe, Rick P <
>>> rick.p.edgecombe@intel.com> wrote:
>>> 
>>> On Sun, 2022-05-15 at 22:40 -0700, Song Liu wrote:
>>>> Use module_alloc_huge for bpf_prog_pack so that BPF programs sit
>>>> on
>>>> PMD_SIZE pages. This benefits system performance by reducing iTLB
>>>> miss
>>>> rate. Benchmark of a real web service workload shows this change
>>>> gives
>>>> another ~0.2% performance boost on top of PAGE_SIZE bpf_prog_pack
>>>> (which improve system throughput by ~0.5%).
>>> 
>>> 0.7% sounds good as a whole. How sure are you of that +0.2%? Was
>>> this a
>>> big averaged test?
>> 
>> Yes, this was a test between two tiers with 10+ servers on each
>> tier.  
>> We took the average performance over a few hours of shadow workload. 
> 
> Awesome. Sounds great.
> 
>> 
>>> 
>>>> 
>>>> Also, remove set_vm_flush_reset_perms() from alloc_new_pack() and
>>>> use
>>>> set_memory_[nx|rw] in bpf_prog_pack_free(). This is because
>>>> VM_FLUSH_RESET_PERMS does not work with huge pages yet. [1]
>>>> 
>>>> [1] 
>>>> 
> https://lore.kernel.org/bpf/aeeeaf0b7ec63fdba55d4834d2f524d8bf05b71b.camel@intel.com/
>>>> Suggested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
>>> 
>>> As I said before, I think this will work functionally. But I meant
>>> it
>>> as a quick fix when we were talking about patching this up to keep
>>> it
>>> enabled upstream.
>>> 
>>> So now, should we make VM_FLUSH_RESET_PERMS work properly with huge
>>> pages? The main benefit would be to keep the tear down of these
>>> types
>>> of allocations consistent for correctness reasons. The TLB flush
>>> minimizing differences are probably less impactful given the
>>> caching
>>> introduced here. At the very least though, we should have (or have
>>> already had) some WARN if people try to use it with huge pages.
>> 
>> I am not quite sure the exact work needed here. Rick, would you have
>> time to enable VM_FLUSH_RESET_PERMS for huge pages? Given the merge 
>> window is coming soon, I guess we need current work around in 5.19. 
> 
> I would have hard time squeezing that in now. The vmalloc part is easy,
> I think I already posted a diff. But first hibernate needs to be
> changed to not care about direct map page sizes.

I guess I missed the diff, could you please send a link to it?

> 
>> 
>>> 
>>>> Signed-off-by: Song Liu <song@kernel.org>
>>>> ---
>>>> kernel/bpf/core.c | 12 +++++++-----
>>>> 1 file changed, 7 insertions(+), 5 deletions(-)
>>>> 
>>>> diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
>>>> index cacd8684c3c4..b64d91fcb0ba 100644
>>>> --- a/kernel/bpf/core.c
>>>> +++ b/kernel/bpf/core.c
>>>> @@ -857,7 +857,7 @@ static size_t select_bpf_prog_pack_size(void)
>>>>      void *ptr;
>>>> 
>>>>      size = BPF_HPAGE_SIZE * num_online_nodes();
>>>> -    ptr = module_alloc(size);
>>>> +    ptr = module_alloc_huge(size);
>>> 
>>> This select_bpf_prog_pack_size() function always seemed weird -
>>> doing a
>>> big allocation and then immediately freeing. Can't it check a
>>> config
>>> for vmalloc huge page support?
>> 
>> Yes, it is weird. Checking a config is not enough here. We also need
>> to 
>> check vmap_allow_huge, which is controlled by boot parameter
>> nohugeiomap. 
>> I haven’t got a better solution for this. 
> 
> It's too weird. We should expose whats needed in vmalloc.
> huge_vmalloc_supported() or something.

Yeah, this should work. I will get something like this in the next 
version.

> 
> I'm also not clear why we wouldn't want to use the prog pack allocator
> even if vmalloc huge pages was disabled. Doesn't it improve performance
> even with small page sizes, per your benchmarks? What is the downside
> to just always using it?

With current version, when huge page is disabled, the prog pack allocator
will use 4kB pages for each pack. We still get about 0.5% performance
improvement with 4kB prog packs. 

Thanks,
Song



  parent reply	other threads:[~2022-05-18  6:34 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-16  5:40 [PATCH bpf-next 0/5] bpf_prog_pack followup Song Liu
2022-05-16  5:40 ` [PATCH bpf-next 1/5] bpf: fill new bpf_prog_pack with illegal instructions Song Liu
2022-05-16  5:40 ` [PATCH bpf-next 2/5] x86/alternative: introduce text_poke_set Song Liu
2022-05-17 19:16   ` Edgecombe, Rick P
2022-05-17 21:09     ` Song Liu
2022-05-18  6:58   ` Song Liu
2022-05-18  7:45     ` Peter Zijlstra
2022-05-18 15:32       ` Song Liu
2022-05-18 17:09   ` Peter Zijlstra
2022-05-18 18:34     ` Song Liu
2022-05-19  7:38       ` Peter Zijlstra
2022-05-16  5:40 ` [PATCH bpf-next 3/5] bpf: introduce bpf_arch_text_invalidate for bpf_prog_pack Song Liu
2022-05-16  5:40 ` [PATCH bpf-next 4/5] module: introduce module_alloc_huge Song Liu
2022-05-18  6:28   ` Song Liu
2022-05-16  5:40 ` [PATCH bpf-next 5/5] bpf: use module_alloc_huge for bpf_prog_pack Song Liu
2022-05-17 19:15   ` Edgecombe, Rick P
2022-05-17 21:08     ` Song Liu
2022-05-17 23:58       ` Edgecombe, Rick P
2022-05-18  6:25         ` Song Liu
2022-05-18  6:34         ` Song Liu [this message]
2022-05-18 16:49           ` Edgecombe, Rick P
2022-05-18 18:31             ` Song Liu
2022-05-19  6:42         ` Song Liu
2022-05-19 16:56           ` Edgecombe, Rick P
2022-05-19 18:25             ` Song Liu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=42042EE3-EDDF-4DBF-AFD5-89A5CCC59AA3@fb.com \
    --to=songliubraving@fb.com \
    --cc=Kernel-team@fb.com \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mcgrof@kernel.org \
    --cc=peterz@infradead.org \
    --cc=rick.p.edgecombe@intel.com \
    --cc=song@kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox