netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Johannes Weiner <hannes@cmpxchg.org>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <songliubraving@fb.com>,
	the arch/x86 maintainers <x86@kernel.org>,
	bpf <bpf@vger.kernel.org>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	"tglx@linutronix.de" <tglx@linutronix.de>,
	"mingo@redhat.com" <mingo@redhat.com>,
	"bp@alien8.de" <bp@alien8.de>,
	"dave.hansen@linux.intel.com" <dave.hansen@linux.intel.com>,
	"ast@kernel.org" <ast@kernel.org>,
	"daniel@iogearbox.net" <daniel@iogearbox.net>,
	"andrii@kernel.org" <andrii@kernel.org>,
	Kernel Team <Kernel-team@fb.com>
Subject: Re: [PATCH bpf-next 2/7] set_memory: introduce set_memory_[ro|x]_noalias
Date: Fri, 19 Nov 2021 11:06:07 -0500	[thread overview]
Message-ID: <YZfLb/AEoA5UBAnY@cmpxchg.org> (raw)
In-Reply-To: <YZdv/NLUU9qLHP2g@hirez.programming.kicks-ass.net>

On Fri, Nov 19, 2021 at 10:35:56AM +0100, Peter Zijlstra wrote:
> On Fri, Nov 19, 2021 at 04:14:46AM +0000, Song Liu wrote:
> > 
> > 
> > > On Nov 18, 2021, at 10:58 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> > > 
> > > On Thu, Nov 18, 2021 at 06:39:49PM +0000, Song Liu wrote:
> > > 
> > >>> You're going to have to do that anyway if you're going to write to the
> > >>> directmap while executing from the alias.
> > >> 
> > >> Not really. If you look at current version 7/7, the logic is mostly 
> > >> straightforward. We just make all the writes to the directmap, while 
> > >> calculate offset from the alias. 
> > > 
> > > Then you can do the exact same thing but do the writes to a temp buffer,
> > > no different.
> > 
> > There will be some extra work, but I guess I will give it a try. 
> > 
> > > 
> > >>>> The BPF program could have up to 1000000 (BPF_COMPLEXITY_LIMIT_INSNS)
> > >>>> instructions (BPF instructions). So it could easily go beyond a few 
> > >>>> pages. Mapping the 2MB page all together should make the logic simpler. 
> > >>> 
> > >>> Then copy it in smaller chunks I suppose.
> > >> 
> > >> How fast/slow is the __text_poke routine? I guess we cannot do it thousands
> > >> of times per BPF program (in chunks of a few bytes)? 
> > > 
> > > You can copy in at least 4k chunks since any 4k will at most use 2
> > > pages, which is what it does. If that's not fast enough we can look at
> > > doing bigger chunks.
> > 
> > If we do JIT in a buffer first, 4kB chunks should be fast enough. 
> > 
> > Another side of this issue is the split of linear mapping (1GB => 
> > many 4kB). If we only split to PMD, but not PTE, we can probably 
> > recover most of the regression. I will check this with Johannes. 
> 
> __text_poke() shouldn't affect the fragmentation of the kernel
> mapping, it's a user-space alias into the same physical memory. For all
> it cares we're poking into GB pages.

Right, __text_poke won't, it's the initial set_memory_ro/x against the
vmap space that does it, since the linear mapping is updated as an
alias with the same granularity.

However, my guess would also be that once we stop doing that with 4k
pages for every single bpf program, and batch into shared 2M pages
instead, the problem will be much smaller. Maybe negligible.*

So ro linear mapping + __text_poke() sounds like a good idea to me.

[ If not, we could consider putting the linear mapping updates behind
  a hardening option similar to RETPOLINE, PAGE_TABLE_ISOLATION,
  STRICT_MODULE_RWX. The __text_poke() method would mean independence
  from the linear mapping and we can do with that what we want
  then. Many machines aren't exposed enough to necessarily care about
  W^X if the price is too high - and the impact of losing the GB
  mappings is actually significant on central workloads right now.

  But yeah, hopefully we won't have to go there. ]

      reply	other threads:[~2021-11-19 16:06 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20211116071347.520327-1-songliubraving@fb.com>
     [not found] ` <20211116071347.520327-3-songliubraving@fb.com>
2021-11-16  8:00   ` [PATCH bpf-next 2/7] set_memory: introduce set_memory_[ro|x]_noalias Peter Zijlstra
2021-11-17 21:36     ` Song Liu
2021-11-17 22:01       ` Peter Zijlstra
2021-11-17 23:57         ` Song Liu
2021-11-18  0:11           ` Song Liu
2021-11-18  7:54           ` Peter Zijlstra
2021-11-18 17:16             ` Song Liu
2021-11-18 18:28               ` Peter Zijlstra
2021-11-18 18:39                 ` Song Liu
2021-11-18 18:58                   ` Peter Zijlstra
2021-11-19  4:14                     ` Song Liu
2021-11-19  9:35                       ` Peter Zijlstra
2021-11-19 16:06                         ` Johannes Weiner [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YZfLb/AEoA5UBAnY@cmpxchg.org \
    --to=hannes@cmpxchg.org \
    --cc=Kernel-team@fb.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bp@alien8.de \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=dave.hansen@linux.intel.com \
    --cc=mingo@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=songliubraving@fb.com \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).