linux-api.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Ingo Molnar <mingo@kernel.org>
To: "Jason A. Donenfeld" <Jason@zx2c4.com>
Cc: linux-kernel@vger.kernel.org, patches@lists.linux.dev,
	tglx@linutronix.de, linux-crypto@vger.kernel.org,
	linux-api@vger.kernel.org, x86@kernel.org,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Adhemerval Zanella Netto <adhemerval.zanella@linaro.org>,
	Carlos O'Donell <carlos@redhat.com>,
	Florian Weimer <fweimer@redhat.com>,
	Arnd Bergmann <arnd@arndb.de>, Jann Horn <jannh@google.com>,
	Christian Brauner <brauner@kernel.org>,
	linux-mm@kvack.org,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: [PATCH v14 2/7] mm: add VM_DROPPABLE for designating always lazily freeable mappings
Date: Tue, 3 Jan 2023 19:15:50 +0100	[thread overview]
Message-ID: <Y7Rw1plb/pqPiWgg@gmail.com> (raw)
In-Reply-To: <Y7RDQLEvlLM0o4cp@zx2c4.com>


* Jason A. Donenfeld <Jason@zx2c4.com> wrote:

> On Tue, Jan 03, 2023 at 11:50:43AM +0100, Ingo Molnar wrote:
> > 
> > * Jason A. Donenfeld <Jason@zx2c4.com> wrote:
> > 
> > > The vDSO getrandom() implementation works with a buffer allocated with a
> > > new system call that has certain requirements:
> > > 
> > > - It shouldn't be written to core dumps.
> > >   * Easy: VM_DONTDUMP.
> > > - It should be zeroed on fork.
> > >   * Easy: VM_WIPEONFORK.
> > > 
> > > - It shouldn't be written to swap.
> > >   * Uh-oh: mlock is rlimited.
> > >   * Uh-oh: mlock isn't inherited by forks.
> > > 
> > > - It shouldn't reserve actual memory, but it also shouldn't crash when
> > >   page faulting in memory if none is available
> > >   * Uh-oh: MAP_NORESERVE respects vm.overcommit_memory=2.
> > >   * Uh-oh: VM_NORESERVE means segfaults.
> > > 
> > > It turns out that the vDSO getrandom() function has three really nice
> > > characteristics that we can exploit to solve this problem:
> > > 
> > > 1) Due to being wiped during fork(), the vDSO code is already robust to
> > >    having the contents of the pages it reads zeroed out midway through
> > >    the function's execution.
> > > 
> > > 2) In the absolute worst case of whatever contingency we're coding for,
> > >    we have the option to fallback to the getrandom() syscall, and
> > >    everything is fine.
> > > 
> > > 3) The buffers the function uses are only ever useful for a maximum of
> > >    60 seconds -- a sort of cache, rather than a long term allocation.
> > > 
> > > These characteristics mean that we can introduce VM_DROPPABLE, which
> > > has the following semantics:
> > > 
> > > a) It never is written out to swap.
> > > b) Under memory pressure, mm can just drop the pages (so that they're
> > >    zero when read back again).
> > > c) If there's not enough memory to service a page fault, it's not fatal,
> > >    and no signal is sent. Instead, writes are simply lost.
> > > d) It is inherited by fork.
> > > e) It doesn't count against the mlock budget, since nothing is locked.
> > > 
> > > This is fairly simple to implement, with the one snag that we have to
> > > use 64-bit VM_* flags, but this shouldn't be a problem, since the only
> > > consumers will probably be 64-bit anyway.
> > > 
> > > This way, allocations used by vDSO getrandom() can use:
> > > 
> > >     VM_DROPPABLE | VM_DONTDUMP | VM_WIPEONFORK | VM_NORESERVE
> > > 
> > > And there will be no problem with OOMing, crashing on overcommitment,
> > > using memory when not in use, not wiping on fork(), coredumps, or
> > > writing out to swap.
> > > 
> > > At the moment, rather than skipping writes on OOM, the fault handler
> > > just returns to userspace, and the instruction is retried. This isn't
> > > terrible, but it's not quite what is intended. The actual instruction
> > > skipping has to be implemented arch-by-arch, but so does this whole
> > > vDSO series, so that's fine. The following commit addresses it for x86.
> > 
> > Yeah, so VM_DROPPABLE adds a whole lot of complexity, corner cases, per 
> > arch low level work and rarely tested functionality (seriously, whose 
> > desktop system touches swap these days?), just so we can add a few pages of 
> > per thread vDSO data of a quirky type that in 99.999% of cases won't ever 
> > be 'dropped' from under the functionality that is using it and will thus 
> > bitrot fast?
> 
> It sounds like you've misunderstood the issue.
> 
> Firstly, the arch work is +19 lines (in the vdso branch of random.git).

For a single architecture: x86.

And it's only 19 lines because x86 already happens to have a bunch of 
complexity implemented, such as a safe instruction decoder that allows the 
skipping of an instruction - which relies on thousands of lines of 
complexity.

On an architecture where this isn't present, it would have to be 
implemented to support the instruction-skipping aspect of VM_DROPPABLE.

Even on x86, it's not common today for the software-decoder to be used in 
unprivileged code - primary use was debugging & instrumentation code. So 
your patches bring this piece of complexity to a much larger scope of 
untrusted user-space functionality.

> That's very small and basic. Don't misrepresent it just to make a point.

I'm not misrepresenting anything.

> Secondly, and more importantly, swapping this data is *not* permissible.

I did not suggest to swap it: my suggestion is to just pin these vDSO data 
pages. The per thread memory overhead is infinitesimal on the vast majority 
of the target systems, and the complexity trade-off you are proposing is 
poorly reasoned IMO.

Anyway:

  > Don't misrepresent it just to make a point.
  ...
  > That seems like a ridiculous rhetorical leap.
  ...
  > Did you actually read the commit message?

Frankly, I don't appreciate your condescending discussion style that 
borders on the toxic, and to save time I'm nacking this technical approach 
until both the patch-set and your reaction to constructive review feedback 
improves:

    NAcked-by: Ingo Molnar <mingo@kernel.org>

I think my core point that it would be much simpler to simply pin those 
pages and not introduce rarely-excercised 'discardable memory' semantics in 
Linux is a fair one - so it's straightforward to lift this NAK.

I'll re-evaluate the NACK on every new iteration of this patchset I see.

Thanks,

	Ingo

  reply	other threads:[~2023-01-03 18:15 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-01 16:29 [PATCH v14 0/7] implement getrandom() in vDSO Jason A. Donenfeld
2023-01-01 16:29 ` [PATCH v14 1/7] x86: lib: Separate instruction decoder MMIO type from MMIO trace Jason A. Donenfeld
2023-01-03 10:32   ` Ingo Molnar
2023-01-03 14:51     ` Jason A. Donenfeld
2023-01-03 17:00       ` Ingo Molnar
2023-01-03 17:29         ` Borislav Petkov
2023-01-03 17:30           ` Jason A. Donenfeld
2023-01-03 17:47             ` Ingo Molnar
2023-01-03 17:48               ` Jason A. Donenfeld
2023-01-04 20:25               ` Ingo Molnar
2023-01-04 20:29                 ` Jason A. Donenfeld
2023-01-01 16:29 ` [PATCH v14 2/7] mm: add VM_DROPPABLE for designating always lazily freeable mappings Jason A. Donenfeld
2023-01-03 10:50   ` Ingo Molnar
2023-01-03 15:01     ` Jason A. Donenfeld
2023-01-03 18:15       ` Ingo Molnar [this message]
2023-01-03 18:51         ` Jason A. Donenfeld
2023-01-03 18:36     ` Andy Lutomirski
2023-01-03 19:05       ` Jason A. Donenfeld
2023-01-03 20:52         ` Andy Lutomirski
2023-01-03 19:19       ` Linus Torvalds
2023-01-03 19:35         ` Jason A. Donenfeld
2023-01-03 19:54           ` Linus Torvalds
2023-01-03 20:03             ` Jason A. Donenfeld
2023-01-03 20:15               ` Linus Torvalds
2023-01-03 20:25                 ` Linus Torvalds
2023-01-03 20:44                 ` Jason A. Donenfeld
2023-01-05 21:57                   ` Yann Droneaud
2023-01-05 22:57                     ` Jason A. Donenfeld
2023-01-06  1:02                       ` Linus Torvalds
2023-01-06  2:08                         ` Linus Torvalds
2023-01-06  2:42                           ` Jason A. Donenfeld
2023-01-06 20:53                           ` Andy Lutomirski
2023-01-06 21:10                             ` Linus Torvalds
2023-01-10 11:01                               ` Dr. Greg
2023-01-06 21:36                             ` Jason A. Donenfeld
2023-01-06 21:42                           ` Matthew Wilcox
2023-01-06 22:06                             ` Linus Torvalds
2023-01-06  2:14                         ` Jason A. Donenfeld
2023-01-09 10:34             ` Florian Weimer
2023-01-09 14:28               ` Linus Torvalds
2023-01-11  7:27                 ` Eric Biggers
2023-01-11 12:07                   ` Linus Torvalds
2023-01-01 16:29 ` [PATCH v14 3/7] x86: mm: Skip faulting instruction for VM_DROPPABLE faults Jason A. Donenfeld
2023-01-01 16:29 ` [PATCH v14 4/7] random: add vgetrandom_alloc() syscall Jason A. Donenfeld
2023-01-01 16:29 ` [PATCH v14 5/7] arch: allocate vgetrandom_alloc() syscall number Jason A. Donenfeld
2023-01-01 16:29 ` [PATCH v14 6/7] random: introduce generic vDSO getrandom() implementation Jason A. Donenfeld
2023-01-01 16:29 ` [PATCH v14 7/7] x86: vdso: Wire up getrandom() vDSO implementation Jason A. Donenfeld
2023-01-12 17:27   ` Christophe Leroy
2023-01-12 17:49     ` Jason A. Donenfeld
2023-01-11 22:23 ` [PATCH v14 0/7] implement getrandom() in vDSO Mathieu Desnoyers

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y7Rw1plb/pqPiWgg@gmail.com \
    --to=mingo@kernel.org \
    --cc=Jason@zx2c4.com \
    --cc=adhemerval.zanella@linaro.org \
    --cc=arnd@arndb.de \
    --cc=brauner@kernel.org \
    --cc=carlos@redhat.com \
    --cc=fweimer@redhat.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=jannh@google.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-crypto@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=patches@lists.linux.dev \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).