Re: [PATCH v14 2/7] mm: add VM_DROPPABLE for designating always lazily freeable mappings

linux-api.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Andy Lutomirski <luto@kernel.org>
To: Ingo Molnar <mingo@kernel.org>
Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>,
	linux-kernel@vger.kernel.org, patches@lists.linux.dev,
	tglx@linutronix.de, linux-crypto@vger.kernel.org,
	linux-api@vger.kernel.org, x86@kernel.org,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Adhemerval Zanella Netto <adhemerval.zanella@linaro.org>,
	"Carlos O'Donell" <carlos@redhat.com>,
	Florian Weimer <fweimer@redhat.com>,
	Arnd Bergmann <arnd@arndb.de>, Jann Horn <jannh@google.com>,
	Christian Brauner <brauner@kernel.org>,
	linux-mm@kvack.org,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: [PATCH v14 2/7] mm: add VM_DROPPABLE for designating always lazily freeable mappings
Date: Tue, 3 Jan 2023 10:36:01 -0800	[thread overview]
Message-ID: <CALCETrWdw5kxrtr4M7AkKYDOJEE1cU1wENWgmgOxn0rEJz4y3w@mail.gmail.com> (raw)
In-Reply-To: <Y7QIg/hAIk7eZE42@gmail.com>

On Tue, Jan 3, 2023 at 2:50 AM Ingo Molnar <mingo@kernel.org> wrote:
>
>
> * Jason A. Donenfeld <Jason@zx2c4.com> wrote:
>
> > The vDSO getrandom() implementation works with a buffer allocated with a
> > new system call that has certain requirements:
> >
> > - It shouldn't be written to core dumps.
> >   * Easy: VM_DONTDUMP.
> > - It should be zeroed on fork.
> >   * Easy: VM_WIPEONFORK.

I have a rather different suggestion: make a special mapping.  Jason,
you're trying to shoehorn all kinds of bizarre behavior into the core
mm, and none of that seems to me to belong to the core mm.  Instead,
have an actual special mapping with callbacks that does the right
thing.  No fancy VM flags.

Memory pressure: have it free and unmap it self.  Gets accessed again?
 ->fault can handle it.

Want to mlock it?  No, don't do that -- that's absurd.  Just arrange
so that, if it gets evicted, it's not written out anywhere.  And when
it gets faulted back in it does the right thing -- see above.

Zero on fork?  I'm sure that's manageable with a special mapping.  If
not, you can add a new vm operation or similar to make it work.  (Kind
of like how we extended special mappings to get mremap right a couple
years go.)  But maybe you don't want to *zero* it on fork and you want
to do something more intelligent.  Fine -- you control ->fault!

> >
> > - It shouldn't be written to swap.
> >   * Uh-oh: mlock is rlimited.
> >   * Uh-oh: mlock isn't inherited by forks.

No mlock, no problems.

> >
> > - It shouldn't reserve actual memory, but it also shouldn't crash when
> >   page faulting in memory if none is available
> >   * Uh-oh: MAP_NORESERVE respects vm.overcommit_memory=2.
> >   * Uh-oh: VM_NORESERVE means segfaults.

->fault can do whatever you want.

And there is no shortage of user memory that *must* be made available
on fault in order to resume the faulting process.  ->fault can handle
this.

> >
> > It turns out that the vDSO getrandom() function has three really nice
> > characteristics that we can exploit to solve this problem:
> >
> > 1) Due to being wiped during fork(), the vDSO code is already robust to
> >    having the contents of the pages it reads zeroed out midway through
> >    the function's execution.
> >
> > 2) In the absolute worst case of whatever contingency we're coding for,
> >    we have the option to fallback to the getrandom() syscall, and
> >    everything is fine.
> >
> > 3) The buffers the function uses are only ever useful for a maximum of
> >    60 seconds -- a sort of cache, rather than a long term allocation.
> >
> > These characteristics mean that we can introduce VM_DROPPABLE, which
> > has the following semantics:

No need for another vm flag.

> >
> > a) It never is written out to swap.

No need to complicate the swap logic for this.

> > b) Under memory pressure, mm can just drop the pages (so that they're
> >    zero when read back again).

Or ->fault could even repopulate it without needing to ever read zeros.

> > c) If there's not enough memory to service a page fault, it's not fatal,
> >    and no signal is sent. Instead, writes are simply lost.

This just seems massively overcomplicated to me.  If there isn't
enough memory to fault in a page of code, we don't have some magic
instruction emulator in the kernel.  We either OOM or we wait for
memory to show up.

> > d) It is inherited by fork.

If you have a special mapping and you fork, it doesn't magically turn
into normal memory.

> > e) It doesn't count against the mlock budget, since nothing is locked.

Special mapping -> no mlock.

> >
> > This is fairly simple to implement, with the one snag that we have to
> > use 64-bit VM_* flags, but this shouldn't be a problem, since the only
> > consumers will probably be 64-bit anyway.
> >
> > This way, allocations used by vDSO getrandom() can use:
> >
> >     VM_DROPPABLE | VM_DONTDUMP | VM_WIPEONFORK | VM_NORESERVE
> >
> > And there will be no problem with OOMing, crashing on overcommitment,
> > using memory when not in use, not wiping on fork(), coredumps, or
> > writing out to swap.
> >
> > At the moment, rather than skipping writes on OOM, the fault handler
> > just returns to userspace, and the instruction is retried. This isn't
> > terrible, but it's not quite what is intended. The actual instruction
> > skipping has to be implemented arch-by-arch, but so does this whole
> > vDSO series, so that's fine. The following commit addresses it for x86.

I really dislike this.  I'm with Ingo.

next prev parent reply	other threads:[~2023-01-03 18:36 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-01 16:29 [PATCH v14 0/7] implement getrandom() in vDSO Jason A. Donenfeld
2023-01-01 16:29 ` [PATCH v14 1/7] x86: lib: Separate instruction decoder MMIO type from MMIO trace Jason A. Donenfeld
2023-01-03 10:32   ` Ingo Molnar
2023-01-03 14:51     ` Jason A. Donenfeld
2023-01-03 17:00       ` Ingo Molnar
2023-01-03 17:29         ` Borislav Petkov
2023-01-03 17:30           ` Jason A. Donenfeld
2023-01-03 17:47             ` Ingo Molnar
2023-01-03 17:48               ` Jason A. Donenfeld
2023-01-04 20:25               ` Ingo Molnar
2023-01-04 20:29                 ` Jason A. Donenfeld
2023-01-01 16:29 ` [PATCH v14 2/7] mm: add VM_DROPPABLE for designating always lazily freeable mappings Jason A. Donenfeld
2023-01-03 10:50   ` Ingo Molnar
2023-01-03 15:01     ` Jason A. Donenfeld
2023-01-03 18:15       ` Ingo Molnar
2023-01-03 18:51         ` Jason A. Donenfeld
2023-01-03 18:36     ` Andy Lutomirski [this message]
2023-01-03 19:05       ` Jason A. Donenfeld
2023-01-03 20:52         ` Andy Lutomirski
2023-01-03 19:19       ` Linus Torvalds
2023-01-03 19:35         ` Jason A. Donenfeld
2023-01-03 19:54           ` Linus Torvalds
2023-01-03 20:03             ` Jason A. Donenfeld
2023-01-03 20:15               ` Linus Torvalds
2023-01-03 20:25                 ` Linus Torvalds
2023-01-03 20:44                 ` Jason A. Donenfeld
2023-01-05 21:57                   ` Yann Droneaud
2023-01-05 22:57                     ` Jason A. Donenfeld
2023-01-06  1:02                       ` Linus Torvalds
2023-01-06  2:08                         ` Linus Torvalds
2023-01-06  2:42                           ` Jason A. Donenfeld
2023-01-06 20:53                           ` Andy Lutomirski
2023-01-06 21:10                             ` Linus Torvalds
2023-01-10 11:01                               ` Dr. Greg
2023-01-06 21:36                             ` Jason A. Donenfeld
2023-01-06 21:42                           ` Matthew Wilcox
2023-01-06 22:06                             ` Linus Torvalds
2023-01-06  2:14                         ` Jason A. Donenfeld
2023-01-09 10:34             ` Florian Weimer
2023-01-09 14:28               ` Linus Torvalds
2023-01-11  7:27                 ` Eric Biggers
2023-01-11 12:07                   ` Linus Torvalds
2023-01-01 16:29 ` [PATCH v14 3/7] x86: mm: Skip faulting instruction for VM_DROPPABLE faults Jason A. Donenfeld
2023-01-01 16:29 ` [PATCH v14 4/7] random: add vgetrandom_alloc() syscall Jason A. Donenfeld
2023-01-01 16:29 ` [PATCH v14 5/7] arch: allocate vgetrandom_alloc() syscall number Jason A. Donenfeld
2023-01-01 16:29 ` [PATCH v14 6/7] random: introduce generic vDSO getrandom() implementation Jason A. Donenfeld
2023-01-01 16:29 ` [PATCH v14 7/7] x86: vdso: Wire up getrandom() vDSO implementation Jason A. Donenfeld
2023-01-12 17:27   ` Christophe Leroy
2023-01-12 17:49     ` Jason A. Donenfeld
2023-01-11 22:23 ` [PATCH v14 0/7] implement getrandom() in vDSO Mathieu Desnoyers

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CALCETrWdw5kxrtr4M7AkKYDOJEE1cU1wENWgmgOxn0rEJz4y3w@mail.gmail.com \
    --to=luto@kernel.org \
    --cc=Jason@zx2c4.com \
    --cc=adhemerval.zanella@linaro.org \
    --cc=arnd@arndb.de \
    --cc=brauner@kernel.org \
    --cc=carlos@redhat.com \
    --cc=fweimer@redhat.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=jannh@google.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-crypto@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mingo@kernel.org \
    --cc=patches@lists.linux.dev \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).