From: David Hildenbrand <david@redhat.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>,
linux-kernel@vger.kernel.org, patches@lists.linux.dev,
tglx@linutronix.de, linux-crypto@vger.kernel.org,
linux-api@vger.kernel.org, x86@kernel.org,
Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
Adhemerval Zanella Netto <adhemerval.zanella@linaro.org>,
Carlos O'Donell <carlos@redhat.com>,
Florian Weimer <fweimer@redhat.com>,
Arnd Bergmann <arnd@arndb.de>, Jann Horn <jannh@google.com>,
Christian Brauner <brauner@kernel.org>,
David Hildenbrand <dhildenb@redhat.com>,
linux-mm@kvack.org
Subject: Re: [PATCH v21 1/4] mm: add VM_DROPPABLE for designating always lazily freeable mappings
Date: Sun, 7 Jul 2024 20:52:51 +0200 [thread overview]
Message-ID: <e2f104ac-b6d9-4583-b999-8f975c60d469@redhat.com> (raw)
In-Reply-To: <CAHk-=wjs-9DVeoc430BDOv+dkpDkdVvkEsSJxNVZ+sO51H1dJA@mail.gmail.com>
On 07.07.24 20:19, Linus Torvalds wrote:
> On Sun, 7 Jul 2024 at 00:42, David Hildenbrand <david@redhat.com> wrote:
>>
>> But I don't immediately see why MAP_WIPEONFORK and MAP_DONTDUMP have to
>> be mmap() flags.
Hi Linus,
>
> I don't think they have to be mmap() flags, but that said, I think
> it's technically the better alternative than saying "you have to
> madvise things later".
Having the option to madvise usually means that you can toggle it on/off
(e.g., MADV_DONTFORK vs. MADV_DOFORK). Not sure if having that option
could be valuable here (droppable) as well; maybe not.
>
> I very much understand the "we don't have a lot of MAP_xyz flags and
> we don't want to waste them" argument, but at the same time
>
> (a) we _do_ have those flags
>
> (b) picking a worse interface seems bad
>
> (c) we could actually use the PROT_xyz bits, which we have a ton of
>
I recall that introducing things like MAP_SHARED_VALIDATE received a lot
of pushback in the past. But that was before my MM days, and I only had
people tell me stories about it.
(and at LSF/MM it's been a recurring theme that if you want to propose
new MMAP flag, you're going to have a hard time)
> And yes, (c) is ugly, but is it uglier than "use two system calls to
> do one thing"? I mean, "flags" and "prot" are just two sides of the
> same coin in the end, the split is kind of arbitrary, and "prot" only
> has four bits right now, and one of them is historical and useless,
> and actually happens to be *exactly* this kind of MAP_xyz bit.
Yeah, I always had the same feeling about prot vs. flags.
My understanding so far was that we should have madvise() ways to toggle
stuff and add mmap bits if not avoidable; at least that's what I learned
from the community.
Good to hear that this is changing. (or it's just been an urban myth)
I'll use your mail as reference in the future when that topic pops up ;)
Maybe, historically we used madvise options so it's easier to sense
which options the current kernel actually supports. (e.g., let mmap()
succeed but let a separate madvise(MADV_HUGEPAGE) etc. fail if not
supported by the kernel; no need to fail the whole operation).
>
> (In case it's not clear, I'm talking about PROT_SEM, which is very
> much a behavioral bit for broken architectures that we've actually
> never implemented).
Yeah.
>
> We also have PROT_GROSDOWN and PROT_GROWSUP , which is basically a
> "match MAP_GROWSxyz and change the mprotect() limits appropriately"
It's the first time I hear about these two mprotect() options, thanks
for mentioning that :)
>
> So I actually think we could use the PROT_xyz bits, and anybody who
> says "those are for PROT_READ and PROT_WRITE is already very very
> wrong.
>
> Again - not pretty, but mappens to match reality.
>
>> Interestingly, when looking into something comparable in the past I
>> stumbled over "vrange" [1], which would have had a slightly different
>> semantic (signal on reaccess).
>
> We literally talked about exactly this with Jason, except unlike you I
> couldn't find the historical archive (I tried in vain to find
> something from lore).
Good that you discussed it, I primarily scanned this patch set here only.
I took notes back when I was looking for something like VM_DROPPABLE
(also, being more interested in the non-signal version for a VM cache
use case).
>
> https://lore.kernel.org/lkml/CAHk-=whRpLyY+U9mkKo8O=2_BXNk=7sjYeObzFr3fGi0KLjLJw@mail.gmail.com/
>
> I do think that a "explicit populate and get a signal on access" is a
> very valid model, but I think the "zero on access" is a more
> immediately real model.
>
> And we actually have had the "get signal on access" before: that's
> what VM_DONTCOPY is.
>
> And it was the *less* useful model, which is why we added
> VM_WIPEONCOPY, because that's the semantics people actually wanted.
>
> So I think the "signal on thrown out data access" is interesting, but
> not necessarily the *more* interesting case.
Absolutely agreed.
>
> And I think if we do want that case, I think having MAP_DROPPABLE have
> those semantics for MAP_SHARED would be the way to go. IOW, starting
> off with the "zero on next access after drop" case doesn't make it any
> harder to then later add a "fault on next access after drop" version.
>
>> There needs to be better reasoning why we have to consume three mmap
>> bits for something that can likely be achieved without any.
>
> I think it goes the other way: why are MAP_xyz bits so precious to
> make this harder to actually use?
If things changed and we can have as many as we want, good!
Things like MADV_HUGEPAGE/MADV_MERGEABLE might benefit from a mmap flag
as well.
--
Cheers,
David / dhildenb
next prev parent reply other threads:[~2024-07-07 18:53 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-07-07 0:26 [PATCH v21 0/4] implement getrandom() in vDSO Jason A. Donenfeld
2024-07-07 0:26 ` [PATCH v21 1/4] mm: add VM_DROPPABLE for designating always lazily freeable mappings Jason A. Donenfeld
2024-07-07 7:42 ` David Hildenbrand
2024-07-07 18:19 ` Linus Torvalds
2024-07-07 18:52 ` David Hildenbrand [this message]
2024-07-07 19:22 ` Linus Torvalds
2024-07-07 21:01 ` David Hildenbrand
2024-07-08 0:08 ` Linus Torvalds
2024-07-08 8:11 ` David Hildenbrand
2024-07-08 8:23 ` David Hildenbrand
2024-07-08 13:57 ` Jason A. Donenfeld
2024-07-08 20:05 ` David Hildenbrand
2024-07-08 13:55 ` Jason A. Donenfeld
2024-07-08 14:40 ` Jason A. Donenfeld
2024-07-08 20:21 ` David Hildenbrand
2024-07-08 20:26 ` David Hildenbrand
2024-07-09 2:17 ` Jason A. Donenfeld
2024-07-10 3:05 ` David Hildenbrand
2024-07-10 3:34 ` Jason A. Donenfeld
2024-07-10 3:53 ` David Hildenbrand
2024-07-08 20:06 ` David Hildenbrand
2024-07-08 13:50 ` Jason A. Donenfeld
2024-07-08 1:59 ` Jason A. Donenfeld
2024-07-08 1:46 ` Jason A. Donenfeld
2024-07-08 20:24 ` David Hildenbrand
2024-07-07 0:26 ` [PATCH v21 2/4] random: introduce generic vDSO getrandom() implementation Jason A. Donenfeld
2024-07-07 0:26 ` [PATCH v21 3/4] x86: vdso: Wire up getrandom() vDSO implementation Jason A. Donenfeld
2024-07-07 0:26 ` [PATCH v21 4/4] selftests/vDSO: add tests for vgetrandom Jason A. Donenfeld
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=e2f104ac-b6d9-4583-b999-8f975c60d469@redhat.com \
--to=david@redhat.com \
--cc=Jason@zx2c4.com \
--cc=adhemerval.zanella@linaro.org \
--cc=arnd@arndb.de \
--cc=brauner@kernel.org \
--cc=carlos@redhat.com \
--cc=dhildenb@redhat.com \
--cc=fweimer@redhat.com \
--cc=gregkh@linuxfoundation.org \
--cc=jannh@google.com \
--cc=linux-api@vger.kernel.org \
--cc=linux-crypto@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=patches@lists.linux.dev \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).