From: "Jason A. Donenfeld" <Jason@zx2c4.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: jolsa@kernel.org, mhiramat@kernel.org, cgzones@googlemail.com,
brauner@kernel.org, linux-kernel@vger.kernel.org, arnd@arndb.de
Subject: Re: deconflicting new syscall numbers for 6.11
Date: Fri, 5 Jul 2024 19:53:40 +0200 [thread overview]
Message-ID: <ZogzJCb66vwxwSLN@zx2c4.com> (raw)
In-Reply-To: <CAHk-=whRpLyY+U9mkKo8O=2_BXNk=7sjYeObzFr3fGi0KLjLJw@mail.gmail.com>
Hi Linus,
On Fri, Jul 05, 2024 at 10:39:48AM -0700, Linus Torvalds wrote:
> Yes. And it should be pretty trivial.
>
> We just at least initially have to be very careful to limit it to
> MAP_ANONYMOUS and MAP_PRIVATE. Because dropping dirty bits on shared
> mappings sounds insane and like a possible source of confusion (and
> thus bugs and maybe even security issues).
>
> It's possible that we might even use a MAP_TYPE flag for this. Or make
> it a PROT_xyz bit rather than a MAP_xyz.
>
> So there's some trivial sanity checks and some UI issues to just pick,
> but apart from "just pick something sane", exposing this for mmap() is
> _not_ hard, and I do think it needs to be done first.
I can take a stab at it.
> > - The "mechanism" needs to return allocated memory to userspace that can
> > be chunked up on a per-thread basis, with no state straddling pages,
> > which means it also needs to return the size of each state, and the
> > number of states that were allocated.
> >
> > - The size of each state might change kernel version to kernel version.
>
> Just pick a size large enough.
>
> And why would that size not be one page?
>
> Considering that you really don't want to rely on page-crossing state
> *ANYWAY* because of the whole "one page can go away while another one
> sticks around" issue, I would expect that states over one page per
> thread would be a *very* questionable idea to begin with.
>
> I don't think we'll ever see systems with page sizes smaller than 4k.
> They have existed in the past, but they're not making a comeback.
> People want larger pages, not smaller ones.
That sounds not so good: the current state is 144 bytes, and it's
expected that there'll be one of these per thread. Mapping 16k or 4k per
thread seems pretty bad. At least it certainly seems that way? Wasting
16240 bytes per thread + a new vmap I can't imagine is okay.
Also, these points still stand:
| - In an effort to match the behaviors of syscall getrandom() as much as
| possible, it needs to be mapped with various flags (the ones in the
| current vgetrandom_alloc() implementation).
|
| - Which flags are needed might change kernel version to kernel version.
|
| - Future memory tagging CPU extensions might allow us to prevent the
| memory from being accessed unless the accesses are coming from vDSO
| code, which would avoid heartbleed-like bugs. This is very appealing.
It seems like leaving it just up to mmap() will not only result in users
doing it wrong, but kind of limits our options moving forward. And
there's this whole issue of communicating sizes so as not to be
wasteful.
Another idea I had, if you hate the syscall, is I could just add this as
(another) private ioctl() on the /dev/random node. This sounds worse
than a syscall worse because it means that node has to exist and the fd
has to be opened -- and concerns about this were what lead to the
getrandom() syscall being introduced in the first place -- but it would
at least avoid the syscall. I'm not crazy about that though.
Maybe the winning solution is MAP_DROPPABLE (or PROT_DROPPABLE) in
mmap(), and then in the following commit, add the vgetrandom_alloc()
syscall, and then we'll avoid vgetrandom_alloc() getting abused, but
still have a nice interface that isn't too constraining.
Jason
next prev parent reply other threads:[~2024-07-05 17:53 UTC|newest]
Thread overview: 39+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-07-04 17:10 deconflicting new syscall numbers for 6.11 Jason A. Donenfeld
2024-07-04 17:21 ` Linus Torvalds
2024-07-04 17:33 ` Linus Torvalds
2024-07-04 17:47 ` Linus Torvalds
2024-07-04 17:51 ` Jason A. Donenfeld
2024-07-04 17:46 ` Jason A. Donenfeld
2024-07-04 17:55 ` Linus Torvalds
2024-07-04 18:04 ` Jason A. Donenfeld
2024-07-04 18:18 ` Linus Torvalds
2024-07-04 18:35 ` Linus Torvalds
2024-07-04 18:46 ` Jason A. Donenfeld
2024-07-04 18:52 ` Linus Torvalds
2024-07-04 18:57 ` Jason A. Donenfeld
2024-07-04 19:19 ` Linus Torvalds
2024-07-04 21:07 ` Linus Torvalds
2024-07-04 21:44 ` Arnd Bergmann
2024-07-04 22:07 ` Linus Torvalds
2024-07-05 8:32 ` Arnd Bergmann
2024-07-05 16:59 ` Linus Torvalds
2024-07-05 16:18 ` Jason A. Donenfeld
2024-07-05 17:39 ` Linus Torvalds
2024-07-05 17:53 ` Jason A. Donenfeld [this message]
2024-07-05 18:08 ` Linus Torvalds
2024-07-05 18:56 ` Jason A. Donenfeld
2024-07-05 19:21 ` Linus Torvalds
2024-07-05 19:46 ` Linus Torvalds
2024-07-06 0:11 ` Jason A. Donenfeld
2024-07-06 2:10 ` Jason A. Donenfeld
2024-07-06 2:56 ` Linus Torvalds
2024-07-06 23:26 ` Jason A. Donenfeld
2024-07-07 16:56 ` Russell Haley
2024-07-04 18:36 ` Jason A. Donenfeld
2024-07-04 18:44 ` Willy Tarreau
2024-07-05 7:01 ` Matthias Urlichs
2024-07-06 1:14 ` Mathieu Desnoyers
2024-07-06 10:01 ` Florian Weimer
2024-07-06 14:34 ` Zack Weinberg
2024-07-06 15:30 ` Florian Weimer
2024-07-07 20:57 ` Adhemerval Zanella Netto
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZogzJCb66vwxwSLN@zx2c4.com \
--to=jason@zx2c4.com \
--cc=arnd@arndb.de \
--cc=brauner@kernel.org \
--cc=cgzones@googlemail.com \
--cc=jolsa@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mhiramat@kernel.org \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox