RE: [PATCH v2] x86: Allow user accesses to the base of the guard page

All of lore.kernel.org
 help / color / mirror / Atom feed

From: David Laight <David.Laight@ACULAB.COM>
To: 'Linus Torvalds' <torvalds@linux-foundation.org>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>,
	"bp@alien8.de" <bp@alien8.de>,
	Josh Poimboeuf <jpoimboe@kernel.org>,
	"x86@kernel.org" <x86@kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Arnd Bergmann <arnd@kernel.org>,
	Mikel Rychliski <mikel@mikelr.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	"H. Peter Anvin" <hpa@zytor.com>
Subject: RE: [PATCH v2] x86: Allow user accesses to the base of the guard page
Date: Mon, 25 Nov 2024 16:48:59 +0000	[thread overview]
Message-ID: <4e2ed7a9cbf54eeabe9be7764141f0d2@AcuMS.aculab.com> (raw)
In-Reply-To: <CAHk-=wh0oKkRHHqnft8mHaz5nuZNEspGQ5HW4oPJmGGwmccF1w@mail.gmail.com>

From: Linus Torvalds
> Sent: 24 November 2024 22:39
> 
> On Sun, 24 Nov 2024 at 14:03, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > On Sun, 24 Nov 2024 at 12:49, David Laight <David.Laight@aculab.com> wrote:
> > >
> > > Doesn't that just need a <= changed to < ?
> > > (And possibly of name)
> >
> > Well, more importantly, it means that we can't use the same helper
> > function at all.
> 
> Oh, the 'sbb' trick also ends up being one byte off if we increase
> USER_PTR_MAX from being the "maximum valid address" to being "one past
> the max valid address".
> 
> And yeah, none of this *matters*, since "one byte off" is then covered
> by the fact that we have that guard page, but this just ends up
> annoying me sense of how it all *should* work.

Maybe we need to look at what this code is trying to achieve.
It seems to be trying to do two different things:
1) Stop real user accesses to kernel memory.
2) Stop speculative accesses to user-defined kernel addresses. 

But they get rather mixed together in this pattern:
+	if (can_do_masked_user_access())
+		from = masked_user_access_begin(from);
+	else if (!user_read_access_begin(from, sizeof(*from)))
+		return -EFAULT;
+	unsafe_get_user(val, from, Efault);
+	user_access_end();
+	*dest = val;
+	return 0;
+Efault:
+	user_access_end();
+	return -EFAULT;

where the masked address designed to stop speculative accesses is
used for a real access.

I was looking at the epoll code.
It does an early access_ok() and then two __put_user() calls to write a 32bit
and 64bit value (with an architecture dependant stride or 12 or 16 bytes).

With access_ok() (possibly just checking the base address if there is a
guard page) it doesn't matter which order the accesses are done in.
But masked_user_access_begin() is going to convert a kernel address to ~0,
this is fine for speculative accesses but for real accesses it becomes
imperative that the first address is to the base address.
(Otherwise the accesses will go to page 0 at the bottom of userspace
which (IIRC) it has to possible to map - so won't always fault.)

Relying on that seems dangerous.

So perhaps rename things a bit so the above starts:
	if (have_guard_page())
		from = bound_to_guard_page_user_access_begin(from);
The default bound_to_guard_page() would be min(addr, guard_page).
Architectures that have 'speculative read issues' would need
that min() to be done without a branch (eg with cmov) but
others may not care.

ppc, of course, needs the length (I don't know whether a guard page
would help or is needed).

After that the order of the accesses doesn't matter - provided they
don't have page sized jumps.

Then rename USER_PTR_MAX to USER_GUARD_PAGE_ADDRESS.
After all it doesn't matter at all where the unwanted speculative
reads end up - provided it isn't a user-defined kernel address.
The base of the guard page is as good as anywhere else.

> I'm starting to think that instead of changing the USER_PTR_MAX value
> (that was selected to be the right constant for things that don't care
> about the size), we just say "the slow case of access_ok() takes a
> tiny hit".
> 
> Kind of like Mikel Rychliski's patch, but we can certainly do better,
> ie something like
> 
>   --- a/arch/x86/include/asm/uaccess_64.h
>   +++ b/arch/x86/include/asm/uaccess_64.h
>   @@ -101,8 +101,9 @@ static inline bool __access_ok(..
>                 return valid_user_address(ptr);
>         } else {
>                 unsigned long sum = size + (__force unsigned long)ptr;
>   +             unsigned long max = runtime_const_ptr(USER_PTR_MAX)+1;
> 
>   -             return valid_user_address(sum) && sum >= (__force
> unsigned long)ptr;
>   +             return sum <= max && sum >= (__force unsigned long)ptr;
>         }
>    }
>    #define __access_ok __access_ok
> 
> would seem like it should work, and now doesn't make the fast-cases be
> off by one.

I don't think it really matters whether access_ok() returns 0 or
a later access faults.
Particularly in the corner case of an access to the base of the guard page.

> 
> Yes, it adds a "add 1" to the runtime path (instead of the init-time
> constant fixup), which is admittedly annoying. But this really
> *should* be the slow path.

There is no real reason why you can't have two constants that are
one apart.

> We do have a few annoying non-constant access_ok() calls in core code.
> The iter code uses access_ok + raw_copy_to_user, because it's evil and
> bad. I'm really not sure why it does that. I think it's *purely* bad
> history, ie we used to do access_ok() far away from the iov_iter copy,
> and then did __copy_from_user(), and then at some point it got changed
> to have the access_ok() closer, rather than just use
> "copy_from_user()".

Is there still an access_ok() in iovec_import (I've not got a tree handy).
All seemed too far away from any actual copy to me.
(IIRC there is a pretty pointless 'direction' flag as well?)

> So I get the feeling that those access_ok() cases could be removed
> entirely in favor of just using the right user copy function. Getting
> rid of the whole silly size checking thing.

That would be a plan.
Would be interesting to unravel what io_uring (or was it bpf) was
doing where it casts a 'long' to a user pointer just to validate it.

> David, does that patch above work for you?

You are the boss :-)

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

next prev parent reply	other threads:[~2024-11-25 16:49 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-11-24 15:39 [PATCH v2] x86: Allow user accesses to the base of the guard page David Laight
2024-11-24 18:52 ` Linus Torvalds
2024-11-24 20:49   ` David Laight
2024-11-24 22:03     ` Linus Torvalds
2024-11-24 22:39       ` Linus Torvalds
2024-11-25 16:48         ` David Laight [this message]
2024-11-25 20:21           ` Linus Torvalds
2024-12-01 11:25 ` David Laight

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4e2ed7a9cbf54eeabe9be7764141f0d2@AcuMS.aculab.com \
    --to=david.laight@aculab.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=arnd@kernel.org \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=hpa@zytor.com \
    --cc=jpoimboe@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mikel@mikelr.com \
    --cc=mingo@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.