All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mateusz Guzik <mjguzik@gmail.com>,
	"the arch/x86 maintainers" <x86@kernel.org>,
	brauner@kernel.org,  viro@zeniv.linux.org.uk, jack@suse.cz,
	linux-kernel@vger.kernel.org,  linux-fsdevel@vger.kernel.org,
	tglx@linutronix.de, pfalcato@suse.de
Subject: Re: [PATCH 1/3] x86: fix access_ok() and valid_user_address() using wrong USER_PTR_MAX in modules
Date: Tue, 4 Nov 2025 13:53:24 -0800	[thread overview]
Message-ID: <aQp11AiTJpg_m_MG@google.com> (raw)
In-Reply-To: <CAHk-=wgYPbj1yQu3=wvMvfX2knKEmaeCoaG9wkPXmM1LoVxRuQ@mail.gmail.com>

On Wed, Nov 05, 2025, Linus Torvalds wrote:
> On Wed, 5 Nov 2025 at 04:07, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > Sadly, no. We've wanted to do that many times for various other
> > reasons, and we really should, but because of historical semantics,
> > some horrendous users still use "__get_user()" for addresses that
> > might be user space or might be kernel space depending on use-case.

Eww.

> > Maybe we should bite the bullet and just break any remaining cases of
> > that horrendous historical pattern. [...]
> 
> What I think is probably the right approach is to just take the normal
> __get_user() calls - the ones that are obviously to user space, and
> have an access_ok() - and just replace them with get_user().
> 
> That should all be very simple and straightforward for any half-way
> normal code, and you won't see any downsides.
> 
> And in the unlikely case that you can measure any performance impact
> because you had one single access_ok() and many __get_user() calls,
> and *if* you really really care, that kind of code should be using
> "user_read_access_begin()" and friends anyway, because unlike the
> range checking, the *real* performance issue is almost certainly going
> to be the cost of the CLAC/STAC instructions.
> 
> Put another way: __get_user() is simply always wrong these days.
> Either it's wrong because it's a bad historical optimization that
> isn't an optimization any more, or it's wrong because it's mis-using
> the old semantics to play tricks with kernel-vs-user memory.
> 
> So we shouldn't try to "fix" __get_user(). We should aim to get rid of it.

Curiosity got the better of me :-)

TL;DR: I agree, we should kill __get_user().


KVM x86's use case is a bit of a snowflake.  KVM does the access_ok() check when
host userspace configures memory regions for the guest, and then does __get_user()
when reading guest PTEs (i.e. when walking the guest's page tables for shadow
paging).

For each access_ok(), there are potentially billions (with a 'b') of __get_user()
calls throughout the lifetime of the guest when KVM is using shadow paging.  E.g.
just booting a Linux guest hits the __get_user() in arch/x86/kvm/mmu/paging_tmpl.h
a few million times.  So if there's any chance that split access_ok() + __get_user()
provides a performance advantage, then it should show up in KVM's shadow paging
use case.

Unless I botched the measurements, get_user() is straight up faster on both Intel
(EMR) and AMD (Turin).  Over tens of millions of calls, get_user() is 12%+ faster
on Intel and 25%+ faster on AMD, relative to __get_user().  The extra overhead is
pretty much entirely due to the LFENCE, as open coding the equivalent via
__uaccess_begin_nospec()+unsafe_get_user()+__uaccess_end(), to avoid the CALL+RET,
yields identical numbers to __get_user().  Dropping the LFENCE, by using
__uaccess_begin(), manages to eke out a victory over get_user() by ~2 cycles, but
that's not remotely worth having to think about whether or not the LFENCE is necessary.

The only setup I can think of that _might_ benefit from __get_user() would be
ancient CPUs without EPT/NPT (i.e. CPUs on which KVM _must_ use shadow paging)
and without SMAP, but those CPUs are so old that IMO they simply aren't relevant
when it comes to performance.  Or I suppose the horrors where RET is actually
something else entirely, but that's also a "don't care", at least as far as KVM
is concerned.

Cycles per guest PTE read:

                __get_user()    get_user()      open-coded      open-coded, no LFENCE
Intel (EMR)		75.1          67.6            75.3                       65.5
AMD (Turin)             68.1          51.1            67.5                       49.3

  reply	other threads:[~2025-11-04 21:53 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-30 10:52 [PATCH v4] fs: hide names_cachep behind runtime access machinery Mateusz Guzik
2025-10-30 13:13 ` kernel test robot
2025-10-30 13:19   ` Mateusz Guzik
2025-10-30 16:15 ` Linus Torvalds
2025-10-30 16:35   ` Mateusz Guzik
2025-10-30 18:07     ` Linus Torvalds
2025-10-30 18:25       ` Linus Torvalds
2025-10-30 21:39       ` Mateusz Guzik
2025-10-30 22:06         ` Mateusz Guzik
2025-10-31 12:08         ` Christian Brauner
2025-10-31 15:13           ` Mateusz Guzik
2025-10-31 16:04             ` Linus Torvalds
2025-10-31 16:25               ` Mateusz Guzik
2025-10-31 16:31                 ` Linus Torvalds
2025-10-31 17:42                   ` [WIP RFC PATCH 0/3] runtime-const header split and whatnot Mateusz Guzik
2025-10-31 17:42                     ` [PATCH 1/3] x86: fix access_ok() and valid_user_address() using wrong USER_PTR_MAX in modules Mateusz Guzik
2025-10-31 21:46                       ` Linus Torvalds
2025-10-31 22:01                         ` Mateusz Guzik
2025-11-01 11:26                       ` David Laight
2025-11-04  6:25                       ` Linus Torvalds
2025-11-04  8:56                         ` Mateusz Guzik
2025-11-04  9:37                           ` Linus Torvalds
2025-11-04 10:25                         ` Borislav Petkov
2025-11-04 16:13                           ` Borislav Petkov
2025-11-05  1:50                             ` Linus Torvalds
2025-11-05 11:37                               ` Borislav Petkov
2025-11-05 20:50                             ` Mateusz Guzik
2025-11-06 11:14                               ` Borislav Petkov
2025-11-06 12:06                                 ` Mateusz Guzik
2025-11-06 13:10                                   ` Borislav Petkov
2025-11-06 13:19                                     ` Mateusz Guzik
2025-11-06 13:36                                       ` Borislav Petkov
2025-11-06 14:49                                         ` Mateusz Guzik
2025-11-06 19:26                                       ` David Laight
2025-11-06 19:49                                         ` Linus Torvalds
2025-11-04 17:09                         ` Sean Christopherson
2025-11-04 19:07                           ` Linus Torvalds
2025-11-04 19:34                             ` Linus Torvalds
2025-11-04 21:53                               ` Sean Christopherson [this message]
2025-11-04 20:17                             ` Borislav Petkov
2025-11-04 22:06                               ` Linus Torvalds
2025-11-05 11:49                                 ` Borislav Petkov
2025-10-31 17:42                     ` [PATCH 2/3] runtime-const: split headers between accessors and fixup; disable for modules Mateusz Guzik
2025-10-31 17:42                     ` [PATCH 3/3] fs: hide names_cachep behind runtime access machinery Mateusz Guzik
2025-10-31 23:30                       ` kernel test robot
2025-10-31 23:30                       ` kernel test robot
2025-10-31 23:41                       ` kernel test robot
2025-11-01 17:49                       ` kernel test robot
2025-10-31 13:30 ` [PATCH v4] " kernel test robot
2025-10-31 22:43 ` kernel test robot
2025-11-01 23:06 ` kernel test robot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aQp11AiTJpg_m_MG@google.com \
    --to=seanjc@google.com \
    --cc=brauner@kernel.org \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mjguzik@gmail.com \
    --cc=pfalcato@suse.de \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.