linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mateusz Guzik <mjguzik@gmail.com>,
	"the arch/x86 maintainers" <x86@kernel.org>,
	brauner@kernel.org,  viro@zeniv.linux.org.uk, jack@suse.cz,
	linux-kernel@vger.kernel.org,  linux-fsdevel@vger.kernel.org,
	tglx@linutronix.de, pfalcato@suse.de
Subject: Re: [PATCH 1/3] x86: fix access_ok() and valid_user_address() using wrong USER_PTR_MAX in modules
Date: Tue, 4 Nov 2025 13:53:24 -0800	[thread overview]
Message-ID: <aQp11AiTJpg_m_MG@google.com> (raw)
In-Reply-To: <CAHk-=wgYPbj1yQu3=wvMvfX2knKEmaeCoaG9wkPXmM1LoVxRuQ@mail.gmail.com>

On Wed, Nov 05, 2025, Linus Torvalds wrote:
> On Wed, 5 Nov 2025 at 04:07, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > Sadly, no. We've wanted to do that many times for various other
> > reasons, and we really should, but because of historical semantics,
> > some horrendous users still use "__get_user()" for addresses that
> > might be user space or might be kernel space depending on use-case.

Eww.

> > Maybe we should bite the bullet and just break any remaining cases of
> > that horrendous historical pattern. [...]
> 
> What I think is probably the right approach is to just take the normal
> __get_user() calls - the ones that are obviously to user space, and
> have an access_ok() - and just replace them with get_user().
> 
> That should all be very simple and straightforward for any half-way
> normal code, and you won't see any downsides.
> 
> And in the unlikely case that you can measure any performance impact
> because you had one single access_ok() and many __get_user() calls,
> and *if* you really really care, that kind of code should be using
> "user_read_access_begin()" and friends anyway, because unlike the
> range checking, the *real* performance issue is almost certainly going
> to be the cost of the CLAC/STAC instructions.
> 
> Put another way: __get_user() is simply always wrong these days.
> Either it's wrong because it's a bad historical optimization that
> isn't an optimization any more, or it's wrong because it's mis-using
> the old semantics to play tricks with kernel-vs-user memory.
> 
> So we shouldn't try to "fix" __get_user(). We should aim to get rid of it.

Curiosity got the better of me :-)

TL;DR: I agree, we should kill __get_user().


KVM x86's use case is a bit of a snowflake.  KVM does the access_ok() check when
host userspace configures memory regions for the guest, and then does __get_user()
when reading guest PTEs (i.e. when walking the guest's page tables for shadow
paging).

For each access_ok(), there are potentially billions (with a 'b') of __get_user()
calls throughout the lifetime of the guest when KVM is using shadow paging.  E.g.
just booting a Linux guest hits the __get_user() in arch/x86/kvm/mmu/paging_tmpl.h
a few million times.  So if there's any chance that split access_ok() + __get_user()
provides a performance advantage, then it should show up in KVM's shadow paging
use case.

Unless I botched the measurements, get_user() is straight up faster on both Intel
(EMR) and AMD (Turin).  Over tens of millions of calls, get_user() is 12%+ faster
on Intel and 25%+ faster on AMD, relative to __get_user().  The extra overhead is
pretty much entirely due to the LFENCE, as open coding the equivalent via
__uaccess_begin_nospec()+unsafe_get_user()+__uaccess_end(), to avoid the CALL+RET,
yields identical numbers to __get_user().  Dropping the LFENCE, by using
__uaccess_begin(), manages to eke out a victory over get_user() by ~2 cycles, but
that's not remotely worth having to think about whether or not the LFENCE is necessary.

The only setup I can think of that _might_ benefit from __get_user() would be
ancient CPUs without EPT/NPT (i.e. CPUs on which KVM _must_ use shadow paging)
and without SMAP, but those CPUs are so old that IMO they simply aren't relevant
when it comes to performance.  Or I suppose the horrors where RET is actually
something else entirely, but that's also a "don't care", at least as far as KVM
is concerned.

Cycles per guest PTE read:

                __get_user()    get_user()      open-coded      open-coded, no LFENCE
Intel (EMR)		75.1          67.6            75.3                       65.5
AMD (Turin)             68.1          51.1            67.5                       49.3

  reply	other threads:[~2025-11-04 21:53 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-30 10:52 [PATCH v4] fs: hide names_cachep behind runtime access machinery Mateusz Guzik
2025-10-30 13:13 ` kernel test robot
2025-10-30 13:19   ` Mateusz Guzik
2025-10-30 16:15 ` Linus Torvalds
2025-10-30 16:35   ` Mateusz Guzik
2025-10-30 18:07     ` Linus Torvalds
2025-10-30 18:25       ` Linus Torvalds
2025-10-30 21:39       ` Mateusz Guzik
2025-10-30 22:06         ` Mateusz Guzik
2025-10-31 12:08         ` Christian Brauner
2025-10-31 15:13           ` Mateusz Guzik
2025-10-31 16:04             ` Linus Torvalds
2025-10-31 16:25               ` Mateusz Guzik
2025-10-31 16:31                 ` Linus Torvalds
2025-10-31 17:42                   ` [WIP RFC PATCH 0/3] runtime-const header split and whatnot Mateusz Guzik
2025-10-31 17:42                     ` [PATCH 1/3] x86: fix access_ok() and valid_user_address() using wrong USER_PTR_MAX in modules Mateusz Guzik
2025-10-31 21:46                       ` Linus Torvalds
2025-10-31 22:01                         ` Mateusz Guzik
2025-11-01 11:26                       ` David Laight
2025-11-04  6:25                       ` Linus Torvalds
2025-11-04  8:56                         ` Mateusz Guzik
2025-11-04  9:37                           ` Linus Torvalds
2025-11-04 10:25                         ` Borislav Petkov
2025-11-04 16:13                           ` Borislav Petkov
2025-11-05  1:50                             ` Linus Torvalds
2025-11-05 11:37                               ` Borislav Petkov
2025-11-05 20:50                             ` Mateusz Guzik
2025-11-06 11:14                               ` Borislav Petkov
2025-11-06 12:06                                 ` Mateusz Guzik
2025-11-06 13:10                                   ` Borislav Petkov
2025-11-06 13:19                                     ` Mateusz Guzik
2025-11-06 13:36                                       ` Borislav Petkov
2025-11-06 14:49                                         ` Mateusz Guzik
2025-11-06 19:26                                       ` David Laight
2025-11-06 19:49                                         ` Linus Torvalds
2025-11-04 17:09                         ` Sean Christopherson
2025-11-04 19:07                           ` Linus Torvalds
2025-11-04 19:34                             ` Linus Torvalds
2025-11-04 21:53                               ` Sean Christopherson [this message]
2025-11-04 20:17                             ` Borislav Petkov
2025-11-04 22:06                               ` Linus Torvalds
2025-11-05 11:49                                 ` Borislav Petkov
2025-10-31 17:42                     ` [PATCH 2/3] runtime-const: split headers between accessors and fixup; disable for modules Mateusz Guzik
2025-10-31 17:42                     ` [PATCH 3/3] fs: hide names_cachep behind runtime access machinery Mateusz Guzik
2025-10-31 23:30                       ` kernel test robot
2025-10-31 23:30                       ` kernel test robot
2025-10-31 23:41                       ` kernel test robot
2025-11-01 17:49                       ` kernel test robot
2025-10-31 13:30 ` [PATCH v4] " kernel test robot
2025-10-31 22:43 ` kernel test robot
2025-11-01 23:06 ` kernel test robot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aQp11AiTJpg_m_MG@google.com \
    --to=seanjc@google.com \
    --cc=brauner@kernel.org \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mjguzik@gmail.com \
    --cc=pfalcato@suse.de \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).