public inbox for linux-mm@kvack.org
 help / color / mirror / Atom feed
From: Andrei Vagin <avagin@google.com>
To: Will Deacon <will@kernel.org>, Mark Rutland <mark.rutland@arm.com>
Cc: Kees Cook <kees@kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	 Marek Szyprowski <m.szyprowski@samsung.com>,
	Cyrill Gorcunov <gorcunov@gmail.com>,
	 Mike Rapoport <rppt@kernel.org>,
	Alexander Mikhalitsyn <alexander@mihalicyn.com>,
	linux-kernel@vger.kernel.org,  linux-fsdevel@vger.kernel.org,
	linux-mm@kvack.org, criu@lists.linux.dev,
	 Catalin Marinas <catalin.marinas@arm.com>,
	linux-arm-kernel@lists.infradead.org,
	 Chen Ridong <chenridong@huawei.com>,
	Christian Brauner <brauner@kernel.org>,
	 David Hildenbrand <david@kernel.org>,
	Eric Biederman <ebiederm@xmission.com>,
	 Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
	Michal Koutny <mkoutny@suse.com>,
	 Alexander Mikhalitsyn <aleksandr.mikhalitsyn@futurfusion.io>
Subject: Re: [PATCH 1/4] exec: inherit HWCAPs from the parent process
Date: Tue, 24 Mar 2026 15:19:49 -0700	[thread overview]
Message-ID: <CAEWA0a7iR8YHooqXJfhersV6YhAXGMZDUhib3QQH5XGn=KNowA@mail.gmail.com> (raw)
In-Reply-To: <acJnOB-rlyt-3jU4@willie-the-truck>

Hi Mark and Will,

Thanks for the feedback. Please read the inline comments.

On Tue, Mar 24, 2026 at 3:28 AM Will Deacon <will@kernel.org> wrote:
>
> On Mon, Mar 23, 2026 at 06:21:22PM +0000, Mark Rutland wrote:
> > On Mon, Mar 23, 2026 at 05:53:37PM +0000, Andrei Vagin wrote:
> > > Introduces a mechanism to inherit hardware capabilities (AT_HWCAP,
> > > AT_HWCAP2, etc.) from a parent process when they have been modified via
> > > prctl.
> > >
> > > To support C/R operations (snapshots, live migration) in heterogeneous
> > > clusters, we must ensure that processes utilize CPU features available
> > > on all potential target nodes. To solve this, we need to advertise a
> > > common feature set across the cluster.
> > >
> > > This patch adds a new mm flag MMF_USER_HWCAP, which is set when the
> > > auxiliary vector is modified via prctl(PR_SET_MM, PR_SET_MM_AUXV).  When
> > > execve() is called, if the current process has MMF_USER_HWCAP set, the
> > > HWCAP values are extracted from the current auxiliary vector and stored
> > > in the linux_binprm structure. These values are then used to populate
> > > the auxiliary vector of the new process, effectively inheriting the
> > > hardware capabilities.
> > >
> > > The inherited HWCAPs are masked with the hardware capabilities supported
> > > by the current kernel to ensure that we don't report more features than
> > > actually supported. This is important to avoid unexpected behavior,
> > > especially for processes with additional privileges.
> >
> > At a high level, I don't think that's going to be sufficient:
> >
> > * On an architecture with other userspace accessible feature
> >   identification mechanism registers (e.g. ID registers), userspace
> >   might read those. So you might need to hide stuff there too, and
> >   that's going to require architecture-specific interfaces to manage.
> >
> >   It's possible that some code checks HWCAPs and others check ID
> >   registers, and mismatch between the two could be problematic.
> >
> > * If the HWCAPs can be inherited by a more privileged task, then a
> >   malicious user could use this to hide security features (e.g. shadow
> >   stack or pointer authentication on arm64), and make it easier to
> >   attack that task. While not a direct attack, it would undermine those
> >   features.

I agree with Mark that only a privileged process have to be able to mask
certain hardware features. Currently, PR_SET_MM_AUXV is guarded by
CAP_SYS_RESOURCE, but PR_SET_MM_MAP allows changing the auxiliary vector
without specific capabilities. This is definitely the issue. To address
this, I think we can consider to introduce a new prctl command to enable
HWCAP inheritance explicitly.

>
> Yeah, this looks like a non-starter to me on arm64. Even if it was
> extended to apply the same treatment to the idregs, many of the hwcap
> features can't actually be disabled by the kernel and so you still run
> the risk of a task that probes for the presence of a feature using
> something like a SIGILL handler or, perhaps more likely, assumes that
> the presence of one hwcap implies the presence of another. And then
> there are the applications that just base everything off the MIDR...

The goal of this mechanism is not to provide strict architectural
enforcement or to trap the use of hardware features; rather, it is to
provide a consistent discovery interface for applications. I chose the
HWCAP vector because it mirrors the existing behavior of running an
older kernel on newer hardware: while ID registers might report a
feature as physically present, the HWCAPs will omit it if the kernel
lacks support. Applications are generally expected to treat HWCAPs as
the source of truth for which features are safe to use, even if the
underlying hardware is technically capable of more.

Another significant advantage of using HWCAPs is that many
applications already rely on them for feature detection. This interface
allows these applications to work correctly "out-of-the-box" in a
migrated environment without requiring any userspace modifications.  I
understand that some apps may use other detection methods; however, there
it no gurantee that these applications will work correctly after
migration to another machine.

>
> There's also kvm, which provides a roundabout way to query some features
> of the underlying hardware.
>
> You're probably better off using/extending the idreg overrides we have
> in arch/arm64/kernel/pi/idreg-override.c so that you can make your
> cluster of heterogeneous machines look alike.

IIRC, idreg-override/cpuid-masking usually works for an entire machine.
We actually need to have a mechanism that will work on a per-container
basis. Workloads inside one cluster can have different
migration/snapshot requirements. Some are pinned to a specific node,
others are never migrated, while others need to be migratable across a
cluster or even between clusters. We need a mechanism that can be
tunable on a per-container/per-process basis.

>
> On the other hand, if munging the hwcaps happens to be sufficient for
> this particular use-case, can't it be handled entirely in userspace (e.g.
> by hacking libc?)

CRIU often handles workloads with a mix of runtimes: some linked against
glibc, some against musl, and others like Go that bypass libc entirely.
CRIU is mostly used to handle containers that can run multiple processes
possible based on different runtimes. It means available cpu features
should not be only specified for one runtime, they have to be passed
across different runtimes. I think the pure userspace solution is near
infeasible in this case.

Thanks,
Andrei


  reply	other threads:[~2026-03-24 22:20 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-23 17:53 [PATCH 0/4 v5] exec: inherit HWCAPs from the parent process Andrei Vagin
2026-03-23 17:53 ` [PATCH 1/4] " Andrei Vagin
2026-03-23 18:21   ` Mark Rutland
2026-03-24 10:28     ` Will Deacon
2026-03-24 22:19       ` Andrei Vagin [this message]
2026-03-23 22:59   ` Marek Szyprowski
2026-03-23 17:53 ` [PATCH 2/4] arm64: elf: clear MMF_USER_HWCAP on architecture switch Andrei Vagin
2026-03-23 17:53 ` [PATCH 3/4] mm: synchronize saved_auxv access with arg_lock Andrei Vagin
2026-03-23 17:53 ` [PATCH 4/4] selftests/exec: add test for HWCAP inheritance Andrei Vagin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAEWA0a7iR8YHooqXJfhersV6YhAXGMZDUhib3QQH5XGn=KNowA@mail.gmail.com' \
    --to=avagin@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=aleksandr.mikhalitsyn@futurfusion.io \
    --cc=alexander@mihalicyn.com \
    --cc=brauner@kernel.org \
    --cc=catalin.marinas@arm.com \
    --cc=chenridong@huawei.com \
    --cc=criu@lists.linux.dev \
    --cc=david@kernel.org \
    --cc=ebiederm@xmission.com \
    --cc=gorcunov@gmail.com \
    --cc=kees@kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=m.szyprowski@samsung.com \
    --cc=mark.rutland@arm.com \
    --cc=mkoutny@suse.com \
    --cc=rppt@kernel.org \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox