public inbox for linux-fsdevel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4 v5] exec: inherit HWCAPs from the parent process
@ 2026-03-23 17:53 Andrei Vagin
  2026-03-23 17:53 ` [PATCH 1/4] " Andrei Vagin
                   ` (3 more replies)
  0 siblings, 4 replies; 9+ messages in thread
From: Andrei Vagin @ 2026-03-23 17:53 UTC (permalink / raw)
  To: Kees Cook, Andrew Morton
  Cc: Marek Szyprowski, Cyrill Gorcunov, Mike Rapoport,
	Alexander Mikhalitsyn, linux-kernel, linux-fsdevel, linux-mm,
	criu, Catalin Marinas, Will Deacon, linux-arm-kernel, Chen Ridong,
	Christian Brauner, David Hildenbrand, Eric Biederman,
	Lorenzo Stoakes, Michal Koutny

This patch series introduces a mechanism to inherit hardware capabilities
(AT_HWCAP, AT_HWCAP2, etc.) from a parent process when they have been
modified via prctl.

To support C/R operations (snapshots, live migration) in heterogeneous
clusters, we must ensure that processes utilize CPU features available
on all potential target nodes. To solve this, we need to advertise a
common feature set across the cluster.

Initially, a cgroup-based approach was considered, but it was decided
that inheriting HWCAPs from a parent process that has set its own
auxiliary vector via prctl is a simpler and more flexible solution.

This implementation adds a new mm flag MMF_USER_HWCAP, which is set when the
auxiliary vector is modified via prctl(PR_SET_MM_AUXV). When execve() is
called, if the current process has MMF_USER_HWCAP set, the HWCAP values are
extracted from the current auxiliary vector and inherited by the new process.

The first patch fixes AUXV size calculation for ELF_HWCAP3 and ELF_HWCAP4
in binfmt_elf_fdpic and updates AT_VECTOR_SIZE_BASE.

The second patch implements the core inheritance logic in execve().

The third patch adds a selftest to verify that HWCAPs are correctly
inherited across execve().

v5:
 -  Fix reading of HWCAPs from auxiliary vectors of compat processes.
 -  Defer HWCAP masking until ELF table creation (create_elf_tables)
    to handle compat process correctly.
 -  arm64: Disable HWCAP inheritance on architecture switch (e.g.,
    AArch64 to AArch32) by clearing MMF_USER_HWCAP, as HWCAP bits have
    completely different meanings across these architectures.

v4: minor fixes based on feedback from the previous version.
v3: synchronize saved_auxv access with arg_lock

v1: https://lkml.org/lkml/2025/12/5/65
v2: https://lkml.org/lkml/2026/1/8/219
v3: https://lkml.org/lkml/2026/2/9/1233
v4: https://lkml.org/lkml/2026/2/17/963

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Chen Ridong <chenridong@huawei.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: David Hildenbrand <david@kernel.org>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Kees Cook <kees@kernel.org>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Michal Koutny <mkoutny@suse.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>

Andrei Vagin (4):
  exec: inherit HWCAPs from the parent process
  arm64: elf: clear MMF_USER_HWCAP on architecture switch
  mm: synchronize saved_auxv access with arg_lock
  selftests/exec: add test for HWCAP inheritance

 arch/arm64/include/asm/elf.h                 |  12 ++-
 fs/binfmt_elf.c                              |  13 ++-
 fs/binfmt_elf_fdpic.c                        |  13 ++-
 fs/exec.c                                    |  54 ++++++++++
 fs/proc/base.c                               |  12 ++-
 include/linux/binfmts.h                      |  11 ++
 include/linux/mm_types.h                     |   3 +-
 kernel/fork.c                                |   8 ++
 kernel/sys.c                                 |  30 +++---
 tools/testing/selftests/exec/.gitignore      |   1 +
 tools/testing/selftests/exec/Makefile        |   1 +
 tools/testing/selftests/exec/hwcap_inherit.c | 105 +++++++++++++++++++
 12 files changed, 234 insertions(+), 29 deletions(-)
 create mode 100644 tools/testing/selftests/exec/hwcap_inherit.c

-- 
2.53.0.959.g497ff81fa9-goog


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/4] exec: inherit HWCAPs from the parent process
  2026-03-23 17:53 [PATCH 0/4 v5] exec: inherit HWCAPs from the parent process Andrei Vagin
@ 2026-03-23 17:53 ` Andrei Vagin
  2026-03-23 18:21   ` Mark Rutland
  2026-03-23 22:59   ` Marek Szyprowski
  2026-03-23 17:53 ` [PATCH 2/4] arm64: elf: clear MMF_USER_HWCAP on architecture switch Andrei Vagin
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 9+ messages in thread
From: Andrei Vagin @ 2026-03-23 17:53 UTC (permalink / raw)
  To: Kees Cook, Andrew Morton
  Cc: Marek Szyprowski, Cyrill Gorcunov, Mike Rapoport,
	Alexander Mikhalitsyn, linux-kernel, linux-fsdevel, linux-mm,
	criu, Catalin Marinas, Will Deacon, linux-arm-kernel, Chen Ridong,
	Christian Brauner, David Hildenbrand, Eric Biederman,
	Lorenzo Stoakes, Michal Koutny, Andrei Vagin,
	Alexander Mikhalitsyn

Introduces a mechanism to inherit hardware capabilities (AT_HWCAP,
AT_HWCAP2, etc.) from a parent process when they have been modified via
prctl.

To support C/R operations (snapshots, live migration) in heterogeneous
clusters, we must ensure that processes utilize CPU features available
on all potential target nodes. To solve this, we need to advertise a
common feature set across the cluster.

This patch adds a new mm flag MMF_USER_HWCAP, which is set when the
auxiliary vector is modified via prctl(PR_SET_MM, PR_SET_MM_AUXV).  When
execve() is called, if the current process has MMF_USER_HWCAP set, the
HWCAP values are extracted from the current auxiliary vector and stored
in the linux_binprm structure. These values are then used to populate
the auxiliary vector of the new process, effectively inheriting the
hardware capabilities.

The inherited HWCAPs are masked with the hardware capabilities supported
by the current kernel to ensure that we don't report more features than
actually supported. This is important to avoid unexpected behavior,
especially for processes with additional privileges.

Reviewed-by: Cyrill Gorcunov <gorcunov@gmail.com>
Reviewed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@futurfusion.io>
Signed-off-by: Andrei Vagin <avagin@google.com>
---
 fs/binfmt_elf.c          | 13 ++++++---
 fs/binfmt_elf_fdpic.c    | 13 ++++++---
 fs/exec.c                | 62 ++++++++++++++++++++++++++++++++++++++++
 include/linux/binfmts.h  | 11 +++++++
 include/linux/mm_types.h |  2 ++
 kernel/fork.c            |  3 ++
 kernel/sys.c             |  5 +++-
 7 files changed, 100 insertions(+), 9 deletions(-)

diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index fb857faaf0d6..d99db73c76f0 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -183,6 +183,7 @@ create_elf_tables(struct linux_binprm *bprm, const struct elfhdr *exec,
 	int ei_index;
 	const struct cred *cred = current_cred();
 	struct vm_area_struct *vma;
+	bool user_hwcap = mm_flags_test(MMF_USER_HWCAP, mm);
 
 	/*
 	 * In some cases (e.g. Hyper-Threading), we want to avoid L1
@@ -247,7 +248,8 @@ create_elf_tables(struct linux_binprm *bprm, const struct elfhdr *exec,
 	 */
 	ARCH_DLINFO;
 #endif
-	NEW_AUX_ENT(AT_HWCAP, ELF_HWCAP);
+	NEW_AUX_ENT(AT_HWCAP, user_hwcap ?
+			      (bprm->hwcap & ELF_HWCAP) : ELF_HWCAP);
 	NEW_AUX_ENT(AT_PAGESZ, ELF_EXEC_PAGESIZE);
 	NEW_AUX_ENT(AT_CLKTCK, CLOCKS_PER_SEC);
 	NEW_AUX_ENT(AT_PHDR, phdr_addr);
@@ -265,13 +267,16 @@ create_elf_tables(struct linux_binprm *bprm, const struct elfhdr *exec,
 	NEW_AUX_ENT(AT_SECURE, bprm->secureexec);
 	NEW_AUX_ENT(AT_RANDOM, (elf_addr_t)(unsigned long)u_rand_bytes);
 #ifdef ELF_HWCAP2
-	NEW_AUX_ENT(AT_HWCAP2, ELF_HWCAP2);
+	NEW_AUX_ENT(AT_HWCAP2, user_hwcap ?
+			       (bprm->hwcap2 & ELF_HWCAP2) : ELF_HWCAP2);
 #endif
 #ifdef ELF_HWCAP3
-	NEW_AUX_ENT(AT_HWCAP3, ELF_HWCAP3);
+	NEW_AUX_ENT(AT_HWCAP3, user_hwcap ?
+			       (bprm->hwcap3 & ELF_HWCAP3) : ELF_HWCAP3);
 #endif
 #ifdef ELF_HWCAP4
-	NEW_AUX_ENT(AT_HWCAP4, ELF_HWCAP4);
+	NEW_AUX_ENT(AT_HWCAP4, user_hwcap ?
+			       (bprm->hwcap4 & ELF_HWCAP4) : ELF_HWCAP4);
 #endif
 	NEW_AUX_ENT(AT_EXECFN, bprm->exec);
 	if (k_platform) {
diff --git a/fs/binfmt_elf_fdpic.c b/fs/binfmt_elf_fdpic.c
index 95b65aab7daa..92c88471455a 100644
--- a/fs/binfmt_elf_fdpic.c
+++ b/fs/binfmt_elf_fdpic.c
@@ -508,6 +508,7 @@ static int create_elf_fdpic_tables(struct linux_binprm *bprm,
 	unsigned long flags = 0;
 	int ei_index;
 	elf_addr_t *elf_info;
+	bool user_hwcap = mm_flags_test(MMF_USER_HWCAP, mm);
 
 #ifdef CONFIG_MMU
 	/* In some cases (e.g. Hyper-Threading), we want to avoid L1 evictions
@@ -629,15 +630,19 @@ static int create_elf_fdpic_tables(struct linux_binprm *bprm,
 	 */
 	ARCH_DLINFO;
 #endif
-	NEW_AUX_ENT(AT_HWCAP,	ELF_HWCAP);
+	NEW_AUX_ENT(AT_HWCAP,	user_hwcap ?
+				(bprm->hwcap & ELF_HWCAP) : ELF_HWCAP);
 #ifdef ELF_HWCAP2
-	NEW_AUX_ENT(AT_HWCAP2,	ELF_HWCAP2);
+	NEW_AUX_ENT(AT_HWCAP2,	user_hwcap ?
+				(bprm->hwcap2 & ELF_HWCAP2) : ELF_HWCAP2);
 #endif
 #ifdef ELF_HWCAP3
-	NEW_AUX_ENT(AT_HWCAP3,	ELF_HWCAP3);
+	NEW_AUX_ENT(AT_HWCAP3,	user_hwcap ?
+				(bprm->hwcap3 & ELF_HWCAP3) : ELF_HWCAP3);
 #endif
 #ifdef ELF_HWCAP4
-	NEW_AUX_ENT(AT_HWCAP4,	ELF_HWCAP4);
+	NEW_AUX_ENT(AT_HWCAP4,	user_hwcap ?
+				(bprm->hwcap4 & ELF_HWCAP4) : ELF_HWCAP4);
 #endif
 	NEW_AUX_ENT(AT_PAGESZ,	PAGE_SIZE);
 	NEW_AUX_ENT(AT_CLKTCK,	CLOCKS_PER_SEC);
diff --git a/fs/exec.c b/fs/exec.c
index 9ea3a775d51e..1cd7d87a0e79 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1775,6 +1775,65 @@ static int bprm_execve(struct linux_binprm *bprm)
 	return retval;
 }
 
+static void inherit_hwcap(struct linux_binprm *bprm)
+{
+	struct mm_struct *mm = current->mm;
+	bool compat = in_compat_syscall();
+	int i, n;
+
+#ifdef ELF_HWCAP4
+	n = 4;
+#elif defined(ELF_HWCAP3)
+	n = 3;
+#elif defined(ELF_HWCAP2)
+	n = 2;
+#else
+	n = 1;
+#endif
+
+	for (i = 0; n && i < AT_VECTOR_SIZE; i += 2) {
+		unsigned long type, val;
+
+		if (!compat) {
+			type = mm->saved_auxv[i];
+			val = mm->saved_auxv[i + 1];
+		} else {
+			compat_uptr_t *auxv = (compat_uptr_t *)mm->saved_auxv;
+
+			type = auxv[i];
+			val = auxv[i + 1];
+		}
+
+		switch (type) {
+		case AT_NULL:
+			goto done;
+		case AT_HWCAP:
+			bprm->hwcap = val;
+			break;
+#ifdef ELF_HWCAP2
+		case AT_HWCAP2:
+			bprm->hwcap2 = val;
+			break;
+#endif
+#ifdef ELF_HWCAP3
+		case AT_HWCAP3:
+			bprm->hwcap3 = val;
+			break;
+#endif
+#ifdef ELF_HWCAP4
+		case AT_HWCAP4:
+			bprm->hwcap4 = val;
+			break;
+#endif
+		default:
+			continue;
+		}
+		n--;
+	}
+done:
+	mm_flags_set(MMF_USER_HWCAP, bprm->mm);
+}
+
 static int do_execveat_common(int fd, struct filename *filename,
 			      struct user_arg_ptr argv,
 			      struct user_arg_ptr envp,
@@ -1843,6 +1902,9 @@ static int do_execveat_common(int fd, struct filename *filename,
 			     current->comm, bprm->filename);
 	}
 
+	if (mm_flags_test(MMF_USER_HWCAP, current->mm))
+		inherit_hwcap(bprm);
+
 	return bprm_execve(bprm);
 }
 
diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h
index 65abd5ab8836..94a3dcf9b1d2 100644
--- a/include/linux/binfmts.h
+++ b/include/linux/binfmts.h
@@ -2,6 +2,7 @@
 #ifndef _LINUX_BINFMTS_H
 #define _LINUX_BINFMTS_H
 
+#include <linux/elf.h>
 #include <linux/sched.h>
 #include <linux/unistd.h>
 #include <asm/exec.h>
@@ -67,6 +68,16 @@ struct linux_binprm {
 	unsigned long exec;
 
 	struct rlimit rlim_stack; /* Saved RLIMIT_STACK used during exec. */
+	unsigned long hwcap;
+#ifdef ELF_HWCAP2
+	unsigned long hwcap2;
+#endif
+#ifdef ELF_HWCAP3
+	unsigned long hwcap3;
+#endif
+#ifdef ELF_HWCAP4
+	unsigned long hwcap4;
+#endif
 
 	char buf[BINPRM_BUF_SIZE];
 } __randomize_layout;
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 3cc8ae722886..62dde645f469 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -1919,6 +1919,8 @@ enum {
 #define MMF_TOPDOWN		31	/* mm searches top down by default */
 #define MMF_TOPDOWN_MASK	BIT(MMF_TOPDOWN)
 
+#define MMF_USER_HWCAP		32	/* user-defined HWCAPs */
+
 #define MMF_INIT_LEGACY_MASK	(MMF_DUMPABLE_MASK | MMF_DUMP_FILTER_MASK |\
 				 MMF_DISABLE_THP_MASK | MMF_HAS_MDWE_MASK |\
 				 MMF_VM_MERGE_ANY_MASK | MMF_TOPDOWN_MASK)
diff --git a/kernel/fork.c b/kernel/fork.c
index bc2bf58b93b6..2ac277aa078c 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1105,6 +1105,9 @@ static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p,
 
 		__mm_flags_overwrite_word(mm, mmf_init_legacy_flags(flags));
 		mm->def_flags = current->mm->def_flags & VM_INIT_DEF_MASK;
+
+		if (mm_flags_test(MMF_USER_HWCAP, current->mm))
+			mm_flags_set(MMF_USER_HWCAP, mm);
 	} else {
 		__mm_flags_overwrite_word(mm, default_dump_filter);
 		mm->def_flags = 0;
diff --git a/kernel/sys.c b/kernel/sys.c
index cdbf8513caf6..e4b0fa2f6845 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -2157,8 +2157,10 @@ static int prctl_set_mm_map(int opt, const void __user *addr, unsigned long data
 	 * not introduce additional locks here making the kernel
 	 * more complex.
 	 */
-	if (prctl_map.auxv_size)
+	if (prctl_map.auxv_size) {
 		memcpy(mm->saved_auxv, user_auxv, sizeof(user_auxv));
+		mm_flags_set(MMF_USER_HWCAP, mm);
+	}
 
 	mmap_read_unlock(mm);
 	return 0;
@@ -2190,6 +2192,7 @@ static int prctl_set_auxv(struct mm_struct *mm, unsigned long addr,
 
 	task_lock(current);
 	memcpy(mm->saved_auxv, user_auxv, len);
+	mm_flags_set(MMF_USER_HWCAP, mm);
 	task_unlock(current);
 
 	return 0;
-- 
2.53.0.983.g0bb29b3bc5-goog


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 2/4] arm64: elf: clear MMF_USER_HWCAP on architecture switch
  2026-03-23 17:53 [PATCH 0/4 v5] exec: inherit HWCAPs from the parent process Andrei Vagin
  2026-03-23 17:53 ` [PATCH 1/4] " Andrei Vagin
@ 2026-03-23 17:53 ` Andrei Vagin
  2026-03-23 17:53 ` [PATCH 3/4] mm: synchronize saved_auxv access with arg_lock Andrei Vagin
  2026-03-23 17:53 ` [PATCH 4/4] selftests/exec: add test for HWCAP inheritance Andrei Vagin
  3 siblings, 0 replies; 9+ messages in thread
From: Andrei Vagin @ 2026-03-23 17:53 UTC (permalink / raw)
  To: Kees Cook, Andrew Morton
  Cc: Marek Szyprowski, Cyrill Gorcunov, Mike Rapoport,
	Alexander Mikhalitsyn, linux-kernel, linux-fsdevel, linux-mm,
	criu, Catalin Marinas, Will Deacon, linux-arm-kernel, Chen Ridong,
	Christian Brauner, David Hildenbrand, Eric Biederman,
	Lorenzo Stoakes, Michal Koutny, Andrei Vagin

The HWCAP bits have different meanings between AArch64 and AArch32,
so HWCAP inheritance is not applicable when switching architectures.
Inherited HWCAP vectors can lead to unpredictable side effects.  For
example, bit 0 in AArch64 signifies FP support, whereas in AArch32 it
signifies SWP instruction support.

Fix this by clearing the MMF_USER_HWCAP flag in SET_PERSONALITY and
COMPAT_SET_PERSONALITY if the architecture is changing. This ensures
that create_elf_tables() will use the default kernel HWCAPs for the new
process.

Signed-off-by: Andrei Vagin <avagin@google.com>
---
 arch/arm64/include/asm/elf.h | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/include/asm/elf.h b/arch/arm64/include/asm/elf.h
index d2779d604c7b..2049d42e2e6a 100644
--- a/arch/arm64/include/asm/elf.h
+++ b/arch/arm64/include/asm/elf.h
@@ -160,7 +160,10 @@ typedef struct user_fpsimd_state elf_fpregset_t;
 
 #define SET_PERSONALITY(ex)						\
 ({									\
-	clear_thread_flag(TIF_32BIT);					\
+	if (test_thread_flag(TIF_32BIT)) {				\
+		mm_flags_clear(MMF_USER_HWCAP, current->mm);		\
+		clear_thread_flag(TIF_32BIT);				\
+	}								\
 	current->personality &= ~READ_IMPLIES_EXEC;			\
 })
 
@@ -223,8 +226,11 @@ int compat_elf_check_arch(const struct elf32_hdr *);
  */
 #define COMPAT_SET_PERSONALITY(ex)					\
 ({									\
-	set_thread_flag(TIF_32BIT);					\
- })
+	if (!test_thread_flag(TIF_32BIT)) {				\
+		mm_flags_clear(MMF_USER_HWCAP, current->mm);		\
+		set_thread_flag(TIF_32BIT);				\
+	}								\
+})
 #ifdef CONFIG_COMPAT_VDSO
 #define COMPAT_ARCH_DLINFO						\
 do {									\
-- 
2.53.0.983.g0bb29b3bc5-goog


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 3/4] mm: synchronize saved_auxv access with arg_lock
  2026-03-23 17:53 [PATCH 0/4 v5] exec: inherit HWCAPs from the parent process Andrei Vagin
  2026-03-23 17:53 ` [PATCH 1/4] " Andrei Vagin
  2026-03-23 17:53 ` [PATCH 2/4] arm64: elf: clear MMF_USER_HWCAP on architecture switch Andrei Vagin
@ 2026-03-23 17:53 ` Andrei Vagin
  2026-03-23 17:53 ` [PATCH 4/4] selftests/exec: add test for HWCAP inheritance Andrei Vagin
  3 siblings, 0 replies; 9+ messages in thread
From: Andrei Vagin @ 2026-03-23 17:53 UTC (permalink / raw)
  To: Kees Cook, Andrew Morton
  Cc: Marek Szyprowski, Cyrill Gorcunov, Mike Rapoport,
	Alexander Mikhalitsyn, linux-kernel, linux-fsdevel, linux-mm,
	criu, Catalin Marinas, Will Deacon, linux-arm-kernel, Chen Ridong,
	Christian Brauner, David Hildenbrand, Eric Biederman,
	Lorenzo Stoakes, Michal Koutny, Andrei Vagin,
	Alexander Mikhalitsyn

The mm->saved_auxv array stores the auxiliary vector, which can be
modified via prctl(PR_SET_MM_AUXV) or prctl(PR_SET_MM_MAP). Previously,
accesses to saved_auxv were not synchronized. This was a intentional
trade-off, as the vector was only used to provide information to
userspace via /proc/PID/auxv or prctl(PR_GET_AUXV), and consistency
between the auxv values left to userspace.

With the introduction of hardware capability (HWCAP) inheritance during
execve, the kernel now relies on the contents of saved_auxv to configure
the execution environment of new processes.  An unsynchronized read
during execve could result in a new process inheriting an inconsistent
set of capabilities if the parent process updates its auxiliary vector
concurrently.

While it is still not strictly required to guarantee the consistency of
auxv values on the kernel side, doing so is relatively straightforward.
This change implements synchronization using arg_lock.

Reviewed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@futurfusion.io>
Reviewed-by: Cyrill Gorcunov <gorcunov@gmail.com>
Reviewed-by: Michal Koutný <mkoutny@suse.com>
Signed-off-by: Andrei Vagin <avagin@google.com>
---
 fs/exec.c                |  2 ++
 fs/proc/base.c           | 12 +++++++++---
 include/linux/mm_types.h |  1 -
 kernel/fork.c            |  7 ++++++-
 kernel/sys.c             | 29 ++++++++++++++---------------
 5 files changed, 31 insertions(+), 20 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index 1cd7d87a0e79..dea868d058fa 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1791,6 +1791,7 @@ static void inherit_hwcap(struct linux_binprm *bprm)
 	n = 1;
 #endif
 
+	spin_lock(&mm->arg_lock);
 	for (i = 0; n && i < AT_VECTOR_SIZE; i += 2) {
 		unsigned long type, val;
 
@@ -1831,6 +1832,7 @@ static void inherit_hwcap(struct linux_binprm *bprm)
 		n--;
 	}
 done:
+	spin_unlock(&mm->arg_lock);
 	mm_flags_set(MMF_USER_HWCAP, bprm->mm);
 }
 
diff --git a/fs/proc/base.c b/fs/proc/base.c
index 4c863d17dfb4..b5496cec888e 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -1083,14 +1083,20 @@ static ssize_t auxv_read(struct file *file, char __user *buf,
 {
 	struct mm_struct *mm = file->private_data;
 	unsigned int nwords = 0;
+	unsigned long saved_auxv[AT_VECTOR_SIZE];
 
 	if (!mm)
 		return 0;
+
+	spin_lock(&mm->arg_lock);
+	memcpy(saved_auxv, mm->saved_auxv, sizeof(saved_auxv));
+	spin_unlock(&mm->arg_lock);
+
 	do {
 		nwords += 2;
-	} while (mm->saved_auxv[nwords - 2] != 0); /* AT_NULL */
-	return simple_read_from_buffer(buf, count, ppos, mm->saved_auxv,
-				       nwords * sizeof(mm->saved_auxv[0]));
+	} while (saved_auxv[nwords - 2] != 0); /* AT_NULL */
+	return simple_read_from_buffer(buf, count, ppos, saved_auxv,
+				       nwords * sizeof(saved_auxv[0]));
 }
 
 static const struct file_operations proc_auxv_operations = {
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 62dde645f469..10351af5851b 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -1255,7 +1255,6 @@ struct mm_struct {
 		unsigned long start_code, end_code, start_data, end_data;
 		unsigned long start_brk, brk, start_stack;
 		unsigned long arg_start, arg_end, env_start, env_end;
-
 		unsigned long saved_auxv[AT_VECTOR_SIZE]; /* for /proc/PID/auxv */
 
 #ifdef CONFIG_ARCH_HAS_ELF_CORE_EFLAGS
diff --git a/kernel/fork.c b/kernel/fork.c
index 2ac277aa078c..3880ce0d44f9 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1106,8 +1106,13 @@ static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p,
 		__mm_flags_overwrite_word(mm, mmf_init_legacy_flags(flags));
 		mm->def_flags = current->mm->def_flags & VM_INIT_DEF_MASK;
 
-		if (mm_flags_test(MMF_USER_HWCAP, current->mm))
+		if (mm_flags_test(MMF_USER_HWCAP, current->mm)) {
+			spin_lock(&current->mm->arg_lock);
 			mm_flags_set(MMF_USER_HWCAP, mm);
+			memcpy(mm->saved_auxv, current->mm->saved_auxv,
+			       sizeof(mm->saved_auxv));
+			spin_unlock(&current->mm->arg_lock);
+		}
 	} else {
 		__mm_flags_overwrite_word(mm, default_dump_filter);
 		mm->def_flags = 0;
diff --git a/kernel/sys.c b/kernel/sys.c
index e4b0fa2f6845..c679b5797e73 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -2147,20 +2147,11 @@ static int prctl_set_mm_map(int opt, const void __user *addr, unsigned long data
 	mm->arg_end	= prctl_map.arg_end;
 	mm->env_start	= prctl_map.env_start;
 	mm->env_end	= prctl_map.env_end;
-	spin_unlock(&mm->arg_lock);
-
-	/*
-	 * Note this update of @saved_auxv is lockless thus
-	 * if someone reads this member in procfs while we're
-	 * updating -- it may get partly updated results. It's
-	 * known and acceptable trade off: we leave it as is to
-	 * not introduce additional locks here making the kernel
-	 * more complex.
-	 */
 	if (prctl_map.auxv_size) {
-		memcpy(mm->saved_auxv, user_auxv, sizeof(user_auxv));
 		mm_flags_set(MMF_USER_HWCAP, mm);
+		memcpy(mm->saved_auxv, user_auxv, sizeof(user_auxv));
 	}
+	spin_unlock(&mm->arg_lock);
 
 	mmap_read_unlock(mm);
 	return 0;
@@ -2190,10 +2181,10 @@ static int prctl_set_auxv(struct mm_struct *mm, unsigned long addr,
 
 	BUILD_BUG_ON(sizeof(user_auxv) != sizeof(mm->saved_auxv));
 
-	task_lock(current);
-	memcpy(mm->saved_auxv, user_auxv, len);
+	spin_lock(&mm->arg_lock);
 	mm_flags_set(MMF_USER_HWCAP, mm);
-	task_unlock(current);
+	memcpy(mm->saved_auxv, user_auxv, len);
+	spin_unlock(&mm->arg_lock);
 
 	return 0;
 }
@@ -2481,9 +2472,17 @@ static inline int prctl_get_mdwe(unsigned long arg2, unsigned long arg3,
 static int prctl_get_auxv(void __user *addr, unsigned long len)
 {
 	struct mm_struct *mm = current->mm;
+	unsigned long auxv[AT_VECTOR_SIZE];
 	unsigned long size = min_t(unsigned long, sizeof(mm->saved_auxv), len);
 
-	if (size && copy_to_user(addr, mm->saved_auxv, size))
+	if (!size)
+		return sizeof(mm->saved_auxv);
+
+	spin_lock(&mm->arg_lock);
+	memcpy(auxv, mm->saved_auxv, size);
+	spin_unlock(&mm->arg_lock);
+
+	if (copy_to_user(addr, auxv, size))
 		return -EFAULT;
 	return sizeof(mm->saved_auxv);
 }
-- 
2.53.0.983.g0bb29b3bc5-goog


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 4/4] selftests/exec: add test for HWCAP inheritance
  2026-03-23 17:53 [PATCH 0/4 v5] exec: inherit HWCAPs from the parent process Andrei Vagin
                   ` (2 preceding siblings ...)
  2026-03-23 17:53 ` [PATCH 3/4] mm: synchronize saved_auxv access with arg_lock Andrei Vagin
@ 2026-03-23 17:53 ` Andrei Vagin
  3 siblings, 0 replies; 9+ messages in thread
From: Andrei Vagin @ 2026-03-23 17:53 UTC (permalink / raw)
  To: Kees Cook, Andrew Morton
  Cc: Marek Szyprowski, Cyrill Gorcunov, Mike Rapoport,
	Alexander Mikhalitsyn, linux-kernel, linux-fsdevel, linux-mm,
	criu, Catalin Marinas, Will Deacon, linux-arm-kernel, Chen Ridong,
	Christian Brauner, David Hildenbrand, Eric Biederman,
	Lorenzo Stoakes, Michal Koutny, Andrei Vagin,
	Alexander Mikhalitsyn

Verify that HWCAPs are correctly inherited/preserved across execve() when
modified via prctl(PR_SET_MM_AUXV).

The test performs the following steps:
* reads the current AUXV using prctl(PR_GET_AUXV);
* finds an HWCAP entry and toggles its most significant bit;
* replaces the AUXV of the current process with the modified one using
  prctl(PR_SET_MM, PR_SET_MM_AUXV);
* executes itself to verify that the new program sees the modified HWCAP
  value.

Reviewed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@futurfusion.io>
Reviewed-by: Cyrill Gorcunov <gorcunov@gmail.com>
Reviewed-by: Kees Cook <kees@kernel.org>
Signed-off-by: Andrei Vagin <avagin@google.com>
---
 tools/testing/selftests/exec/.gitignore      |   1 +
 tools/testing/selftests/exec/Makefile        |   1 +
 tools/testing/selftests/exec/hwcap_inherit.c | 105 +++++++++++++++++++
 3 files changed, 107 insertions(+)
 create mode 100644 tools/testing/selftests/exec/hwcap_inherit.c

diff --git a/tools/testing/selftests/exec/.gitignore b/tools/testing/selftests/exec/.gitignore
index 7f3d1ae762ec..2ff245fd0ba6 100644
--- a/tools/testing/selftests/exec/.gitignore
+++ b/tools/testing/selftests/exec/.gitignore
@@ -19,3 +19,4 @@ null-argv
 xxxxxxxx*
 pipe
 S_I*.test
+hwcap_inherit
\ No newline at end of file
diff --git a/tools/testing/selftests/exec/Makefile b/tools/testing/selftests/exec/Makefile
index 45a3cfc435cf..e73005965e05 100644
--- a/tools/testing/selftests/exec/Makefile
+++ b/tools/testing/selftests/exec/Makefile
@@ -20,6 +20,7 @@ TEST_FILES := Makefile
 TEST_GEN_PROGS += recursion-depth
 TEST_GEN_PROGS += null-argv
 TEST_GEN_PROGS += check-exec
+TEST_GEN_PROGS += hwcap_inherit
 
 EXTRA_CLEAN := $(OUTPUT)/subdir.moved $(OUTPUT)/execveat.moved $(OUTPUT)/xxxxx*	\
 	       $(OUTPUT)/S_I*.test
diff --git a/tools/testing/selftests/exec/hwcap_inherit.c b/tools/testing/selftests/exec/hwcap_inherit.c
new file mode 100644
index 000000000000..1b43b2dbb1d0
--- /dev/null
+++ b/tools/testing/selftests/exec/hwcap_inherit.c
@@ -0,0 +1,105 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#define _GNU_SOURCE
+#include <sys/auxv.h>
+#include <sys/prctl.h>
+#include <sys/wait.h>
+#include <linux/prctl.h>
+#include <unistd.h>
+#include <stdlib.h>
+#include <stdio.h>
+#include <string.h>
+#include <errno.h>
+#include <elf.h>
+#include <linux/auxvec.h>
+
+#include "../kselftest.h"
+
+static int find_msb(unsigned long v)
+{
+	return sizeof(v)*8 - __builtin_clzl(v) - 1;
+}
+
+int main(int argc, char *argv[])
+{
+	unsigned long auxv[1024], hwcap, new_hwcap, hwcap_idx;
+	int size, hwcap_type = 0, hwcap_feature, count, status;
+	char hwcap_str[32], hwcap_type_str[32];
+	pid_t pid;
+
+	if (argc > 1 && strcmp(argv[1], "verify") == 0) {
+		unsigned long type = strtoul(argv[2], NULL, 16);
+		unsigned long expected = strtoul(argv[3], NULL, 16);
+		unsigned long hwcap = getauxval(type);
+
+		if (hwcap != expected) {
+			ksft_print_msg("HWCAP mismatch: type %lx, expected %lx, got %lx\n",
+					type, expected, hwcap);
+			return 1;
+		}
+		ksft_print_msg("HWCAP matched: %lx\n", hwcap);
+		return 0;
+	}
+
+	ksft_print_header();
+	ksft_set_plan(1);
+
+	size = prctl(PR_GET_AUXV, auxv, sizeof(auxv), 0, 0);
+	if (size == -1)
+		ksft_exit_fail_perror("prctl(PR_GET_AUXV)");
+
+	count = size / sizeof(unsigned long);
+
+	/* Find the "latest" feature and try to mask it out. */
+	for (int i = 0; i < count - 1; i += 2) {
+		hwcap = auxv[i + 1];
+		if (hwcap == 0)
+			continue;
+		switch (auxv[i]) {
+		case AT_HWCAP4:
+		case AT_HWCAP3:
+		case AT_HWCAP2:
+		case AT_HWCAP:
+			hwcap_type = auxv[i];
+			hwcap_feature = find_msb(hwcap);
+			hwcap_idx = i + 1;
+			break;
+		default:
+			continue;
+		}
+	}
+	if (hwcap_type == 0)
+		ksft_exit_skip("No features found, skipping test\n");
+	hwcap = auxv[hwcap_idx];
+	new_hwcap = hwcap ^ (1UL << hwcap_feature);
+	auxv[hwcap_idx] = new_hwcap;
+
+	if (prctl(PR_SET_MM, PR_SET_MM_AUXV, auxv, size, 0) < 0) {
+		if (errno == EPERM)
+			ksft_exit_skip("prctl(PR_SET_MM_AUXV) requires CAP_SYS_RESOURCE\n");
+		ksft_exit_fail_perror("prctl(PR_SET_MM_AUXV)");
+	}
+
+	pid = fork();
+	if (pid < 0)
+		ksft_exit_fail_perror("fork");
+	if (pid == 0) {
+		char *new_argv[] = { argv[0], "verify", hwcap_type_str, hwcap_str, NULL };
+
+		snprintf(hwcap_str, sizeof(hwcap_str), "%lx", new_hwcap);
+		snprintf(hwcap_type_str, sizeof(hwcap_type_str), "%x", hwcap_type);
+
+		execv(argv[0], new_argv);
+		perror("execv");
+		exit(1);
+	}
+
+	if (waitpid(pid, &status, 0) == -1)
+		ksft_exit_fail_perror("waitpid");
+	if (status != 0)
+		ksft_exit_fail_msg("HWCAP inheritance failed (status %d)\n", status);
+
+	ksft_test_result_pass("HWCAP inheritance succeeded\n");
+	ksft_exit_pass();
+	return 0;
+}
-- 
2.53.0.983.g0bb29b3bc5-goog


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/4] exec: inherit HWCAPs from the parent process
  2026-03-23 17:53 ` [PATCH 1/4] " Andrei Vagin
@ 2026-03-23 18:21   ` Mark Rutland
  2026-03-24 10:28     ` Will Deacon
  2026-03-23 22:59   ` Marek Szyprowski
  1 sibling, 1 reply; 9+ messages in thread
From: Mark Rutland @ 2026-03-23 18:21 UTC (permalink / raw)
  To: Andrei Vagin
  Cc: Kees Cook, Andrew Morton, Marek Szyprowski, Cyrill Gorcunov,
	Mike Rapoport, Alexander Mikhalitsyn, linux-kernel, linux-fsdevel,
	linux-mm, criu, Catalin Marinas, Will Deacon, linux-arm-kernel,
	Chen Ridong, Christian Brauner, David Hildenbrand, Eric Biederman,
	Lorenzo Stoakes, Michal Koutny, Alexander Mikhalitsyn

On Mon, Mar 23, 2026 at 05:53:37PM +0000, Andrei Vagin wrote:
> Introduces a mechanism to inherit hardware capabilities (AT_HWCAP,
> AT_HWCAP2, etc.) from a parent process when they have been modified via
> prctl.
> 
> To support C/R operations (snapshots, live migration) in heterogeneous
> clusters, we must ensure that processes utilize CPU features available
> on all potential target nodes. To solve this, we need to advertise a
> common feature set across the cluster.
> 
> This patch adds a new mm flag MMF_USER_HWCAP, which is set when the
> auxiliary vector is modified via prctl(PR_SET_MM, PR_SET_MM_AUXV).  When
> execve() is called, if the current process has MMF_USER_HWCAP set, the
> HWCAP values are extracted from the current auxiliary vector and stored
> in the linux_binprm structure. These values are then used to populate
> the auxiliary vector of the new process, effectively inheriting the
> hardware capabilities.
> 
> The inherited HWCAPs are masked with the hardware capabilities supported
> by the current kernel to ensure that we don't report more features than
> actually supported. This is important to avoid unexpected behavior,
> especially for processes with additional privileges.

At a high level, I don't think that's going to be sufficient:

* On an architecture with other userspace accessible feature
  identification mechanism registers (e.g. ID registers), userspace
  might read those. So you might need to hide stuff there too, and
  that's going to require architecture-specific interfaces to manage.

  It's possible that some code checks HWCAPs and others check ID
  registers, and mismatch between the two could be problematic.

* If the HWCAPs can be inherited by a more privileged task, then a
  malicious user could use this to hide security features (e.g. shadow
  stack or pointer authentication on arm64), and make it easier to
  attack that task. While not a direct attack, it would undermine those
  features.

Mark.

> Reviewed-by: Cyrill Gorcunov <gorcunov@gmail.com>
> Reviewed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@futurfusion.io>
> Signed-off-by: Andrei Vagin <avagin@google.com>
> ---
>  fs/binfmt_elf.c          | 13 ++++++---
>  fs/binfmt_elf_fdpic.c    | 13 ++++++---
>  fs/exec.c                | 62 ++++++++++++++++++++++++++++++++++++++++
>  include/linux/binfmts.h  | 11 +++++++
>  include/linux/mm_types.h |  2 ++
>  kernel/fork.c            |  3 ++
>  kernel/sys.c             |  5 +++-
>  7 files changed, 100 insertions(+), 9 deletions(-)
> 
> diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
> index fb857faaf0d6..d99db73c76f0 100644
> --- a/fs/binfmt_elf.c
> +++ b/fs/binfmt_elf.c
> @@ -183,6 +183,7 @@ create_elf_tables(struct linux_binprm *bprm, const struct elfhdr *exec,
>  	int ei_index;
>  	const struct cred *cred = current_cred();
>  	struct vm_area_struct *vma;
> +	bool user_hwcap = mm_flags_test(MMF_USER_HWCAP, mm);
>  
>  	/*
>  	 * In some cases (e.g. Hyper-Threading), we want to avoid L1
> @@ -247,7 +248,8 @@ create_elf_tables(struct linux_binprm *bprm, const struct elfhdr *exec,
>  	 */
>  	ARCH_DLINFO;
>  #endif
> -	NEW_AUX_ENT(AT_HWCAP, ELF_HWCAP);
> +	NEW_AUX_ENT(AT_HWCAP, user_hwcap ?
> +			      (bprm->hwcap & ELF_HWCAP) : ELF_HWCAP);
>  	NEW_AUX_ENT(AT_PAGESZ, ELF_EXEC_PAGESIZE);
>  	NEW_AUX_ENT(AT_CLKTCK, CLOCKS_PER_SEC);
>  	NEW_AUX_ENT(AT_PHDR, phdr_addr);
> @@ -265,13 +267,16 @@ create_elf_tables(struct linux_binprm *bprm, const struct elfhdr *exec,
>  	NEW_AUX_ENT(AT_SECURE, bprm->secureexec);
>  	NEW_AUX_ENT(AT_RANDOM, (elf_addr_t)(unsigned long)u_rand_bytes);
>  #ifdef ELF_HWCAP2
> -	NEW_AUX_ENT(AT_HWCAP2, ELF_HWCAP2);
> +	NEW_AUX_ENT(AT_HWCAP2, user_hwcap ?
> +			       (bprm->hwcap2 & ELF_HWCAP2) : ELF_HWCAP2);
>  #endif
>  #ifdef ELF_HWCAP3
> -	NEW_AUX_ENT(AT_HWCAP3, ELF_HWCAP3);
> +	NEW_AUX_ENT(AT_HWCAP3, user_hwcap ?
> +			       (bprm->hwcap3 & ELF_HWCAP3) : ELF_HWCAP3);
>  #endif
>  #ifdef ELF_HWCAP4
> -	NEW_AUX_ENT(AT_HWCAP4, ELF_HWCAP4);
> +	NEW_AUX_ENT(AT_HWCAP4, user_hwcap ?
> +			       (bprm->hwcap4 & ELF_HWCAP4) : ELF_HWCAP4);
>  #endif
>  	NEW_AUX_ENT(AT_EXECFN, bprm->exec);
>  	if (k_platform) {
> diff --git a/fs/binfmt_elf_fdpic.c b/fs/binfmt_elf_fdpic.c
> index 95b65aab7daa..92c88471455a 100644
> --- a/fs/binfmt_elf_fdpic.c
> +++ b/fs/binfmt_elf_fdpic.c
> @@ -508,6 +508,7 @@ static int create_elf_fdpic_tables(struct linux_binprm *bprm,
>  	unsigned long flags = 0;
>  	int ei_index;
>  	elf_addr_t *elf_info;
> +	bool user_hwcap = mm_flags_test(MMF_USER_HWCAP, mm);
>  
>  #ifdef CONFIG_MMU
>  	/* In some cases (e.g. Hyper-Threading), we want to avoid L1 evictions
> @@ -629,15 +630,19 @@ static int create_elf_fdpic_tables(struct linux_binprm *bprm,
>  	 */
>  	ARCH_DLINFO;
>  #endif
> -	NEW_AUX_ENT(AT_HWCAP,	ELF_HWCAP);
> +	NEW_AUX_ENT(AT_HWCAP,	user_hwcap ?
> +				(bprm->hwcap & ELF_HWCAP) : ELF_HWCAP);
>  #ifdef ELF_HWCAP2
> -	NEW_AUX_ENT(AT_HWCAP2,	ELF_HWCAP2);
> +	NEW_AUX_ENT(AT_HWCAP2,	user_hwcap ?
> +				(bprm->hwcap2 & ELF_HWCAP2) : ELF_HWCAP2);
>  #endif
>  #ifdef ELF_HWCAP3
> -	NEW_AUX_ENT(AT_HWCAP3,	ELF_HWCAP3);
> +	NEW_AUX_ENT(AT_HWCAP3,	user_hwcap ?
> +				(bprm->hwcap3 & ELF_HWCAP3) : ELF_HWCAP3);
>  #endif
>  #ifdef ELF_HWCAP4
> -	NEW_AUX_ENT(AT_HWCAP4,	ELF_HWCAP4);
> +	NEW_AUX_ENT(AT_HWCAP4,	user_hwcap ?
> +				(bprm->hwcap4 & ELF_HWCAP4) : ELF_HWCAP4);
>  #endif
>  	NEW_AUX_ENT(AT_PAGESZ,	PAGE_SIZE);
>  	NEW_AUX_ENT(AT_CLKTCK,	CLOCKS_PER_SEC);
> diff --git a/fs/exec.c b/fs/exec.c
> index 9ea3a775d51e..1cd7d87a0e79 100644
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -1775,6 +1775,65 @@ static int bprm_execve(struct linux_binprm *bprm)
>  	return retval;
>  }
>  
> +static void inherit_hwcap(struct linux_binprm *bprm)
> +{
> +	struct mm_struct *mm = current->mm;
> +	bool compat = in_compat_syscall();
> +	int i, n;
> +
> +#ifdef ELF_HWCAP4
> +	n = 4;
> +#elif defined(ELF_HWCAP3)
> +	n = 3;
> +#elif defined(ELF_HWCAP2)
> +	n = 2;
> +#else
> +	n = 1;
> +#endif
> +
> +	for (i = 0; n && i < AT_VECTOR_SIZE; i += 2) {
> +		unsigned long type, val;
> +
> +		if (!compat) {
> +			type = mm->saved_auxv[i];
> +			val = mm->saved_auxv[i + 1];
> +		} else {
> +			compat_uptr_t *auxv = (compat_uptr_t *)mm->saved_auxv;
> +
> +			type = auxv[i];
> +			val = auxv[i + 1];
> +		}
> +
> +		switch (type) {
> +		case AT_NULL:
> +			goto done;
> +		case AT_HWCAP:
> +			bprm->hwcap = val;
> +			break;
> +#ifdef ELF_HWCAP2
> +		case AT_HWCAP2:
> +			bprm->hwcap2 = val;
> +			break;
> +#endif
> +#ifdef ELF_HWCAP3
> +		case AT_HWCAP3:
> +			bprm->hwcap3 = val;
> +			break;
> +#endif
> +#ifdef ELF_HWCAP4
> +		case AT_HWCAP4:
> +			bprm->hwcap4 = val;
> +			break;
> +#endif
> +		default:
> +			continue;
> +		}
> +		n--;
> +	}
> +done:
> +	mm_flags_set(MMF_USER_HWCAP, bprm->mm);
> +}
> +
>  static int do_execveat_common(int fd, struct filename *filename,
>  			      struct user_arg_ptr argv,
>  			      struct user_arg_ptr envp,
> @@ -1843,6 +1902,9 @@ static int do_execveat_common(int fd, struct filename *filename,
>  			     current->comm, bprm->filename);
>  	}
>  
> +	if (mm_flags_test(MMF_USER_HWCAP, current->mm))
> +		inherit_hwcap(bprm);
> +
>  	return bprm_execve(bprm);
>  }
>  
> diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h
> index 65abd5ab8836..94a3dcf9b1d2 100644
> --- a/include/linux/binfmts.h
> +++ b/include/linux/binfmts.h
> @@ -2,6 +2,7 @@
>  #ifndef _LINUX_BINFMTS_H
>  #define _LINUX_BINFMTS_H
>  
> +#include <linux/elf.h>
>  #include <linux/sched.h>
>  #include <linux/unistd.h>
>  #include <asm/exec.h>
> @@ -67,6 +68,16 @@ struct linux_binprm {
>  	unsigned long exec;
>  
>  	struct rlimit rlim_stack; /* Saved RLIMIT_STACK used during exec. */
> +	unsigned long hwcap;
> +#ifdef ELF_HWCAP2
> +	unsigned long hwcap2;
> +#endif
> +#ifdef ELF_HWCAP3
> +	unsigned long hwcap3;
> +#endif
> +#ifdef ELF_HWCAP4
> +	unsigned long hwcap4;
> +#endif
>  
>  	char buf[BINPRM_BUF_SIZE];
>  } __randomize_layout;
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index 3cc8ae722886..62dde645f469 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -1919,6 +1919,8 @@ enum {
>  #define MMF_TOPDOWN		31	/* mm searches top down by default */
>  #define MMF_TOPDOWN_MASK	BIT(MMF_TOPDOWN)
>  
> +#define MMF_USER_HWCAP		32	/* user-defined HWCAPs */
> +
>  #define MMF_INIT_LEGACY_MASK	(MMF_DUMPABLE_MASK | MMF_DUMP_FILTER_MASK |\
>  				 MMF_DISABLE_THP_MASK | MMF_HAS_MDWE_MASK |\
>  				 MMF_VM_MERGE_ANY_MASK | MMF_TOPDOWN_MASK)
> diff --git a/kernel/fork.c b/kernel/fork.c
> index bc2bf58b93b6..2ac277aa078c 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -1105,6 +1105,9 @@ static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p,
>  
>  		__mm_flags_overwrite_word(mm, mmf_init_legacy_flags(flags));
>  		mm->def_flags = current->mm->def_flags & VM_INIT_DEF_MASK;
> +
> +		if (mm_flags_test(MMF_USER_HWCAP, current->mm))
> +			mm_flags_set(MMF_USER_HWCAP, mm);
>  	} else {
>  		__mm_flags_overwrite_word(mm, default_dump_filter);
>  		mm->def_flags = 0;
> diff --git a/kernel/sys.c b/kernel/sys.c
> index cdbf8513caf6..e4b0fa2f6845 100644
> --- a/kernel/sys.c
> +++ b/kernel/sys.c
> @@ -2157,8 +2157,10 @@ static int prctl_set_mm_map(int opt, const void __user *addr, unsigned long data
>  	 * not introduce additional locks here making the kernel
>  	 * more complex.
>  	 */
> -	if (prctl_map.auxv_size)
> +	if (prctl_map.auxv_size) {
>  		memcpy(mm->saved_auxv, user_auxv, sizeof(user_auxv));
> +		mm_flags_set(MMF_USER_HWCAP, mm);
> +	}
>  
>  	mmap_read_unlock(mm);
>  	return 0;
> @@ -2190,6 +2192,7 @@ static int prctl_set_auxv(struct mm_struct *mm, unsigned long addr,
>  
>  	task_lock(current);
>  	memcpy(mm->saved_auxv, user_auxv, len);
> +	mm_flags_set(MMF_USER_HWCAP, mm);
>  	task_unlock(current);
>  
>  	return 0;
> -- 
> 2.53.0.983.g0bb29b3bc5-goog
> 
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/4] exec: inherit HWCAPs from the parent process
  2026-03-23 17:53 ` [PATCH 1/4] " Andrei Vagin
  2026-03-23 18:21   ` Mark Rutland
@ 2026-03-23 22:59   ` Marek Szyprowski
  1 sibling, 0 replies; 9+ messages in thread
From: Marek Szyprowski @ 2026-03-23 22:59 UTC (permalink / raw)
  To: Andrei Vagin, Kees Cook, Andrew Morton
  Cc: Cyrill Gorcunov, Mike Rapoport, Alexander Mikhalitsyn,
	linux-kernel, linux-fsdevel, linux-mm, criu, Catalin Marinas,
	Will Deacon, linux-arm-kernel, Chen Ridong, Christian Brauner,
	David Hildenbrand, Eric Biederman, Lorenzo Stoakes, Michal Koutny,
	Alexander Mikhalitsyn

On 23.03.2026 18:53, Andrei Vagin wrote:
> Introduces a mechanism to inherit hardware capabilities (AT_HWCAP,
> AT_HWCAP2, etc.) from a parent process when they have been modified via
> prctl.
>
> To support C/R operations (snapshots, live migration) in heterogeneous
> clusters, we must ensure that processes utilize CPU features available
> on all potential target nodes. To solve this, we need to advertise a
> common feature set across the cluster.
>
> This patch adds a new mm flag MMF_USER_HWCAP, which is set when the
> auxiliary vector is modified via prctl(PR_SET_MM, PR_SET_MM_AUXV).  When
> execve() is called, if the current process has MMF_USER_HWCAP set, the
> HWCAP values are extracted from the current auxiliary vector and stored
> in the linux_binprm structure. These values are then used to populate
> the auxiliary vector of the new process, effectively inheriting the
> hardware capabilities.
>
> The inherited HWCAPs are masked with the hardware capabilities supported
> by the current kernel to ensure that we don't report more features than
> actually supported. This is important to avoid unexpected behavior,
> especially for processes with additional privileges.
>
> Reviewed-by: Cyrill Gorcunov <gorcunov@gmail.com>
> Reviewed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@futurfusion.io>
> Signed-off-by: Andrei Vagin <avagin@google.com>

v5 fixed the issue I've observed here:

https://lore.kernel.org/all/aec9c36d-d67a-4b61-9950-57b95afedf75@samsung.com/

Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>

> ---
>   fs/binfmt_elf.c          | 13 ++++++---
>   fs/binfmt_elf_fdpic.c    | 13 ++++++---
>   fs/exec.c                | 62 ++++++++++++++++++++++++++++++++++++++++
>   include/linux/binfmts.h  | 11 +++++++
>   include/linux/mm_types.h |  2 ++
>   kernel/fork.c            |  3 ++
>   kernel/sys.c             |  5 +++-
>   7 files changed, 100 insertions(+), 9 deletions(-)
>
> diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
> index fb857faaf0d6..d99db73c76f0 100644
> --- a/fs/binfmt_elf.c
> +++ b/fs/binfmt_elf.c
> @@ -183,6 +183,7 @@ create_elf_tables(struct linux_binprm *bprm, const struct elfhdr *exec,
>   	int ei_index;
>   	const struct cred *cred = current_cred();
>   	struct vm_area_struct *vma;
> +	bool user_hwcap = mm_flags_test(MMF_USER_HWCAP, mm);
>   
>   	/*
>   	 * In some cases (e.g. Hyper-Threading), we want to avoid L1
> @@ -247,7 +248,8 @@ create_elf_tables(struct linux_binprm *bprm, const struct elfhdr *exec,
>   	 */
>   	ARCH_DLINFO;
>   #endif
> -	NEW_AUX_ENT(AT_HWCAP, ELF_HWCAP);
> +	NEW_AUX_ENT(AT_HWCAP, user_hwcap ?
> +			      (bprm->hwcap & ELF_HWCAP) : ELF_HWCAP);
>   	NEW_AUX_ENT(AT_PAGESZ, ELF_EXEC_PAGESIZE);
>   	NEW_AUX_ENT(AT_CLKTCK, CLOCKS_PER_SEC);
>   	NEW_AUX_ENT(AT_PHDR, phdr_addr);
> @@ -265,13 +267,16 @@ create_elf_tables(struct linux_binprm *bprm, const struct elfhdr *exec,
>   	NEW_AUX_ENT(AT_SECURE, bprm->secureexec);
>   	NEW_AUX_ENT(AT_RANDOM, (elf_addr_t)(unsigned long)u_rand_bytes);
>   #ifdef ELF_HWCAP2
> -	NEW_AUX_ENT(AT_HWCAP2, ELF_HWCAP2);
> +	NEW_AUX_ENT(AT_HWCAP2, user_hwcap ?
> +			       (bprm->hwcap2 & ELF_HWCAP2) : ELF_HWCAP2);
>   #endif
>   #ifdef ELF_HWCAP3
> -	NEW_AUX_ENT(AT_HWCAP3, ELF_HWCAP3);
> +	NEW_AUX_ENT(AT_HWCAP3, user_hwcap ?
> +			       (bprm->hwcap3 & ELF_HWCAP3) : ELF_HWCAP3);
>   #endif
>   #ifdef ELF_HWCAP4
> -	NEW_AUX_ENT(AT_HWCAP4, ELF_HWCAP4);
> +	NEW_AUX_ENT(AT_HWCAP4, user_hwcap ?
> +			       (bprm->hwcap4 & ELF_HWCAP4) : ELF_HWCAP4);
>   #endif
>   	NEW_AUX_ENT(AT_EXECFN, bprm->exec);
>   	if (k_platform) {
> diff --git a/fs/binfmt_elf_fdpic.c b/fs/binfmt_elf_fdpic.c
> index 95b65aab7daa..92c88471455a 100644
> --- a/fs/binfmt_elf_fdpic.c
> +++ b/fs/binfmt_elf_fdpic.c
> @@ -508,6 +508,7 @@ static int create_elf_fdpic_tables(struct linux_binprm *bprm,
>   	unsigned long flags = 0;
>   	int ei_index;
>   	elf_addr_t *elf_info;
> +	bool user_hwcap = mm_flags_test(MMF_USER_HWCAP, mm);
>   
>   #ifdef CONFIG_MMU
>   	/* In some cases (e.g. Hyper-Threading), we want to avoid L1 evictions
> @@ -629,15 +630,19 @@ static int create_elf_fdpic_tables(struct linux_binprm *bprm,
>   	 */
>   	ARCH_DLINFO;
>   #endif
> -	NEW_AUX_ENT(AT_HWCAP,	ELF_HWCAP);
> +	NEW_AUX_ENT(AT_HWCAP,	user_hwcap ?
> +				(bprm->hwcap & ELF_HWCAP) : ELF_HWCAP);
>   #ifdef ELF_HWCAP2
> -	NEW_AUX_ENT(AT_HWCAP2,	ELF_HWCAP2);
> +	NEW_AUX_ENT(AT_HWCAP2,	user_hwcap ?
> +				(bprm->hwcap2 & ELF_HWCAP2) : ELF_HWCAP2);
>   #endif
>   #ifdef ELF_HWCAP3
> -	NEW_AUX_ENT(AT_HWCAP3,	ELF_HWCAP3);
> +	NEW_AUX_ENT(AT_HWCAP3,	user_hwcap ?
> +				(bprm->hwcap3 & ELF_HWCAP3) : ELF_HWCAP3);
>   #endif
>   #ifdef ELF_HWCAP4
> -	NEW_AUX_ENT(AT_HWCAP4,	ELF_HWCAP4);
> +	NEW_AUX_ENT(AT_HWCAP4,	user_hwcap ?
> +				(bprm->hwcap4 & ELF_HWCAP4) : ELF_HWCAP4);
>   #endif
>   	NEW_AUX_ENT(AT_PAGESZ,	PAGE_SIZE);
>   	NEW_AUX_ENT(AT_CLKTCK,	CLOCKS_PER_SEC);
> diff --git a/fs/exec.c b/fs/exec.c
> index 9ea3a775d51e..1cd7d87a0e79 100644
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -1775,6 +1775,65 @@ static int bprm_execve(struct linux_binprm *bprm)
>   	return retval;
>   }
>   
> +static void inherit_hwcap(struct linux_binprm *bprm)
> +{
> +	struct mm_struct *mm = current->mm;
> +	bool compat = in_compat_syscall();
> +	int i, n;
> +
> +#ifdef ELF_HWCAP4
> +	n = 4;
> +#elif defined(ELF_HWCAP3)
> +	n = 3;
> +#elif defined(ELF_HWCAP2)
> +	n = 2;
> +#else
> +	n = 1;
> +#endif
> +
> +	for (i = 0; n && i < AT_VECTOR_SIZE; i += 2) {
> +		unsigned long type, val;
> +
> +		if (!compat) {
> +			type = mm->saved_auxv[i];
> +			val = mm->saved_auxv[i + 1];
> +		} else {
> +			compat_uptr_t *auxv = (compat_uptr_t *)mm->saved_auxv;
> +
> +			type = auxv[i];
> +			val = auxv[i + 1];
> +		}
> +
> +		switch (type) {
> +		case AT_NULL:
> +			goto done;
> +		case AT_HWCAP:
> +			bprm->hwcap = val;
> +			break;
> +#ifdef ELF_HWCAP2
> +		case AT_HWCAP2:
> +			bprm->hwcap2 = val;
> +			break;
> +#endif
> +#ifdef ELF_HWCAP3
> +		case AT_HWCAP3:
> +			bprm->hwcap3 = val;
> +			break;
> +#endif
> +#ifdef ELF_HWCAP4
> +		case AT_HWCAP4:
> +			bprm->hwcap4 = val;
> +			break;
> +#endif
> +		default:
> +			continue;
> +		}
> +		n--;
> +	}
> +done:
> +	mm_flags_set(MMF_USER_HWCAP, bprm->mm);
> +}
> +
>   static int do_execveat_common(int fd, struct filename *filename,
>   			      struct user_arg_ptr argv,
>   			      struct user_arg_ptr envp,
> @@ -1843,6 +1902,9 @@ static int do_execveat_common(int fd, struct filename *filename,
>   			     current->comm, bprm->filename);
>   	}
>   
> +	if (mm_flags_test(MMF_USER_HWCAP, current->mm))
> +		inherit_hwcap(bprm);
> +
>   	return bprm_execve(bprm);
>   }
>   
> diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h
> index 65abd5ab8836..94a3dcf9b1d2 100644
> --- a/include/linux/binfmts.h
> +++ b/include/linux/binfmts.h
> @@ -2,6 +2,7 @@
>   #ifndef _LINUX_BINFMTS_H
>   #define _LINUX_BINFMTS_H
>   
> +#include <linux/elf.h>
>   #include <linux/sched.h>
>   #include <linux/unistd.h>
>   #include <asm/exec.h>
> @@ -67,6 +68,16 @@ struct linux_binprm {
>   	unsigned long exec;
>   
>   	struct rlimit rlim_stack; /* Saved RLIMIT_STACK used during exec. */
> +	unsigned long hwcap;
> +#ifdef ELF_HWCAP2
> +	unsigned long hwcap2;
> +#endif
> +#ifdef ELF_HWCAP3
> +	unsigned long hwcap3;
> +#endif
> +#ifdef ELF_HWCAP4
> +	unsigned long hwcap4;
> +#endif
>   
>   	char buf[BINPRM_BUF_SIZE];
>   } __randomize_layout;
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index 3cc8ae722886..62dde645f469 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -1919,6 +1919,8 @@ enum {
>   #define MMF_TOPDOWN		31	/* mm searches top down by default */
>   #define MMF_TOPDOWN_MASK	BIT(MMF_TOPDOWN)
>   
> +#define MMF_USER_HWCAP		32	/* user-defined HWCAPs */
> +
>   #define MMF_INIT_LEGACY_MASK	(MMF_DUMPABLE_MASK | MMF_DUMP_FILTER_MASK |\
>   				 MMF_DISABLE_THP_MASK | MMF_HAS_MDWE_MASK |\
>   				 MMF_VM_MERGE_ANY_MASK | MMF_TOPDOWN_MASK)
> diff --git a/kernel/fork.c b/kernel/fork.c
> index bc2bf58b93b6..2ac277aa078c 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -1105,6 +1105,9 @@ static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p,
>   
>   		__mm_flags_overwrite_word(mm, mmf_init_legacy_flags(flags));
>   		mm->def_flags = current->mm->def_flags & VM_INIT_DEF_MASK;
> +
> +		if (mm_flags_test(MMF_USER_HWCAP, current->mm))
> +			mm_flags_set(MMF_USER_HWCAP, mm);
>   	} else {
>   		__mm_flags_overwrite_word(mm, default_dump_filter);
>   		mm->def_flags = 0;
> diff --git a/kernel/sys.c b/kernel/sys.c
> index cdbf8513caf6..e4b0fa2f6845 100644
> --- a/kernel/sys.c
> +++ b/kernel/sys.c
> @@ -2157,8 +2157,10 @@ static int prctl_set_mm_map(int opt, const void __user *addr, unsigned long data
>   	 * not introduce additional locks here making the kernel
>   	 * more complex.
>   	 */
> -	if (prctl_map.auxv_size)
> +	if (prctl_map.auxv_size) {
>   		memcpy(mm->saved_auxv, user_auxv, sizeof(user_auxv));
> +		mm_flags_set(MMF_USER_HWCAP, mm);
> +	}
>   
>   	mmap_read_unlock(mm);
>   	return 0;
> @@ -2190,6 +2192,7 @@ static int prctl_set_auxv(struct mm_struct *mm, unsigned long addr,
>   
>   	task_lock(current);
>   	memcpy(mm->saved_auxv, user_auxv, len);
> +	mm_flags_set(MMF_USER_HWCAP, mm);
>   	task_unlock(current);
>   
>   	return 0;

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/4] exec: inherit HWCAPs from the parent process
  2026-03-23 18:21   ` Mark Rutland
@ 2026-03-24 10:28     ` Will Deacon
  2026-03-24 22:19       ` Andrei Vagin
  0 siblings, 1 reply; 9+ messages in thread
From: Will Deacon @ 2026-03-24 10:28 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Andrei Vagin, Kees Cook, Andrew Morton, Marek Szyprowski,
	Cyrill Gorcunov, Mike Rapoport, Alexander Mikhalitsyn,
	linux-kernel, linux-fsdevel, linux-mm, criu, Catalin Marinas,
	linux-arm-kernel, Chen Ridong, Christian Brauner,
	David Hildenbrand, Eric Biederman, Lorenzo Stoakes, Michal Koutny,
	Alexander Mikhalitsyn

On Mon, Mar 23, 2026 at 06:21:22PM +0000, Mark Rutland wrote:
> On Mon, Mar 23, 2026 at 05:53:37PM +0000, Andrei Vagin wrote:
> > Introduces a mechanism to inherit hardware capabilities (AT_HWCAP,
> > AT_HWCAP2, etc.) from a parent process when they have been modified via
> > prctl.
> > 
> > To support C/R operations (snapshots, live migration) in heterogeneous
> > clusters, we must ensure that processes utilize CPU features available
> > on all potential target nodes. To solve this, we need to advertise a
> > common feature set across the cluster.
> > 
> > This patch adds a new mm flag MMF_USER_HWCAP, which is set when the
> > auxiliary vector is modified via prctl(PR_SET_MM, PR_SET_MM_AUXV).  When
> > execve() is called, if the current process has MMF_USER_HWCAP set, the
> > HWCAP values are extracted from the current auxiliary vector and stored
> > in the linux_binprm structure. These values are then used to populate
> > the auxiliary vector of the new process, effectively inheriting the
> > hardware capabilities.
> > 
> > The inherited HWCAPs are masked with the hardware capabilities supported
> > by the current kernel to ensure that we don't report more features than
> > actually supported. This is important to avoid unexpected behavior,
> > especially for processes with additional privileges.
> 
> At a high level, I don't think that's going to be sufficient:
> 
> * On an architecture with other userspace accessible feature
>   identification mechanism registers (e.g. ID registers), userspace
>   might read those. So you might need to hide stuff there too, and
>   that's going to require architecture-specific interfaces to manage.
> 
>   It's possible that some code checks HWCAPs and others check ID
>   registers, and mismatch between the two could be problematic.
> 
> * If the HWCAPs can be inherited by a more privileged task, then a
>   malicious user could use this to hide security features (e.g. shadow
>   stack or pointer authentication on arm64), and make it easier to
>   attack that task. While not a direct attack, it would undermine those
>   features.

Yeah, this looks like a non-starter to me on arm64. Even if it was
extended to apply the same treatment to the idregs, many of the hwcap
features can't actually be disabled by the kernel and so you still run
the risk of a task that probes for the presence of a feature using
something like a SIGILL handler or, perhaps more likely, assumes that
the presence of one hwcap implies the presence of another. And then
there are the applications that just base everything off the MIDR...

There's also kvm, which provides a roundabout way to query some features
of the underlying hardware.

You're probably better off using/extending the idreg overrides we have
in arch/arm64/kernel/pi/idreg-override.c so that you can make your
cluster of heterogeneous machines look alike.

On the other hand, if munging the hwcaps happens to be sufficient for
this particular use-case, can't it be handled entirely in userspace (e.g.
by hacking libc?)

Will

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/4] exec: inherit HWCAPs from the parent process
  2026-03-24 10:28     ` Will Deacon
@ 2026-03-24 22:19       ` Andrei Vagin
  0 siblings, 0 replies; 9+ messages in thread
From: Andrei Vagin @ 2026-03-24 22:19 UTC (permalink / raw)
  To: Will Deacon, Mark Rutland
  Cc: Kees Cook, Andrew Morton, Marek Szyprowski, Cyrill Gorcunov,
	Mike Rapoport, Alexander Mikhalitsyn, linux-kernel, linux-fsdevel,
	linux-mm, criu, Catalin Marinas, linux-arm-kernel, Chen Ridong,
	Christian Brauner, David Hildenbrand, Eric Biederman,
	Lorenzo Stoakes, Michal Koutny, Alexander Mikhalitsyn

Hi Mark and Will,

Thanks for the feedback. Please read the inline comments.

On Tue, Mar 24, 2026 at 3:28 AM Will Deacon <will@kernel.org> wrote:
>
> On Mon, Mar 23, 2026 at 06:21:22PM +0000, Mark Rutland wrote:
> > On Mon, Mar 23, 2026 at 05:53:37PM +0000, Andrei Vagin wrote:
> > > Introduces a mechanism to inherit hardware capabilities (AT_HWCAP,
> > > AT_HWCAP2, etc.) from a parent process when they have been modified via
> > > prctl.
> > >
> > > To support C/R operations (snapshots, live migration) in heterogeneous
> > > clusters, we must ensure that processes utilize CPU features available
> > > on all potential target nodes. To solve this, we need to advertise a
> > > common feature set across the cluster.
> > >
> > > This patch adds a new mm flag MMF_USER_HWCAP, which is set when the
> > > auxiliary vector is modified via prctl(PR_SET_MM, PR_SET_MM_AUXV).  When
> > > execve() is called, if the current process has MMF_USER_HWCAP set, the
> > > HWCAP values are extracted from the current auxiliary vector and stored
> > > in the linux_binprm structure. These values are then used to populate
> > > the auxiliary vector of the new process, effectively inheriting the
> > > hardware capabilities.
> > >
> > > The inherited HWCAPs are masked with the hardware capabilities supported
> > > by the current kernel to ensure that we don't report more features than
> > > actually supported. This is important to avoid unexpected behavior,
> > > especially for processes with additional privileges.
> >
> > At a high level, I don't think that's going to be sufficient:
> >
> > * On an architecture with other userspace accessible feature
> >   identification mechanism registers (e.g. ID registers), userspace
> >   might read those. So you might need to hide stuff there too, and
> >   that's going to require architecture-specific interfaces to manage.
> >
> >   It's possible that some code checks HWCAPs and others check ID
> >   registers, and mismatch between the two could be problematic.
> >
> > * If the HWCAPs can be inherited by a more privileged task, then a
> >   malicious user could use this to hide security features (e.g. shadow
> >   stack or pointer authentication on arm64), and make it easier to
> >   attack that task. While not a direct attack, it would undermine those
> >   features.

I agree with Mark that only a privileged process have to be able to mask
certain hardware features. Currently, PR_SET_MM_AUXV is guarded by
CAP_SYS_RESOURCE, but PR_SET_MM_MAP allows changing the auxiliary vector
without specific capabilities. This is definitely the issue. To address
this, I think we can consider to introduce a new prctl command to enable
HWCAP inheritance explicitly.

>
> Yeah, this looks like a non-starter to me on arm64. Even if it was
> extended to apply the same treatment to the idregs, many of the hwcap
> features can't actually be disabled by the kernel and so you still run
> the risk of a task that probes for the presence of a feature using
> something like a SIGILL handler or, perhaps more likely, assumes that
> the presence of one hwcap implies the presence of another. And then
> there are the applications that just base everything off the MIDR...

The goal of this mechanism is not to provide strict architectural
enforcement or to trap the use of hardware features; rather, it is to
provide a consistent discovery interface for applications. I chose the
HWCAP vector because it mirrors the existing behavior of running an
older kernel on newer hardware: while ID registers might report a
feature as physically present, the HWCAPs will omit it if the kernel
lacks support. Applications are generally expected to treat HWCAPs as
the source of truth for which features are safe to use, even if the
underlying hardware is technically capable of more.

Another significant advantage of using HWCAPs is that many
applications already rely on them for feature detection. This interface
allows these applications to work correctly "out-of-the-box" in a
migrated environment without requiring any userspace modifications.  I
understand that some apps may use other detection methods; however, there
it no gurantee that these applications will work correctly after
migration to another machine.

>
> There's also kvm, which provides a roundabout way to query some features
> of the underlying hardware.
>
> You're probably better off using/extending the idreg overrides we have
> in arch/arm64/kernel/pi/idreg-override.c so that you can make your
> cluster of heterogeneous machines look alike.

IIRC, idreg-override/cpuid-masking usually works for an entire machine.
We actually need to have a mechanism that will work on a per-container
basis. Workloads inside one cluster can have different
migration/snapshot requirements. Some are pinned to a specific node,
others are never migrated, while others need to be migratable across a
cluster or even between clusters. We need a mechanism that can be
tunable on a per-container/per-process basis.

>
> On the other hand, if munging the hwcaps happens to be sufficient for
> this particular use-case, can't it be handled entirely in userspace (e.g.
> by hacking libc?)

CRIU often handles workloads with a mix of runtimes: some linked against
glibc, some against musl, and others like Go that bypass libc entirely.
CRIU is mostly used to handle containers that can run multiple processes
possible based on different runtimes. It means available cpu features
should not be only specified for one runtime, they have to be passed
across different runtimes. I think the pure userspace solution is near
infeasible in this case.

Thanks,
Andrei

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2026-03-24 22:20 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-23 17:53 [PATCH 0/4 v5] exec: inherit HWCAPs from the parent process Andrei Vagin
2026-03-23 17:53 ` [PATCH 1/4] " Andrei Vagin
2026-03-23 18:21   ` Mark Rutland
2026-03-24 10:28     ` Will Deacon
2026-03-24 22:19       ` Andrei Vagin
2026-03-23 22:59   ` Marek Szyprowski
2026-03-23 17:53 ` [PATCH 2/4] arm64: elf: clear MMF_USER_HWCAP on architecture switch Andrei Vagin
2026-03-23 17:53 ` [PATCH 3/4] mm: synchronize saved_auxv access with arg_lock Andrei Vagin
2026-03-23 17:53 ` [PATCH 4/4] selftests/exec: add test for HWCAP inheritance Andrei Vagin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox