linux-api.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RFC v3 00/10] extensible syscalls: CHECK_FIELDS to allow for easier feature detection
@ 2024-10-09 20:40 Aleksa Sarai
  2024-10-09 20:40 ` [PATCH RFC v3 01/10] uaccess: add copy_struct_to_user helper Aleksa Sarai
                   ` (11 more replies)
  0 siblings, 12 replies; 22+ messages in thread
From: Aleksa Sarai @ 2024-10-09 20:40 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Valentin Schneider, Alexander Viro, Christian Brauner, Jan Kara,
	Arnd Bergmann, Shuah Khan
  Cc: Kees Cook, Florian Weimer, Arnd Bergmann, Mark Rutland,
	linux-kernel, linux-api, linux-fsdevel, linux-arch,
	linux-kselftest, Aleksa Sarai, stable

This is something that I've been thinking about for a while. We had a
discussion at LPC 2020 about this[1] but the proposals suggested there
never materialised.

In short, it is quite difficult for userspace to detect the feature
capability of syscalls at runtime. This is something a lot of programs
want to do, but they are forced to create elaborate scenarios to try to
figure out if a feature is supported without causing damage to the
system. For the vast majority of cases, each individual feature also
needs to be tested individually (because syscall results are
all-or-nothing), so testing even a single syscall's feature set can
easily inflate the startup time of programs.

This patchset implements the fairly minimal design I proposed in this
talk[2] and in some old LKML threads (though I can't find the exact
references ATM). The general flow looks like:

 1. Userspace will indicate to the kernel that a syscall should a be
    no-op by setting the top bit of the extensible struct size argument.

    We will almost certainly never support exabyte sized structs, so the
    top bits are free for us to use as makeshift flag bits. This is
    preferable to using the per-syscall flag field inside the structure
    because seccomp can easily detect the bit in the flag and allow the
    probe or forcefully return -EEXTSYS_NOOP.

 2. The kernel will then fill the provided structure with every valid
    bit pattern that the current kernel understands.

    For flags or other bitflag-like fields, this is the set of valid
    flags or bits. For pointer fields or fields that take an arbitrary
    value, the field has every bit set (0xFF... to fill the field) to
    indicate that any value is valid in the field.

 3. The syscall then returns -EEXTSYS_NOOP which is an errno that will
    only ever be used for this purpose (so userspace can be sure that
    the request succeeded).

    On older kernels, the syscall will return a different error (usually
    -E2BIG or -EFAULT) and userspace can do their old-fashioned checks.

 4. Userspace can then check which flags and fields are supported by
    looking at the fields in the returned structure. Flags are checked
    by doing an AND with the flags field, and field support can checked
    by comparing to 0. In principle you could just AND the entire
    structure if you wanted to do this check generically without caring
    about the structure contents (this is what libraries might consider
    doing).

    Userspace can even find out the internal kernel structure size by
    passing a PAGE_SIZE buffer and seeing how many bytes are non-zero.

    As with copy_struct_from_user(), this is designed to be forward- and
    backwards- compatible.

This allows programas to get a one-shot understanding of what features a
syscall supports without having to do any elaborate setups or tricks to
detect support for destructive features. Flags can simply be ANDed to
check if they are in the supported set, and fields can just be checked
to see if they are non-zero.

This patchset is IMHO the simplest way we can add the ability to
introspect the feature set of extensible struct (copy_struct_from_user)
syscalls. It doesn't preclude the chance of a more generic mechanism
being added later.

The intended way of using this interface to get feature information
looks something like the following (imagine that openat2 has gained a
new field and a new flag in the future):

  static bool openat2_no_automount_supported;
  static bool openat2_cwd_fd_supported;

  int check_openat2_support(void)
  {
      int err;
      struct open_how how = {};

      err = openat2(AT_FDCWD, ".", &how, CHECK_FIELDS | sizeof(how));
      assert(err < 0);
      switch (errno) {
      case EFAULT: case E2BIG:
          /* Old kernel... */
          check_support_the_old_way();
          break;
      case EEXTSYS_NOOP:
          openat2_no_automount_supported = (how.flags & RESOLVE_NO_AUTOMOUNT);
          openat2_cwd_fd_supported = (how.cwd_fd != 0);
          break;
      }
  }

This series adds CHECK_FIELDS support for the following extensible
struct syscalls, as they are quite likely to grow flags in the near
future:

 * openat2
 * clone3
 * mount_setattr

[1]: https://lwn.net/Articles/830666/
[2]: https://youtu.be/ggD-eb3yPVs

Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
---
Changes in v3:
- Fix copy_struct_to_user() return values in case of clear_user() failure.
- v2: <https://lore.kernel.org/r/20240906-extensible-structs-check_fields-v2-0-0f46d2de9bad@cyphar.com>
Changes in v2:
- Add CHECK_FIELDS support to mount_setattr(2).
- Fix build failure on architectures with custom errno values.
- Rework selftests to use the tools/ uAPI headers rather than custom
  defining EEXTSYS_NOOP.
- Make sure we return -EINVAL and -E2BIG for invalid sizes even if
  CHECK_FIELDS is set, and add some tests for that.
- v1: <https://lore.kernel.org/r/20240902-extensible-structs-check_fields-v1-0-545e93ede2f2@cyphar.com>

---
Aleksa Sarai (10):
      uaccess: add copy_struct_to_user helper
      sched_getattr: port to copy_struct_to_user
      openat2: explicitly return -E2BIG for (usize > PAGE_SIZE)
      openat2: add CHECK_FIELDS flag to usize argument
      selftests: openat2: add 0xFF poisoned data after misaligned struct
      selftests: openat2: add CHECK_FIELDS selftests
      clone3: add CHECK_FIELDS flag to usize argument
      selftests: clone3: add CHECK_FIELDS selftests
      mount_setattr: add CHECK_FIELDS flag to usize argument
      selftests: mount_setattr: add CHECK_FIELDS selftest

 arch/alpha/include/uapi/asm/errno.h                |   3 +
 arch/mips/include/uapi/asm/errno.h                 |   3 +
 arch/parisc/include/uapi/asm/errno.h               |   3 +
 arch/sparc/include/uapi/asm/errno.h                |   3 +
 fs/namespace.c                                     |  17 ++
 fs/open.c                                          |  18 ++
 include/linux/uaccess.h                            |  97 ++++++++
 include/uapi/asm-generic/errno.h                   |   3 +
 include/uapi/linux/openat2.h                       |   2 +
 kernel/fork.c                                      |  30 ++-
 kernel/sched/syscalls.c                            |  42 +---
 tools/arch/alpha/include/uapi/asm/errno.h          |   3 +
 tools/arch/mips/include/uapi/asm/errno.h           |   3 +
 tools/arch/parisc/include/uapi/asm/errno.h         |   3 +
 tools/arch/sparc/include/uapi/asm/errno.h          |   3 +
 tools/include/uapi/asm-generic/errno.h             |   3 +
 tools/include/uapi/asm-generic/posix_types.h       | 101 ++++++++
 tools/testing/selftests/clone3/.gitignore          |   1 +
 tools/testing/selftests/clone3/Makefile            |   4 +-
 .../testing/selftests/clone3/clone3_check_fields.c | 264 +++++++++++++++++++++
 tools/testing/selftests/mount_setattr/Makefile     |   2 +-
 .../selftests/mount_setattr/mount_setattr_test.c   |  53 ++++-
 tools/testing/selftests/openat2/Makefile           |   2 +
 tools/testing/selftests/openat2/openat2_test.c     | 165 ++++++++++++-
 24 files changed, 777 insertions(+), 51 deletions(-)
---
base-commit: 98f7e32f20d28ec452afb208f9cffc08448a2652
change-id: 20240803-extensible-structs-check_fields-a47e94cef691

Best regards,
-- 
Aleksa Sarai <cyphar@cyphar.com>


^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2025-01-20  9:51 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-09 20:40 [PATCH RFC v3 00/10] extensible syscalls: CHECK_FIELDS to allow for easier feature detection Aleksa Sarai
2024-10-09 20:40 ` [PATCH RFC v3 01/10] uaccess: add copy_struct_to_user helper Aleksa Sarai
2024-10-09 20:40 ` [PATCH RFC v3 02/10] sched_getattr: port to copy_struct_to_user Aleksa Sarai
2024-12-10 18:14   ` Florian Weimer
2024-12-11 10:23     ` Christian Brauner
2025-01-18 13:02       ` Xi Ruoyao
2025-01-20  5:28         ` Florian Weimer
2025-01-20  9:21           ` Xi Ruoyao
2025-01-20  9:51             ` Florian Weimer
2024-10-09 20:40 ` [PATCH RFC v3 03/10] openat2: explicitly return -E2BIG for (usize > PAGE_SIZE) Aleksa Sarai
2024-10-10  6:24   ` Greg KH
2024-10-10 10:09   ` (subset) " Christian Brauner
2024-10-09 20:40 ` [PATCH RFC v3 04/10] openat2: add CHECK_FIELDS flag to usize argument Aleksa Sarai
2024-10-09 20:40 ` [PATCH RFC v3 05/10] selftests: openat2: add 0xFF poisoned data after misaligned struct Aleksa Sarai
2024-10-09 20:40 ` [PATCH RFC v3 06/10] selftests: openat2: add CHECK_FIELDS selftests Aleksa Sarai
2024-10-09 20:40 ` [PATCH RFC v3 07/10] clone3: add CHECK_FIELDS flag to usize argument Aleksa Sarai
2024-10-09 20:40 ` [PATCH RFC v3 08/10] selftests: clone3: add CHECK_FIELDS selftests Aleksa Sarai
2024-10-09 20:40 ` [PATCH RFC v3 09/10] mount_setattr: add CHECK_FIELDS flag to usize argument Aleksa Sarai
2024-10-09 20:40 ` [PATCH RFC v3 10/10] selftests: mount_setattr: add CHECK_FIELDS selftest Aleksa Sarai
2024-10-10  6:26 ` [PATCH RFC v3 00/10] extensible syscalls: CHECK_FIELDS to allow for easier feature detection Florian Weimer
2024-10-21 14:51 ` (subset) " Christian Brauner
2024-10-21 21:38   ` Aleksa Sarai

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).