From: Jan Kara <jack@suse.cz>
To: Aleksa Sarai <cyphar@cyphar.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>,
Christian Brauner <brauner@kernel.org>, Jan Kara <jack@suse.cz>,
Chuck Lever <chuck.lever@oracle.com>,
Jeff Layton <jlayton@kernel.org>,
Amir Goldstein <amir73il@gmail.com>,
Alexander Aring <alex.aring@gmail.com>,
Peter Zijlstra <peterz@infradead.org>,
Ingo Molnar <mingo@redhat.com>,
Arnaldo Carvalho de Melo <acme@kernel.org>,
Namhyung Kim <namhyung@kernel.org>,
Mark Rutland <mark.rutland@arm.com>,
Alexander Shishkin <alexander.shishkin@linux.intel.com>,
Jiri Olsa <jolsa@kernel.org>, Ian Rogers <irogers@google.com>,
Adrian Hunter <adrian.hunter@intel.com>,
Christoph Hellwig <hch@infradead.org>,
Josef Bacik <josef@toxicpanda.com>,
linux-fsdevel@vger.kernel.org, linux-nfs@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-api@vger.kernel.org,
linux-perf-users@vger.kernel.org
Subject: Re: [PATCH RESEND v3 1/2] uapi: explain how per-syscall AT_* flags should be allocated
Date: Mon, 2 Sep 2024 14:46:42 +0200 [thread overview]
Message-ID: <20240902124642.rnd763njngu6qsg2@quack3> (raw)
In-Reply-To: <20240828-exportfs-u64-mount-id-v3-1-10c2c4c16708@cyphar.com>
On Wed 28-08-24 20:19:42, Aleksa Sarai wrote:
> Unfortunately, the way we have gone about adding new AT_* flags has
> been a little messy. In the beginning, all of the AT_* flags had generic
> meanings and so it made sense to share the flag bits indiscriminately.
> However, we inevitably ran into syscalls that needed their own
> syscall-specific flags. Due to the lack of a planned out policy, we
> ended up with the following situations:
>
> * Existing syscalls adding new features tended to use new AT_* bits,
> with some effort taken to try to re-use bits for flags that were so
> obviously syscall specific that they only make sense for a single
> syscall (such as the AT_EACCESS/AT_REMOVEDIR/AT_HANDLE_FID triplet).
>
> Given the constraints of bitflags, this works well in practice, but
> ideally (to avoid future confusion) we would plan ahead and define a
> set of "per-syscall bits" ahead of time so that when allocating new
> bits we don't end up with a complete mish-mash of which bits are
> supposed to be per-syscall and which aren't.
>
> * New syscalls dealt with this in several ways:
>
> - Some syscalls (like renameat2(2), move_mount(2), fsopen(2), and
> fspick(2)) created their separate own flag spaces that have no
> overlap with the AT_* flags. Most of these ended up allocating
> their bits sequentually.
>
> In the case of move_mount(2) and fspick(2), several flags have
> identical meanings to AT_* flags but were allocated in their own
> flag space.
>
> This makes sense for syscalls that will never share AT_* flags, but
> for some syscalls this leads to duplication with AT_* flags in a
> way that could cause confusion (if renameat2(2) grew a
> RENAME_EMPTY_PATH it seems likely that users could mistake it for
> AT_EMPTY_PATH since it is an *at(2) syscall).
>
> - Some syscalls unfortunately ended up both creating their own flag
> space while also using bits from other flag spaces. The most
> obvious example is open_tree(2), where the standard usage ends up
> using flags from *THREE* separate flag spaces:
>
> open_tree(AT_FDCWD, "/foo", OPEN_TREE_CLONE|O_CLOEXEC|AT_RECURSIVE);
>
> (Note that O_CLOEXEC is also platform-specific, so several future
> OPEN_TREE_* bits are also made unusable in one fell swoop.)
>
> It's not entirely clear to me what the "right" choice is for new
> syscalls. Just saying that all future VFS syscalls should use AT_* flags
> doesn't seem practical. openat2(2) has RESOLVE_* flags (many of which
> don't make much sense to burn generic AT_* flags for) and move_mount(2)
> has separate AT_*-like flags for both the source and target so separate
> flags are needed anyway (though it seems possible that renameat2(2)
> could grow *_EMPTY_PATH flags at some point, and it's a bit of a shame
> they can't be reused).
>
> But at least for syscalls that _do_ choose to use AT_* flags, we should
> explicitly state the policy that 0x2ff is currently intended for
> per-syscall flags and that new flags should err on the side of
> overlapping with existing flag bits (so we can extend the scope of
> generic flags in the future if necessary).
>
> And add AT_* aliases for the RENAME_* flags to further cement that
> renameat2(2) is an *at(2) flag, just with its own per-syscall flags.
>
> Suggested-by: Amir Goldstein <amir73il@gmail.com>
> Reviewed-by: Jeff Layton <jlayton@kernel.org>
> Reviewed-by: Josef Bacik <josef@toxicpanda.com>
> Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
Looks good. Feel free to add:
Reviewed-by: Jan Kara <jack@suse.cz>
Honza
> ---
> include/uapi/linux/fcntl.h | 80 ++++++++++++++-------
> tools/perf/trace/beauty/include/uapi/linux/fcntl.h | 83 +++++++++++++++-------
> 2 files changed, 115 insertions(+), 48 deletions(-)
>
> diff --git a/include/uapi/linux/fcntl.h b/include/uapi/linux/fcntl.h
> index e55a3314bcb0..38a6d66d9e88 100644
> --- a/include/uapi/linux/fcntl.h
> +++ b/include/uapi/linux/fcntl.h
> @@ -90,37 +90,69 @@
> #define DN_ATTRIB 0x00000020 /* File changed attibutes */
> #define DN_MULTISHOT 0x80000000 /* Don't remove notifier */
>
> +#define AT_FDCWD -100 /* Special value for dirfd used to
> + indicate openat should use the
> + current working directory. */
> +
> +
> +/* Generic flags for the *at(2) family of syscalls. */
> +
> +/* Reserved for per-syscall flags 0xff. */
> +#define AT_SYMLINK_NOFOLLOW 0x100 /* Do not follow symbolic
> + links. */
> +/* Reserved for per-syscall flags 0x200 */
> +#define AT_SYMLINK_FOLLOW 0x400 /* Follow symbolic links. */
> +#define AT_NO_AUTOMOUNT 0x800 /* Suppress terminal automount
> + traversal. */
> +#define AT_EMPTY_PATH 0x1000 /* Allow empty relative
> + pathname to operate on dirfd
> + directly. */
> /*
> - * The constants AT_REMOVEDIR and AT_EACCESS have the same value. AT_EACCESS is
> - * meaningful only to faccessat, while AT_REMOVEDIR is meaningful only to
> - * unlinkat. The two functions do completely different things and therefore,
> - * the flags can be allowed to overlap. For example, passing AT_REMOVEDIR to
> - * faccessat would be undefined behavior and thus treating it equivalent to
> - * AT_EACCESS is valid undefined behavior.
> + * These flags are currently statx(2)-specific, but they could be made generic
> + * in the future and so they should not be used for other per-syscall flags.
> */
> -#define AT_FDCWD -100 /* Special value used to indicate
> - openat should use the current
> - working directory. */
> -#define AT_SYMLINK_NOFOLLOW 0x100 /* Do not follow symbolic links. */
> +#define AT_STATX_SYNC_TYPE 0x6000 /* Type of synchronisation required from statx() */
> +#define AT_STATX_SYNC_AS_STAT 0x0000 /* - Do whatever stat() does */
> +#define AT_STATX_FORCE_SYNC 0x2000 /* - Force the attributes to be sync'd with the server */
> +#define AT_STATX_DONT_SYNC 0x4000 /* - Don't sync attributes with the server */
> +
> +#define AT_RECURSIVE 0x8000 /* Apply to the entire subtree */
> +
> +/*
> + * Per-syscall flags for the *at(2) family of syscalls.
> + *
> + * These are flags that are so syscall-specific that a user passing these flags
> + * to the wrong syscall is so "clearly wrong" that we can safely call such
> + * usage "undefined behaviour".
> + *
> + * For example, the constants AT_REMOVEDIR and AT_EACCESS have the same value.
> + * AT_EACCESS is meaningful only to faccessat, while AT_REMOVEDIR is meaningful
> + * only to unlinkat. The two functions do completely different things and
> + * therefore, the flags can be allowed to overlap. For example, passing
> + * AT_REMOVEDIR to faccessat would be undefined behavior and thus treating it
> + * equivalent to AT_EACCESS is valid undefined behavior.
> + *
> + * Note for implementers: When picking a new per-syscall AT_* flag, try to
> + * reuse already existing flags first. This leaves us with as many unused bits
> + * as possible, so we can use them for generic bits in the future if necessary.
> + */
> +
> +/* Flags for renameat2(2) (must match legacy RENAME_* flags). */
> +#define AT_RENAME_NOREPLACE 0x0001
> +#define AT_RENAME_EXCHANGE 0x0002
> +#define AT_RENAME_WHITEOUT 0x0004
> +
> +/* Flag for faccessat(2). */
> #define AT_EACCESS 0x200 /* Test access permitted for
> effective IDs, not real IDs. */
> +/* Flag for unlinkat(2). */
> #define AT_REMOVEDIR 0x200 /* Remove directory instead of
> unlinking file. */
> -#define AT_SYMLINK_FOLLOW 0x400 /* Follow symbolic links. */
> -#define AT_NO_AUTOMOUNT 0x800 /* Suppress terminal automount traversal */
> -#define AT_EMPTY_PATH 0x1000 /* Allow empty relative pathname */
> -
> -#define AT_STATX_SYNC_TYPE 0x6000 /* Type of synchronisation required from statx() */
> -#define AT_STATX_SYNC_AS_STAT 0x0000 /* - Do whatever stat() does */
> -#define AT_STATX_FORCE_SYNC 0x2000 /* - Force the attributes to be sync'd with the server */
> -#define AT_STATX_DONT_SYNC 0x4000 /* - Don't sync attributes with the server */
> -
> -#define AT_RECURSIVE 0x8000 /* Apply to the entire subtree */
> +/* Flags for name_to_handle_at(2). */
> +#define AT_HANDLE_FID 0x200 /* File handle is needed to compare
> + object identity and may not be
> + usable with open_by_handle_at(2). */
>
> -/* Flags for name_to_handle_at(2). We reuse AT_ flag space to save bits... */
> -#define AT_HANDLE_FID AT_REMOVEDIR /* file handle is needed to
> - compare object identity and may not
> - be usable to open_by_handle_at(2) */
> #if defined(__KERNEL__)
> #define AT_GETATTR_NOSEC 0x80000000
> #endif
> diff --git a/tools/perf/trace/beauty/include/uapi/linux/fcntl.h b/tools/perf/trace/beauty/include/uapi/linux/fcntl.h
> index c0bcc185fa48..38a6d66d9e88 100644
> --- a/tools/perf/trace/beauty/include/uapi/linux/fcntl.h
> +++ b/tools/perf/trace/beauty/include/uapi/linux/fcntl.h
> @@ -16,6 +16,9 @@
>
> #define F_DUPFD_QUERY (F_LINUX_SPECIFIC_BASE + 3)
>
> +/* Was the file just created? */
> +#define F_CREATED_QUERY (F_LINUX_SPECIFIC_BASE + 4)
> +
> /*
> * Cancel a blocking posix lock; internal use only until we expose an
> * asynchronous lock api to userspace:
> @@ -87,37 +90,69 @@
> #define DN_ATTRIB 0x00000020 /* File changed attibutes */
> #define DN_MULTISHOT 0x80000000 /* Don't remove notifier */
>
> +#define AT_FDCWD -100 /* Special value for dirfd used to
> + indicate openat should use the
> + current working directory. */
> +
> +
> +/* Generic flags for the *at(2) family of syscalls. */
> +
> +/* Reserved for per-syscall flags 0xff. */
> +#define AT_SYMLINK_NOFOLLOW 0x100 /* Do not follow symbolic
> + links. */
> +/* Reserved for per-syscall flags 0x200 */
> +#define AT_SYMLINK_FOLLOW 0x400 /* Follow symbolic links. */
> +#define AT_NO_AUTOMOUNT 0x800 /* Suppress terminal automount
> + traversal. */
> +#define AT_EMPTY_PATH 0x1000 /* Allow empty relative
> + pathname to operate on dirfd
> + directly. */
> +/*
> + * These flags are currently statx(2)-specific, but they could be made generic
> + * in the future and so they should not be used for other per-syscall flags.
> + */
> +#define AT_STATX_SYNC_TYPE 0x6000 /* Type of synchronisation required from statx() */
> +#define AT_STATX_SYNC_AS_STAT 0x0000 /* - Do whatever stat() does */
> +#define AT_STATX_FORCE_SYNC 0x2000 /* - Force the attributes to be sync'd with the server */
> +#define AT_STATX_DONT_SYNC 0x4000 /* - Don't sync attributes with the server */
> +
> +#define AT_RECURSIVE 0x8000 /* Apply to the entire subtree */
> +
> /*
> - * The constants AT_REMOVEDIR and AT_EACCESS have the same value. AT_EACCESS is
> - * meaningful only to faccessat, while AT_REMOVEDIR is meaningful only to
> - * unlinkat. The two functions do completely different things and therefore,
> - * the flags can be allowed to overlap. For example, passing AT_REMOVEDIR to
> - * faccessat would be undefined behavior and thus treating it equivalent to
> - * AT_EACCESS is valid undefined behavior.
> + * Per-syscall flags for the *at(2) family of syscalls.
> + *
> + * These are flags that are so syscall-specific that a user passing these flags
> + * to the wrong syscall is so "clearly wrong" that we can safely call such
> + * usage "undefined behaviour".
> + *
> + * For example, the constants AT_REMOVEDIR and AT_EACCESS have the same value.
> + * AT_EACCESS is meaningful only to faccessat, while AT_REMOVEDIR is meaningful
> + * only to unlinkat. The two functions do completely different things and
> + * therefore, the flags can be allowed to overlap. For example, passing
> + * AT_REMOVEDIR to faccessat would be undefined behavior and thus treating it
> + * equivalent to AT_EACCESS is valid undefined behavior.
> + *
> + * Note for implementers: When picking a new per-syscall AT_* flag, try to
> + * reuse already existing flags first. This leaves us with as many unused bits
> + * as possible, so we can use them for generic bits in the future if necessary.
> */
> -#define AT_FDCWD -100 /* Special value used to indicate
> - openat should use the current
> - working directory. */
> -#define AT_SYMLINK_NOFOLLOW 0x100 /* Do not follow symbolic links. */
> +
> +/* Flags for renameat2(2) (must match legacy RENAME_* flags). */
> +#define AT_RENAME_NOREPLACE 0x0001
> +#define AT_RENAME_EXCHANGE 0x0002
> +#define AT_RENAME_WHITEOUT 0x0004
> +
> +/* Flag for faccessat(2). */
> #define AT_EACCESS 0x200 /* Test access permitted for
> effective IDs, not real IDs. */
> +/* Flag for unlinkat(2). */
> #define AT_REMOVEDIR 0x200 /* Remove directory instead of
> unlinking file. */
> -#define AT_SYMLINK_FOLLOW 0x400 /* Follow symbolic links. */
> -#define AT_NO_AUTOMOUNT 0x800 /* Suppress terminal automount traversal */
> -#define AT_EMPTY_PATH 0x1000 /* Allow empty relative pathname */
> -
> -#define AT_STATX_SYNC_TYPE 0x6000 /* Type of synchronisation required from statx() */
> -#define AT_STATX_SYNC_AS_STAT 0x0000 /* - Do whatever stat() does */
> -#define AT_STATX_FORCE_SYNC 0x2000 /* - Force the attributes to be sync'd with the server */
> -#define AT_STATX_DONT_SYNC 0x4000 /* - Don't sync attributes with the server */
> -
> -#define AT_RECURSIVE 0x8000 /* Apply to the entire subtree */
> +/* Flags for name_to_handle_at(2). */
> +#define AT_HANDLE_FID 0x200 /* File handle is needed to compare
> + object identity and may not be
> + usable with open_by_handle_at(2). */
>
> -/* Flags for name_to_handle_at(2). We reuse AT_ flag space to save bits... */
> -#define AT_HANDLE_FID AT_REMOVEDIR /* file handle is needed to
> - compare object identity and may not
> - be usable to open_by_handle_at(2) */
> #if defined(__KERNEL__)
> #define AT_GETATTR_NOSEC 0x80000000
> #endif
>
> --
> 2.46.0
>
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
next prev parent reply other threads:[~2024-09-02 12:47 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-08-28 10:19 [PATCH RESEND v3 0/2] fhandle: expose u64 mount id to name_to_handle_at(2) Aleksa Sarai
2024-08-28 10:19 ` [PATCH RESEND v3 1/2] uapi: explain how per-syscall AT_* flags should be allocated Aleksa Sarai
2024-09-02 12:46 ` Jan Kara [this message]
2024-08-28 10:19 ` [PATCH RESEND v3 2/2] fhandle: expose u64 mount id to name_to_handle_at(2) Aleksa Sarai
2024-09-02 12:54 ` Jan Kara
2024-08-28 10:37 ` [PATCH xfstests v1 1/2] statx: update headers to include newer statx fields Aleksa Sarai
2024-08-28 10:37 ` [PATCH xfstests v1 2/2] open_by_handle: add tests for u64 mount ID Aleksa Sarai
2024-08-30 17:10 ` Amir Goldstein
2024-09-01 12:57 ` Aleksa Sarai
2024-09-02 14:16 ` [PATCH RESEND v3 0/2] fhandle: expose u64 mount id to name_to_handle_at(2) Christian Brauner
2024-09-02 16:45 ` [PATCH xfstests v2 1/2] statx: update headers to include newer statx fields Aleksa Sarai
2024-09-02 16:45 ` [PATCH xfstests v2 2/2] open_by_handle: add tests for u64 mount ID Aleksa Sarai
2024-09-02 17:21 ` Amir Goldstein
2024-09-03 6:41 ` Aleksa Sarai
2024-09-03 9:08 ` Amir Goldstein
2024-09-04 16:30 ` Aleksa Sarai
2024-09-04 16:44 ` Amir Goldstein
2024-09-04 17:53 ` Aleksa Sarai
2024-09-03 7:54 ` Amir Goldstein
2024-09-03 9:11 ` Aleksa Sarai
2024-09-03 6:49 ` [PATCH xfstests v2 1/2] statx: update headers to include newer statx fields Amir Goldstein
2024-09-04 17:56 ` [PATCH xfstests v3 1/2] open_by_handle: verify u32 and u64 mount IDs Aleksa Sarai
2024-09-04 17:56 ` [PATCH xfstests v3 2/2] generic/756: test name_to_handle_at(AT_HANDLE_MNT_ID_UNIQUE) explicitly Aleksa Sarai
2024-09-04 18:49 ` Amir Goldstein
2024-09-04 19:00 ` Aleksa Sarai
2024-09-04 18:29 ` [PATCH xfstests v3 1/2] open_by_handle: verify u32 and u64 mount IDs Amir Goldstein
2024-09-04 19:48 ` [PATCH xfstests v4 " Aleksa Sarai
2024-09-04 19:48 ` [PATCH xfstests v4 2/2] generic/756: test name_to_handle_at(AT_HANDLE_MNT_ID_UNIQUE) explicitly Aleksa Sarai
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240902124642.rnd763njngu6qsg2@quack3 \
--to=jack@suse.cz \
--cc=acme@kernel.org \
--cc=adrian.hunter@intel.com \
--cc=alex.aring@gmail.com \
--cc=alexander.shishkin@linux.intel.com \
--cc=amir73il@gmail.com \
--cc=brauner@kernel.org \
--cc=chuck.lever@oracle.com \
--cc=cyphar@cyphar.com \
--cc=hch@infradead.org \
--cc=irogers@google.com \
--cc=jlayton@kernel.org \
--cc=jolsa@kernel.org \
--cc=josef@toxicpanda.com \
--cc=linux-api@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-nfs@vger.kernel.org \
--cc=linux-perf-users@vger.kernel.org \
--cc=mark.rutland@arm.com \
--cc=mingo@redhat.com \
--cc=namhyung@kernel.org \
--cc=peterz@infradead.org \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox