* [PATCH v2 00/11] man2: add man pages for 'new' mount API
@ 2025-08-06 17:44 Aleksa Sarai
2025-08-06 17:44 ` [PATCH v2 01/11] mount_setattr.2: document glibc >= 2.36 syscall wrappers Aleksa Sarai
` (11 more replies)
0 siblings, 12 replies; 36+ messages in thread
From: Aleksa Sarai @ 2025-08-06 17:44 UTC (permalink / raw)
To: Alejandro Colomar
Cc: Michael T. Kerrisk, Alexander Viro, Jan Kara, Askar Safin,
G. Branden Robinson, linux-man, linux-api, linux-fsdevel,
linux-kernel, David Howells, Christian Brauner, Aleksa Sarai
Back in 2019, the new mount API was merged into mainline[1]. David Howells
then set about writing man pages for these new APIs, and sent some
patches back in 2020[2]. Unfortunately, these patches were never merged,
which meant that these APIs were practically undocumented for many
years -- arguably this may have been a contributing factor to the
relatively slow adoption of these new (far better) APIs. I have often
discovered that many folks are unaware of the read(2)-based message
retrieval interface provided by filesystem context file descriptors.
In 2024, Christian Brauner set aside some time to provide some
documentation of these new APIs and so adapted David Howell's original
man pages into the easier-to-edit Markdown format and published them on
GitHub[3]. These have been maintained since, including updated
information on new features added since David Howells's 2020 draft pages
(such as MOVE_MOUNT_BENEATH).
While this was a welcome improvement to the previous status quo (that
had lasted over 6 years), speaking personally my experience is that not
having access to these man pages from the terminal has been a fairly
common painpoint.
So, this is a modern version of the man pages for these APIs, in the hopes
that we can finally (7 years later) get proper documentation for these
APIs in the man-pages project.
One important thing to note is that most of these were re-written by me,
with very minimal copying from the versions available from Christian[2].
The reasons for this are two-fold:
* Both Howells's original version and Christian's maintained versions
contain crucial mistakes that I have been bitten by in the past (the
most obvious being that all of these APIs were merged in Linux 5.2,
but the man pages all claim they were merged in different versions.)
* As the man pages appear to have been written from Howells's
perspective while implementing them, some of the wording is a little
too tied to the implementation (or appears to describe features that
don't really exist in the merged versions of these APIs).
I decided that the best way to resolve these issues is to rewrite them
from the perspective of an actual user of these APIs (me), and check
that we do not repeat the mistakes I found in the originals.
I have also done my best to resolve the issues raised by Michael Kerrisk
on the original patchset sent by Howells[1].
In addition, I have also included a man page for open_tree_attr(2) (as a
subsection of the new open_tree(2) man page), which was merged in Linux
6.15.
[1]: https://lore.kernel.org/all/20190507204921.GL23075@ZenIV.linux.org.uk/
[2]: https://lore.kernel.org/linux-man/159680892602.29015.6551860260436544999.stgit@warthog.procyon.org.uk/
[3]: https://github.com/brauner/man-pages-md
Co-developed-by: David Howells <dhowells@redhat.com>
Co-developed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
---
Changes in v2:
- `make -R lint-man`. [Alejandro Colomar]
- `sed -i s|Glibc|glibc|g`. [Alejandro Colomar]
- `sed -i s|pathname|path|g` [Alejandro Colomar]
- Clean up macro usage, example code, and synopsis. [Alejandro Colomar]
- Try to use semantic newlines. [Alejandro Colomar]
- Make sure the usage of "filesystem context", "filesystem instance",
and "mount object" are consistent. [Askar Safin]
- Avoid referring to these syscalls without an "at" suffix as "*at()
syscalls". [Askar Safin]
- Use \% to avoid hyphenation of constants. [Askar Safin, G. Branden Robinson]
- Add a new subsection to mount_setattr(2) to describe the distinction
between mount attributes and filesystem parameters.
- (Under protest) double-space-after-period formatted commit messages.
- v1: <https://lore.kernel.org/r/20250806-new-mount-api-v1-0-8678f56c6ee0@cyphar.com>
---
Aleksa Sarai (11):
mount_setattr.2: document glibc >= 2.36 syscall wrappers
mount_setattr.2: move mount_attr struct to mount_attr.2type
fsopen.2: document 'new' mount api
fspick.2: document 'new' mount api
fsconfig.2: document 'new' mount api
fsmount.2: document 'new' mount api
move_mount.2: document 'new' mount api
open_tree.2: document 'new' mount api
mount_setattr.2: mirror opening sentence from fsopen(2)
open_tree_attr.2, open_tree.2: document new open_tree_attr() api
fsconfig.2, mount_setattr.2: add note about attribute-parameter distinction
man/man2/fsconfig.2 | 566 +++++++++++++++++++++++++++++++++++++++
man/man2/fsmount.2 | 209 +++++++++++++++
man/man2/fsopen.2 | 319 ++++++++++++++++++++++
man/man2/fspick.2 | 305 +++++++++++++++++++++
man/man2/mount_setattr.2 | 105 ++++----
man/man2/move_mount.2 | 609 ++++++++++++++++++++++++++++++++++++++++++
man/man2/open_tree.2 | 479 +++++++++++++++++++++++++++++++++
man/man2/open_tree_attr.2 | 1 +
man/man2type/mount_attr.2type | 58 ++++
9 files changed, 2600 insertions(+), 51 deletions(-)
---
base-commit: f23e8249a6dcf695d38055483802779c36aedbba
change-id: 20250802-new-mount-api-436db984f432
Best regards,
--
Aleksa Sarai <cyphar@cyphar.com>
^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH v2 01/11] mount_setattr.2: document glibc >= 2.36 syscall wrappers
2025-08-06 17:44 [PATCH v2 00/11] man2: add man pages for 'new' mount API Aleksa Sarai
@ 2025-08-06 17:44 ` Aleksa Sarai
2025-08-07 10:39 ` Alejandro Colomar
2025-08-08 9:23 ` Askar Safin
2025-08-06 17:44 ` [PATCH v2 02/11] mount_setattr.2: move mount_attr struct to mount_attr.2type Aleksa Sarai
` (10 subsequent siblings)
11 siblings, 2 replies; 36+ messages in thread
From: Aleksa Sarai @ 2025-08-06 17:44 UTC (permalink / raw)
To: Alejandro Colomar
Cc: Michael T. Kerrisk, Alexander Viro, Jan Kara, Askar Safin,
G. Branden Robinson, linux-man, linux-api, linux-fsdevel,
linux-kernel, David Howells, Christian Brauner, Aleksa Sarai
Glibc 2.36 added syscall wrappers for the entire family of fd-based
mount syscalls, including mount_setattr(2). Thus it's no longer
necessary to instruct users to do raw syscall(2) operations.
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
---
man/man2/mount_setattr.2 | 45 +++++++--------------------------------------
1 file changed, 7 insertions(+), 38 deletions(-)
diff --git a/man/man2/mount_setattr.2 b/man/man2/mount_setattr.2
index 60d9cf9de8aa..c96f0657f046 100644
--- a/man/man2/mount_setattr.2
+++ b/man/man2/mount_setattr.2
@@ -10,21 +10,12 @@ .SH LIBRARY
.RI ( libc ,\~ \-lc )
.SH SYNOPSIS
.nf
-.BR "#include <linux/fcntl.h>" " /* Definition of " AT_* " constants */"
-.BR "#include <linux/mount.h>" " /* Definition of " MOUNT_ATTR_* " constants */"
-.BR "#include <sys/syscall.h>" " /* Definition of " SYS_* " constants */"
-.B #include <unistd.h>
+.BR "#include <fcntl.h>" " /* Definition of " AT_* " constants */"
+.B #include <sys/mount.h>
.P
-.BI "int syscall(SYS_mount_setattr, int " dirfd ", const char *" path ,
-.BI " unsigned int " flags ", struct mount_attr *" attr \
-", size_t " size );
+.BI "int mount_setattr(int " dirfd ", const char *" path ", unsigned int " flags ","
+.BI " struct mount_attr *" attr ", size_t " size );"
.fi
-.P
-.IR Note :
-glibc provides no wrapper for
-.BR mount_setattr (),
-necessitating the use of
-.BR syscall (2).
.SH DESCRIPTION
The
.BR mount_setattr ()
@@ -586,6 +577,7 @@ .SH HISTORY
.\" commit 7d6beb71da3cc033649d641e1e608713b8220290
.\" commit 2a1867219c7b27f928e2545782b86daaf9ad50bd
.\" commit 9caccd41541a6f7d6279928d9f971f6642c361af
+glibc 2.36.
.SH NOTES
.SS ID-mapped mounts
Creating an ID-mapped mount makes it possible to
@@ -914,37 +906,14 @@ .SH EXAMPLES
#include <err.h>
#include <fcntl.h>
#include <getopt.h>
-#include <linux/mount.h>
-#include <linux/types.h>
+#include <sys/mount.h>
+#include <sys/types.h>
#include <stdbool.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
-#include <sys/syscall.h>
#include <unistd.h>
\&
-static inline int
-mount_setattr(int dirfd, const char *path, unsigned int flags,
- struct mount_attr *attr, size_t size)
-{
- return syscall(SYS_mount_setattr, dirfd, path, flags,
- attr, size);
-}
-\&
-static inline int
-open_tree(int dirfd, const char *filename, unsigned int flags)
-{
- return syscall(SYS_open_tree, dirfd, filename, flags);
-}
-\&
-static inline int
-move_mount(int from_dirfd, const char *from_path,
- int to_dirfd, const char *to_path, unsigned int flags)
-{
- return syscall(SYS_move_mount, from_dirfd, from_path,
- to_dirfd, to_path, flags);
-}
-\&
static const struct option longopts[] = {
{"map\-mount", required_argument, NULL, \[aq]a\[aq]},
{"recursive", no_argument, NULL, \[aq]b\[aq]},
--
2.50.1
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH v2 02/11] mount_setattr.2: move mount_attr struct to mount_attr.2type
2025-08-06 17:44 [PATCH v2 00/11] man2: add man pages for 'new' mount API Aleksa Sarai
2025-08-06 17:44 ` [PATCH v2 01/11] mount_setattr.2: document glibc >= 2.36 syscall wrappers Aleksa Sarai
@ 2025-08-06 17:44 ` Aleksa Sarai
2025-08-07 11:11 ` Alejandro Colomar
2025-08-06 17:44 ` [PATCH v2 03/11] fsopen.2: document 'new' mount api Aleksa Sarai
` (9 subsequent siblings)
11 siblings, 1 reply; 36+ messages in thread
From: Aleksa Sarai @ 2025-08-06 17:44 UTC (permalink / raw)
To: Alejandro Colomar
Cc: Michael T. Kerrisk, Alexander Viro, Jan Kara, Askar Safin,
G. Branden Robinson, linux-man, linux-api, linux-fsdevel,
linux-kernel, David Howells, Christian Brauner, Aleksa Sarai
As with open_how(2type), it makes sense to move this to a separate man
page. In addition, future man pages added in this patchset will want to
reference mount_attr(2type).
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
---
man/man2/mount_setattr.2 | 17 ++++---------
man/man2type/mount_attr.2type | 58 +++++++++++++++++++++++++++++++++++++++++++
2 files changed, 63 insertions(+), 12 deletions(-)
diff --git a/man/man2/mount_setattr.2 b/man/man2/mount_setattr.2
index c96f0657f046..d44fafc93a20 100644
--- a/man/man2/mount_setattr.2
+++ b/man/man2/mount_setattr.2
@@ -114,18 +114,11 @@ .SH DESCRIPTION
.I attr
argument of
.BR mount_setattr ()
-is a structure of the following form:
-.P
-.in +4n
-.EX
-struct mount_attr {
- __u64 attr_set; /* Mount properties to set */
- __u64 attr_clr; /* Mount properties to clear */
- __u64 propagation; /* Mount propagation type */
- __u64 userns_fd; /* User namespace file descriptor */
-};
-.EE
-.in
+is a pointer to a
+.I mount_attr
+structure,
+described in
+.BR mount_attr (2type).
.P
The
.I attr_set
diff --git a/man/man2type/mount_attr.2type b/man/man2type/mount_attr.2type
new file mode 100644
index 000000000000..b7a3ace6b3b9
--- /dev/null
+++ b/man/man2type/mount_attr.2type
@@ -0,0 +1,58 @@
+
+.\" Copyright, the authors of the Linux man-pages project
+.\"
+.\" SPDX-License-Identifier: Linux-man-pages-copyleft
+.\"
+.TH mount_attr 2type (date) "Linux man-pages (unreleased)"
+.SH NAME
+mount_attr \- what mount properties to set and clear
+.SH LIBRARY
+Linux kernel headers
+.SH SYNOPSIS
+.EX
+.B #include <sys/mount.h>
+.P
+.B struct mount_attr {
+.BR " __u64 attr_set;" " /* Mount properties to set */"
+.BR " __u64 attr_clr;" " /* Mount properties to clear */"
+.BR " __u64 propagation;" " /* Mount propagation type */"
+.BR " __u64 userns_fd;" " /* User namespace file descriptor */"
+ /* ... */
+.B };
+.EE
+.SH DESCRIPTION
+Specifies which mount properties should be changed with
+.BR mount_setattr (2).
+.P
+The fields are as follows:
+.TP
+.I .attr_set
+This field specifies which
+.BI MOUNT_ATTR_ *
+attribute flags to set.
+.TP
+.I .attr_clr
+This fields specifies which
+.BI MOUNT_ATTR_ *
+attribute flags to clear.
+.TP
+.I .propagation
+This field specifies what mount propagation will be applied.
+The valid values of this field are the same propagation types described in
+.BR mount_namespaces (7).
+.TP
+.I .userns_fd
+This fields specifies a file descriptor that indicates which user namespace to
+use as a reference for ID-mapped mounts with
+.BR MOUNT_ATTR_IDMAP .
+.SH VERSIONS
+Extra fields may be appended to the structure,
+with a zero value in a new field resulting in
+the kernel behaving as though that extension field was not present.
+Therefore, a user
+.I must
+zero-fill this structure on initialization.
+.SH STANDARDS
+Linux.
+.SH SEE ALSO
+.BR mount_setattr (2)
--
2.50.1
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH v2 03/11] fsopen.2: document 'new' mount api
2025-08-06 17:44 [PATCH v2 00/11] man2: add man pages for 'new' mount API Aleksa Sarai
2025-08-06 17:44 ` [PATCH v2 01/11] mount_setattr.2: document glibc >= 2.36 syscall wrappers Aleksa Sarai
2025-08-06 17:44 ` [PATCH v2 02/11] mount_setattr.2: move mount_attr struct to mount_attr.2type Aleksa Sarai
@ 2025-08-06 17:44 ` Aleksa Sarai
2025-08-07 11:38 ` Alejandro Colomar
2025-08-08 9:07 ` Askar Safin
2025-08-06 17:44 ` [PATCH v2 04/11] fspick.2: " Aleksa Sarai
` (8 subsequent siblings)
11 siblings, 2 replies; 36+ messages in thread
From: Aleksa Sarai @ 2025-08-06 17:44 UTC (permalink / raw)
To: Alejandro Colomar
Cc: Michael T. Kerrisk, Alexander Viro, Jan Kara, Askar Safin,
G. Branden Robinson, linux-man, linux-api, linux-fsdevel,
linux-kernel, David Howells, Christian Brauner, Aleksa Sarai
This is loosely based on the original documentation written by David
Howells and later maintained by Christian Brauner, but has been
rewritten to be more from a user perspective (as well as fixing a few
critical mistakes).
Co-developed-by: David Howells <dhowells@redhat.com>
Co-developed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
---
man/man2/fsopen.2 | 319 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 319 insertions(+)
diff --git a/man/man2/fsopen.2 b/man/man2/fsopen.2
new file mode 100644
index 000000000000..ad38ef0782be
--- /dev/null
+++ b/man/man2/fsopen.2
@@ -0,0 +1,319 @@
+.\" Copyright, the authors of the Linux man-pages project
+.\"
+.\" SPDX-License-Identifier: Linux-man-pages-copyleft
+.\"
+.TH fsopen 2 (date) "Linux man-pages (unreleased)"
+.SH NAME
+fsopen \- create a new filesystem context
+.SH LIBRARY
+Standard C library
+.RI ( libc ,\~ \-lc )
+.SH SYNOPSIS
+.nf
+.BR "#include <sys/mount.h>"
+.P
+.BI "int fsopen(const char *" fsname ", unsigned int " flags ");"
+.fi
+.SH DESCRIPTION
+The
+.BR fsopen ()
+system call is part of the suite of file descriptor based mount facilities in
+Linux.
+.P
+.BR fsopen ()
+creates a blank filesystem configuration context within the kernel
+for the filesystem named by
+.IR fsname ,
+puts the context into creation mode and attaches it to a file descriptor,
+which is then returned.
+The calling process must have the
+.B \%CAP_SYS_ADMIN
+capability in order to create a new filesystem configuration context.
+.P
+A filesystem configuration context is an in-kernel representation of a pending
+transaction,
+containing a set of configuration parameters that are to be applied
+when creating a new instance of a filesystem
+(or modifying the configuration of an existing filesystem instance,
+such as when using
+.BR fspick (2)).
+.P
+After obtaining a filesystem configuration context with
+.BR fsopen (),
+the general workflow for operating on the context looks like the following:
+.IP (1) 5
+Pass the filesystem context file descriptor to
+.BR fsconfig (2)
+to specify any desired filesystem parameters.
+This may be done as many times as necessary.
+.IP (2)
+Pass the same filesystem context file descriptor to
+.BR fsconfig (2)
+with
+.B \%FSCONFIG_CMD_CREATE
+to create an instance of the configured filesystem.
+.IP (3)
+Pass the same filesystem context file descriptor to
+.BR fsmount (2)
+to create a new mount object for the root of the filesystem,
+which is then attached to a new file descriptor.
+This also places the filesystem context file descriptor into reconfiguration
+mode,
+similar to the mode produced by
+.BR fspick (2).
+.IP (4)
+Use the mount object file descriptor as a
+.I dirfd
+argument to "*at()" system calls;
+or attach the mount object to a mount point
+by passing the mount object file descriptor to
+.BR move_mount (2).
+.P
+A filesystem context will move between different modes throughout its
+lifecycle
+(such as the creation phase when created with
+.BR fsopen (),
+the reconfiguration phase when an existing filesystem instance is selected by
+.BR fspick (2),
+and the intermediate "needs-mount" phase between
+.\" FS_CONTEXT_NEEDS_MOUNT is the term the kernel uses for this.
+.BR \%FSCONFIG_CMD_CREATE
+and
+.BR fsmount (2)),
+which has an impact on what operations are permitted on the filesystem context.
+.P
+The file descriptor returned by
+.BR fsopen ()
+also acts as a channel for filesystem drivers to provide more comprehensive
+error, warning, and information messages
+than are normally provided through the standard
+.BR errno (3)
+interface for system calls.
+If an error occurs at any time during the workflow mentioned above,
+calling
+.BR read (2)
+on the filesystem context file descriptor will retrieve any ancillary
+information about the encountered errors.
+(See the "Message retrieval interface" section for more details on the message
+format.)
+.P
+.I flags
+can be used to control aspects of the creation of the filesystem configuration
+context file descriptor.
+A value for
+.I flags
+is constructed by bitwise ORing
+zero or more of the following constants:
+.RS
+.TP
+.B FSOPEN_CLOEXEC
+Set the close-on-exec
+.RB ( FD_CLOEXEC )
+flag on the new file descriptor.
+See the description of the
+.B O_CLOEXEC
+flag in
+.BR open (2)
+for reasons why this may be useful.
+.RE
+.P
+A list of filesystems supported by the running kernel
+(and thus a list of valid values for
+.IR fsname )
+can be obtained from
+.IR /proc/filesystems .
+(See also
+.BR proc_filesystems (5).)
+.SS Message retrieval interface
+When doing operations on a filesystem configuration context,
+the filesystem driver may choose to provide ancillary information to userspace
+in the form of message strings.
+.P
+The filesystem context file descriptors returned by
+.BR fsopen ()
+and
+.BR fspick (2)
+may be queried for message strings at any time by calling
+.BR read (2)
+on the file descriptor.
+Each call to
+.BR read (2)
+will return a single message,
+prefixed to indicate its class:
+.RS
+.TP
+.B "e <message>"
+An error message was logged.
+This is usually associated with an error being returned from the corresponding
+system call which triggered this message.
+.TP
+.B "w <message>"
+A warning message was logged.
+.TP
+.B "i <message>"
+An informational message was logged.
+.RE
+.P
+Messages are removed from the queue as they are read.
+Note that the message queue has limited depth,
+so it is possible for messages to get lost.
+If there are no messages in the message queue,
+.B read(2)
+will return no data and
+.I errno
+will be set to
+.BR \%ENODATA .
+If the
+.I buf
+argument to
+.BR read (2)
+is not large enough to contain the message,
+.BR read (2)
+will return no data and
+.I errno
+will be set to
+.BR \%EMSGSIZE .
+.P
+If there are multiple filesystem context file descriptors referencing the same
+filesystem instance
+(such as if you call
+.BR fspick (2)
+multiple times for the same mount),
+each one gets its own independent message queue.
+This does not apply to file descriptors that were duplicated with
+.BR dup (2).
+.P
+Messages strings will usually be prefixed by the filesystem driver that logged
+the message, though this may not always be the case.
+See the Linux kernel source code for details.
+.SH RETURN VALUE
+On success, a new file descriptor is returned.
+On error, \-1 is returned, and
+.I errno
+is set to indicate the error.
+.SH ERRORS
+.TP
+.B EFAULT
+.I fsname
+is NULL
+or a pointer to a location
+outside the calling process's accessible address space.
+.TP
+.B EINVAL
+.I flags
+had an invalid flag set.
+.TP
+.B EMFILE
+The calling process has too many open files to create more.
+.TP
+.B ENFILE
+The system has too many open files to create more.
+.TP
+.B ENODEV
+The filesystem named by
+.I fsname
+is not supported by the kernel.
+.TP
+.B ENOMEM
+The kernel could not allocate sufficient memory to complete the operation.
+.TP
+.B EPERM
+The calling process does not have the required
+.B \%CAP_SYS_ADMIN
+capability.
+.SH STANDARDS
+Linux.
+.SH HISTORY
+Linux 5.2.
+.\" commit 24dcb3d90a1f67fe08c68a004af37df059d74005
+glibc 2.36.
+.SH EXAMPLES
+To illustrate the workflow for creating a new mount,
+the following is an example of how to mount an
+.BR ext4 (5)
+filesystem stored on
+.I /dev/sdb1
+onto
+.IR /mnt .
+.P
+.in +4n
+.EX
+int fsfd, mntfd;
+\&
+fsfd = fsopen("ext4", FSOPEN_CLOEXEC);
+fsconfig(fsfd, FSCONFIG_SET_FLAG, "ro", NULL, 0);
+fsconfig(fsfd, FSCONFIG_SET_PATH, "source", "/dev/sdb1", AT_FDCWD);
+fsconfig(fsfd, FSCONFIG_SET_FLAG, "noatime", NULL, 0);
+fsconfig(fsfd, FSCONFIG_SET_FLAG, "acl", NULL, 0);
+fsconfig(fsfd, FSCONFIG_SET_FLAG, "user_xattr", NULL, 0);
+fsconfig(fsfd, FSCONFIG_SET_FLAG, "iversion", NULL, 0)
+fsconfig(fsfd, FSCONFIG_CMD_CREATE, NULL, NULL, 0);
+mntfd = fsmount(fsfd, FSMOUNT_CLOEXEC, MOUNT_ATTR_RELATIME);
+move_mount(mntfd, "", AT_FDCWD, "/mnt", MOVE_MOUNT_F_EMPTY_PATH);
+.EE
+.in
+.P
+First, an ext4 configuration context is created and attached to the file
+descriptor
+.IR fsfd .
+Then, a series of parameters
+(such as the source of the filesystem)
+are provided using
+.BR fsconfig (2),
+followed by the filesystem instance being created with
+.BR \%FSCONFIG_CMD_CREATE .
+.BR fsmount (2)
+is then used to create a new mount object attached to the file descriptor
+.IR mntfd ,
+which is then attached to the intended mount point using
+.BR move_mount (2).
+.P
+The above procedure is functionally equivalent to the following mount operation
+using
+.BR mount (2):
+.P
+.in +4n
+.EX
+mount("/dev/sdb1", "/mnt", "ext4", MS_RELATIME,
+ "ro,noatime,acl,user_xattr,iversion");
+.EE
+.in
+.P
+And here's an example of creating a mount object
+of an NFS server share
+and setting a Smack security module label.
+However, instead of attaching it to a mount point,
+the program uses the mount object directly
+to open a file from the NFS share.
+.P
+.in +4n
+.EX
+int fsfd, mntfd, fd;
+\&
+fsfd = fsopen("nfs", 0);
+fsconfig(fsfd, FSCONFIG_SET_STRING, "source", "example.com/pub/linux", 0);
+fsconfig(fsfd, FSCONFIG_SET_STRING, "nfsvers", "3", 0);
+fsconfig(fsfd, FSCONFIG_SET_STRING, "rsize", "65536", 0);
+fsconfig(fsfd, FSCONFIG_SET_STRING, "wsize", "65536", 0);
+fsconfig(fsfd, FSCONFIG_SET_STRING, "smackfsdef", "foolabel", 0);
+fsconfig(fsfd, FSCONFIG_SET_FLAG, "rdma", NULL, 0);
+fsconfig(fsfd, FSCONFIG_CMD_CREATE, NULL, NULL, 0);
+mntfd = fsmount(fsfd, 0, MOUNT_ATTR_NODEV);
+fd = openat(mntfd, "src/linux-5.2.tar.xz", O_RDONLY);
+.EE
+.in
+.P
+Unlike the previous example,
+this operation has no trivial equivalent with
+.BR mount (2),
+as it was not previously possible to create a mount object
+that is not attached to any mount point.
+.SH SEE ALSO
+.BR fsconfig (2),
+.BR fsmount (2),
+.BR fspick (2),
+.BR mount (2),
+.BR mount_setattr (2),
+.BR move_mount (2),
+.BR open_tree (2),
+.BR mount_namespaces (7)
--
2.50.1
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH v2 04/11] fspick.2: document 'new' mount api
2025-08-06 17:44 [PATCH v2 00/11] man2: add man pages for 'new' mount API Aleksa Sarai
` (2 preceding siblings ...)
2025-08-06 17:44 ` [PATCH v2 03/11] fsopen.2: document 'new' mount api Aleksa Sarai
@ 2025-08-06 17:44 ` Aleksa Sarai
2025-08-06 17:44 ` [PATCH v2 05/11] fsconfig.2: " Aleksa Sarai
` (7 subsequent siblings)
11 siblings, 0 replies; 36+ messages in thread
From: Aleksa Sarai @ 2025-08-06 17:44 UTC (permalink / raw)
To: Alejandro Colomar
Cc: Michael T. Kerrisk, Alexander Viro, Jan Kara, Askar Safin,
G. Branden Robinson, linux-man, linux-api, linux-fsdevel,
linux-kernel, David Howells, Christian Brauner, Aleksa Sarai
This is loosely based on the original documentation written by David
Howells and later maintained by Christian Brauner, but has been
rewritten to be more from a user perspective (as well as fixing a few
critical mistakes).
Co-developed-by: David Howells <dhowells@redhat.com>
Co-developed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
---
man/man2/fspick.2 | 305 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 305 insertions(+)
diff --git a/man/man2/fspick.2 b/man/man2/fspick.2
new file mode 100644
index 000000000000..5215d706428b
--- /dev/null
+++ b/man/man2/fspick.2
@@ -0,0 +1,305 @@
+.\" Copyright, the authors of the Linux man-pages project
+.\"
+.\" SPDX-License-Identifier: Linux-man-pages-copyleft
+.\"
+.TH fspick 2 (date) "Linux man-pages (unreleased)"
+.SH NAME
+fspick \- select filesystem for reconfiguration
+.SH LIBRARY
+Standard C library
+.RI ( libc ,\~ \-lc )
+.SH SYNOPSIS
+.nf
+.BR "#include <fcntl.h>" \
+" /* Definition of " AT_* " constants */"
+.BR "#include <sys/mount.h>"
+.P
+.BI "int fspick(int " dirfd ", const char *" path ", unsigned int " flags ");"
+.fi
+.SH DESCRIPTION
+The
+.BR fspick ()
+system call is part of
+the suite of file descriptor based mount facilities
+in Linux.
+.P
+.BR fspick()
+creates a new filesystem configuration context
+for the filesystem instance
+associated with the path described by
+.IR dirfd
+and
+.IR path ,
+places it into reconfiguration mode
+(similar to
+.BR mount (8)
+with the
+.I -o remount
+option),
+and attaches it to a new file descriptor,
+which is then returned.
+The calling process must have the
+.BR CAP_SYS_ADMIN
+capability in order to create a new filesystem configuration context.
+.P
+The resultant file descriptor can be used with
+.BR fsconfig (2)
+to specify the desired set of changes
+to mount parameters
+of the filesystem instance.
+Once the desired set of changes have been configured,
+the changes can be effectuated by calling
+.BR fsconfig (2)
+with the
+.B \%FSCONFIG_CMD_RECONFIGURE
+command.
+.P
+As with "*at()" system calls,
+.BR fspick ()
+uses the
+.I dirfd
+argument in conjunction with the
+.I path
+argument to determine the path to operate on, as follows:
+.IP \[bu] 3
+If the pathname given in
+.I path
+is absolute, then
+.I dirfd
+is ignored.
+.IP \[bu]
+If the pathname given in
+.I path
+is relative and
+.I dirfd
+is the special value
+.BR AT_FDCWD ,
+then
+.I path
+is interpreted relative to
+the current working directory
+of the calling process (like
+.BR open (2)).
+.IP \[bu]
+If the pathname given in
+.I path
+is relative,
+then it is interpreted relative to
+the directory referred to by the file descriptor
+.I dirfd
+(rather than relative to
+the current working directory
+of the calling process,
+as is done by
+.BR open (2)
+for a relative pathname).
+In this case,
+.I dirfd
+must be a directory
+that was opened for reading
+.RB ( O_RDONLY )
+or using the
+.B O_PATH
+flag.
+.IP \[bu]
+If
+.I path
+is an empty string,
+and
+.I flags
+contains
+.BR \%FSPICK_EMPTY_PATH ,
+then the file descriptor referenced by
+.I dirfd
+is operated on directly.
+In this case,
+.I dirfd
+can refer to any type of file,
+not just a directory.
+.P
+.I flags
+can be used to control aspects of how
+.I path
+is resolved
+and properties of the returned file descriptor.
+A value for
+.I flags
+is constructed by bitwise ORing
+zero or more of the following constants:
+.RS
+.TP
+.B FSPICK_CLOEXEC
+Set the close-on-exec
+.RB ( FD_CLOEXEC )
+flag on the new file descriptor.
+See the description of the
+.B O_CLOEXEC
+flag in
+.BR open (2)
+for reasons why this may be useful.
+.TP
+.B FSPICK_EMPTY_PATH
+If
+.I path
+is an empty string,
+operate on the file referred to by
+.I dirfd
+(which may have been obtained from
+.BR open (2),
+.BR fsmount (2),
+or
+.BR open_tree (2)).
+In this case,
+.I dirfd
+can refer to any type of file, not just a directory.
+If
+.I dirfd
+is
+.BR \%AT_FDCWD ,
+the call operates on the current working directory
+of the calling process.
+.TP
+.B FSPICK_SYMLINK_NOFOLLOW
+Do not follow symbolic links
+in the terminal component of
+.IR path .
+If
+.I path
+references a symbolic link,
+the returned filesystem context will reference
+the filesystem that the symbolic link itself resides on.
+.TP
+.B FSPICK_NO_AUTOMOUNT
+Do not follow automounts in the terminal component of
+.IR path .
+This allows you to configure the underlying automount point
+without triggering the automount.
+This flag has no effect if the automount point has already been mounted over.
+.RE
+.P
+As with filesystem contexts created with
+.BR fsopen (2),
+the file descriptor returned by
+.BR fspick ()
+may be queried for message strings at any time by calling
+.BR read (2)
+on the file descriptor.
+(See the "Message retrieval interface" subsection in
+.BR fsopen (2)
+for more details on the message format.)
+.SH RETURN VALUE
+On success, a new file descriptor is returned.
+On error, \-1 is returned, and
+.I errno
+is set to indicate the error.
+.SH ERRORS
+.TP
+.B EACCES
+Search permission is denied
+for one of the directories
+in the path prefix of
+.IR path .
+(See also
+.BR path_resolution (7).)
+.TP
+.B EBADF
+.I path
+is relative but
+.I dirfd
+is neither
+.B AT_FDCWD
+nor a valid file descriptor.
+.TP
+.B EFAULT
+.I path
+is NULL
+or a pointer to a location
+outside the calling process's accessible address space.
+.TP
+.B EINVAL
+Invalid flag specified in
+.IR flags .
+.TP
+.B ELOOP
+Too many symbolic links encountered when resolving
+.IR path .
+.TP
+.B ENAMETOOLONG
+.I path
+is longer than
+.BR PATH_MAX .
+.TP
+.B ENOENT
+A component of
+.I path
+does not exist,
+or is a dangling symbolic link.
+.TP
+.B ENOENT
+.I path
+is an empty string, but
+.B \%FSPICK_EMPTY_PATH
+is not specified in
+.IR flags .
+.TP
+.B ENOTDIR
+A component of the path prefix of
+.I path
+is not a directory, or
+.I path
+is relative and
+.I dirfd
+is a file descriptor referring to a file other than a directory.
+.TP
+.B ENOMEM
+The kernel could not allocate sufficient memory to complete the operation.
+.TP
+.B EMFILE
+The calling process has too many open files to create more.
+.TP
+.B ENFILE
+The system has too many open files to create more.
+.TP
+.B EPERM
+The calling process does not have the required
+.B \%CAP_SYS_ADMIN
+capability.
+.SH STANDARDS
+Linux.
+.SH HISTORY
+Linux 5.2.
+.\" commit cf3cba4a429be43e5527a3f78859b1bfd9ebc5fb
+glibc 2.36.
+.SH EXAMPLES
+The following example sets the read-only flag
+on the filesystem instance referenced by
+the mount object attached at
+.IR /tmp .
+.P
+.in +4n
+.EX
+int fsfd = fspick(AT_FDCWD, "/tmp", FSPICK_CLOEXEC);
+fsconfig(fsfd, FSCONFIG_SET_FLAG, "ro", NULL, 0);
+fsconfig(fsfd, FSCONFIG_CMD_RECONFIGURE, NULL, NULL, 0);
+.EE
+.in
+.P
+The above procedure is functionally equivalent to the following mount operation
+using
+.BR mount (2):
+.P
+.in +4n
+.EX
+mount(NULL, "/tmp", NULL, MS_REMOUNT | MS_RDONLY, NULL);
+.EE
+.in
+.SH SEE ALSO
+.BR fsconfig (2),
+.BR fsmount (2),
+.BR fsopen (2),
+.BR mount (2),
+.BR mount_setattr (2),
+.BR move_mount (2),
+.BR open_tree (2),
+.BR mount_namespaces (7)
+
--
2.50.1
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH v2 05/11] fsconfig.2: document 'new' mount api
2025-08-06 17:44 [PATCH v2 00/11] man2: add man pages for 'new' mount API Aleksa Sarai
` (3 preceding siblings ...)
2025-08-06 17:44 ` [PATCH v2 04/11] fspick.2: " Aleksa Sarai
@ 2025-08-06 17:44 ` Aleksa Sarai
2025-08-08 14:00 ` Askar Safin
2025-08-06 17:44 ` [PATCH v2 06/11] fsmount.2: " Aleksa Sarai
` (6 subsequent siblings)
11 siblings, 1 reply; 36+ messages in thread
From: Aleksa Sarai @ 2025-08-06 17:44 UTC (permalink / raw)
To: Alejandro Colomar
Cc: Michael T. Kerrisk, Alexander Viro, Jan Kara, Askar Safin,
G. Branden Robinson, linux-man, linux-api, linux-fsdevel,
linux-kernel, David Howells, Christian Brauner, Aleksa Sarai
This is loosely based on the original documentation written by David
Howells and later maintained by Christian Brauner, but has been
rewritten to be more from a user perspective (as well as fixing a few
critical mistakes).
Co-developed-by: David Howells <dhowells@redhat.com>
Co-developed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
---
man/man2/fsconfig.2 | 559 ++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 559 insertions(+)
diff --git a/man/man2/fsconfig.2 b/man/man2/fsconfig.2
new file mode 100644
index 000000000000..e2121b7a6b68
--- /dev/null
+++ b/man/man2/fsconfig.2
@@ -0,0 +1,559 @@
+.\" Copyright, the authors of the Linux man-pages project
+.\"
+.\" SPDX-License-Identifier: Linux-man-pages-copyleft
+.\"
+.TH fsconfig 2 (date) "Linux man-pages (unreleased)"
+.SH NAME
+fsconfig \- configure new or existing filesystem context
+.SH LIBRARY
+Standard C library
+.RI ( libc ,\~ \-lc )
+.SH SYNOPSIS
+.nf
+.BR "#include <sys/mount.h>"
+.P
+.BI "int fsconfig(int " fd ", unsigned int " cmd ","
+.BI " const char *" key ", const void *" value ", int " aux ");"
+.fi
+.SH DESCRIPTION
+The
+.BR fsconfig ()
+system call is part of the suite of file descriptor based mount facilities
+in Linux.
+.P
+.BR fsconfig ()
+is used to supply parameters to
+and issue commands against
+the filesystem configuration context
+associated with the file descriptor
+.IR fd .
+Filesystem configuration contexts can be created with
+.BR fsopen (2)
+or instantiated from an extant filesystem instance with
+.BR fspick (2).
+.P
+The
+.I cmd
+argument indicates the command to be issued.
+Some commands supply parameters to the context
+(equivalent to mount options specified with
+.BR mount (8)),
+while others are meta-operations on the filesystem context.
+The list of valid
+.I cmd
+values are:
+.RS
+.TP
+.B FSCONFIG_SET_FLAG
+Set the flag parameter named by
+.IR key .
+.I value
+must be NULL,
+and
+.I aux
+must be 0.
+.TP
+.B FSCONFIG_SET_STRING
+Set the string parameter named by
+.I key
+to the value specified by
+.IR value .
+.I value
+points to a null-terminated string,
+and
+.I aux
+must be 0.
+.TP
+.B FSCONFIG_SET_BINARY
+Set the blob parameter named by
+.I key
+to the contents of the binary blob
+specified by
+.IR value .
+.I value
+points to
+the start of a buffer
+that is
+.I aux
+bytes in length.
+.TP
+.B FSCONFIG_SET_FD
+Set the file parameter named by
+.I key
+to the file referenced by the file descriptor
+.IR aux .
+.I value
+must be NULL.
+.IP
+You may also use
+.B \%FSCONFIG_SET_STRING
+for file parameters,
+with
+.I value
+set to a null-terminated string
+containing a base-10 representation
+of the file descriptor number.
+This mechanism is primarily intended for compatibility with older
+.BR mount (2)-based
+programs.
+.TP
+.B FSCONFIG_SET_PATH
+Set the path parameter named by
+.I key
+to the object at a provided path,
+resolved in a similar manner to
+.BR openat (2).
+.I value
+points to a null-terminated pathname string,
+and
+.I aux
+is equivalent to the
+.I dirfd
+argument to
+.BR openat (2).
+.IP
+You may also use
+.B \%FSCONFIG_SET_STRING
+for path parameters,
+the behaviour of which is equivalent to
+.B \%FSCONFIG_SET_PATH
+with
+.I aux
+set to
+.BR AT_FDCWD .
+.TP
+.B FSCONFIG_SET_PATH_EMPTY
+As with
+.BR \%FSCONFIG_SET_PATH ,
+except that if
+.I value
+is an empty string,
+the file descriptor specified by
+.I aux
+may be any type of file
+(not just a directory)
+and will be used as the path parameter value,
+equivalent to the behaviour of
+.B \%AT_EMPTY_PATH
+with most "*at()" system calls.
+If
+.I aux
+is
+.BR AT_FDCWD ,
+the call operates on the current working directory
+of the calling process.
+.IP
+Note that this behaviour with empty paths is distinct in some subtle ways to
+.BR \%FSCONFIG_SET_FD .
+.B \%FSCONFIG_SET_FD
+indicates that the underlying file
+for the file descriptor
+.I aux
+should be used as the parameter value directly;
+.B \%FSCONFIG_SET_PATH_EMPTY
+indicates that the underlying file
+for the file descriptor
+.I aux
+should be re-opened by the filesystem driver,
+and the newly created file description
+should be used as the parameter value.
+This can result in slightly different behaviour
+when dealing with special files
+or files sourced from pseudofilesystems.
+Filesystems may also choose to only support one kind of parameter,
+and so a parameter that accepts
+.B FSCONFIG_SET_FD
+may not work with
+.BR FSCONFIG_SET_PATH ( _EMPTY )
+(or vice-versa).
+.TP
+.B FSCONFIG_CMD_CREATE
+This command instructs the filesystem driver
+to instantiate an instance of the filesystem in the kernel
+with the parameters set in the filesystem configuration context
+referenced by the file descriptor
+.IR fd .
+.IR key " and " value
+must be NULL,
+and
+.I aux
+must be 0.
+.IP
+If this operation succeeds,
+the filesystem context
+associated with file descriptor
+.I fd
+now references the created filesystem instance,
+and is placed into a special "needs-mount" mode
+that allows you to use
+.BR fsmount (2)
+to create a mount object from the filesystem instance.
+.IP
+This is intended for use with filesystem configuration contexts created with
+.BR fsopen (2).
+In order to create a filesystem instance,
+the calling process must have the
+.B \%CAP_SYS_ADMIN
+capability.
+.IP
+Note that the Linux kernel reuses filesystem instances
+for many filesystems,
+so (depending on the filesystem being configured and parameters used)
+it is possible for the filesystem instance "created" by
+.B \%FSCONFIG_CMD_CREATE
+to, in fact, be a reference
+to an existing filesystem instance in the kernel.
+The kernel will attempt to merge the specified parameters
+of this filesystem configuration context
+with those of the filesystem instance being reused,
+but some parameters may be
+.IR "silently ignored" .
+.IP
+Programs that need to ensure
+that they create a new filesystem instance
+with specific parameters
+(notably, security-related parameters such as "acl" to enable
+POSIX ACLs as described in
+.BR acl (5))
+should use
+.B \%FSCONFIG_CMD_CREATE_EXCL
+instead.
+.TP
+.BR FSCONFIG_CMD_CREATE_EXCL " (since Linux 6.6)"
+.\" commit 22ed7ecdaefe0cac0c6e6295e83048af60435b13
+As with
+.BR \%FSCONFIG_CMD_CREATE ,
+except that the kernel is instructed
+to create a new filesystem instance
+("superblock" in kernel developer parlance)
+rather than reusing an existing one.
+.IP
+If this is not possible
+(such as with disk-backed filesystems
+where multiple filesystem instances
+using the same filesystem driver
+and writing to the same underlying device
+could result in data corruption),
+this operation will incur
+an
+.B EBUSY
+error.
+.IP
+As a result (unlike
+.BR \%FSCONFIG_CMD_CREATE ),
+if this command succeeds
+then the calling process can be sure that
+all of the parameters successfully configured with
+.BR fsconfig ()
+will actually be applied
+to the created filesystem instance.
+.TP
+.B FSCONFIG_CMD_RECONFIGURE
+This command instructs the filesystem driver
+to apply the parameters set in this filesystem configuration context
+to an already existing filesystem instance.
+.IP
+This is primarily intended for use with
+.BR fspick (2),
+but may also be used to modify the parameters of filesystem instance after
+.BR \%FSCONFIG_CMD_CREATE
+was used to create it
+and a mount object was created using
+.BR fsmount (2).
+In order to reconfigure an extant filesystem instance,
+the calling process must have the
+.B CAP_SYS_ADMIN
+capability.
+.IP
+Once this operation succeeds, the filesystem context is reset
+but remains in reconfiguration mode
+and thus can be used for subsequent
+.B \%FSCONFIG_CMD_RECONFIGURE
+commands.
+.SH RETURN VALUE
+On success,
+.BR fsconfig ()
+returns 0.
+On error, \-1 is returned, and
+.I errno
+is set to indicate the error.
+.SH ERRORS
+If an error occurs, the filesystem driver may provide
+additional information about the error
+through the message retrieval interface for filesystem configuration contexts.
+This additional information can be retrieved at any time by calling
+.BR read (2)
+on the filesystem instance or filesystem configuration context referenced by
+the file descriptor
+.IR fd .
+(See the "Message retrieval interface" subsection in
+.BR fsopen (2)
+for more details on the message format.)
+.P
+Even after an error occurs,
+the filesystem configuration context is
+.I not
+invalidated,
+and thus can still be used with other
+.BR fsconfig ()
+commands.
+This means that users can probe support for mount parameters
+on a per-parameter basis,
+and adjust which parameters they wish to set.
+.P
+The error values given below result from filesystem type independent errors.
+Each filesystem type may have its own special errors
+and its own special behavior.
+See the Linux kernel source code for details.
+.TP
+.B EACCES
+A component of a path
+provided as a path parameter
+was not searchable.
+(See also
+.BR path_resolution (7).)
+.TP
+.B EACCES
+.B \%FSCONFIG_CMD_CREATE
+was attempted
+for a read-only filesystem
+without specifying the
+.RB ' ro '
+flag parameter.
+.TP
+.B EACCES
+A specified block device parameter
+is located on a filesystem
+mounted with the
+.B \%MS_NODEV
+option.
+.TP
+.B EBADF
+The file descriptor given by
+.I fd
+(or possibly by
+.IR aux ,
+depending on the command)
+is invalid.
+.TP
+.B EBUSY
+The filesystem context attached to
+.I fd
+is in the wrong state
+for the given command.
+.TP
+.B EBUSY
+The filesystem instance cannot be reconfigured as read-only
+with
+.B \%FSCONFIG_CMD_RECONFIGURE
+because some programs
+still hold files open for writing.
+.TP
+.B EBUSY
+A new filesystem instance was requested with
+.B \%FSCONFIG_CMD_CREATE_EXCL
+but a matching superblock already existed.
+.TP
+.B EFAULT
+One of the pointer arguments
+points to a location
+outside the calling process's accessible address space.
+.TP
+.B EINVAL
+.I fd
+does not refer to
+a filesystem configuration context
+or filesystem instance.
+.TP
+.B EINVAL
+One of the values of
+.IR name ,
+.IR value ,
+and/or
+.I aux
+were set to a non-zero value when
+.I cmd
+required that they be zero
+(or NULL).
+.TP
+.B EINVAL
+The parameter named by
+.I name
+cannot be set
+using the type specified with
+.IR cmd .
+.TP
+.B EINVAL
+One of the source parameters
+referred to
+an invalid superblock.
+.TP
+.B ELOOP
+Too many links encountered
+during pathname resolution
+of a path argument.
+.TP
+.B ENAMETOOLONG
+A path argument was longer than
+.BR PATH_MAX .
+.TP
+.B ENOENT
+A path argument had a non-existent component.
+.TP
+.B ENOENT
+A path argument is an empty string,
+but
+.I cmd
+is not
+.BR \%FSCONFIG_SET_PATH_EMPTY .
+.TP
+.B ENOMEM
+The kernel could not allocate sufficient memory to complete the operation.
+.TP
+.B ENOTBLK
+The parameter named by
+.I name
+must be a block device,
+but the provided parameter value was not a block device.
+.TP
+.B ENOTDIR
+A component of the path prefix
+of a path argument
+was not a directory.
+.TP
+.B EOPNOTSUPP
+The command given by
+.I cmd
+is not valid.
+.TP
+.B ENXIO
+The major number
+of a block device parameter
+is out of range.
+.TP
+.B EPERM
+The command given by
+.I cmd
+was
+.BR \%FSCONFIG_CMD_CREATE ,
+.BR \%FSCONFIG_CMD_CREATE_EXCL ,
+or
+.BR \% FSCONFIG_CMD_RECONFIGURE ,
+but the calling process does not have the required
+.B \%CAP_SYS_ADMIN
+capability.
+.SH STANDARDS
+Linux.
+.SH HISTORY
+Linux 5.2.
+.\" commit ecdab150fddb42fe6a739335257949220033b782
+glibc 2.36.
+.SH EXAMPLES
+To illustrate the different kinds of flags that can be configured with
+.BR fsconfig (),
+here are a few examples of some different filesystems being created:
+.P
+.in +4n
+.EX
+int fsfd, mntfd;
+\&
+fsfd = fsopen("tmpfs", FSOPEN_CLOEXEC);
+fsconfig(fsfd, FSCONFIG_SET_FLAG, "inode64", NULL, 0);
+fsconfig(fsfd, FSCONFIG_SET_STRING, "uid", "1234", 0);
+fsconfig(fsfd, FSCONFIG_SET_STRING, "huge", "never", 0);
+fsconfig(fsfd, FSCONFIG_SET_FLAG, "casefold", NULL, 0);
+fsconfig(fsfd, FSCONFIG_CMD_CREATE, NULL, NULL, 0);
+mntfd = fsmount(fsfd, FSMOUNT_CLOEXEC, MOUNT_ATTR_NOEXEC);
+move_mount(mntfd, "", AT_FDCWD, "/tmp", MOVE_MOUNT_F_EMPTY_PATH);
+\&
+fsfd = fsopen("erofs", FSOPEN_CLOEXEC);
+fsconfig(fsfd, FSCONFIG_SET_STRING, "source", "/dev/loop0", 0);
+fsconfig(fsfd, FSCONFIG_SET_FLAG, "acl", NULL, 0);
+fsconfig(fsfd, FSCONFIG_SET_FLAG, "user_xattr", NULL, 0);
+fsconfig(fsfd, FSCONFIG_CMD_CREATE_EXCL, NULL, NULL, 0);
+mntfd = fsmount(fsfd, FSMOUNT_CLOEXEC, MOUNT_ATTR_NOSUID);
+move_mount(mntfd, "", AT_FDCWD, "/mnt", MOVE_MOUNT_F_EMPTY_PATH);
+.EE
+.in
+.P
+Some filesystems have different behaviour
+when using
+.BR fsconfig ()
+to set the same parameter
+named by
+.I key
+multiple times:
+.P
+.in +4n
+.EX
+\&
+int fsfd, mntfd, lowerdirfd;
+\&
+lowerdirfd = open("/o/ctr/lower1", O_DIRECTORY | O_CLOEXEC);
+fsfd = fsopen("overlay", FSOPEN_CLOEXEC);
+/* "lowerdir+" appends to the lower dir stack */
+fsconfig(fsfd, FSCONFIG_SET_FD, "lowerdir+", NULL, lowerdirfd);
+fsconfig(fsfd, FSCONFIG_SET_STRING, "lowerdir+", "/o/ctr/lower2", 0);
+.\" fsconfig(fsfd, FSCONFIG_SET_PATH, "lowerdir+", "/o/ctr/lower3", AT_FDCWD);
+.\" fsconfig(fsfd, FSCONFIG_SET_PATH_EMPTY, "lowerdir+", "", lowerdirfd);
+fsconfig(fsfd, FSCONFIG_SET_STRING, "xino", "auto", 0);
+fsconfig(fsfd, FSCONFIG_SET_STRING, "nfs_export", "off", 0);
+fsconfig(fsfd, FSCONFIG_CMD_CREATE, NULL, NULL, 0);
+mntfd = fsmount(fsfd, FSMOUNT_CLOEXEC, 0);
+move_mount(mntfd, "", AT_FDCWD, "/mnt", MOVE_MOUNT_F_EMPTY_PATH);
+.EE
+.in
+.P
+Other filesystems allow you to use different
+.BI FSCONFIG_SET_ *
+commands for the same parameter
+named by
+.IR key :
+.P
+.in +4n
+.EX
+\&
+int fsfd, mntfd, nsfd, nsdirfd;
+\&
+nsfd = open("/proc/self/ns/pid", O_PATH);
+nsdirfd = open("/proc/1/ns", O_DIRECTORY);
+\&
+fsfd = fsopen("proc", FSOPEN_CLOEXEC);
+/* "pidns" changes the value each time. */
+fsconfig(fsfd, FSCONFIG_SET_PATH, "pidns", "/proc/self/ns/pid", AT_FDCWD);
+fsconfig(fsfd, FSCONFIG_SET_PATH, "pidns", "pid", NULL, nsdirfd);
+fsconfig(fsfd, FSCONFIG_SET_PATH_EMPTY, "pidns", "", nsfd);
+fsconfig(fsfd, FSCONFIG_SET_FD, "pidns", NULL, nsfd);
+fsconfig(fsfd, FSCONFIG_CMD_CREATE, NULL, NULL, 0);
+mntfd = fsmount(fsfd, FSMOUNT_CLOEXEC, 0);
+move_mount(mntfd, "", AT_FDCWD, "/proc", MOVE_MOUNT_F_EMPTY_PATH);
+.EE
+.in
+.P
+And here is an example of how
+.BR fspick (2)
+can be used with
+.BR fsconfig ()
+to reconfigure the parameters
+of an extant filesystem instance
+attached to
+.IR /proc :
+.P
+.in +4n
+.EX
+int fsfd = fspick(AT_FDCWD, "/proc", FSPICK_CLOEXEC);
+fsconfig(fsfd, FSCONFIG_SET_STRING, "hidepid", "ptraceable", 0);
+fsconfig(fsfd, FSCONFIG_SET_STRING, "subset", "pid", 0);
+fsconfig(fsfd, FSCONFIG_CMD_RECONFIGURE, NULL, NULL, 0);
+.EE
+.in
+.SH SEE ALSO
+.BR fsmount (2),
+.BR fsopen (2),
+.BR fspick (2),
+.BR mount (2),
+.BR mount_setattr (2),
+.BR move_mount (2),
+.BR open_tree (2),
+.BR mount_namespaces (7)
+
--
2.50.1
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH v2 06/11] fsmount.2: document 'new' mount api
2025-08-06 17:44 [PATCH v2 00/11] man2: add man pages for 'new' mount API Aleksa Sarai
` (4 preceding siblings ...)
2025-08-06 17:44 ` [PATCH v2 05/11] fsconfig.2: " Aleksa Sarai
@ 2025-08-06 17:44 ` Aleksa Sarai
2025-08-06 17:44 ` [PATCH v2 07/11] move_mount.2: " Aleksa Sarai
` (5 subsequent siblings)
11 siblings, 0 replies; 36+ messages in thread
From: Aleksa Sarai @ 2025-08-06 17:44 UTC (permalink / raw)
To: Alejandro Colomar
Cc: Michael T. Kerrisk, Alexander Viro, Jan Kara, Askar Safin,
G. Branden Robinson, linux-man, linux-api, linux-fsdevel,
linux-kernel, David Howells, Christian Brauner, Aleksa Sarai
This is loosely based on the original documentation written by David
Howells and later maintained by Christian Brauner, but has been
rewritten to be more from a user perspective (as well as fixing a few
critical mistakes).
Co-developed-by: David Howells <dhowells@redhat.com>
Co-developed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
---
man/man2/fsmount.2 | 209 +++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 209 insertions(+)
diff --git a/man/man2/fsmount.2 b/man/man2/fsmount.2
new file mode 100644
index 000000000000..8c264c3d5aba
--- /dev/null
+++ b/man/man2/fsmount.2
@@ -0,0 +1,209 @@
+.\" Copyright, the authors of the Linux man-pages project
+.\"
+.\" SPDX-License-Identifier: Linux-man-pages-copyleft
+.\"
+.TH fsmount 2 (date) "Linux man-pages (unreleased)"
+.SH NAME
+fsmount \- instantiate mount object from filesystem context
+.SH LIBRARY
+Standard C library
+.RI ( libc ,\~ \-lc )
+.SH SYNOPSIS
+.nf
+.BR "#include <sys/mount.h>"
+.P
+.BI "int fsmount(int " fsfd ", unsigned int " flags ", \
+unsigned int " attr_flags ");"
+.fi
+.SH DESCRIPTION
+The
+.BR fsmount ()
+system call is part of the suite of file descriptor based mount facilities
+in Linux.
+.P
+.BR fsmount ()
+takes the created filesystem instance
+referenced by the filesystem context
+associated with the file descriptor
+.I fsfd
+and creates a new mount object
+for the root of the filesystem instance,
+which is then attached to a new file descriptor
+and returned.
+In order to create a mount object with
+.BR fsmount (),
+the calling process must have the
+.BR \%CAP_SYS_ADMIN
+capability.
+.P
+The filesystem context must have been created with a call to
+.BR fsopen (2)
+and then had a filesystem instance instantiated with a call to
+.BR fsconfig (2)
+with
+.B \%FSCONFIG_CMD_CREATE
+or
+.B \%FSCONFIG_CMD_CREATE_EXCL
+in order to be in the correct state
+for this operation.
+.P
+As with file descriptors returned from
+.BR open_tree (2)
+called with
+.BR OPEN_TREE_CLONE ,
+the returned file descriptor
+can then be used with
+.BR move_mount (2),
+.BR mount_setattr (2),
+or other such system calls
+to do further mount operations.
+This mount object will be unmounted and destroyed
+when the file descriptor is closed
+if it was not otherwise mounted somewhere else
+by calling
+.BR move_mount (2).
+The returned file descriptor
+also acts the same as one produced by
+.BR open (2)
+with
+.BR O_PATH ,
+meaning it can also be used as a
+.I dirfd
+argument
+to "*at()" system calls.
+.P
+.I flags
+controls the creation of the returned file descriptor.
+A value for
+.I flags
+is constructed by bitwise ORing
+zero or more of the following constants:
+.RS
+.TP
+.B FSMOUNT_CLOEXEC
+Set the close-on-exec
+.RB ( FD_CLOEXEC )
+flag on the new file descriptor.
+See the description of the
+.B O_CLOEXEC
+flag in
+.BR open (2)
+for reasons why this may be useful.
+.RE
+.P
+.I attr_flags
+specifies the mount attributes
+for the created mount object
+and accepts the same set of
+.BI \%MOUNT_ATTR_ *
+flags as
+.BR mount_setattr (2),
+except for flags such as
+.B \%MOUNT_ATTR_IDMAP
+which require specifying additional fields in
+.BR mount_attr (2type).
+.P
+If the
+.BR fsmount ()
+operation is successful,
+the filesystem context
+associated with the file descriptor
+.I fsfd
+is reset
+and placed into a reconfiguration state,
+similar to the one produced by
+.BR fspick (2).
+You may coninue to use
+.BR fsconfig (2)
+with the
+.B \%FSCONFIG_CMD_RECONFIGURE
+command
+to reconfigure the filesystem instance.
+.P
+Unlike
+.BR open_tree (2),
+.BR fsmount ()
+can only be called once
+to produce a mount object
+for a given filesystem configuration context.
+.SH RETURN VALUE
+On success, a new file descriptor is returned.
+On error, \-1 is returned, and
+.I errno
+is set to indicate the error.
+.SH ERRORS
+.TP
+.B EBUSY
+The filesystem context attached to
+.I fsfd
+is not in the right state
+to be used by
+.BR fsmount ().
+.TP
+.B EINVAL
+.I flags
+had an invalid flag set.
+.TP
+.B EINVAL
+.I attr_flags
+had an invalid
+.BI MOUNT_ATTR_ *
+flag set.
+.TP
+.B EMFILE
+The calling process has too many open files to create more.
+.TP
+.B ENFILE
+The system has too many open files to create more.
+.TP
+.B ENOSPC
+The "anonymous" mount namespace
+necessary to contain the new mount object
+could not be allocated,
+as doing so would
+exceed the configured per-user limit
+on the number of mount namespaces
+in the current user namespace.
+(See also
+.BR namespaces (7).)
+.TP
+.B ENOMEM
+The kernel could not allocate sufficient memory to complete the operation.
+.TP
+.B EPERM
+The calling process does not have the required
+.B CAP_SYS_ADMIN
+capability.
+.SH STANDARDS
+Linux.
+.SH HISTORY
+Linux 5.2.
+.\" commit 93766fbd2696c2c4453dd8e1070977e9cd4e6b6d
+glibc 2.36.
+.SH EXAMPLES
+.in +4n
+.EX
+int fsfd, mntfd, tmpfd;
+\&
+fsfd = fsopen("tmpfs", FSOPEN_CLOEXEC);
+fsconfig(fsfd, FSCONFIG_CMD_CREATE, NULL, NULL, 0);
+mntfd = fsmount(fsfd, FSMOUNT_CLOEXEC, MOUNT_ATTR_NODEV | MOUNT_ATTR_NOEXEC);
+\&
+/* Create a new file without attaching the mount object. */
+int tmpfd = openat(mntfd, "tmpfile", O_CREAT | O_EXCL | O_RDWR, 0600);
+unlinkat(mntfd, "tmpfile", 0);
+\&
+/* Attach the mount object to "/tmp". */
+move_mount(mntfd, "", AT_FDCWD, "/tmp", MOVE_MOUNT_F_EMPTY_PATH);
+.EE
+.in
+.SH SEE ALSO
+.BR fsconfig (2),
+.BR fsopen (2),
+.BR fspick (2),
+.BR mount (2),
+.BR mount_setattr (2),
+.BR move_mount (2),
+.BR open_tree (2),
+.BR mount_namespaces (7)
+
--
2.50.1
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH v2 07/11] move_mount.2: document 'new' mount api
2025-08-06 17:44 [PATCH v2 00/11] man2: add man pages for 'new' mount API Aleksa Sarai
` (5 preceding siblings ...)
2025-08-06 17:44 ` [PATCH v2 06/11] fsmount.2: " Aleksa Sarai
@ 2025-08-06 17:44 ` Aleksa Sarai
2025-08-06 17:44 ` [PATCH v2 08/11] open_tree.2: " Aleksa Sarai
` (4 subsequent siblings)
11 siblings, 0 replies; 36+ messages in thread
From: Aleksa Sarai @ 2025-08-06 17:44 UTC (permalink / raw)
To: Alejandro Colomar
Cc: Michael T. Kerrisk, Alexander Viro, Jan Kara, Askar Safin,
G. Branden Robinson, linux-man, linux-api, linux-fsdevel,
linux-kernel, David Howells, Christian Brauner, Aleksa Sarai
This is loosely based on the original documentation written by David
Howells and later maintained by Christian Brauner, but has been
rewritten to be more from a user perspective (as well as fixing a few
critical mistakes).
Co-developed-by: David Howells <dhowells@redhat.com>
Co-developed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
---
man/man2/move_mount.2 | 609 ++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 609 insertions(+)
diff --git a/man/man2/move_mount.2 b/man/man2/move_mount.2
new file mode 100644
index 000000000000..6a944198f620
--- /dev/null
+++ b/man/man2/move_mount.2
@@ -0,0 +1,609 @@
+.\" Copyright, the authors of the Linux man-pages project
+.\"
+.\" SPDX-License-Identifier: Linux-man-pages-copyleft
+.\"
+.TH move_mount 2 (date) "Linux man-pages (unreleased)"
+.SH NAME
+move_mount \- move or attach mount object to filesystem
+.SH LIBRARY
+Standard C library
+.RI ( libc ,\~ \-lc )
+.SH SYNOPSIS
+.nf
+.BR "#include <fcntl.h>" \
+" /* Definition of " AT_* " constants */"
+.BR "#include <sys/mount.h>"
+.P
+.BI "int move_mount(int " from_dirfd ", const char *" from_path ","
+.BI " int " to_dirfd ", const char *" to_path ","
+.BI " unsigned int " flags ");"
+.fi
+.SH DESCRIPTION
+The
+.BR move_mount ()
+system call is part of the suite of file descriptor based mount facilities
+in Linux.
+.P
+.BR move_mount ()
+moves the mount object indicated by
+.I from_dirfd
+and
+.I from_path
+to the path indicated by
+.I to_dirfd
+and
+.IR to_path .
+The mount object being moved
+could be an existing mount point in the current mount namespace,
+or it could be a detached mount object created by
+.BR fsmount (2)
+or
+.BR open_tree (2)
+with
+.BR \%OPEN_TREE_CLONE .
+.P
+To access the source mount object
+or the destination mount point,
+no permissions are required on the object itself,
+but if either pathname is supplied,
+execute (search) permission is required
+on all of the directories specified in
+.I from_path
+or
+.IR to_path .
+.P
+The calling process must have the
+.BR \%CAP_SYS_ADMIN
+capability in order to attach or move a mount object.
+.P
+As with "*at()" system calls,
+.BR move_mount ()
+uses the
+.I from_dirfd
+and
+.I to_dirfd
+arguments
+in conjunction with the
+.I from_path
+and
+.I to_path
+arguments to determine the source and destination objects to operate on
+(respectively), as follows:
+.IP \[bu] 3
+If the pathname given in
+.I *_path
+is absolute, then
+the corresponding
+.I *_dirfd
+is ignored.
+.IP \[bu]
+If the pathname given in
+.I *_path
+is relative and
+the corresponding
+.I *_dirfd
+is the special value
+.BR AT_FDCWD ,
+then
+.I *_path
+is interpreted relative to
+the current working directory
+of the calling process (like
+.BR open (2)).
+.IP \[bu]
+If the pathname given in
+.I *_path
+is relative,
+then it is interpreted relative to
+the directory referred to by
+the corresponding file descriptor
+.I *_dirfd
+(rather than relative to
+the current working directory
+of the calling process,
+as is done by
+.BR open (2)
+for a relative pathname).
+In this case,
+the corresponding
+.I *_dirfd
+must be a directory
+that was opened for reading
+.RB ( O_RDONLY )
+or using the
+.B O_PATH
+flag.
+.IP \[bu]
+If
+.I *_path
+is an empty string,
+and
+.I flags
+contains the appropriate
+.BI \%MOVE_MOUNT_ * _EMPTY_PATH
+flag,
+then the file descriptor referenced by
+the corresponding
+.I *_dirfd
+is operated on directly.
+In this case,
+the corresponding
+.I *_dirfd
+can refer to any type of file,
+not just a directory.
+.IP
+This is the most common mechanism
+used to attach detached mount objects
+to a mount point target.
+.P
+.I flags
+can be used to control aspects of the path lookup
+for both the source and destination objects,
+as well as other properties of the mount operation.
+A value for
+.I flags
+is constructed by bitwise ORing
+zero or more of the following constants:
+.RS
+.TP
+.B MOVE_MOUNT_F_EMPTY_PATH
+If
+.I from_path
+is an empty string, operate on the file referred to by
+.I from_dirfd
+(which may have been obtained from
+.BR open (2),
+.BR fsmount (2),
+or
+.BR open_tree (2)).
+In this case,
+.I from_dirfd
+can refer to any type of file,
+not just a directory.
+If
+.I from_dirfd
+is
+.BR AT_FDCWD ,
+the call operates on the current working directory.
+.TP
+.B MOVE_MOUNT_T_EMPTY_PATH
+As with
+.BR \%MOVE_MOUNT_F_EMPTY_PATH ,
+except operating on
+.IR to_dirfd " and " to_path .
+.TP
+.B MOVE_MOUNT_F_SYMLINKS
+If
+.IR from_path
+references a symbolic link,
+then dereference it.
+The default behaviour for
+.BR move_mount ()
+is to
+.I not follow
+symbolic links.
+.TP
+.B MOVE_MOUNT_T_SYMLINKS
+As with
+.BR \%MOVE_MOUNT_F_SYMLINKS ,
+except operating on
+.I to_dirfd
+and
+.IR to_path .
+.TP
+.B MOVE_MOUNT_F_NO_AUTOMOUNT
+Don't automount the terminal ("basename") component of
+.I from_path
+if it is a directory that is an automount point.
+This allows a mount object
+that has an automount point at its root
+to be moved
+and prevents unintended triggering of an automount point.
+This flag has no effect
+if the automount point has already been mounted over.
+.TP
+.B MOVE_MOUNT_T_NO_AUTOMOUNT
+As with
+.BR \%MOVE_MOUNT_F_NO_AUTOMOUNT ,
+except operating on
+.IR to_dirfd " and " to_path .
+This allows an automount point to be manually mounted over.
+.TP
+.BR MOVE_MOUNT_SET_GROUP " (since Linux 5.15)"
+Add the attached (private) mount indicated by
+.I to_dirfd
+and
+.I to_path
+into the mount propagation "peer group"
+of the attached (non-private) mount
+indicated by
+.IR from_dirfd " and " from_path .
+.IP
+Unlike other
+.BR move_mount ()
+operations,
+this operation does not move any actual mount objects.
+Instead, it only updates the metadata
+of existing (attached) mount objects.
+.IP
+This makes it possible to first create a mount tree
+consisting only of private mounts
+and then configure the desired propagation layout afterwards.
+(See the "SHARED SUBTREES" section of
+.BR mount_namespaces (7)
+for more information about mount propagation and peer groups.)
+.TP
+.BR MOVE_MOUNT_BENEATH " (since Linux 6.5)"
+If the path indicated by
+.I to_dirfd
+and
+.I to_path
+is an existing mount object,
+rather than placing the mount object indicated by
+.I from_dirfd
+and
+.I from_path
+on top of the mount stack,
+place it below the current top mount
+on the mount stack.
+.IP
+After using
+.BR \%MOVE_MOUNT_BENEATH ,
+it is possible to
+.BR umount (2)
+the top mount
+in order to reveal the mount
+which was moved beneath it earlier.
+This allows for the seamless (and atomic) replacement
+of intricate mount trees,
+which can further be used
+to "upgrade" a mount tree with a newer version.
+.IP
+This operation has several restrictions:
+.RS
+.IP \[bu] 3
+Mounts cannot be moved beneath the rootfs,
+including the rootfs as configured by
+.BR chroot (2)
+and
+.BR pivot_root (2).
+To mount beneath the rootfs,
+.BR pivot_root (2)
+must be used.
+.IP \[bu]
+The target path indicated by
+.I to_dirfd
+and
+.I to_path
+must be an attached mount object.
+It must not be a detached mount object given by
+.BR open_tree (2)
+with
+.B \%OPEN_TREE_CLONE
+or
+.BR fsmount (2).
+.IP \[bu]
+The current top mount
+of the target path's mount stack
+and its parent mount
+must be in the calling process's mount namespace.
+.IP \[bu]
+The caller must have sufficient privileges
+to unmount the top mount
+of the target path's mount stack,
+to prove they have privileges
+to reveal the underlying mount.
+.IP \[bu]
+Mount propagation events triggered by this
+.BR move_mount ()
+operation
+are calculated based on the parent mount
+of the current top mount
+of the target path's mount stack.
+.IP \[bu]
+The target path's mount
+cannot be a parent mount
+of the source mount object.
+.IP \[bu]
+The source mount object
+must not have any overmounts,
+otherwise it would be possible to create "shadow mounts"
+(i.e., two mounts mounted on the same parent mount at the same mount point).
+.IP \[bu]
+It is not possible to move a mount
+beneath a top mount
+if the parent mount
+of the current top mount
+propagates to the top mount itself.
+Otherwise,
+.B \%MOVE_MOUNT_BENEATH
+would cause the mount object
+to be propagated
+to the top mount
+from the parent mount,
+defeating the purpose of using
+.BR \%MOVE_MOUNT_BENEATH .
+.IP \[bu]
+It is not possible to move a mount
+beneath a top mount
+if the parent mount
+of the current top mount
+propagates to the mount object
+being mounted beneath.
+Otherwise, this would cause a similar propagation issue
+to the previous point,
+also defeating the purpose of using
+.BR \%MOVE_MOUNT_BENEATH .
+.RE
+.RE
+.P
+If
+.BR move_mount ()
+is called repeatedly
+with a file descriptor
+that refers to a mount object,
+then the object will be attached (or moved)
+the first time
+and then moved again and again,
+detaching it from the previous mount point each time.
+.SH RETURN VALUE
+On success,
+.BR move_mount ()
+returns 0.
+On error, \-1 is returned, and
+.I errno
+is set to indicate the error.
+.SH ERRORS
+.TP
+.B EACCES
+Search permission is denied
+for one of the directories
+in the path prefix of one of
+.IR from_path " or " to_path .
+(See also
+.BR path_resolution (7).)
+.TP
+.B EBADF
+One of
+.I from_dirfd
+or
+.I to_dirfd
+is not a valid file descriptor.
+.TP
+.B EFAULT
+One of
+.I from_path
+or
+.I to_path
+is NULL
+or a pointer to a location
+outside the calling process's accessible address space.
+.TP
+.B EINVAL
+Invalid flag specified in
+.IR flags .
+.TP
+.B EINVAL
+The path indicated by
+.IR from_dirfd " and " from_path
+is not a mount object.
+.TP
+.B EINVAL
+The mount object type
+of the source mount object and target inode
+are not compatible
+(i.e., the source is a file but the target is a directory, or vice-versa).
+.TP
+.B EINVAL
+The source mount object or target path
+are not in the calling process's mount namespace
+(or an anonymous mount namespace of the calling process).
+.TP
+.B EINVAL
+The source mount object's parent mount
+has shared mount propagation,
+and thus cannot be moved
+(as described in
+.BR mount_namespaces (7)).
+.TP
+.B EINVAL
+The source mount has
+.B MS_UNBINDABLE
+child mounts
+but the target path
+resides on a mount tree with shared mount propagation,
+which would otherwise cause the unbindable mounts to be propagated
+(as described in
+.BR mount_namespaces (7)).
+.TP
+.B EINVAL
+.B \%MOVE_MOUNT_BENEATH
+was attempted,
+but one of the listed restrictions was violated.
+.TP
+.B ELOOP
+Too many symbolic links encountered
+when resolving one of
+.I from_path
+or
+.IR to_path .
+.TP
+.B ENAMETOOLONG
+One of
+.I from_path
+or
+.I to_path
+is longer than
+.BR PATH_MAX .
+.TP
+.B ENOENT
+A component of one of
+.I from_path
+or
+.I to_path
+does not exist.
+.TP
+.B ENOENT
+One of
+.I from_path
+or
+.I to_path
+is an empty string,
+but the corresponding
+.BI MOVE_MOUNT_ * _EMPTY_PATH
+flag is not specified in
+.IR flags .
+.TP
+.B ENOTDIR
+A component of the path prefix of one of
+.I from_path
+or
+.I to_path
+is not a directory,
+or one of
+.I from_path
+or
+.I to_path
+is relative
+and the corresponding
+.I from_dirfd
+or
+.I to_dirfd
+is a file descriptor referring to a file other than a directory.
+.TP
+.B ENOMEM
+The kernel could not allocate sufficient memory to complete the operation.
+.TP
+.B EPERM
+The calling process does not have the required
+.B \%CAP_SYS_ADMIN
+capability.
+.SH STANDARDS
+Linux.
+.SH HISTORY
+Linux 5.2.
+.\" commit 2db154b3ea8e14b04fee23e3fdfd5e9d17fbc6ae
+glibc 2.36.
+.SH EXAMPLES
+.BR move_mount ()
+can be used to move attached mounts like the following:
+.P
+.in +4n
+.EX
+move_mount(AT_FDCWD, "/a", AT_FDCWD, "/b", 0);
+.EE
+.in
+.P
+This would move the mount object mounted on
+.I /a
+to
+.IR /b .
+The above procedure is functionally equivalent to
+the following mount operation
+using
+.BR mount (2):
+.P
+.in +4n
+.EX
+mount("/a", "/b", NULL, MS_MOVE, NULL);
+.EE
+.in
+.P
+.BR move_mount ()
+can also be used in conjunction with file descriptors returned from
+.BR open_tree (2)
+or
+.BR open (2):
+.P
+.in +4n
+.EX
+int fd = open_tree(AT_FDCWD, "/mnt", 0); /* or open("/mnt", O_PATH); */
+move_mount(fd, "", AT_FDCWD, "/mnt2", MOVE_MOUNT_F_EMPTY_PATH);
+move_mount(fd, "", AT_FDCWD, "/mnt3", MOVE_MOUNT_F_EMPTY_PATH);
+move_mount(fd, "", AT_FDCWD, "/mnt4", MOVE_MOUNT_F_EMPTY_PATH);
+.EE
+.in
+.P
+This would move the mount object mounted at
+.I /mnt
+to
+.IR /mnt2 ,
+then
+.IR /mnt3 ,
+then
+.IR /mnt4 .
+.P
+If the source mount object
+indicated by
+.I from_dirfd
+and
+.I from_path
+is a detached mount object,
+.BR move_mount ()
+can be used to attach it to a mount point:
+.P
+.in +4n
+.EX
+int fsfd, mntfd;
+\&
+fsfd = fsopen("ext4", FSOPEN_CLOEXEC);
+fsconfig(fsfd, FSCONFIG_SET_STRING, "source", "/dev/sda1", 0);
+fsconfig(fsfd, FSCONFIG_SET_FLAG, "user_xattr", NULL, 0);
+fsconfig(fsfd, FSCONFIG_CMD_CREATE, NULL, NULL, 0);
+mntfd = fsmount(fsfd, FSMOUNT_CLOEXEC, MOUNT_ATTR_NODEV);
+move_mount(mntfd, "", AT_FDCWD, "/home", MOVE_MOUNT_F_EMPTY_PATH);
+.EE
+.in
+.P
+This creates a new filesystem configuration context for ext4,
+configures it,
+creates a mount object,
+and then attaches it to
+.IR /home .
+The above procedure is functionally equivalent to
+the following mount operation
+using
+.BR mount (2):
+.P
+.in +4n
+.EX
+mount("/dev/sda1", "/home", "ext4", MS_NODEV, "user_xattr");
+.EE
+.in
+.P
+This also works with detached bind-mounts created with
+.BR open_tree (2)
+with
+.BR OPEN_TREE_CLONE :
+.P
+.in +4n
+.EX
+int mntfd = open_tree(AT_FDCWD, "/home/cyphar", OPEN_TREE_CLONE);
+move_mount(mntfd, "", AT_FDCWD, "/root", MOVE_MOUNT_F_EMPTY_PATH);
+.EE
+.in
+.P
+This creates a new detached bind-mount mount object of
+.IR /home/cyphar ,
+and then attaches it to
+.IR /root .
+The above procedure is functionally equivalent to
+the following mount operation
+using
+.BR mount (2):
+.P
+.in +4n
+.EX
+mount("/home/cyphar", "/root", NULL, MS_BIND, NULL);
+.EE
+.in
+.SH SEE ALSO
+.BR fsconfig (2),
+.BR fsmount (2),
+.BR fsopen (2),
+.BR fspick (2),
+.BR mount (2),
+.BR mount_setattr (2),
+.BR open_tree (2),
+.BR mount_namespaces (7)
+
--
2.50.1
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH v2 08/11] open_tree.2: document 'new' mount api
2025-08-06 17:44 [PATCH v2 00/11] man2: add man pages for 'new' mount API Aleksa Sarai
` (6 preceding siblings ...)
2025-08-06 17:44 ` [PATCH v2 07/11] move_mount.2: " Aleksa Sarai
@ 2025-08-06 17:44 ` Aleksa Sarai
2025-08-08 12:32 ` Askar Safin
2025-08-06 17:44 ` [PATCH v2 09/11] mount_setattr.2: mirror opening sentence from fsopen(2) Aleksa Sarai
` (3 subsequent siblings)
11 siblings, 1 reply; 36+ messages in thread
From: Aleksa Sarai @ 2025-08-06 17:44 UTC (permalink / raw)
To: Alejandro Colomar
Cc: Michael T. Kerrisk, Alexander Viro, Jan Kara, Askar Safin,
G. Branden Robinson, linux-man, linux-api, linux-fsdevel,
linux-kernel, David Howells, Christian Brauner, Aleksa Sarai
This is loosely based on the original documentation written by David
Howells and later maintained by Christian Brauner, but has been
rewritten to be more from a user perspective (as well as fixing a few
critical mistakes).
Co-developed-by: David Howells <dhowells@redhat.com>
Co-developed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
---
man/man2/open_tree.2 | 405 +++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 405 insertions(+)
diff --git a/man/man2/open_tree.2 b/man/man2/open_tree.2
new file mode 100644
index 000000000000..3d38e27b5254
--- /dev/null
+++ b/man/man2/open_tree.2
@@ -0,0 +1,405 @@
+.\" Copyright, the authors of the Linux man-pages project
+.\"
+.\" SPDX-License-Identifier: Linux-man-pages-copyleft
+.\"
+.TH open_tree 2 (date) "Linux man-pages (unreleased)"
+.SH NAME
+open_tree \- open path or create detached mount object and attach to fd
+.SH LIBRARY
+Standard C library
+.RI ( libc ,\~ \-lc )
+.SH SYNOPSIS
+.nf
+.BR "#include <fcntl.h>" \
+" /* Definition of " AT_* " constants */"
+.BR "#include <sys/mount.h>"
+.P
+.BI "int open_tree(int " dirfd ", const char *" path ", unsigned int " flags ");"
+.fi
+.SH DESCRIPTION
+The
+.BR open_tree ()
+system call is part of the suite of file descriptor based mount facilities
+in Linux.
+.P
+If
+.I flags
+contains
+.BR \%OPEN_TREE_CLONE ,
+.BR open_tree ()
+creates a detached mount object
+consisting of a bind-mount of the path
+specified by the
+.IR path ,
+and attaches it to a new file descriptor,
+which is then returned.
+The mount object is equivalent to a bind-mount
+that would be created by
+.BR mount (2)
+called with
+.BR MS_BIND ,
+except that it is tied to a file descriptor
+and is not mounted onto the filesystem.
+.P
+As with file descriptors returned from
+.BR fsmount (2),
+the resultant file descriptor can then be used with
+.BR move_mount (2),
+.BR mount_setattr (2),
+or other such system calls
+to do further mount operations.
+This mount object will be unmounted and destroyed
+when the file descriptor is closed
+if it was not otherwise attached to a mount point
+by calling
+.BR move_mount (2).
+.P
+If
+.I flags
+does not contain
+.BR \%OPEN_TREE_CLONE ,
+.BR open_tree ()
+returns a file descriptor
+that is exactly equivalent to
+one produced by
+.BR open (2).
+.P
+In either case, the resultant file descriptor
+acts the same as one produced by
+.BR open (2)
+with
+.BR O_PATH ,
+meaning it can also be used as a
+.I dirfd
+argument to
+"*at()" system calls.
+.P
+As with "*at()" system calls,
+.BR fspick ()
+uses the
+.I dirfd
+argument in conjunction with the
+.I path
+argument to determine the path to operate on, as follows:
+.IP \[bu] 3
+If the pathname given in
+.I path
+is absolute, then
+.I dirfd
+is ignored.
+.IP \[bu]
+If the pathname given in
+.I path
+is relative and
+.I dirfd
+is the special value
+.BR AT_FDCWD ,
+then
+.I path
+is interpreted relative to
+the current working directory
+of the calling process (like
+.BR open (2)).
+.IP \[bu]
+If the pathname given in
+.I path
+is relative,
+then it is interpreted relative to
+the directory referred to by the file descriptor
+.I dirfd
+(rather than relative to
+the current working directory
+of the calling process,
+as is done by
+.BR open (2)
+for a relative pathname).
+In this case,
+.I dirfd
+must be a directory
+that was opened for reading
+.RB ( O_RDONLY )
+or using the
+.B O_PATH
+flag.
+.IP \[bu]
+If
+.I path
+is an empty string,
+and
+.I flags
+contains
+.BR \%AT_EMPTY_PATH ,
+then the file descriptor referenced by
+.I dirfd
+is operated on directly.
+In this case,
+.I dirfd
+can refer to any type of file,
+not just a directory.
+.P
+.I flags
+can be used to control aspects of the path lookup
+and properties of the returned file descriptor.
+A value for
+.I flags
+is constructed by bitwise ORing
+zero or more of the following constants:
+.RS
+.TP
+.B AT_EMPTY_PATH
+If
+.I path
+is an empty string, operate on the file referred to by
+.I dirfd
+(which may have been obtained from
+.BR open (2),
+.BR fsmount(2),
+or from another
+.BR open_tree ()
+call).
+In this case,
+.I dirfd
+can refer to any type of file, not just a directory.
+If
+.I dirfd
+is
+.BR AT_FDCWD ,
+the call operates on the current working directory
+of the calling process.
+This flag is Linux-specific; define
+.B \%_GNU_SOURCE
+to obtain its definition.
+.TP
+.B AT_NO_AUTOMOUNT
+Don't automount the terminal ("basename") component of
+.I path
+if it is a directory that is an automount point.
+This allows the caller to gather attributes of an automount point
+(rather than the location it would mount).
+This flag has no effect if the mount point has already been mounted over.
+This flag is Linux-specific; define
+.B \%_GNU_SOURCE
+to obtain its definition.
+.TP
+.B AT_SYMLINK_NOFOLLOW
+If
+.I path
+is a symbolic link, do not dereference it; instead,
+create either a handle to the link itself
+or a bind-mount of it.
+The resultant file descriptor is indistinguishable from one produced by
+.BR openat (2)
+with
+.BR \%O_PATH | O_NOFOLLLOW .
+.TP
+.B OPEN_TREE_CLOEXEC
+Set the close-on-exec
+.RB ( FD_CLOEXEC )
+flag on the new file descriptor.
+See the description of the
+.B O_CLOEXEC
+flag in
+.BR open (2)
+for reasons why this may be useful.
+.TP
+.B OPEN_TREE_CLONE
+Rather than opening the path as a regular file
+(a-la
+.BR openat (2)),
+create a detached bind-mount mount object
+and attach it to the file descriptor.
+In order to do this operation,
+the calling process must have the
+.BR \%CAP_SYS_ADMIN
+capability.
+.TP
+.B AT_RECURSIVE
+Create a recursive bind-mount of the path
+(a-la
+.BR mount (2)
+with
+.BR MS_BIND | MS_REC ),
+and attach it to the file descriptor.
+This flag is only permitted in conjunction with
+.BR \%OPEN_TREE_CLONE .
+.SH RETURN VALUE
+On success, a new file descriptor is returned.
+On error, \-1 is returned, and
+.I errno
+is set to indicate the error.
+.SH ERRORS
+.TP
+.B EACCES
+Search permission is denied for one of the directories
+in the path prefix of
+.IR path .
+(See also
+.BR path_resolution (7).)
+.TP
+.B EPERM
+.I flags
+contains
+.B \%OPEN_TREE_CLONE
+but the calling process does not have the required
+.B CAP_SYS_ADMIN
+capability.
+.TP
+.B EBADF
+.I path
+is relative but
+.I dirfd
+is neither
+.B AT_FDCWD
+nor a valid file descriptor.
+.TP
+.B EFAULT
+.I path
+is NULL
+or a pointer to a location
+outside the calling process's accessible address space.
+.TP
+.B EINVAL
+Invalid flag specified in
+.IR flags .
+.TP
+.B ELOOP
+Too many symbolic links encountered when resolving
+.IR path .
+.TP
+.B ENAMETOOLONG
+.I path
+is longer than
+.BR PATH_MAX .
+.TP
+.B ENOENT
+A component of
+.I path
+does not exist, or is a dangling symbolic link.
+.TP
+.B ENOENT
+.I path
+is an empty string, but
+.B AT_EMPTY_PATH
+is not specified in
+.IR flags .
+.TP
+.B ENOTDIR
+A component of the path prefix of
+.I path
+is not a directory, or
+.I path
+is relative and
+.I dirfd
+is a file descriptor referring to a file other than a directory.
+.TP
+.B ENOSPC
+The "anonymous" mount namespace
+necessary to contain the
+.B \%OPEN_TREE_CLONE
+detached bind-mount mount object
+could not be allocated,
+as doing so would
+exceed the configured per-user limit
+on the number of mount namespaces
+in the current user namespace.
+(See also
+.BR namespaces (7).)
+.TP
+.B ENOMEM
+The kernel could not allocate sufficient memory to complete the operation.
+.TP
+.B EMFILE
+The calling process has too many open files to create more.
+.TP
+.B ENFILE
+The system has too many open files to create more.
+.SH STANDARDS
+Linux.
+.SH HISTORY
+Linux 5.2.
+.\" commit a07b20004793d8926f78d63eb5980559f7813404
+glibc 2.36.
+.SH NOTES
+.SS Anonymous mount namespaces
+The bind-mount mount objects created by
+.BR open_tree ()
+with
+.B \%OPEN_TREE_CLONE
+are not attached to the mount namespace of the calling process.
+Instead, each mount object is attached to
+a newly allocated "anonymous" mount namespace
+associated with the calling process.
+.P
+One of the side-effects of this is that
+(unlike bind-mounts created with
+.BR mount (2)),
+mount propagation
+(as described in
+.BR mount_namespaces (7))
+will not be applied to bind-mounts created by
+.BR open_tree ()
+until the bind-mount is attached with
+.BR move_mount (2),
+at which point the mount
+will be associated with the mount namespace
+where it was mounted
+and mount propagation will resume.
+.SH EXAMPLES
+The following examples show how
+.BR open_tree ()
+can be used in place of more traditional
+.BR mount (2)
+calls with
+.BR MS_BIND .
+.P
+.in +4n
+.EX
+int srcfd;
+\&
+/* mount --bind /var /mnt */
+mount("/var", "/mnt", NULL, MS_BIND, NULL);
+/* ... is equivalent to ... */
+srcfd = open_tree(AT_FDCWD, "/var", OPEN_TREE_CLONE);
+move_mount(srcfd, "", AT_FDCWD, "/mnt", MOVE_MOUNT_F_EMPTY_PATH);
+\&
+/* mount --rbind /var /mnt */
+mount("/var", "/mnt", NULL, MS_BIND|MS_REC, NULL);
+/* ... is equivalent to ... */
+srcfd = open_tree(AT_FDCWD, "/var", OPEN_TREE_CLONE | AT_RECURSIVE);
+move_mount(srcfd, "", AT_FDCWD, "/mnt", MOVE_MOUNT_F_EMPTY_PATH);
+\&
+/* mount --bind /proc/self/fd/100 /proc/self/fd/200/foo */
+mount("/proc/self/fd/100", "/proc/self/fd/200/foo", NULL, MS_BIND, NULL);
+/* ... is equivalent to ... */
+srcfd = open_tree(100, "", AT_EMPTY_PATH | OPEN_TREE_CLONE);
+move_mount(srcfd, "", 200, "foo", MOVE_MOUNT_F_EMPTY_PATH);
+.EE
+.in
+.P
+In addition, you can use the file descriptor returned by
+.BR open_tree ()
+as the
+.I dirfd
+argument to any "*at()" system calls:
+.P
+.in +4n
+.EX
+int dirfd, fd;
+\&
+dirfd = open_tree(AT_FDCWD, "/etc", OPEN_TREE_CLONE);
+fd = openat(dirfd, "passwd", O_RDONLY);
+fchmodat(dirfd, "shadow", 0000, 0);
+close(dirfd);
+close(fd);
+/* The bind-mount is now destroyed. */
+.EE
+.in
+.SH SEE ALSO
+.BR fsconfig (2),
+.BR fsmount (2),
+.BR fsopen (2),
+.BR fspick (2),
+.BR mount (2),
+.BR mount_setattr (2),
+.BR move_mount (2),
+.BR mount_namespaces (7)
--
2.50.1
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH v2 09/11] mount_setattr.2: mirror opening sentence from fsopen(2)
2025-08-06 17:44 [PATCH v2 00/11] man2: add man pages for 'new' mount API Aleksa Sarai
` (7 preceding siblings ...)
2025-08-06 17:44 ` [PATCH v2 08/11] open_tree.2: " Aleksa Sarai
@ 2025-08-06 17:44 ` Aleksa Sarai
2025-08-06 17:44 ` [PATCH v2 10/11] open_tree_attr.2, open_tree.2: document new open_tree_attr() api Aleksa Sarai
` (2 subsequent siblings)
11 siblings, 0 replies; 36+ messages in thread
From: Aleksa Sarai @ 2025-08-06 17:44 UTC (permalink / raw)
To: Alejandro Colomar
Cc: Michael T. Kerrisk, Alexander Viro, Jan Kara, Askar Safin,
G. Branden Robinson, linux-man, linux-api, linux-fsdevel,
linux-kernel, David Howells, Christian Brauner, Aleksa Sarai
All of the other new mount API docs have this lead-in sentence in order
to make this set of APIs feel a little bit more cohesive. Despite being
a bit of a latecomer, mount_setattr(2) is definitely part of this family
of APIs and so deserves the same treatment.
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
---
man/man2/mount_setattr.2 | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/man/man2/mount_setattr.2 b/man/man2/mount_setattr.2
index d44fafc93a20..b9afc21035b8 100644
--- a/man/man2/mount_setattr.2
+++ b/man/man2/mount_setattr.2
@@ -19,7 +19,11 @@ .SH SYNOPSIS
.SH DESCRIPTION
The
.BR mount_setattr ()
-system call changes the mount properties of a mount or an entire mount tree.
+system call is part of the suite of file descriptor based mount facilities
+in Linux.
+.P
+.BR mount_setattr ()
+changes the mount properties of a mount or an entire mount tree.
If
.I path
is relative,
--
2.50.1
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH v2 10/11] open_tree_attr.2, open_tree.2: document new open_tree_attr() api
2025-08-06 17:44 [PATCH v2 00/11] man2: add man pages for 'new' mount API Aleksa Sarai
` (8 preceding siblings ...)
2025-08-06 17:44 ` [PATCH v2 09/11] mount_setattr.2: mirror opening sentence from fsopen(2) Aleksa Sarai
@ 2025-08-06 17:44 ` Aleksa Sarai
2025-08-06 17:44 ` [PATCH v2 11/11] fsconfig.2, mount_setattr.2: add note about attribute-parameter distinction Aleksa Sarai
2025-08-08 12:53 ` [PATCH v2 00/11] man2: add man pages for 'new' mount API Christian Brauner
11 siblings, 0 replies; 36+ messages in thread
From: Aleksa Sarai @ 2025-08-06 17:44 UTC (permalink / raw)
To: Alejandro Colomar
Cc: Michael T. Kerrisk, Alexander Viro, Jan Kara, Askar Safin,
G. Branden Robinson, linux-man, linux-api, linux-fsdevel,
linux-kernel, David Howells, Christian Brauner, Aleksa Sarai
This is a new API added in Linux 6.15, and is effectively just a minor
expansion of open_tree(2) in order to allow for MOUNT_ATTR_IDMAP to be
changed for an existing ID-mapped mount. Glibc does not yet have a
wrapper for this.
Cc: Christian Brauner <brauner@kernel.org>
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
---
man/man2/open_tree.2 | 74 +++++++++++++++++++++++++++++++++++++++++++++++
man/man2/open_tree_attr.2 | 1 +
2 files changed, 75 insertions(+)
diff --git a/man/man2/open_tree.2 b/man/man2/open_tree.2
index 3d38e27b5254..6e7ec4998d42 100644
--- a/man/man2/open_tree.2
+++ b/man/man2/open_tree.2
@@ -15,7 +15,19 @@ .SH SYNOPSIS
.BR "#include <sys/mount.h>"
.P
.BI "int open_tree(int " dirfd ", const char *" path ", unsigned int " flags ");"
+.P
+.BR "#include <sys/syscall.h>" " /* Definition of " SYS_* " constants */"
+.P
+.BI "int syscall(SYS_open_tree_attr, int " dirfd ", const char *" path ","
+.BI " unsigned int " flags ", struct mount_attr *" attr ", \
+size_t " size ");"
.fi
+.P
+.IR Note :
+glibc provides no wrapper for
+.BR open_tree_attr (),
+necessitating the use of
+.BR syscall (2).
.SH DESCRIPTION
The
.BR open_tree ()
@@ -222,6 +234,64 @@ .SH DESCRIPTION
and attach it to the file descriptor.
This flag is only permitted in conjunction with
.BR \%OPEN_TREE_CLONE .
+.SS open_tree_attr()
+The
+.BR open_tree_attr ()
+system call operates in exactly the same way as
+.BR open_tree (),
+except for the differences described here.
+.P
+After performing the same operation as with
+.BR open_tree (),
+(before returning the resulting file descriptor)
+.BR open_tree_attr ()
+will apply the mount attributes requested in
+.I attr
+to the mount object.
+(See
+.BR mount_attr (2type)
+for a description of the
+.I mount_attr
+structure.
+As described in
+.BR mount_setattr (2),
+.I size
+must be set to
+.I sizeof(struct mount_attr)
+in order to support future extensions.)
+.P
+For the most part, the application of
+.I attr
+has identical semantics to
+.BR mount_setattr (2),
+except that it is possible to change the
+.B \%MOUNT_ATTR_IDMAP
+attribute for a mount object
+that is already configured as an ID-mapped mount.
+This is usually forbidden by
+.BR mount_setattr (2)
+and thus
+.BR open_tree_attr ()
+is currently the only permitted mechanism to change this attribute.
+Changing an ID-mapped mount is only permitted
+if a new detached mount object is being created with
+.I flags
+including
+.BR \%OPEN_TREE_CLONE .
+.P
+If
+.I flags
+contains
+.BR \%AT_RECURSIVE ,
+then the attributes are applied recursively
+(just as when
+.BR mount_setattr (2)
+is called with
+.BR \%AT_RECURSIVE ).
+This applies in addition to the
+.BR open_tree ()-specific
+behaviour regarding
+.BR \%AT_RECURSIVE .
.SH RETURN VALUE
On success, a new file descriptor is returned.
On error, \-1 is returned, and
@@ -316,9 +386,13 @@ .SH ERRORS
.SH STANDARDS
Linux.
.SH HISTORY
+.SS open_tree()
Linux 5.2.
.\" commit a07b20004793d8926f78d63eb5980559f7813404
glibc 2.36.
+.SS open_tree_attr()
+Linux 6.15.
+.\" commit c4a16820d90199409c9bf01c4f794e1e9e8d8fd8
.SH NOTES
.SS Anonymous mount namespaces
The bind-mount mount objects created by
diff --git a/man/man2/open_tree_attr.2 b/man/man2/open_tree_attr.2
new file mode 100644
index 000000000000..e57269bbd269
--- /dev/null
+++ b/man/man2/open_tree_attr.2
@@ -0,0 +1 @@
+.so man2/open_tree.2
--
2.50.1
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH v2 11/11] fsconfig.2, mount_setattr.2: add note about attribute-parameter distinction
2025-08-06 17:44 [PATCH v2 00/11] man2: add man pages for 'new' mount API Aleksa Sarai
` (9 preceding siblings ...)
2025-08-06 17:44 ` [PATCH v2 10/11] open_tree_attr.2, open_tree.2: document new open_tree_attr() api Aleksa Sarai
@ 2025-08-06 17:44 ` Aleksa Sarai
2025-08-08 12:53 ` [PATCH v2 00/11] man2: add man pages for 'new' mount API Christian Brauner
11 siblings, 0 replies; 36+ messages in thread
From: Aleksa Sarai @ 2025-08-06 17:44 UTC (permalink / raw)
To: Alejandro Colomar
Cc: Michael T. Kerrisk, Alexander Viro, Jan Kara, Askar Safin,
G. Branden Robinson, linux-man, linux-api, linux-fsdevel,
linux-kernel, David Howells, Christian Brauner, Aleksa Sarai
This was not particularly well documented in mount(8) nor mount(2), and
since this is a fairly notable aspect of the new mount API, we should
probably add some words about it.
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
---
man/man2/fsconfig.2 | 7 +++++++
man/man2/mount_setattr.2 | 37 +++++++++++++++++++++++++++++++++++++
2 files changed, 44 insertions(+)
diff --git a/man/man2/fsconfig.2 b/man/man2/fsconfig.2
index e2121b7a6b68..9e0e25acff3b 100644
--- a/man/man2/fsconfig.2
+++ b/man/man2/fsconfig.2
@@ -448,6 +448,13 @@ .SH HISTORY
Linux 5.2.
.\" commit ecdab150fddb42fe6a739335257949220033b782
glibc 2.36.
+.SH NOTES
+.SS Mount attributes and filesystem parameters
+For a description of the distinction between
+mount attributes and filesystem parameters,
+see the "Mount attributes and filesystem paramers" subsection
+of
+.BR mount_setattr (2).
.SH EXAMPLES
To illustrate the different kinds of flags that can be configured with
.BR fsconfig (),
diff --git a/man/man2/mount_setattr.2 b/man/man2/mount_setattr.2
index b9afc21035b8..3e6b59e5b57a 100644
--- a/man/man2/mount_setattr.2
+++ b/man/man2/mount_setattr.2
@@ -790,6 +790,43 @@ .SS ID-mapped mounts
.BR chown (2)
system call changes the ownership globally and permanently.
.\"
+.SS Mount attributes and filesystem parameters
+Some mount attributes
+(traditionally associated with
+.BR mount (8)-style
+options)
+are also filesystem parameters.
+For example, the
+.I -o ro
+option to
+.BR mount (8)
+can refer to the
+"read-only" filesystem parameter,
+or the "read-only" mount attribute.
+.P
+The distinction between these two kinds of option is that
+mount object attributes are applied per-mount-object
+(allowing different mount objects
+derived from a given filesystem instance
+to have different attributes),
+while filesystem instance parameters
+("superblock flags" in kernel developer parlance)
+apply to all mount objects
+derived from the same filesystem instance.
+.P
+When using
+.BR mount (2),
+the line between these two types of mount options was blurred.
+However, with
+.BR mount_setattr ()
+and
+.BR fsconfig (2),
+the distinction is made much clearer.
+Mount attributes are configured with
+.BR mount_setattr (),
+while filesystem parameters can be configured using
+.BR fsconfig (2).
+.\"
.SS Extensibility
In order to allow for future extensibility,
.BR mount_setattr ()
--
2.50.1
^ permalink raw reply related [flat|nested] 36+ messages in thread
* Re: [PATCH v2 01/11] mount_setattr.2: document glibc >= 2.36 syscall wrappers
2025-08-06 17:44 ` [PATCH v2 01/11] mount_setattr.2: document glibc >= 2.36 syscall wrappers Aleksa Sarai
@ 2025-08-07 10:39 ` Alejandro Colomar
2025-08-08 9:23 ` Askar Safin
1 sibling, 0 replies; 36+ messages in thread
From: Alejandro Colomar @ 2025-08-07 10:39 UTC (permalink / raw)
To: Aleksa Sarai
Cc: Michael T. Kerrisk, Alexander Viro, Jan Kara, Askar Safin,
G. Branden Robinson, linux-man, linux-api, linux-fsdevel,
linux-kernel, David Howells, Christian Brauner
[-- Attachment #1: Type: text/plain, Size: 3386 bytes --]
Hi Aleksa,
On Thu, Aug 07, 2025 at 03:44:35AM +1000, Aleksa Sarai wrote:
> Glibc 2.36 added syscall wrappers for the entire family of fd-based
> mount syscalls, including mount_setattr(2). Thus it's no longer
> necessary to instruct users to do raw syscall(2) operations.
>
> Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
Thanks! I've applied and pushed the patch.
Have a lovely day!
Alex
> ---
> man/man2/mount_setattr.2 | 45 +++++++--------------------------------------
> 1 file changed, 7 insertions(+), 38 deletions(-)
>
> diff --git a/man/man2/mount_setattr.2 b/man/man2/mount_setattr.2
> index 60d9cf9de8aa..c96f0657f046 100644
> --- a/man/man2/mount_setattr.2
> +++ b/man/man2/mount_setattr.2
> @@ -10,21 +10,12 @@ .SH LIBRARY
> .RI ( libc ,\~ \-lc )
> .SH SYNOPSIS
> .nf
> -.BR "#include <linux/fcntl.h>" " /* Definition of " AT_* " constants */"
> -.BR "#include <linux/mount.h>" " /* Definition of " MOUNT_ATTR_* " constants */"
> -.BR "#include <sys/syscall.h>" " /* Definition of " SYS_* " constants */"
> -.B #include <unistd.h>
> +.BR "#include <fcntl.h>" " /* Definition of " AT_* " constants */"
> +.B #include <sys/mount.h>
> .P
> -.BI "int syscall(SYS_mount_setattr, int " dirfd ", const char *" path ,
> -.BI " unsigned int " flags ", struct mount_attr *" attr \
> -", size_t " size );
> +.BI "int mount_setattr(int " dirfd ", const char *" path ", unsigned int " flags ","
> +.BI " struct mount_attr *" attr ", size_t " size );"
> .fi
> -.P
> -.IR Note :
> -glibc provides no wrapper for
> -.BR mount_setattr (),
> -necessitating the use of
> -.BR syscall (2).
> .SH DESCRIPTION
> The
> .BR mount_setattr ()
> @@ -586,6 +577,7 @@ .SH HISTORY
> .\" commit 7d6beb71da3cc033649d641e1e608713b8220290
> .\" commit 2a1867219c7b27f928e2545782b86daaf9ad50bd
> .\" commit 9caccd41541a6f7d6279928d9f971f6642c361af
> +glibc 2.36.
> .SH NOTES
> .SS ID-mapped mounts
> Creating an ID-mapped mount makes it possible to
> @@ -914,37 +906,14 @@ .SH EXAMPLES
> #include <err.h>
> #include <fcntl.h>
> #include <getopt.h>
> -#include <linux/mount.h>
> -#include <linux/types.h>
> +#include <sys/mount.h>
> +#include <sys/types.h>
> #include <stdbool.h>
> #include <stdio.h>
> #include <stdlib.h>
> #include <string.h>
> -#include <sys/syscall.h>
> #include <unistd.h>
> \&
> -static inline int
> -mount_setattr(int dirfd, const char *path, unsigned int flags,
> - struct mount_attr *attr, size_t size)
> -{
> - return syscall(SYS_mount_setattr, dirfd, path, flags,
> - attr, size);
> -}
> -\&
> -static inline int
> -open_tree(int dirfd, const char *filename, unsigned int flags)
> -{
> - return syscall(SYS_open_tree, dirfd, filename, flags);
> -}
> -\&
> -static inline int
> -move_mount(int from_dirfd, const char *from_path,
> - int to_dirfd, const char *to_path, unsigned int flags)
> -{
> - return syscall(SYS_move_mount, from_dirfd, from_path,
> - to_dirfd, to_path, flags);
> -}
> -\&
> static const struct option longopts[] = {
> {"map\-mount", required_argument, NULL, \[aq]a\[aq]},
> {"recursive", no_argument, NULL, \[aq]b\[aq]},
>
> --
> 2.50.1
>
>
--
<https://www.alejandro-colomar.es/>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v2 02/11] mount_setattr.2: move mount_attr struct to mount_attr.2type
2025-08-06 17:44 ` [PATCH v2 02/11] mount_setattr.2: move mount_attr struct to mount_attr.2type Aleksa Sarai
@ 2025-08-07 11:11 ` Alejandro Colomar
2025-08-07 12:38 ` Aleksa Sarai
0 siblings, 1 reply; 36+ messages in thread
From: Alejandro Colomar @ 2025-08-07 11:11 UTC (permalink / raw)
To: Aleksa Sarai
Cc: Michael T. Kerrisk, Alexander Viro, Jan Kara, Askar Safin,
G. Branden Robinson, linux-man, linux-api, linux-fsdevel,
linux-kernel, David Howells, Christian Brauner
[-- Attachment #1: Type: text/plain, Size: 3941 bytes --]
Hi Aleksa,
On Thu, Aug 07, 2025 at 03:44:36AM +1000, Aleksa Sarai wrote:
> As with open_how(2type), it makes sense to move this to a separate man
> page. In addition, future man pages added in this patchset will want to
> reference mount_attr(2type).
>
> Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
> ---
> man/man2/mount_setattr.2 | 17 ++++---------
> man/man2type/mount_attr.2type | 58 +++++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 63 insertions(+), 12 deletions(-)
>
> diff --git a/man/man2/mount_setattr.2 b/man/man2/mount_setattr.2
> index c96f0657f046..d44fafc93a20 100644
> --- a/man/man2/mount_setattr.2
> +++ b/man/man2/mount_setattr.2
> @@ -114,18 +114,11 @@ .SH DESCRIPTION
> .I attr
> argument of
> .BR mount_setattr ()
> -is a structure of the following form:
> -.P
> -.in +4n
> -.EX
> -struct mount_attr {
> - __u64 attr_set; /* Mount properties to set */
> - __u64 attr_clr; /* Mount properties to clear */
> - __u64 propagation; /* Mount propagation type */
> - __u64 userns_fd; /* User namespace file descriptor */
> -};
> -.EE
> -.in
> +is a pointer to a
> +.I mount_attr
> +structure,
> +described in
> +.BR mount_attr (2type).
> .P
> The
> .I attr_set
> diff --git a/man/man2type/mount_attr.2type b/man/man2type/mount_attr.2type
> new file mode 100644
> index 000000000000..b7a3ace6b3b9
> --- /dev/null
> +++ b/man/man2type/mount_attr.2type
> @@ -0,0 +1,58 @@
> +
Please remove this blank. It is not diagnosed by groff(1), but I think
it should be diagnosed (blank lines are diagnosed elsewhere). I've
reported a bug to groff(1) (Branden will be reading this, anyway).
> +.\" Copyright, the authors of the Linux man-pages project
> +.\"
> +.\" SPDX-License-Identifier: Linux-man-pages-copyleft
> +.\"
> +.TH mount_attr 2type (date) "Linux man-pages (unreleased)"
> +.SH NAME
> +mount_attr \- what mount properties to set and clear
> +.SH LIBRARY
> +Linux kernel headers
> +.SH SYNOPSIS
> +.EX
> +.B #include <sys/mount.h>
> +.P
> +.B struct mount_attr {
> +.BR " __u64 attr_set;" " /* Mount properties to set */"
> +.BR " __u64 attr_clr;" " /* Mount properties to clear */"
> +.BR " __u64 propagation;" " /* Mount propagation type */"
> +.BR " __u64 userns_fd;" " /* User namespace file descriptor */"
> + /* ... */
> +.B };
> +.EE
> +.SH DESCRIPTION
> +Specifies which mount properties should be changed with
> +.BR mount_setattr (2).
> +.P
> +The fields are as follows:
> +.TP
> +.I .attr_set
> +This field specifies which
> +.BI MOUNT_ATTR_ *
> +attribute flags to set.
> +.TP
> +.I .attr_clr
> +This fields specifies which
> +.BI MOUNT_ATTR_ *
> +attribute flags to clear.
> +.TP
> +.I .propagation
> +This field specifies what mount propagation will be applied.
> +The valid values of this field are the same propagation types described in
> +.BR mount_namespaces (7).
> +.TP
> +.I .userns_fd
> +This fields specifies a file descriptor that indicates which user namespace to
s/fields/field/
> +use as a reference for ID-mapped mounts with
> +.BR MOUNT_ATTR_IDMAP .
> +.SH VERSIONS
> +Extra fields may be appended to the structure,
> +with a zero value in a new field resulting in
> +the kernel behaving as though that extension field was not present.
> +Therefore, a user
> +.I must
> +zero-fill this structure on initialization.
I think this would be more appropriate for HISTORY. In VERSIONS, we
usually document differences with the BSDs or other systems.
While moving this to HISTORY, it would also be useful to mention the
glibc version that added the structure. In the future, we'd document
the versions of glibc and Linux that have added members.
> +.SH STANDARDS
> +Linux.
> +.SH SEE ALSO
> +.BR mount_setattr (2)
Have a lovely day!
Alex
--
<https://www.alejandro-colomar.es/>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v2 03/11] fsopen.2: document 'new' mount api
2025-08-06 17:44 ` [PATCH v2 03/11] fsopen.2: document 'new' mount api Aleksa Sarai
@ 2025-08-07 11:38 ` Alejandro Colomar
2025-08-07 12:50 ` Aleksa Sarai
2025-08-07 13:27 ` Aleksa Sarai
2025-08-08 9:07 ` Askar Safin
1 sibling, 2 replies; 36+ messages in thread
From: Alejandro Colomar @ 2025-08-07 11:38 UTC (permalink / raw)
To: Aleksa Sarai
Cc: Michael T. Kerrisk, Alexander Viro, Jan Kara, Askar Safin,
G. Branden Robinson, linux-man, linux-api, linux-fsdevel,
linux-kernel, David Howells, Christian Brauner
[-- Attachment #1: Type: text/plain, Size: 12796 bytes --]
Hi Aleksa,
On Thu, Aug 07, 2025 at 03:44:37AM +1000, Aleksa Sarai wrote:
> This is loosely based on the original documentation written by David
> Howells and later maintained by Christian Brauner, but has been
> rewritten to be more from a user perspective (as well as fixing a few
> critical mistakes).
>
> Co-developed-by: David Howells <dhowells@redhat.com>
> Co-developed-by: Christian Brauner <brauner@kernel.org>
Please use Co-authored-by. It's documented under CONTRIBUTING.d/:
$ cat CONTRIBUTING.d/patches/description | grep -A99 Trailer;
Trailer
Sign your patch with "Signed-off-by:". Read about the
"Developer's Certificate of Origin" at
<https://www.kernel.org/doc/Documentation/process/submitting-patches.rst>.
When appropriate, other tags documented in that file, such as
"Reported-by:", "Reviewed-by:", "Acked-by:", and "Suggested-by:"
can be added to the patch. We use "Co-authored-by:" instead of
"Co-developed-by:". Example:
Signed-off-by: Alejandro Colomar <alx@kernel.org>
I think 'author' is more appropriate than 'developer' for documentation.
It is also more consistent with the Copyright notice, which assigns
copyright to the authors (documented in AUTHORS). And ironically, even
the kernel documentation about Co-authored-by talks about authorship
instead of development:
Co-developed-by: states that the patch was co-created by
multiple developers; it is used to give attribution to
co-authors (in addition to the author attributed by the From:
tag) when several people work on a single patch.
> Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
> ---
> man/man2/fsopen.2 | 319 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 319 insertions(+)
>
> diff --git a/man/man2/fsopen.2 b/man/man2/fsopen.2
> new file mode 100644
> index 000000000000..ad38ef0782be
> --- /dev/null
> +++ b/man/man2/fsopen.2
> @@ -0,0 +1,319 @@
> +.\" Copyright, the authors of the Linux man-pages project
> +.\"
> +.\" SPDX-License-Identifier: Linux-man-pages-copyleft
> +.\"
> +.TH fsopen 2 (date) "Linux man-pages (unreleased)"
> +.SH NAME
> +fsopen \- create a new filesystem context
> +.SH LIBRARY
> +Standard C library
> +.RI ( libc ,\~ \-lc )
> +.SH SYNOPSIS
> +.nf
> +.BR "#include <sys/mount.h>"
> +.P
> +.BI "int fsopen(const char *" fsname ", unsigned int " flags ");"
> +.fi
> +.SH DESCRIPTION
> +The
> +.BR fsopen ()
> +system call is part of the suite of file descriptor based mount facilities in
> +Linux.
> +.P
> +.BR fsopen ()
> +creates a blank filesystem configuration context within the kernel
> +for the filesystem named by
> +.IR fsname ,
> +puts the context into creation mode and attaches it to a file descriptor,
> +which is then returned.
> +The calling process must have the
> +.B \%CAP_SYS_ADMIN
> +capability in order to create a new filesystem configuration context.
> +.P
> +A filesystem configuration context is an in-kernel representation of a pending
> +transaction,
This page still needs semantic newlines. (Please review all pages
regarding that.) (In this specific sentence, I'd break after 'is'.)
> +containing a set of configuration parameters that are to be applied
> +when creating a new instance of a filesystem
> +(or modifying the configuration of an existing filesystem instance,
> +such as when using
> +.BR fspick (2)).
> +.P
> +After obtaining a filesystem configuration context with
> +.BR fsopen (),
> +the general workflow for operating on the context looks like the following:
> +.IP (1) 5
> +Pass the filesystem context file descriptor to
> +.BR fsconfig (2)
> +to specify any desired filesystem parameters.
> +This may be done as many times as necessary.
> +.IP (2)
> +Pass the same filesystem context file descriptor to
Do we need to say "same"? I guess it's obvious. Or do you expect
any confusion if we don't?
> +.BR fsconfig (2)
> +with
> +.B \%FSCONFIG_CMD_CREATE
> +to create an instance of the configured filesystem.
> +.IP (3)
> +Pass the same filesystem context file descriptor to
> +.BR fsmount (2)
> +to create a new mount object for the root of the filesystem,
> +which is then attached to a new file descriptor.
> +This also places the filesystem context file descriptor into reconfiguration
> +mode,
> +similar to the mode produced by
> +.BR fspick (2).
> +.IP (4)
> +Use the mount object file descriptor as a
> +.I dirfd
> +argument to "*at()" system calls;
> +or attach the mount object to a mount point
> +by passing the mount object file descriptor to
> +.BR move_mount (2).
> +.P
> +A filesystem context will move between different modes throughout its
> +lifecycle
> +(such as the creation phase when created with
> +.BR fsopen (),
> +the reconfiguration phase when an existing filesystem instance is selected by
> +.BR fspick (2),
> +and the intermediate "needs-mount" phase between
> +.\" FS_CONTEXT_NEEDS_MOUNT is the term the kernel uses for this.
> +.BR \%FSCONFIG_CMD_CREATE
> +and
> +.BR fsmount (2)),
> +which has an impact on what operations are permitted on the filesystem context.
> +.P
> +The file descriptor returned by
> +.BR fsopen ()
> +also acts as a channel for filesystem drivers to provide more comprehensive
> +error, warning, and information messages
Should we just say "diagnostic messages" to avoid explicitly mentioning
all the levels?
> +than are normally provided through the standard
> +.BR errno (3)
> +interface for system calls.
> +If an error occurs at any time during the workflow mentioned above,
> +calling
> +.BR read (2)
> +on the filesystem context file descriptor will retrieve any ancillary
> +information about the encountered errors.
> +(See the "Message retrieval interface" section for more details on the message
> +format.)
> +.P
> +.I flags
> +can be used to control aspects of the creation of the filesystem configuration
> +context file descriptor.
> +A value for
> +.I flags
> +is constructed by bitwise ORing
> +zero or more of the following constants:
> +.RS
> +.TP
> +.B FSOPEN_CLOEXEC
> +Set the close-on-exec
> +.RB ( FD_CLOEXEC )
> +flag on the new file descriptor.
> +See the description of the
> +.B O_CLOEXEC
> +flag in
> +.BR open (2)
> +for reasons why this may be useful.
> +.RE
> +.P
> +A list of filesystems supported by the running kernel
> +(and thus a list of valid values for
> +.IR fsname )
> +can be obtained from
> +.IR /proc/filesystems .
> +(See also
> +.BR proc_filesystems (5).)
> +.SS Message retrieval interface
> +When doing operations on a filesystem configuration context,
> +the filesystem driver may choose to provide ancillary information to userspace
> +in the form of message strings.
> +.P
> +The filesystem context file descriptors returned by
> +.BR fsopen ()
> +and
> +.BR fspick (2)
> +may be queried for message strings at any time by calling
> +.BR read (2)
> +on the file descriptor.
> +Each call to
> +.BR read (2)
> +will return a single message,
> +prefixed to indicate its class:
> +.RS
> +.TP
> +.B "e <message>"
> +An error message was logged.
> +This is usually associated with an error being returned from the corresponding
> +system call which triggered this message.
> +.TP
> +.B "w <message>"
> +A warning message was logged.
> +.TP
> +.B "i <message>"
> +An informational message was logged.
> +.RE
> +.P
> +Messages are removed from the queue as they are read.
> +Note that the message queue has limited depth,
> +so it is possible for messages to get lost.
> +If there are no messages in the message queue,
> +.B read(2)
> +will return no data and
> +.I errno
> +will be set to
> +.BR \%ENODATA .
> +If the
> +.I buf
> +argument to
> +.BR read (2)
> +is not large enough to contain the message,
> +.BR read (2)
> +will return no data and
> +.I errno
> +will be set to
> +.BR \%EMSGSIZE .
> +.P
> +If there are multiple filesystem context file descriptors referencing the same
> +filesystem instance
> +(such as if you call
> +.BR fspick (2)
> +multiple times for the same mount),
> +each one gets its own independent message queue.
> +This does not apply to file descriptors that were duplicated with
> +.BR dup (2).
> +.P
> +Messages strings will usually be prefixed by the filesystem driver that logged
s/Messages/Message/
BTW, here, I'd break after 'prefixed', and then after the ','.
> +the message, though this may not always be the case.
> +See the Linux kernel source code for details.
> +.SH RETURN VALUE
> +On success, a new file descriptor is returned.
> +On error, \-1 is returned, and
> +.I errno
> +is set to indicate the error.
> +.SH ERRORS
> +.TP
> +.B EFAULT
> +.I fsname
> +is NULL
> +or a pointer to a location
> +outside the calling process's accessible address space.
> +.TP
> +.B EINVAL
> +.I flags
> +had an invalid flag set.
> +.TP
> +.B EMFILE
> +The calling process has too many open files to create more.
> +.TP
> +.B ENFILE
> +The system has too many open files to create more.
> +.TP
> +.B ENODEV
> +The filesystem named by
> +.I fsname
> +is not supported by the kernel.
> +.TP
> +.B ENOMEM
> +The kernel could not allocate sufficient memory to complete the operation.
> +.TP
> +.B EPERM
> +The calling process does not have the required
> +.B \%CAP_SYS_ADMIN
> +capability.
> +.SH STANDARDS
> +Linux.
> +.SH HISTORY
> +Linux 5.2.
> +.\" commit 24dcb3d90a1f67fe08c68a004af37df059d74005
> +glibc 2.36.
> +.SH EXAMPLES
> +To illustrate the workflow for creating a new mount,
> +the following is an example of how to mount an
> +.BR ext4 (5)
> +filesystem stored on
> +.I /dev/sdb1
> +onto
> +.IR /mnt .
> +.P
> +.in +4n
> +.EX
> +int fsfd, mntfd;
> +\&
> +fsfd = fsopen("ext4", FSOPEN_CLOEXEC);
> +fsconfig(fsfd, FSCONFIG_SET_FLAG, "ro", NULL, 0);
> +fsconfig(fsfd, FSCONFIG_SET_PATH, "source", "/dev/sdb1", AT_FDCWD);
> +fsconfig(fsfd, FSCONFIG_SET_FLAG, "noatime", NULL, 0);
> +fsconfig(fsfd, FSCONFIG_SET_FLAG, "acl", NULL, 0);
> +fsconfig(fsfd, FSCONFIG_SET_FLAG, "user_xattr", NULL, 0);
> +fsconfig(fsfd, FSCONFIG_SET_FLAG, "iversion", NULL, 0)
> +fsconfig(fsfd, FSCONFIG_CMD_CREATE, NULL, NULL, 0);
> +mntfd = fsmount(fsfd, FSMOUNT_CLOEXEC, MOUNT_ATTR_RELATIME);
> +move_mount(mntfd, "", AT_FDCWD, "/mnt", MOVE_MOUNT_F_EMPTY_PATH);
> +.EE
> +.in
> +.P
> +First, an ext4 configuration context is created and attached to the file
Here, I'd break after the ',', and if you need to break again, after
'created'.
> +descriptor
> +.IR fsfd .
> +Then, a series of parameters
> +(such as the source of the filesystem)
> +are provided using
> +.BR fsconfig (2),
> +followed by the filesystem instance being created with
> +.BR \%FSCONFIG_CMD_CREATE .
> +.BR fsmount (2)
> +is then used to create a new mount object attached to the file descriptor
> +.IR mntfd ,
> +which is then attached to the intended mount point using
> +.BR move_mount (2).
> +.P
> +The above procedure is functionally equivalent to the following mount operation
> +using
> +.BR mount (2):
> +.P
> +.in +4n
> +.EX
> +mount("/dev/sdb1", "/mnt", "ext4", MS_RELATIME,
> + "ro,noatime,acl,user_xattr,iversion");
> +.EE
> +.in
> +.P
> +And here's an example of creating a mount object
> +of an NFS server share
> +and setting a Smack security module label.
> +However, instead of attaching it to a mount point,
> +the program uses the mount object directly
> +to open a file from the NFS share.
> +.P
> +.in +4n
> +.EX
> +int fsfd, mntfd, fd;
> +\&
> +fsfd = fsopen("nfs", 0);
> +fsconfig(fsfd, FSCONFIG_SET_STRING, "source", "example.com/pub/linux", 0);
> +fsconfig(fsfd, FSCONFIG_SET_STRING, "nfsvers", "3", 0);
> +fsconfig(fsfd, FSCONFIG_SET_STRING, "rsize", "65536", 0);
> +fsconfig(fsfd, FSCONFIG_SET_STRING, "wsize", "65536", 0);
> +fsconfig(fsfd, FSCONFIG_SET_STRING, "smackfsdef", "foolabel", 0);
> +fsconfig(fsfd, FSCONFIG_SET_FLAG, "rdma", NULL, 0);
> +fsconfig(fsfd, FSCONFIG_CMD_CREATE, NULL, NULL, 0);
> +mntfd = fsmount(fsfd, 0, MOUNT_ATTR_NODEV);
> +fd = openat(mntfd, "src/linux-5.2.tar.xz", O_RDONLY);
> +.EE
> +.in
> +.P
> +Unlike the previous example,
> +this operation has no trivial equivalent with
> +.BR mount (2),
> +as it was not previously possible to create a mount object
> +that is not attached to any mount point.
> +.SH SEE ALSO
> +.BR fsconfig (2),
> +.BR fsmount (2),
> +.BR fspick (2),
> +.BR mount (2),
> +.BR mount_setattr (2),
> +.BR move_mount (2),
> +.BR open_tree (2),
> +.BR mount_namespaces (7)
Other than those minor comments, the text LGTM.
Cheers,
Alex
--
<https://www.alejandro-colomar.es/>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v2 02/11] mount_setattr.2: move mount_attr struct to mount_attr.2type
2025-08-07 11:11 ` Alejandro Colomar
@ 2025-08-07 12:38 ` Aleksa Sarai
2025-08-07 13:33 ` Alejandro Colomar
0 siblings, 1 reply; 36+ messages in thread
From: Aleksa Sarai @ 2025-08-07 12:38 UTC (permalink / raw)
To: Alejandro Colomar
Cc: Michael T. Kerrisk, Alexander Viro, Jan Kara, Askar Safin,
G. Branden Robinson, linux-man, linux-api, linux-fsdevel,
linux-kernel, David Howells, Christian Brauner
[-- Attachment #1: Type: text/plain, Size: 4422 bytes --]
On 2025-08-07, Alejandro Colomar <alx@kernel.org> wrote:
> Hi Aleksa,
>
> On Thu, Aug 07, 2025 at 03:44:36AM +1000, Aleksa Sarai wrote:
> > As with open_how(2type), it makes sense to move this to a separate man
> > page. In addition, future man pages added in this patchset will want to
> > reference mount_attr(2type).
> >
> > Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
> > ---
> > man/man2/mount_setattr.2 | 17 ++++---------
> > man/man2type/mount_attr.2type | 58 +++++++++++++++++++++++++++++++++++++++++++
> > 2 files changed, 63 insertions(+), 12 deletions(-)
> >
> > diff --git a/man/man2/mount_setattr.2 b/man/man2/mount_setattr.2
> > index c96f0657f046..d44fafc93a20 100644
> > --- a/man/man2/mount_setattr.2
> > +++ b/man/man2/mount_setattr.2
> > @@ -114,18 +114,11 @@ .SH DESCRIPTION
> > .I attr
> > argument of
> > .BR mount_setattr ()
> > -is a structure of the following form:
> > -.P
> > -.in +4n
> > -.EX
> > -struct mount_attr {
> > - __u64 attr_set; /* Mount properties to set */
> > - __u64 attr_clr; /* Mount properties to clear */
> > - __u64 propagation; /* Mount propagation type */
> > - __u64 userns_fd; /* User namespace file descriptor */
> > -};
> > -.EE
> > -.in
> > +is a pointer to a
> > +.I mount_attr
> > +structure,
> > +described in
> > +.BR mount_attr (2type).
> > .P
> > The
> > .I attr_set
> > diff --git a/man/man2type/mount_attr.2type b/man/man2type/mount_attr.2type
> > new file mode 100644
> > index 000000000000..b7a3ace6b3b9
> > --- /dev/null
> > +++ b/man/man2type/mount_attr.2type
> > @@ -0,0 +1,58 @@
> > +
>
> Please remove this blank. It is not diagnosed by groff(1), but I think
> it should be diagnosed (blank lines are diagnosed elsewhere). I've
> reported a bug to groff(1) (Branden will be reading this, anyway).
>
> > +.\" Copyright, the authors of the Linux man-pages project
> > +.\"
> > +.\" SPDX-License-Identifier: Linux-man-pages-copyleft
> > +.\"
> > +.TH mount_attr 2type (date) "Linux man-pages (unreleased)"
> > +.SH NAME
> > +mount_attr \- what mount properties to set and clear
> > +.SH LIBRARY
> > +Linux kernel headers
> > +.SH SYNOPSIS
> > +.EX
> > +.B #include <sys/mount.h>
> > +.P
> > +.B struct mount_attr {
> > +.BR " __u64 attr_set;" " /* Mount properties to set */"
> > +.BR " __u64 attr_clr;" " /* Mount properties to clear */"
> > +.BR " __u64 propagation;" " /* Mount propagation type */"
> > +.BR " __u64 userns_fd;" " /* User namespace file descriptor */"
> > + /* ... */
> > +.B };
> > +.EE
> > +.SH DESCRIPTION
> > +Specifies which mount properties should be changed with
> > +.BR mount_setattr (2).
> > +.P
> > +The fields are as follows:
> > +.TP
> > +.I .attr_set
> > +This field specifies which
> > +.BI MOUNT_ATTR_ *
> > +attribute flags to set.
> > +.TP
> > +.I .attr_clr
> > +This fields specifies which
> > +.BI MOUNT_ATTR_ *
> > +attribute flags to clear.
> > +.TP
> > +.I .propagation
> > +This field specifies what mount propagation will be applied.
> > +The valid values of this field are the same propagation types described in
> > +.BR mount_namespaces (7).
> > +.TP
> > +.I .userns_fd
> > +This fields specifies a file descriptor that indicates which user namespace to
>
> s/fields/field/
>
> > +use as a reference for ID-mapped mounts with
> > +.BR MOUNT_ATTR_IDMAP .
> > +.SH VERSIONS
> > +Extra fields may be appended to the structure,
> > +with a zero value in a new field resulting in
> > +the kernel behaving as though that extension field was not present.
> > +Therefore, a user
> > +.I must
> > +zero-fill this structure on initialization.
>
> I think this would be more appropriate for HISTORY. In VERSIONS, we
> usually document differences with the BSDs or other systems.
>
> While moving this to HISTORY, it would also be useful to mention the
> glibc version that added the structure. In the future, we'd document
> the versions of glibc and Linux that have added members.
Sure, though I just copied this section from open_how(2type).
> > +.SH STANDARDS
> > +Linux.
> > +.SH SEE ALSO
> > +.BR mount_setattr (2)
>
> Have a lovely day!
> Alex
>
> --
> <https://www.alejandro-colomar.es/>
--
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
https://www.cyphar.com/
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v2 03/11] fsopen.2: document 'new' mount api
2025-08-07 11:38 ` Alejandro Colomar
@ 2025-08-07 12:50 ` Aleksa Sarai
2025-08-07 13:42 ` Alejandro Colomar
2025-08-07 13:27 ` Aleksa Sarai
1 sibling, 1 reply; 36+ messages in thread
From: Aleksa Sarai @ 2025-08-07 12:50 UTC (permalink / raw)
To: Alejandro Colomar
Cc: Michael T. Kerrisk, Alexander Viro, Jan Kara, Askar Safin,
G. Branden Robinson, linux-man, linux-api, linux-fsdevel,
linux-kernel, David Howells, Christian Brauner
[-- Attachment #1: Type: text/plain, Size: 14892 bytes --]
On 2025-08-07, Alejandro Colomar <alx@kernel.org> wrote:
> Hi Aleksa,
>
> On Thu, Aug 07, 2025 at 03:44:37AM +1000, Aleksa Sarai wrote:
> > This is loosely based on the original documentation written by David
> > Howells and later maintained by Christian Brauner, but has been
> > rewritten to be more from a user perspective (as well as fixing a few
> > critical mistakes).
> >
> > Co-developed-by: David Howells <dhowells@redhat.com>
> > Co-developed-by: Christian Brauner <brauner@kernel.org>
>
> Please use Co-authored-by. It's documented under CONTRIBUTING.d/:
>
> $ cat CONTRIBUTING.d/patches/description | grep -A99 Trailer;
> Trailer
> Sign your patch with "Signed-off-by:". Read about the
> "Developer's Certificate of Origin" at
> <https://www.kernel.org/doc/Documentation/process/submitting-patches.rst>.
> When appropriate, other tags documented in that file, such as
> "Reported-by:", "Reviewed-by:", "Acked-by:", and "Suggested-by:"
> can be added to the patch. We use "Co-authored-by:" instead of
> "Co-developed-by:". Example:
>
> Signed-off-by: Alejandro Colomar <alx@kernel.org>
>
> I think 'author' is more appropriate than 'developer' for documentation.
> It is also more consistent with the Copyright notice, which assigns
> copyright to the authors (documented in AUTHORS). And ironically, even
> the kernel documentation about Co-authored-by talks about authorship
> instead of development:
>
> Co-developed-by: states that the patch was co-created by
> multiple developers; it is used to give attribution to
> co-authors (in addition to the author attributed by the From:
> tag) when several people work on a single patch.
>
> > Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
> > ---
> > man/man2/fsopen.2 | 319 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > 1 file changed, 319 insertions(+)
> >
> > diff --git a/man/man2/fsopen.2 b/man/man2/fsopen.2
> > new file mode 100644
> > index 000000000000..ad38ef0782be
> > --- /dev/null
> > +++ b/man/man2/fsopen.2
> > @@ -0,0 +1,319 @@
> > +.\" Copyright, the authors of the Linux man-pages project
> > +.\"
> > +.\" SPDX-License-Identifier: Linux-man-pages-copyleft
> > +.\"
> > +.TH fsopen 2 (date) "Linux man-pages (unreleased)"
> > +.SH NAME
> > +fsopen \- create a new filesystem context
> > +.SH LIBRARY
> > +Standard C library
> > +.RI ( libc ,\~ \-lc )
> > +.SH SYNOPSIS
> > +.nf
> > +.BR "#include <sys/mount.h>"
> > +.P
> > +.BI "int fsopen(const char *" fsname ", unsigned int " flags ");"
> > +.fi
> > +.SH DESCRIPTION
> > +The
> > +.BR fsopen ()
> > +system call is part of the suite of file descriptor based mount facilities in
> > +Linux.
> > +.P
> > +.BR fsopen ()
> > +creates a blank filesystem configuration context within the kernel
> > +for the filesystem named by
> > +.IR fsname ,
> > +puts the context into creation mode and attaches it to a file descriptor,
> > +which is then returned.
> > +The calling process must have the
> > +.B \%CAP_SYS_ADMIN
> > +capability in order to create a new filesystem configuration context.
> > +.P
> > +A filesystem configuration context is an in-kernel representation of a pending
> > +transaction,
>
> This page still needs semantic newlines. (Please review all pages
> regarding that.) (In this specific sentence, I'd break after 'is'.)
I did try adding them to this page (and all of the other pages -- I
suspect the pages later in the patchset have more aggressive newlining).
If you compare the newline placement between v1 and v2 you'll see that I
have added a lot of newlines in all of the man-pages, but it's possible
I missed a couple of sentences like this one.
To be honest I feel quite lost where the "semantic newlines" school
would deem appropriate to place newlines, and man-pages(7) is very terse
on the topic. Outside of very obvious examples,
it just feels wrong
to have such choppy
line break usage.
I understand
the argument that
this helps
with reviewing diffs,
but I really find it
incredibly unnatural.
(And this tongue-in-cheek example
is probably wrong too.)
> > +containing a set of configuration parameters that are to be applied
> > +when creating a new instance of a filesystem
> > +(or modifying the configuration of an existing filesystem instance,
> > +such as when using
> > +.BR fspick (2)).
> > +.P
> > +After obtaining a filesystem configuration context with
> > +.BR fsopen (),
> > +the general workflow for operating on the context looks like the following:
> > +.IP (1) 5
> > +Pass the filesystem context file descriptor to
> > +.BR fsconfig (2)
> > +to specify any desired filesystem parameters.
> > +This may be done as many times as necessary.
> > +.IP (2)
> > +Pass the same filesystem context file descriptor to
>
> Do we need to say "same"? I guess it's obvious. Or do you expect
> any confusion if we don't?
The first time I saw this interface I was confused when you pass
which file descriptor (especially around the FSCONFIG_CMD_CREATE stage),
so I felt it better to make it clear which file descriptor we are
talking about.
> > +.BR fsconfig (2)
> > +with
> > +.B \%FSCONFIG_CMD_CREATE
> > +to create an instance of the configured filesystem.
> > +.IP (3)
> > +Pass the same filesystem context file descriptor to
> > +.BR fsmount (2)
> > +to create a new mount object for the root of the filesystem,
> > +which is then attached to a new file descriptor.
> > +This also places the filesystem context file descriptor into reconfiguration
> > +mode,
> > +similar to the mode produced by
> > +.BR fspick (2).
> > +.IP (4)
> > +Use the mount object file descriptor as a
> > +.I dirfd
> > +argument to "*at()" system calls;
> > +or attach the mount object to a mount point
> > +by passing the mount object file descriptor to
> > +.BR move_mount (2).
> > +.P
> > +A filesystem context will move between different modes throughout its
> > +lifecycle
> > +(such as the creation phase when created with
> > +.BR fsopen (),
> > +the reconfiguration phase when an existing filesystem instance is selected by
> > +.BR fspick (2),
> > +and the intermediate "needs-mount" phase between
> > +.\" FS_CONTEXT_NEEDS_MOUNT is the term the kernel uses for this.
> > +.BR \%FSCONFIG_CMD_CREATE
> > +and
> > +.BR fsmount (2)),
> > +which has an impact on what operations are permitted on the filesystem context.
> > +.P
> > +The file descriptor returned by
> > +.BR fsopen ()
> > +also acts as a channel for filesystem drivers to provide more comprehensive
> > +error, warning, and information messages
>
> Should we just say "diagnostic messages" to avoid explicitly mentioning
> all the levels?
Sure.
> > +than are normally provided through the standard
> > +.BR errno (3)
> > +interface for system calls.
> > +If an error occurs at any time during the workflow mentioned above,
> > +calling
> > +.BR read (2)
> > +on the filesystem context file descriptor will retrieve any ancillary
> > +information about the encountered errors.
> > +(See the "Message retrieval interface" section for more details on the message
> > +format.)
> > +.P
> > +.I flags
> > +can be used to control aspects of the creation of the filesystem configuration
> > +context file descriptor.
> > +A value for
> > +.I flags
> > +is constructed by bitwise ORing
> > +zero or more of the following constants:
> > +.RS
> > +.TP
> > +.B FSOPEN_CLOEXEC
> > +Set the close-on-exec
> > +.RB ( FD_CLOEXEC )
> > +flag on the new file descriptor.
> > +See the description of the
> > +.B O_CLOEXEC
> > +flag in
> > +.BR open (2)
> > +for reasons why this may be useful.
> > +.RE
> > +.P
> > +A list of filesystems supported by the running kernel
> > +(and thus a list of valid values for
> > +.IR fsname )
> > +can be obtained from
> > +.IR /proc/filesystems .
> > +(See also
> > +.BR proc_filesystems (5).)
> > +.SS Message retrieval interface
> > +When doing operations on a filesystem configuration context,
> > +the filesystem driver may choose to provide ancillary information to userspace
> > +in the form of message strings.
> > +.P
> > +The filesystem context file descriptors returned by
> > +.BR fsopen ()
> > +and
> > +.BR fspick (2)
> > +may be queried for message strings at any time by calling
> > +.BR read (2)
> > +on the file descriptor.
> > +Each call to
> > +.BR read (2)
> > +will return a single message,
> > +prefixed to indicate its class:
> > +.RS
> > +.TP
> > +.B "e <message>"
> > +An error message was logged.
> > +This is usually associated with an error being returned from the corresponding
> > +system call which triggered this message.
> > +.TP
> > +.B "w <message>"
> > +A warning message was logged.
> > +.TP
> > +.B "i <message>"
> > +An informational message was logged.
> > +.RE
> > +.P
> > +Messages are removed from the queue as they are read.
> > +Note that the message queue has limited depth,
> > +so it is possible for messages to get lost.
> > +If there are no messages in the message queue,
> > +.B read(2)
> > +will return no data and
> > +.I errno
> > +will be set to
> > +.BR \%ENODATA .
> > +If the
> > +.I buf
> > +argument to
> > +.BR read (2)
> > +is not large enough to contain the message,
> > +.BR read (2)
> > +will return no data and
> > +.I errno
> > +will be set to
> > +.BR \%EMSGSIZE .
> > +.P
> > +If there are multiple filesystem context file descriptors referencing the same
> > +filesystem instance
> > +(such as if you call
> > +.BR fspick (2)
> > +multiple times for the same mount),
> > +each one gets its own independent message queue.
> > +This does not apply to file descriptors that were duplicated with
> > +.BR dup (2).
> > +.P
> > +Messages strings will usually be prefixed by the filesystem driver that logged
>
> s/Messages/Message/
>
> BTW, here, I'd break after 'prefixed', and then after the ','.
>
> > +the message, though this may not always be the case.
> > +See the Linux kernel source code for details.
> > +.SH RETURN VALUE
> > +On success, a new file descriptor is returned.
> > +On error, \-1 is returned, and
> > +.I errno
> > +is set to indicate the error.
> > +.SH ERRORS
> > +.TP
> > +.B EFAULT
> > +.I fsname
> > +is NULL
> > +or a pointer to a location
> > +outside the calling process's accessible address space.
> > +.TP
> > +.B EINVAL
> > +.I flags
> > +had an invalid flag set.
> > +.TP
> > +.B EMFILE
> > +The calling process has too many open files to create more.
> > +.TP
> > +.B ENFILE
> > +The system has too many open files to create more.
> > +.TP
> > +.B ENODEV
> > +The filesystem named by
> > +.I fsname
> > +is not supported by the kernel.
> > +.TP
> > +.B ENOMEM
> > +The kernel could not allocate sufficient memory to complete the operation.
> > +.TP
> > +.B EPERM
> > +The calling process does not have the required
> > +.B \%CAP_SYS_ADMIN
> > +capability.
> > +.SH STANDARDS
> > +Linux.
> > +.SH HISTORY
> > +Linux 5.2.
> > +.\" commit 24dcb3d90a1f67fe08c68a004af37df059d74005
> > +glibc 2.36.
> > +.SH EXAMPLES
> > +To illustrate the workflow for creating a new mount,
> > +the following is an example of how to mount an
> > +.BR ext4 (5)
> > +filesystem stored on
> > +.I /dev/sdb1
> > +onto
> > +.IR /mnt .
> > +.P
> > +.in +4n
> > +.EX
> > +int fsfd, mntfd;
> > +\&
> > +fsfd = fsopen("ext4", FSOPEN_CLOEXEC);
> > +fsconfig(fsfd, FSCONFIG_SET_FLAG, "ro", NULL, 0);
> > +fsconfig(fsfd, FSCONFIG_SET_PATH, "source", "/dev/sdb1", AT_FDCWD);
> > +fsconfig(fsfd, FSCONFIG_SET_FLAG, "noatime", NULL, 0);
> > +fsconfig(fsfd, FSCONFIG_SET_FLAG, "acl", NULL, 0);
> > +fsconfig(fsfd, FSCONFIG_SET_FLAG, "user_xattr", NULL, 0);
> > +fsconfig(fsfd, FSCONFIG_SET_FLAG, "iversion", NULL, 0)
> > +fsconfig(fsfd, FSCONFIG_CMD_CREATE, NULL, NULL, 0);
> > +mntfd = fsmount(fsfd, FSMOUNT_CLOEXEC, MOUNT_ATTR_RELATIME);
> > +move_mount(mntfd, "", AT_FDCWD, "/mnt", MOVE_MOUNT_F_EMPTY_PATH);
> > +.EE
> > +.in
> > +.P
> > +First, an ext4 configuration context is created and attached to the file
>
> Here, I'd break after the ',', and if you need to break again, after
> 'created'.
Okay, I wanted to avoid having lines with single words due to semantic
newlines, but if that's what you prefer I can update that everywhere...
> > +descriptor
> > +.IR fsfd .
> > +Then, a series of parameters
> > +(such as the source of the filesystem)
> > +are provided using
> > +.BR fsconfig (2),
> > +followed by the filesystem instance being created with
> > +.BR \%FSCONFIG_CMD_CREATE .
> > +.BR fsmount (2)
> > +is then used to create a new mount object attached to the file descriptor
> > +.IR mntfd ,
> > +which is then attached to the intended mount point using
> > +.BR move_mount (2).
> > +.P
> > +The above procedure is functionally equivalent to the following mount operation
> > +using
> > +.BR mount (2):
> > +.P
> > +.in +4n
> > +.EX
> > +mount("/dev/sdb1", "/mnt", "ext4", MS_RELATIME,
> > + "ro,noatime,acl,user_xattr,iversion");
> > +.EE
> > +.in
> > +.P
> > +And here's an example of creating a mount object
> > +of an NFS server share
> > +and setting a Smack security module label.
> > +However, instead of attaching it to a mount point,
> > +the program uses the mount object directly
> > +to open a file from the NFS share.
> > +.P
> > +.in +4n
> > +.EX
> > +int fsfd, mntfd, fd;
> > +\&
> > +fsfd = fsopen("nfs", 0);
> > +fsconfig(fsfd, FSCONFIG_SET_STRING, "source", "example.com/pub/linux", 0);
> > +fsconfig(fsfd, FSCONFIG_SET_STRING, "nfsvers", "3", 0);
> > +fsconfig(fsfd, FSCONFIG_SET_STRING, "rsize", "65536", 0);
> > +fsconfig(fsfd, FSCONFIG_SET_STRING, "wsize", "65536", 0);
> > +fsconfig(fsfd, FSCONFIG_SET_STRING, "smackfsdef", "foolabel", 0);
> > +fsconfig(fsfd, FSCONFIG_SET_FLAG, "rdma", NULL, 0);
> > +fsconfig(fsfd, FSCONFIG_CMD_CREATE, NULL, NULL, 0);
> > +mntfd = fsmount(fsfd, 0, MOUNT_ATTR_NODEV);
> > +fd = openat(mntfd, "src/linux-5.2.tar.xz", O_RDONLY);
> > +.EE
> > +.in
> > +.P
> > +Unlike the previous example,
> > +this operation has no trivial equivalent with
> > +.BR mount (2),
> > +as it was not previously possible to create a mount object
> > +that is not attached to any mount point.
> > +.SH SEE ALSO
> > +.BR fsconfig (2),
> > +.BR fsmount (2),
> > +.BR fspick (2),
> > +.BR mount (2),
> > +.BR mount_setattr (2),
> > +.BR move_mount (2),
> > +.BR open_tree (2),
> > +.BR mount_namespaces (7)
>
> Other than those minor comments, the text LGTM.
>
>
> Cheers,
> Alex
>
> --
> <https://www.alejandro-colomar.es/>
--
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
https://www.cyphar.com/
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v2 03/11] fsopen.2: document 'new' mount api
2025-08-07 11:38 ` Alejandro Colomar
2025-08-07 12:50 ` Aleksa Sarai
@ 2025-08-07 13:27 ` Aleksa Sarai
2025-08-07 13:52 ` Alejandro Colomar
1 sibling, 1 reply; 36+ messages in thread
From: Aleksa Sarai @ 2025-08-07 13:27 UTC (permalink / raw)
To: Alejandro Colomar
Cc: Michael T. Kerrisk, Alexander Viro, Jan Kara, Askar Safin,
G. Branden Robinson, linux-man, linux-api, linux-fsdevel,
linux-kernel, David Howells, Christian Brauner
[-- Attachment #1: Type: text/plain, Size: 14267 bytes --]
On 2025-08-07, Alejandro Colomar <alx@kernel.org> wrote:
> Hi Aleksa,
>
> On Thu, Aug 07, 2025 at 03:44:37AM +1000, Aleksa Sarai wrote:
> > This is loosely based on the original documentation written by David
> > Howells and later maintained by Christian Brauner, but has been
> > rewritten to be more from a user perspective (as well as fixing a few
> > critical mistakes).
> >
> > Co-developed-by: David Howells <dhowells@redhat.com>
> > Co-developed-by: Christian Brauner <brauner@kernel.org>
>
> Please use Co-authored-by. It's documented under CONTRIBUTING.d/:
>
> $ cat CONTRIBUTING.d/patches/description | grep -A99 Trailer;
> Trailer
> Sign your patch with "Signed-off-by:". Read about the
> "Developer's Certificate of Origin" at
> <https://www.kernel.org/doc/Documentation/process/submitting-patches.rst>.
> When appropriate, other tags documented in that file, such as
> "Reported-by:", "Reviewed-by:", "Acked-by:", and "Suggested-by:"
> can be added to the patch. We use "Co-authored-by:" instead of
> "Co-developed-by:". Example:
>
> Signed-off-by: Alejandro Colomar <alx@kernel.org>
>
> I think 'author' is more appropriate than 'developer' for documentation.
> It is also more consistent with the Copyright notice, which assigns
> copyright to the authors (documented in AUTHORS). And ironically, even
> the kernel documentation about Co-authored-by talks about authorship
> instead of development:
>
> Co-developed-by: states that the patch was co-created by
> multiple developers; it is used to give attribution to
> co-authors (in addition to the author attributed by the From:
> tag) when several people work on a single patch.
Sure, fixed.
Can you also clarify whether CONTRIBUTING.d/patches/range-diff is
required for submissions? I don't think b4 supports including it (and I
really would prefer to not have to use raw git-send-email again just for
man-pages -- b4 has so many benefits over raw git-send-email). Is the
b4-style changelog I include in the cover-letter sufficient?
I like to think of myself as a fairly prolific git user, but I don't
think I've ever seen --range-diff= output in a git-send-email patch
before...
> > Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
> > ---
> > man/man2/fsopen.2 | 319 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > 1 file changed, 319 insertions(+)
> >
> > diff --git a/man/man2/fsopen.2 b/man/man2/fsopen.2
> > new file mode 100644
> > index 000000000000..ad38ef0782be
> > --- /dev/null
> > +++ b/man/man2/fsopen.2
> > @@ -0,0 +1,319 @@
> > +.\" Copyright, the authors of the Linux man-pages project
> > +.\"
> > +.\" SPDX-License-Identifier: Linux-man-pages-copyleft
> > +.\"
> > +.TH fsopen 2 (date) "Linux man-pages (unreleased)"
> > +.SH NAME
> > +fsopen \- create a new filesystem context
> > +.SH LIBRARY
> > +Standard C library
> > +.RI ( libc ,\~ \-lc )
> > +.SH SYNOPSIS
> > +.nf
> > +.BR "#include <sys/mount.h>"
> > +.P
> > +.BI "int fsopen(const char *" fsname ", unsigned int " flags ");"
> > +.fi
> > +.SH DESCRIPTION
> > +The
> > +.BR fsopen ()
> > +system call is part of the suite of file descriptor based mount facilities in
> > +Linux.
> > +.P
> > +.BR fsopen ()
> > +creates a blank filesystem configuration context within the kernel
> > +for the filesystem named by
> > +.IR fsname ,
> > +puts the context into creation mode and attaches it to a file descriptor,
> > +which is then returned.
> > +The calling process must have the
> > +.B \%CAP_SYS_ADMIN
> > +capability in order to create a new filesystem configuration context.
> > +.P
> > +A filesystem configuration context is an in-kernel representation of a pending
> > +transaction,
>
> This page still needs semantic newlines. (Please review all pages
> regarding that.) (In this specific sentence, I'd break after 'is'.)
>
> > +containing a set of configuration parameters that are to be applied
> > +when creating a new instance of a filesystem
> > +(or modifying the configuration of an existing filesystem instance,
> > +such as when using
> > +.BR fspick (2)).
> > +.P
> > +After obtaining a filesystem configuration context with
> > +.BR fsopen (),
> > +the general workflow for operating on the context looks like the following:
> > +.IP (1) 5
> > +Pass the filesystem context file descriptor to
> > +.BR fsconfig (2)
> > +to specify any desired filesystem parameters.
> > +This may be done as many times as necessary.
> > +.IP (2)
> > +Pass the same filesystem context file descriptor to
>
> Do we need to say "same"? I guess it's obvious. Or do you expect
> any confusion if we don't?
>
> > +.BR fsconfig (2)
> > +with
> > +.B \%FSCONFIG_CMD_CREATE
> > +to create an instance of the configured filesystem.
> > +.IP (3)
> > +Pass the same filesystem context file descriptor to
> > +.BR fsmount (2)
> > +to create a new mount object for the root of the filesystem,
> > +which is then attached to a new file descriptor.
> > +This also places the filesystem context file descriptor into reconfiguration
> > +mode,
> > +similar to the mode produced by
> > +.BR fspick (2).
> > +.IP (4)
> > +Use the mount object file descriptor as a
> > +.I dirfd
> > +argument to "*at()" system calls;
> > +or attach the mount object to a mount point
> > +by passing the mount object file descriptor to
> > +.BR move_mount (2).
> > +.P
> > +A filesystem context will move between different modes throughout its
> > +lifecycle
> > +(such as the creation phase when created with
> > +.BR fsopen (),
> > +the reconfiguration phase when an existing filesystem instance is selected by
> > +.BR fspick (2),
> > +and the intermediate "needs-mount" phase between
> > +.\" FS_CONTEXT_NEEDS_MOUNT is the term the kernel uses for this.
> > +.BR \%FSCONFIG_CMD_CREATE
> > +and
> > +.BR fsmount (2)),
> > +which has an impact on what operations are permitted on the filesystem context.
> > +.P
> > +The file descriptor returned by
> > +.BR fsopen ()
> > +also acts as a channel for filesystem drivers to provide more comprehensive
> > +error, warning, and information messages
>
> Should we just say "diagnostic messages" to avoid explicitly mentioning
> all the levels?
>
> > +than are normally provided through the standard
> > +.BR errno (3)
> > +interface for system calls.
> > +If an error occurs at any time during the workflow mentioned above,
> > +calling
> > +.BR read (2)
> > +on the filesystem context file descriptor will retrieve any ancillary
> > +information about the encountered errors.
> > +(See the "Message retrieval interface" section for more details on the message
> > +format.)
> > +.P
> > +.I flags
> > +can be used to control aspects of the creation of the filesystem configuration
> > +context file descriptor.
> > +A value for
> > +.I flags
> > +is constructed by bitwise ORing
> > +zero or more of the following constants:
> > +.RS
> > +.TP
> > +.B FSOPEN_CLOEXEC
> > +Set the close-on-exec
> > +.RB ( FD_CLOEXEC )
> > +flag on the new file descriptor.
> > +See the description of the
> > +.B O_CLOEXEC
> > +flag in
> > +.BR open (2)
> > +for reasons why this may be useful.
> > +.RE
> > +.P
> > +A list of filesystems supported by the running kernel
> > +(and thus a list of valid values for
> > +.IR fsname )
> > +can be obtained from
> > +.IR /proc/filesystems .
> > +(See also
> > +.BR proc_filesystems (5).)
> > +.SS Message retrieval interface
> > +When doing operations on a filesystem configuration context,
> > +the filesystem driver may choose to provide ancillary information to userspace
> > +in the form of message strings.
> > +.P
> > +The filesystem context file descriptors returned by
> > +.BR fsopen ()
> > +and
> > +.BR fspick (2)
> > +may be queried for message strings at any time by calling
> > +.BR read (2)
> > +on the file descriptor.
> > +Each call to
> > +.BR read (2)
> > +will return a single message,
> > +prefixed to indicate its class:
> > +.RS
> > +.TP
> > +.B "e <message>"
> > +An error message was logged.
> > +This is usually associated with an error being returned from the corresponding
> > +system call which triggered this message.
> > +.TP
> > +.B "w <message>"
> > +A warning message was logged.
> > +.TP
> > +.B "i <message>"
> > +An informational message was logged.
> > +.RE
> > +.P
> > +Messages are removed from the queue as they are read.
> > +Note that the message queue has limited depth,
> > +so it is possible for messages to get lost.
> > +If there are no messages in the message queue,
> > +.B read(2)
> > +will return no data and
> > +.I errno
> > +will be set to
> > +.BR \%ENODATA .
> > +If the
> > +.I buf
> > +argument to
> > +.BR read (2)
> > +is not large enough to contain the message,
> > +.BR read (2)
> > +will return no data and
> > +.I errno
> > +will be set to
> > +.BR \%EMSGSIZE .
> > +.P
> > +If there are multiple filesystem context file descriptors referencing the same
> > +filesystem instance
> > +(such as if you call
> > +.BR fspick (2)
> > +multiple times for the same mount),
> > +each one gets its own independent message queue.
> > +This does not apply to file descriptors that were duplicated with
> > +.BR dup (2).
> > +.P
> > +Messages strings will usually be prefixed by the filesystem driver that logged
>
> s/Messages/Message/
>
> BTW, here, I'd break after 'prefixed', and then after the ','.
>
> > +the message, though this may not always be the case.
> > +See the Linux kernel source code for details.
> > +.SH RETURN VALUE
> > +On success, a new file descriptor is returned.
> > +On error, \-1 is returned, and
> > +.I errno
> > +is set to indicate the error.
> > +.SH ERRORS
> > +.TP
> > +.B EFAULT
> > +.I fsname
> > +is NULL
> > +or a pointer to a location
> > +outside the calling process's accessible address space.
> > +.TP
> > +.B EINVAL
> > +.I flags
> > +had an invalid flag set.
> > +.TP
> > +.B EMFILE
> > +The calling process has too many open files to create more.
> > +.TP
> > +.B ENFILE
> > +The system has too many open files to create more.
> > +.TP
> > +.B ENODEV
> > +The filesystem named by
> > +.I fsname
> > +is not supported by the kernel.
> > +.TP
> > +.B ENOMEM
> > +The kernel could not allocate sufficient memory to complete the operation.
> > +.TP
> > +.B EPERM
> > +The calling process does not have the required
> > +.B \%CAP_SYS_ADMIN
> > +capability.
> > +.SH STANDARDS
> > +Linux.
> > +.SH HISTORY
> > +Linux 5.2.
> > +.\" commit 24dcb3d90a1f67fe08c68a004af37df059d74005
> > +glibc 2.36.
> > +.SH EXAMPLES
> > +To illustrate the workflow for creating a new mount,
> > +the following is an example of how to mount an
> > +.BR ext4 (5)
> > +filesystem stored on
> > +.I /dev/sdb1
> > +onto
> > +.IR /mnt .
> > +.P
> > +.in +4n
> > +.EX
> > +int fsfd, mntfd;
> > +\&
> > +fsfd = fsopen("ext4", FSOPEN_CLOEXEC);
> > +fsconfig(fsfd, FSCONFIG_SET_FLAG, "ro", NULL, 0);
> > +fsconfig(fsfd, FSCONFIG_SET_PATH, "source", "/dev/sdb1", AT_FDCWD);
> > +fsconfig(fsfd, FSCONFIG_SET_FLAG, "noatime", NULL, 0);
> > +fsconfig(fsfd, FSCONFIG_SET_FLAG, "acl", NULL, 0);
> > +fsconfig(fsfd, FSCONFIG_SET_FLAG, "user_xattr", NULL, 0);
> > +fsconfig(fsfd, FSCONFIG_SET_FLAG, "iversion", NULL, 0)
> > +fsconfig(fsfd, FSCONFIG_CMD_CREATE, NULL, NULL, 0);
> > +mntfd = fsmount(fsfd, FSMOUNT_CLOEXEC, MOUNT_ATTR_RELATIME);
> > +move_mount(mntfd, "", AT_FDCWD, "/mnt", MOVE_MOUNT_F_EMPTY_PATH);
> > +.EE
> > +.in
> > +.P
> > +First, an ext4 configuration context is created and attached to the file
>
> Here, I'd break after the ',', and if you need to break again, after
> 'created'.
>
> > +descriptor
> > +.IR fsfd .
> > +Then, a series of parameters
> > +(such as the source of the filesystem)
> > +are provided using
> > +.BR fsconfig (2),
> > +followed by the filesystem instance being created with
> > +.BR \%FSCONFIG_CMD_CREATE .
> > +.BR fsmount (2)
> > +is then used to create a new mount object attached to the file descriptor
> > +.IR mntfd ,
> > +which is then attached to the intended mount point using
> > +.BR move_mount (2).
> > +.P
> > +The above procedure is functionally equivalent to the following mount operation
> > +using
> > +.BR mount (2):
> > +.P
> > +.in +4n
> > +.EX
> > +mount("/dev/sdb1", "/mnt", "ext4", MS_RELATIME,
> > + "ro,noatime,acl,user_xattr,iversion");
> > +.EE
> > +.in
> > +.P
> > +And here's an example of creating a mount object
> > +of an NFS server share
> > +and setting a Smack security module label.
> > +However, instead of attaching it to a mount point,
> > +the program uses the mount object directly
> > +to open a file from the NFS share.
> > +.P
> > +.in +4n
> > +.EX
> > +int fsfd, mntfd, fd;
> > +\&
> > +fsfd = fsopen("nfs", 0);
> > +fsconfig(fsfd, FSCONFIG_SET_STRING, "source", "example.com/pub/linux", 0);
> > +fsconfig(fsfd, FSCONFIG_SET_STRING, "nfsvers", "3", 0);
> > +fsconfig(fsfd, FSCONFIG_SET_STRING, "rsize", "65536", 0);
> > +fsconfig(fsfd, FSCONFIG_SET_STRING, "wsize", "65536", 0);
> > +fsconfig(fsfd, FSCONFIG_SET_STRING, "smackfsdef", "foolabel", 0);
> > +fsconfig(fsfd, FSCONFIG_SET_FLAG, "rdma", NULL, 0);
> > +fsconfig(fsfd, FSCONFIG_CMD_CREATE, NULL, NULL, 0);
> > +mntfd = fsmount(fsfd, 0, MOUNT_ATTR_NODEV);
> > +fd = openat(mntfd, "src/linux-5.2.tar.xz", O_RDONLY);
> > +.EE
> > +.in
> > +.P
> > +Unlike the previous example,
> > +this operation has no trivial equivalent with
> > +.BR mount (2),
> > +as it was not previously possible to create a mount object
> > +that is not attached to any mount point.
> > +.SH SEE ALSO
> > +.BR fsconfig (2),
> > +.BR fsmount (2),
> > +.BR fspick (2),
> > +.BR mount (2),
> > +.BR mount_setattr (2),
> > +.BR move_mount (2),
> > +.BR open_tree (2),
> > +.BR mount_namespaces (7)
>
> Other than those minor comments, the text LGTM.
>
>
> Cheers,
> Alex
>
> --
> <https://www.alejandro-colomar.es/>
--
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
https://www.cyphar.com/
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v2 02/11] mount_setattr.2: move mount_attr struct to mount_attr.2type
2025-08-07 12:38 ` Aleksa Sarai
@ 2025-08-07 13:33 ` Alejandro Colomar
0 siblings, 0 replies; 36+ messages in thread
From: Alejandro Colomar @ 2025-08-07 13:33 UTC (permalink / raw)
To: Aleksa Sarai
Cc: Michael T. Kerrisk, Alexander Viro, Jan Kara, Askar Safin,
G. Branden Robinson, linux-man, linux-api, linux-fsdevel,
linux-kernel, David Howells, Christian Brauner
[-- Attachment #1: Type: text/plain, Size: 1279 bytes --]
Hi Aleksa,
On Thu, Aug 07, 2025 at 10:38:36PM +1000, Aleksa Sarai wrote:
> On 2025-08-07, Alejandro Colomar <alx@kernel.org> wrote:
> > > +.SH VERSIONS
> > > +Extra fields may be appended to the structure,
> > > +with a zero value in a new field resulting in
> > > +the kernel behaving as though that extension field was not present.
> > > +Therefore, a user
> > > +.I must
> > > +zero-fill this structure on initialization.
> >
> > I think this would be more appropriate for HISTORY. In VERSIONS, we
> > usually document differences with the BSDs or other systems.
> >
> > While moving this to HISTORY, it would also be useful to mention the
> > glibc version that added the structure. In the future, we'd document
> > the versions of glibc and Linux that have added members.
>
> Sure, though I just copied this section from open_how(2type).
Thanks! I should fix that.
Cheers,
Alex
>
> > > +.SH STANDARDS
> > > +Linux.
> > > +.SH SEE ALSO
> > > +.BR mount_setattr (2)
> >
> > Have a lovely day!
> > Alex
> >
> > --
> > <https://www.alejandro-colomar.es/>
>
>
>
> --
> Aleksa Sarai
> Senior Software Engineer (Containers)
> SUSE Linux GmbH
> https://www.cyphar.com/
--
<https://www.alejandro-colomar.es/>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v2 03/11] fsopen.2: document 'new' mount api
2025-08-07 12:50 ` Aleksa Sarai
@ 2025-08-07 13:42 ` Alejandro Colomar
0 siblings, 0 replies; 36+ messages in thread
From: Alejandro Colomar @ 2025-08-07 13:42 UTC (permalink / raw)
To: Aleksa Sarai
Cc: Michael T. Kerrisk, Alexander Viro, Jan Kara, Askar Safin,
G. Branden Robinson, linux-man, linux-api, linux-fsdevel,
linux-kernel, David Howells, Christian Brauner
[-- Attachment #1: Type: text/plain, Size: 4070 bytes --]
Hi Aleksa,
On Thu, Aug 07, 2025 at 10:50:17PM +1000, Aleksa Sarai wrote:
> > > +A filesystem configuration context is an in-kernel representation of a pending
> > > +transaction,
> >
> > This page still needs semantic newlines. (Please review all pages
> > regarding that.) (In this specific sentence, I'd break after 'is'.)
>
> I did try adding them to this page (and all of the other pages -- I
> suspect the pages later in the patchset have more aggressive newlining).
> If you compare the newline placement between v1 and v2 you'll see that I
> have added a lot of newlines in all of the man-pages, but it's possible
> I missed a couple of sentences like this one.
Yup, it's quite better. Thanks!
> To be honest I feel quite lost where the "semantic newlines" school
> would deem appropriate to place newlines, and man-pages(7) is very terse
> on the topic. Outside of very obvious examples,
> it just feels wrong
> to have such choppy
> line break usage.
> I understand
> the argument that
> this helps
> with reviewing diffs,
> but I really find it
> incredibly unnatural.
> (And this tongue-in-cheek example
> is probably wrong too.)
I understand. The guidelines I use are:
If there's punctuation, break.
If there isn't punctuation, but the sentence would go past the
80-char right margin, try to find the best point to break (this
is sometimes hard or subjective).
Other than that, there's no need to break.
Does that seem reasonable? (I can always amend a few cases that you
don't know where to split.)
>
> > > +containing a set of configuration parameters that are to be applied
> > > +when creating a new instance of a filesystem
> > > +(or modifying the configuration of an existing filesystem instance,
> > > +such as when using
> > > +.BR fspick (2)).
> > > +.P
> > > +After obtaining a filesystem configuration context with
> > > +.BR fsopen (),
> > > +the general workflow for operating on the context looks like the following:
> > > +.IP (1) 5
> > > +Pass the filesystem context file descriptor to
> > > +.BR fsconfig (2)
> > > +to specify any desired filesystem parameters.
> > > +This may be done as many times as necessary.
> > > +.IP (2)
> > > +Pass the same filesystem context file descriptor to
> >
> > Do we need to say "same"? I guess it's obvious. Or do you expect
> > any confusion if we don't?
>
> The first time I saw this interface I was confused when you pass
> which file descriptor (especially around the FSCONFIG_CMD_CREATE stage),
> so I felt it better to make it clear which file descriptor we are
> talking about.
Okay.
> > > +.EX
> > > +int fsfd, mntfd;
> > > +\&
> > > +fsfd = fsopen("ext4", FSOPEN_CLOEXEC);
> > > +fsconfig(fsfd, FSCONFIG_SET_FLAG, "ro", NULL, 0);
> > > +fsconfig(fsfd, FSCONFIG_SET_PATH, "source", "/dev/sdb1", AT_FDCWD);
> > > +fsconfig(fsfd, FSCONFIG_SET_FLAG, "noatime", NULL, 0);
> > > +fsconfig(fsfd, FSCONFIG_SET_FLAG, "acl", NULL, 0);
> > > +fsconfig(fsfd, FSCONFIG_SET_FLAG, "user_xattr", NULL, 0);
> > > +fsconfig(fsfd, FSCONFIG_SET_FLAG, "iversion", NULL, 0)
> > > +fsconfig(fsfd, FSCONFIG_CMD_CREATE, NULL, NULL, 0);
> > > +mntfd = fsmount(fsfd, FSMOUNT_CLOEXEC, MOUNT_ATTR_RELATIME);
> > > +move_mount(mntfd, "", AT_FDCWD, "/mnt", MOVE_MOUNT_F_EMPTY_PATH);
> > > +.EE
> > > +.in
> > > +.P
> > > +First, an ext4 configuration context is created and attached to the file
> >
> > Here, I'd break after the ',', and if you need to break again, after
> > 'created'.
>
> Okay, I wanted to avoid having lines with single words due to semantic
> newlines, but if that's what you prefer I can update that everywhere...
I don't have a strong opinion on that. I sometimes avoid the break if
the rest of the sentence is short and all fits in one line, but if you
already need to break, that'd be the first obvious place to look at.
Other times, I have a more pedantic day, and split at every comma, even
unnecessarily.
Cheers,
Alex
--
<https://www.alejandro-colomar.es/>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v2 03/11] fsopen.2: document 'new' mount api
2025-08-07 13:27 ` Aleksa Sarai
@ 2025-08-07 13:52 ` Alejandro Colomar
2025-08-07 14:26 ` Aleksa Sarai
0 siblings, 1 reply; 36+ messages in thread
From: Alejandro Colomar @ 2025-08-07 13:52 UTC (permalink / raw)
To: Aleksa Sarai
Cc: Michael T. Kerrisk, Alexander Viro, Jan Kara, Askar Safin,
G. Branden Robinson, linux-man, linux-api, linux-fsdevel,
linux-kernel, David Howells, Christian Brauner
[-- Attachment #1: Type: text/plain, Size: 1935 bytes --]
Hi Aleksa,
On Thu, Aug 07, 2025 at 11:27:04PM +1000, Aleksa Sarai wrote:
> > I think 'author' is more appropriate than 'developer' for documentation.
> > It is also more consistent with the Copyright notice, which assigns
> > copyright to the authors (documented in AUTHORS). And ironically, even
> > the kernel documentation about Co-authored-by talks about authorship
(Oops, s/Co-authored-by/Co-developed-by/)
> > instead of development:
> >
> > Co-developed-by: states that the patch was co-created by
> > multiple developers; it is used to give attribution to
> > co-authors (in addition to the author attributed by the From:
> > tag) when several people work on a single patch.
>
> Sure, fixed.
>
> Can you also clarify whether CONTRIBUTING.d/patches/range-diff is
> required for submissions? I don't think b4 supports including it (and I
> really would prefer to not have to use raw git-send-email again just for
> man-pages -- b4 has so many benefits over raw git-send-email). Is the
> b4-style changelog I include in the cover-letter sufficient?
Yes, that's sufficient. As Captain Barbossa would say, "the code is
more what you'd call 'guidelines' than actual rules". ;)
> I like to think of myself as a fairly prolific git user, but I don't
> think I've ever seen --range-diff= output in a git-send-email patch
> before...
Yup, I only learnt about a few years ago. I have to say it's great as
a reviewer; it changed my efficiency reviewing code when we started
using it at $dayjob-1.
And even as a submitter, it has also saved me a few times, when I
introduced a regression in some revision of a patch set, and I could
easily trace back to the revision where I had introduced it by reading
the range diffs, which are much shorter than the actual code.
Maybe we could ping Konstantin to add this to b4?
Cheers,
Alex
--
<https://www.alejandro-colomar.es/>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v2 03/11] fsopen.2: document 'new' mount api
2025-08-07 13:52 ` Alejandro Colomar
@ 2025-08-07 14:26 ` Aleksa Sarai
2025-08-07 19:27 ` Konstantin Ryabitsev
0 siblings, 1 reply; 36+ messages in thread
From: Aleksa Sarai @ 2025-08-07 14:26 UTC (permalink / raw)
To: Alejandro Colomar, Konstantin Ryabitsev
Cc: Michael T. Kerrisk, Alexander Viro, Jan Kara, Askar Safin,
G. Branden Robinson, linux-man, linux-api, linux-fsdevel,
linux-kernel, David Howells, Christian Brauner
[-- Attachment #1: Type: text/plain, Size: 2468 bytes --]
On 2025-08-07, Alejandro Colomar <alx@kernel.org> wrote:
> Hi Aleksa,
>
> On Thu, Aug 07, 2025 at 11:27:04PM +1000, Aleksa Sarai wrote:
> > > I think 'author' is more appropriate than 'developer' for documentation.
> > > It is also more consistent with the Copyright notice, which assigns
> > > copyright to the authors (documented in AUTHORS). And ironically, even
> > > the kernel documentation about Co-authored-by talks about authorship
>
> (Oops, s/Co-authored-by/Co-developed-by/)
>
> > > instead of development:
> > >
> > > Co-developed-by: states that the patch was co-created by
> > > multiple developers; it is used to give attribution to
> > > co-authors (in addition to the author attributed by the From:
> > > tag) when several people work on a single patch.
> >
> > Sure, fixed.
> >
> > Can you also clarify whether CONTRIBUTING.d/patches/range-diff is
> > required for submissions? I don't think b4 supports including it (and I
> > really would prefer to not have to use raw git-send-email again just for
> > man-pages -- b4 has so many benefits over raw git-send-email). Is the
> > b4-style changelog I include in the cover-letter sufficient?
>
> Yes, that's sufficient. As Captain Barbossa would say, "the code is
> more what you'd call 'guidelines' than actual rules". ;)
>
> > I like to think of myself as a fairly prolific git user, but I don't
> > think I've ever seen --range-diff= output in a git-send-email patch
> > before...
>
> Yup, I only learnt about a few years ago. I have to say it's great as
> a reviewer; it changed my efficiency reviewing code when we started
> using it at $dayjob-1.
>
> And even as a submitter, it has also saved me a few times, when I
> introduced a regression in some revision of a patch set, and I could
> easily trace back to the revision where I had introduced it by reading
> the range diffs, which are much shorter than the actual code.
>
> Maybe we could ping Konstantin to add this to b4?
Konstantin, would you be interested in a patch to add --range-diff to
the trailing bits of cover letters? I would guess that b4 already has
all of the necessary metadata to reference the right commits.
It seems like a fairly neat way of providing some more metadata about
changes between patchsets, for folks that care about that information.
--
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
https://www.cyphar.com/
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v2 03/11] fsopen.2: document 'new' mount api
2025-08-07 14:26 ` Aleksa Sarai
@ 2025-08-07 19:27 ` Konstantin Ryabitsev
2025-08-07 19:39 ` Aleksa Sarai
0 siblings, 1 reply; 36+ messages in thread
From: Konstantin Ryabitsev @ 2025-08-07 19:27 UTC (permalink / raw)
To: Aleksa Sarai
Cc: Alejandro Colomar, Michael T. Kerrisk, Alexander Viro, Jan Kara,
Askar Safin, G. Branden Robinson, linux-man, linux-api,
linux-fsdevel, linux-kernel, David Howells, Christian Brauner
On Fri, Aug 08, 2025 at 12:26:48AM +1000, Aleksa Sarai wrote:
> Konstantin, would you be interested in a patch to add --range-diff to
> the trailing bits of cover letters? I would guess that b4 already has
> all of the necessary metadata to reference the right commits.
>
> It seems like a fairly neat way of providing some more metadata about
> changes between patchsets, for folks that care about that information.
It's already there, just add ${range_diff} to your cover letter template.
Cheers,
-K
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v2 03/11] fsopen.2: document 'new' mount api
2025-08-07 19:27 ` Konstantin Ryabitsev
@ 2025-08-07 19:39 ` Aleksa Sarai
0 siblings, 0 replies; 36+ messages in thread
From: Aleksa Sarai @ 2025-08-07 19:39 UTC (permalink / raw)
To: Konstantin Ryabitsev
Cc: Alejandro Colomar, Michael T. Kerrisk, Alexander Viro, Jan Kara,
Askar Safin, G. Branden Robinson, linux-man, linux-api,
linux-fsdevel, linux-kernel, David Howells, Christian Brauner
[-- Attachment #1: Type: text/plain, Size: 777 bytes --]
On 2025-08-07, Konstantin Ryabitsev <konstantin@linuxfoundation.org> wrote:
> On Fri, Aug 08, 2025 at 12:26:48AM +1000, Aleksa Sarai wrote:
> > Konstantin, would you be interested in a patch to add --range-diff to
> > the trailing bits of cover letters? I would guess that b4 already has
> > all of the necessary metadata to reference the right commits.
> >
> > It seems like a fairly neat way of providing some more metadata about
> > changes between patchsets, for folks that care about that information.
>
> It's already there, just add ${range_diff} to your cover letter template.
Oh, my bad... Time to go re-read the b4 docs again.
> Cheers,
> -K
>
--
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
https://www.cyphar.com/
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v2 03/11] fsopen.2: document 'new' mount api
2025-08-06 17:44 ` [PATCH v2 03/11] fsopen.2: document 'new' mount api Aleksa Sarai
2025-08-07 11:38 ` Alejandro Colomar
@ 2025-08-08 9:07 ` Askar Safin
2025-08-08 11:57 ` Aleksa Sarai
1 sibling, 1 reply; 36+ messages in thread
From: Askar Safin @ 2025-08-08 9:07 UTC (permalink / raw)
To: Aleksa Sarai
Cc: Alejandro Colomar, Michael T. Kerrisk, Alexander Viro, Jan Kara,
G. Branden Robinson, linux-man, linux-api, linux-fsdevel,
linux-kernel, David Howells, Christian Brauner
> If there are no messages in the message queue,
> read(2) will return no data and errno will be set to ENODATA.
> If the buf argument to read(2) is not large enough to contain the message,
> read(2) will return no data and errno will be set to EMSGSIZE.
read(2) will return -1 in these cases? If yes, then, please, write this.
Also, I see that you addressed all my requests. Thank you!
And thank you again for writing all these manpages!
--
Askar Safin
https://types.pl/@safinaskar
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v2 01/11] mount_setattr.2: document glibc >= 2.36 syscall wrappers
2025-08-06 17:44 ` [PATCH v2 01/11] mount_setattr.2: document glibc >= 2.36 syscall wrappers Aleksa Sarai
2025-08-07 10:39 ` Alejandro Colomar
@ 2025-08-08 9:23 ` Askar Safin
2025-08-08 11:55 ` Aleksa Sarai
1 sibling, 1 reply; 36+ messages in thread
From: Askar Safin @ 2025-08-08 9:23 UTC (permalink / raw)
To: Aleksa Sarai
Cc: Alejandro Colomar, Michael T. Kerrisk, Alexander Viro, Jan Kara,
G. Branden Robinson, linux-man, linux-api, linux-fsdevel,
linux-kernel, David Howells, Christian Brauner
When I render "mount_setattr" from this (v2) pathset, I see weird quote mark. I. e.:
$ MANWIDTH=10000 man /path/to/mount_setattr.2
...
SYNOPSIS
#include <fcntl.h> /* Definition of AT_* constants */
#include <sys/mount.h>
int mount_setattr(int dirfd, const char *path, unsigned int flags,
struct mount_attr *attr, size_t size);"
...
--
Askar Safin
https://types.pl/@safinaskar
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v2 01/11] mount_setattr.2: document glibc >= 2.36 syscall wrappers
2025-08-08 9:23 ` Askar Safin
@ 2025-08-08 11:55 ` Aleksa Sarai
2025-08-09 10:42 ` Alejandro Colomar
0 siblings, 1 reply; 36+ messages in thread
From: Aleksa Sarai @ 2025-08-08 11:55 UTC (permalink / raw)
To: Askar Safin
Cc: Alejandro Colomar, Michael T. Kerrisk, Alexander Viro, Jan Kara,
G. Branden Robinson, linux-man, linux-api, linux-fsdevel,
linux-kernel, David Howells, Christian Brauner
[-- Attachment #1: Type: text/plain, Size: 1243 bytes --]
On 2025-08-08, Askar Safin <safinaskar@zohomail.com> wrote:
> When I render "mount_setattr" from this (v2) pathset, I see weird quote mark. I. e.:
>
> $ MANWIDTH=10000 man /path/to/mount_setattr.2
> ...
> SYNOPSIS
> #include <fcntl.h> /* Definition of AT_* constants */
> #include <sys/mount.h>
>
> int mount_setattr(int dirfd, const char *path, unsigned int flags,
> struct mount_attr *attr, size_t size);"
> ...
Ah, my bad. "make -R lint-man" told me to put end quotes on the synopsis
lines, but I missed that there was a separate quote missing. This should
fix it:
diff --git a/man/man2/mount_setattr.2 b/man/man2/mount_setattr.2
index d44fafc93a20..46fcba927dd8 100644
--- a/man/man2/mount_setattr.2
+++ b/man/man2/mount_setattr.2
@@ -14,7 +14,7 @@ .SH SYNOPSIS
.B #include <sys/mount.h>
.P
.BI "int mount_setattr(int " dirfd ", const char *" path ", unsigned int " flags ","
-.BI " struct mount_attr *" attr ", size_t " size );"
+.BI " struct mount_attr *" attr ", size_t " size ");"
.fi
.SH DESCRIPTION
The
--
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
https://www.cyphar.com/
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
^ permalink raw reply related [flat|nested] 36+ messages in thread
* Re: [PATCH v2 03/11] fsopen.2: document 'new' mount api
2025-08-08 9:07 ` Askar Safin
@ 2025-08-08 11:57 ` Aleksa Sarai
0 siblings, 0 replies; 36+ messages in thread
From: Aleksa Sarai @ 2025-08-08 11:57 UTC (permalink / raw)
To: Askar Safin
Cc: Alejandro Colomar, Michael T. Kerrisk, Alexander Viro, Jan Kara,
G. Branden Robinson, linux-man, linux-api, linux-fsdevel,
linux-kernel, David Howells, Christian Brauner
[-- Attachment #1: Type: text/plain, Size: 626 bytes --]
On 2025-08-08, Askar Safin <safinaskar@zohomail.com> wrote:
> > If there are no messages in the message queue,
> > read(2) will return no data and errno will be set to ENODATA.
> > If the buf argument to read(2) is not large enough to contain the message,
> > read(2) will return no data and errno will be set to EMSGSIZE.
>
> read(2) will return -1 in these cases? If yes, then, please, write this.
Yes (well, the syscall returns -EMSGSIZE). I'll try to add a note
without making the paragraph too wordy...
--
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
https://www.cyphar.com/
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v2 08/11] open_tree.2: document 'new' mount api
2025-08-06 17:44 ` [PATCH v2 08/11] open_tree.2: " Aleksa Sarai
@ 2025-08-08 12:32 ` Askar Safin
2025-08-08 13:26 ` Aleksa Sarai
0 siblings, 1 reply; 36+ messages in thread
From: Askar Safin @ 2025-08-08 12:32 UTC (permalink / raw)
To: Aleksa Sarai
Cc: Alejandro Colomar, Michael T. Kerrisk, Alexander Viro, Jan Kara,
G. Branden Robinson, linux-man, linux-api, linux-fsdevel,
linux-kernel, David Howells, Christian Brauner
In "man open_tree":
> As with "*at()" system calls, fspick() uses the dirfd argument in conjunction
You meant "open_tree"
> If flags does not contain OPEN_TREE_CLONE, open_tree() returns
> a file descriptor that is exactly equivalent to one produced by open(2).
Please, change "by open(2)" to "by openat(2) with O_PATH" (and other similar places).
--
Askar Safin
https://types.pl/@safinaskar
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v2 00/11] man2: add man pages for 'new' mount API
2025-08-06 17:44 [PATCH v2 00/11] man2: add man pages for 'new' mount API Aleksa Sarai
` (10 preceding siblings ...)
2025-08-06 17:44 ` [PATCH v2 11/11] fsconfig.2, mount_setattr.2: add note about attribute-parameter distinction Aleksa Sarai
@ 2025-08-08 12:53 ` Christian Brauner
11 siblings, 0 replies; 36+ messages in thread
From: Christian Brauner @ 2025-08-08 12:53 UTC (permalink / raw)
To: Aleksa Sarai
Cc: Alejandro Colomar, Michael T. Kerrisk, Alexander Viro, Jan Kara,
Askar Safin, G. Branden Robinson, linux-man, linux-api,
linux-fsdevel, linux-kernel, David Howells
On Thu, Aug 07, 2025 at 03:44:34AM +1000, Aleksa Sarai wrote:
> Back in 2019, the new mount API was merged into mainline[1]. David Howells
> then set about writing man pages for these new APIs, and sent some
> patches back in 2020[2]. Unfortunately, these patches were never merged,
> which meant that these APIs were practically undocumented for many
> years -- arguably this may have been a contributing factor to the
> relatively slow adoption of these new (far better) APIs. I have often
> discovered that many folks are unaware of the read(2)-based message
> retrieval interface provided by filesystem context file descriptors.
>
> In 2024, Christian Brauner set aside some time to provide some
> documentation of these new APIs and so adapted David Howell's original
> man pages into the easier-to-edit Markdown format and published them on
> GitHub[3]. These have been maintained since, including updated
> information on new features added since David Howells's 2020 draft pages
> (such as MOVE_MOUNT_BENEATH).
>
> While this was a welcome improvement to the previous status quo (that
> had lasted over 6 years), speaking personally my experience is that not
> having access to these man pages from the terminal has been a fairly
> common painpoint.
>
> So, this is a modern version of the man pages for these APIs, in the hopes
> that we can finally (7 years later) get proper documentation for these
> APIs in the man-pages project.
>
> One important thing to note is that most of these were re-written by me,
> with very minimal copying from the versions available from Christian[2].
> The reasons for this are two-fold:
>
> * Both Howells's original version and Christian's maintained versions
> contain crucial mistakes that I have been bitten by in the past (the
"Lies, damned lies, and statistics."
> most obvious being that all of these APIs were merged in Linux 5.2,
> but the man pages all claim they were merged in different versions.)
>
> * As the man pages appear to have been written from Howells's
> perspective while implementing them, some of the wording is a little
> too tied to the implementation (or appears to describe features that
> don't really exist in the merged versions of these APIs).
>
> I decided that the best way to resolve these issues is to rewrite them
> from the perspective of an actual user of these APIs (me), and check
> that we do not repeat the mistakes I found in the originals.
>
> I have also done my best to resolve the issues raised by Michael Kerrisk
> on the original patchset sent by Howells[1].
>
> In addition, I have also included a man page for open_tree_attr(2) (as a
> subsection of the new open_tree(2) man page), which was merged in Linux
> 6.15.
>
> [1]: https://lore.kernel.org/all/20190507204921.GL23075@ZenIV.linux.org.uk/
> [2]: https://lore.kernel.org/linux-man/159680892602.29015.6551860260436544999.stgit@warthog.procyon.org.uk/
> [3]: https://github.com/brauner/man-pages-md
>
> Co-developed-by: David Howells <dhowells@redhat.com>
> Co-developed-by: Christian Brauner <brauner@kernel.org>
> Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
> ---
Thanks for doing this! Just a point of order. If you add CdB you also
need to add SoB for all of them.
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v2 08/11] open_tree.2: document 'new' mount api
2025-08-08 12:32 ` Askar Safin
@ 2025-08-08 13:26 ` Aleksa Sarai
0 siblings, 0 replies; 36+ messages in thread
From: Aleksa Sarai @ 2025-08-08 13:26 UTC (permalink / raw)
To: Askar Safin
Cc: Alejandro Colomar, Michael T. Kerrisk, Alexander Viro, Jan Kara,
G. Branden Robinson, linux-man, linux-api, linux-fsdevel,
linux-kernel, David Howells, Christian Brauner
[-- Attachment #1: Type: text/plain, Size: 1000 bytes --]
On 2025-08-08, Askar Safin <safinaskar@zohomail.com> wrote:
> In "man open_tree":
>
> > As with "*at()" system calls, fspick() uses the dirfd argument in conjunction
>
> You meant "open_tree"
>
> > If flags does not contain OPEN_TREE_CLONE, open_tree() returns
> > a file descriptor that is exactly equivalent to one produced by open(2).
>
> Please, change "by open(2)" to "by openat(2) with O_PATH" (and other similar places).
I think the more common pattern in man-pages is to prefer to refer to
open(2) unless you are explicitly talking about openat(2) features (like
passing a dirfd). If it's just "a file descriptor with O_PATH" then most
man-pages I've seen reference open(2) even if they were written
post-openat(2).
Though in this case, since we are talking about open_tree(2) as an open
operation that takes a dirfd, you're right that openat(2) might be
better.
--
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
https://www.cyphar.com/
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v2 05/11] fsconfig.2: document 'new' mount api
2025-08-06 17:44 ` [PATCH v2 05/11] fsconfig.2: " Aleksa Sarai
@ 2025-08-08 14:00 ` Askar Safin
2025-08-08 15:22 ` Aleksa Sarai
0 siblings, 1 reply; 36+ messages in thread
From: Askar Safin @ 2025-08-08 14:00 UTC (permalink / raw)
To: Aleksa Sarai
Cc: Alejandro Colomar, Michael T. Kerrisk, Alexander Viro, Jan Kara,
G. Branden Robinson, linux-man, linux-api, linux-fsdevel,
linux-kernel, David Howells, Christian Brauner
Let's consider this example:
int fsfd, mntfd, nsfd, nsdirfd;
nsfd = open("/proc/self/ns/pid", O_PATH);
nsdirfd = open("/proc/1/ns", O_DIRECTORY);
fsfd = fsopen("proc", FSOPEN_CLOEXEC);
/* "pidns" changes the value each time. */
fsconfig(fsfd, FSCONFIG_SET_PATH, "pidns", "/proc/self/ns/pid", AT_FDCWD);
fsconfig(fsfd, FSCONFIG_SET_PATH, "pidns", "pid", NULL, nsdirfd);
fsconfig(fsfd, FSCONFIG_SET_PATH_EMPTY, "pidns", "", nsfd);
fsconfig(fsfd, FSCONFIG_SET_FD, "pidns", NULL, nsfd);
fsconfig(fsfd, FSCONFIG_CMD_CREATE, NULL, NULL, 0);
mntfd = fsmount(fsfd, FSMOUNT_CLOEXEC, 0);
move_mount(mntfd, "", AT_FDCWD, "/proc", MOVE_MOUNT_F_EMPTY_PATH);
I don't like it. /proc/self/ns/pid is our namespace, which is default anyway.
I. e. setting pidns to /proc/self/ns/pid is no-op (assuming that "pidns" option is implemented in our kernel, of course).
Moreover, if /proc is mounted properly, then /proc/1/ns/pid refers to our namespace, too!
Thus, *all* these fsconfig(FSCONFIG_SET_...) calls are no-op.
Thus it is bad example.
I suggest using, say, /proc/2/ns/pid . It has actual chance to refer to some other namespace.
Also, sentence '"pidns" changes the value each time' is a lie: as I explained, all these calls are no-ops,
they don't really change anything.
--
Askar Safin
https://types.pl/@safinaskar
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v2 05/11] fsconfig.2: document 'new' mount api
2025-08-08 14:00 ` Askar Safin
@ 2025-08-08 15:22 ` Aleksa Sarai
2025-08-08 19:07 ` Aleksa Sarai
0 siblings, 1 reply; 36+ messages in thread
From: Aleksa Sarai @ 2025-08-08 15:22 UTC (permalink / raw)
To: Askar Safin
Cc: Alejandro Colomar, Michael T. Kerrisk, Alexander Viro, Jan Kara,
G. Branden Robinson, linux-man, linux-api, linux-fsdevel,
linux-kernel, David Howells, Christian Brauner
[-- Attachment #1: Type: text/plain, Size: 2258 bytes --]
On 2025-08-08, Askar Safin <safinaskar@zohomail.com> wrote:
> Let's consider this example:
>
> int fsfd, mntfd, nsfd, nsdirfd;
>
> nsfd = open("/proc/self/ns/pid", O_PATH);
> nsdirfd = open("/proc/1/ns", O_DIRECTORY);
>
> fsfd = fsopen("proc", FSOPEN_CLOEXEC);
> /* "pidns" changes the value each time. */
> fsconfig(fsfd, FSCONFIG_SET_PATH, "pidns", "/proc/self/ns/pid", AT_FDCWD);
> fsconfig(fsfd, FSCONFIG_SET_PATH, "pidns", "pid", NULL, nsdirfd);
> fsconfig(fsfd, FSCONFIG_SET_PATH_EMPTY, "pidns", "", nsfd);
> fsconfig(fsfd, FSCONFIG_SET_FD, "pidns", NULL, nsfd);
> fsconfig(fsfd, FSCONFIG_CMD_CREATE, NULL, NULL, 0);
> mntfd = fsmount(fsfd, FSMOUNT_CLOEXEC, 0);
> move_mount(mntfd, "", AT_FDCWD, "/proc", MOVE_MOUNT_F_EMPTY_PATH);
>
> I don't like it. /proc/self/ns/pid is our namespace, which is default anyway.
> I. e. setting pidns to /proc/self/ns/pid is no-op (assuming that "pidns" option is implemented in our kernel, of course).
> Moreover, if /proc is mounted properly, then /proc/1/ns/pid refers to our namespace, too!
> Thus, *all* these fsconfig(FSCONFIG_SET_...) calls are no-op.
> Thus it is bad example.
>
> I suggest using, say, /proc/2/ns/pid . It has actual chance to refer to some other namespace.
>
> Also, sentence '"pidns" changes the value each time' is a lie: as I explained, all these calls are no-ops,
> they don't really change anything.
Right, I see your point.
One other problem with this example is that there is no
currently-existing parameter which accepts all of FSCONFIG_SET_PATH,
FSCONFIG_SET_PATH_EMPTY, FSCONFIG_SET_FD, and FSCONFIG_SET_STRING so
this example is by necessity a little contrived. I suspect that it'd be
better to remove this and re-add it once we actually something that
works this way...
You've replied to the pidns parameter patchset so I shouldn't repeat
myself here too much, but supporting this completely is my plan for the
next version I send. It's just not a thing that exists today (ditto for
overlayfs).
--
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
https://www.cyphar.com/
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v2 05/11] fsconfig.2: document 'new' mount api
2025-08-08 15:22 ` Aleksa Sarai
@ 2025-08-08 19:07 ` Aleksa Sarai
0 siblings, 0 replies; 36+ messages in thread
From: Aleksa Sarai @ 2025-08-08 19:07 UTC (permalink / raw)
To: Askar Safin
Cc: Alejandro Colomar, Michael T. Kerrisk, Alexander Viro, Jan Kara,
G. Branden Robinson, linux-man, linux-api, linux-fsdevel,
linux-kernel, David Howells, Christian Brauner
[-- Attachment #1: Type: text/plain, Size: 2996 bytes --]
On 2025-08-09, Aleksa Sarai <cyphar@cyphar.com> wrote:
> On 2025-08-08, Askar Safin <safinaskar@zohomail.com> wrote:
> > Let's consider this example:
> >
> > int fsfd, mntfd, nsfd, nsdirfd;
> >
> > nsfd = open("/proc/self/ns/pid", O_PATH);
> > nsdirfd = open("/proc/1/ns", O_DIRECTORY);
> >
> > fsfd = fsopen("proc", FSOPEN_CLOEXEC);
> > /* "pidns" changes the value each time. */
> > fsconfig(fsfd, FSCONFIG_SET_PATH, "pidns", "/proc/self/ns/pid", AT_FDCWD);
> > fsconfig(fsfd, FSCONFIG_SET_PATH, "pidns", "pid", NULL, nsdirfd);
> > fsconfig(fsfd, FSCONFIG_SET_PATH_EMPTY, "pidns", "", nsfd);
> > fsconfig(fsfd, FSCONFIG_SET_FD, "pidns", NULL, nsfd);
> > fsconfig(fsfd, FSCONFIG_CMD_CREATE, NULL, NULL, 0);
> > mntfd = fsmount(fsfd, FSMOUNT_CLOEXEC, 0);
> > move_mount(mntfd, "", AT_FDCWD, "/proc", MOVE_MOUNT_F_EMPTY_PATH);
> >
> > I don't like it. /proc/self/ns/pid is our namespace, which is default anyway.
> > I. e. setting pidns to /proc/self/ns/pid is no-op (assuming that "pidns" option is implemented in our kernel, of course).
> > Moreover, if /proc is mounted properly, then /proc/1/ns/pid refers to our namespace, too!
This slightly depends on what you mean by "properly". If you deal with
namespaces a lot, running into a situation whether the current process's
pidns doesn't match /proc is quite common (we run into it with container
runtimes all the time).
A proper example with provably different pidns values (such as the
selftests for the pidns parameter) would make for a very lengthy example
program with very little use for readers.
I'm tempted to just delete this example.
> > Thus, *all* these fsconfig(FSCONFIG_SET_...) calls are no-op.
> > Thus it is bad example.
> >
> > I suggest using, say, /proc/2/ns/pid . It has actual chance to refer to some other namespace.
> >
> > Also, sentence '"pidns" changes the value each time' is a lie: as I explained, all these calls are no-ops,
> > they don't really change anything.
>
> Right, I see your point.
>
> One other problem with this example is that there is no
> currently-existing parameter which accepts all of FSCONFIG_SET_PATH,
> FSCONFIG_SET_PATH_EMPTY, FSCONFIG_SET_FD, and FSCONFIG_SET_STRING so
> this example is by necessity a little contrived. I suspect that it'd be
> better to remove this and re-add it once we actually something that
> works this way...
>
> You've replied to the pidns parameter patchset so I shouldn't repeat
> myself here too much, but supporting this completely is my plan for the
> next version I send. It's just not a thing that exists today (ditto for
> overlayfs).
>
> --
> Aleksa Sarai
> Senior Software Engineer (Containers)
> SUSE Linux GmbH
> https://www.cyphar.com/
--
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
https://www.cyphar.com/
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v2 01/11] mount_setattr.2: document glibc >= 2.36 syscall wrappers
2025-08-08 11:55 ` Aleksa Sarai
@ 2025-08-09 10:42 ` Alejandro Colomar
2025-08-09 10:44 ` Alejandro Colomar
0 siblings, 1 reply; 36+ messages in thread
From: Alejandro Colomar @ 2025-08-09 10:42 UTC (permalink / raw)
To: Aleksa Sarai
Cc: Askar Safin, Michael T. Kerrisk, Alexander Viro, Jan Kara,
G. Branden Robinson, linux-man, linux-api, linux-fsdevel,
linux-kernel, David Howells, Christian Brauner
[-- Attachment #1: Type: text/plain, Size: 1558 bytes --]
Hi Aleksa, Askar,
On Fri, Aug 08, 2025 at 09:55:10PM +1000, Aleksa Sarai wrote:
> On 2025-08-08, Askar Safin <safinaskar@zohomail.com> wrote:
> > When I render "mount_setattr" from this (v2) pathset, I see weird quote mark. I. e.:
> >
> > $ MANWIDTH=10000 man /path/to/mount_setattr.2
> > ...
> > SYNOPSIS
> > #include <fcntl.h> /* Definition of AT_* constants */
> > #include <sys/mount.h>
> >
> > int mount_setattr(int dirfd, const char *path, unsigned int flags,
> > struct mount_attr *attr, size_t size);"
> > ...
>
> Ah, my bad. "make -R lint-man" told me to put end quotes on the synopsis
> lines, but I missed that there was a separate quote missing. This should
> fix it:
>
> diff --git a/man/man2/mount_setattr.2 b/man/man2/mount_setattr.2
> index d44fafc93a20..46fcba927dd8 100644
> --- a/man/man2/mount_setattr.2
> +++ b/man/man2/mount_setattr.2
> @@ -14,7 +14,7 @@ .SH SYNOPSIS
> .B #include <sys/mount.h>
> .P
> .BI "int mount_setattr(int " dirfd ", const char *" path ", unsigned int " flags ","
> -.BI " struct mount_attr *" attr ", size_t " size );"
> +.BI " struct mount_attr *" attr ", size_t " size ");"
Actually, I'd use
.BI " struct mount_attr *" attr ", size_t " size );
> .fi
> .SH DESCRIPTION
> The
Hmmm, thanks for the catch! My CI server is down until I come back home
and have a chance to fix it.
Have a lovely day!
Alex
--
<https://www.alejandro-colomar.es/>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v2 01/11] mount_setattr.2: document glibc >= 2.36 syscall wrappers
2025-08-09 10:42 ` Alejandro Colomar
@ 2025-08-09 10:44 ` Alejandro Colomar
0 siblings, 0 replies; 36+ messages in thread
From: Alejandro Colomar @ 2025-08-09 10:44 UTC (permalink / raw)
To: Aleksa Sarai
Cc: Askar Safin, Michael T. Kerrisk, Alexander Viro, Jan Kara,
G. Branden Robinson, linux-man, linux-api, linux-fsdevel,
linux-kernel, David Howells, Christian Brauner
[-- Attachment #1: Type: text/plain, Size: 2318 bytes --]
Hi Aleksa, Askar,
On Sat, Aug 09, 2025 at 12:42:58PM +0200, Alejandro Colomar wrote:
> Hi Aleksa, Askar,
>
> On Fri, Aug 08, 2025 at 09:55:10PM +1000, Aleksa Sarai wrote:
> > On 2025-08-08, Askar Safin <safinaskar@zohomail.com> wrote:
> > > When I render "mount_setattr" from this (v2) pathset, I see weird quote mark. I. e.:
> > >
> > > $ MANWIDTH=10000 man /path/to/mount_setattr.2
> > > ...
> > > SYNOPSIS
> > > #include <fcntl.h> /* Definition of AT_* constants */
> > > #include <sys/mount.h>
> > >
> > > int mount_setattr(int dirfd, const char *path, unsigned int flags,
> > > struct mount_attr *attr, size_t size);"
> > > ...
> >
> > Ah, my bad. "make -R lint-man" told me to put end quotes on the synopsis
> > lines, but I missed that there was a separate quote missing. This should
> > fix it:
> >
> > diff --git a/man/man2/mount_setattr.2 b/man/man2/mount_setattr.2
> > index d44fafc93a20..46fcba927dd8 100644
> > --- a/man/man2/mount_setattr.2
> > +++ b/man/man2/mount_setattr.2
> > @@ -14,7 +14,7 @@ .SH SYNOPSIS
> > .B #include <sys/mount.h>
> > .P
> > .BI "int mount_setattr(int " dirfd ", const char *" path ", unsigned int " flags ","
> > -.BI " struct mount_attr *" attr ", size_t " size );"
> > +.BI " struct mount_attr *" attr ", size_t " size ");"
>
> Actually, I'd use
>
> .BI " struct mount_attr *" attr ", size_t " size );
I've pushed this as a fix. As a sanity check:
$ diffman-git HEAD
--- HEAD^:man/man2/mount_setattr.2
+++ HEAD:man/man2/mount_setattr.2
@@ -11,7 +11,7 @@
#include <sys/mount.h>
int mount_setattr(int dirfd, const char *path, unsigned int flags,
- struct mount_attr *attr, size_t size);"
+ struct mount_attr *attr, size_t size);
DESCRIPTION
The mount_setattr() system call changes the mount properties of a mount
Have a lovely day!
Alex
>
> > .fi
> > .SH DESCRIPTION
> > The
>
> Hmmm, thanks for the catch! My CI server is down until I come back home
> and have a chance to fix it.
>
>
> Have a lovely day!
> Alex
>
> --
> <https://www.alejandro-colomar.es/>
--
<https://www.alejandro-colomar.es/>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 36+ messages in thread
end of thread, other threads:[~2025-08-09 10:45 UTC | newest]
Thread overview: 36+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-06 17:44 [PATCH v2 00/11] man2: add man pages for 'new' mount API Aleksa Sarai
2025-08-06 17:44 ` [PATCH v2 01/11] mount_setattr.2: document glibc >= 2.36 syscall wrappers Aleksa Sarai
2025-08-07 10:39 ` Alejandro Colomar
2025-08-08 9:23 ` Askar Safin
2025-08-08 11:55 ` Aleksa Sarai
2025-08-09 10:42 ` Alejandro Colomar
2025-08-09 10:44 ` Alejandro Colomar
2025-08-06 17:44 ` [PATCH v2 02/11] mount_setattr.2: move mount_attr struct to mount_attr.2type Aleksa Sarai
2025-08-07 11:11 ` Alejandro Colomar
2025-08-07 12:38 ` Aleksa Sarai
2025-08-07 13:33 ` Alejandro Colomar
2025-08-06 17:44 ` [PATCH v2 03/11] fsopen.2: document 'new' mount api Aleksa Sarai
2025-08-07 11:38 ` Alejandro Colomar
2025-08-07 12:50 ` Aleksa Sarai
2025-08-07 13:42 ` Alejandro Colomar
2025-08-07 13:27 ` Aleksa Sarai
2025-08-07 13:52 ` Alejandro Colomar
2025-08-07 14:26 ` Aleksa Sarai
2025-08-07 19:27 ` Konstantin Ryabitsev
2025-08-07 19:39 ` Aleksa Sarai
2025-08-08 9:07 ` Askar Safin
2025-08-08 11:57 ` Aleksa Sarai
2025-08-06 17:44 ` [PATCH v2 04/11] fspick.2: " Aleksa Sarai
2025-08-06 17:44 ` [PATCH v2 05/11] fsconfig.2: " Aleksa Sarai
2025-08-08 14:00 ` Askar Safin
2025-08-08 15:22 ` Aleksa Sarai
2025-08-08 19:07 ` Aleksa Sarai
2025-08-06 17:44 ` [PATCH v2 06/11] fsmount.2: " Aleksa Sarai
2025-08-06 17:44 ` [PATCH v2 07/11] move_mount.2: " Aleksa Sarai
2025-08-06 17:44 ` [PATCH v2 08/11] open_tree.2: " Aleksa Sarai
2025-08-08 12:32 ` Askar Safin
2025-08-08 13:26 ` Aleksa Sarai
2025-08-06 17:44 ` [PATCH v2 09/11] mount_setattr.2: mirror opening sentence from fsopen(2) Aleksa Sarai
2025-08-06 17:44 ` [PATCH v2 10/11] open_tree_attr.2, open_tree.2: document new open_tree_attr() api Aleksa Sarai
2025-08-06 17:44 ` [PATCH v2 11/11] fsconfig.2, mount_setattr.2: add note about attribute-parameter distinction Aleksa Sarai
2025-08-08 12:53 ` [PATCH v2 00/11] man2: add man pages for 'new' mount API Christian Brauner
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).