* [PATCH 00/18] VFS: Filesystem information [ver #21]
@ 2020-08-03 13:36 David Howells
2020-08-03 13:38 ` [PATCH 14/18] fsinfo: Add support to ext4 " David Howells
2020-08-04 15:39 ` [PATCH 00/18] VFS: Filesystem information " James Bottomley
0 siblings, 2 replies; 5+ messages in thread
From: David Howells @ 2020-08-03 13:36 UTC (permalink / raw)
To: viro
Cc: Theodore Ts'o, Andreas Dilger, Eric Biggers, Jeff Layton,
linux-ext4, Carlos Maiolino, Darrick J. Wong, linux-api, dhowells,
torvalds, raven, mszeredi, christian, jannh, darrick.wong, kzak,
jlayton, linux-api, linux-fsdevel, linux-security-module,
linux-kernel
Here's a set of patches that adds a system call, fsinfo(), that allows
information about the VFS, mount topology, superblock and files to be
retrieved.
The patchset is based on top of the notifications patchset and allows event
counters implemented in the latter to be retrieved to allow overruns to be
efficiently managed.
=======
THE WHY
=======
Why do we want this?
Using /proc/mounts (or similar) has problems:
(1) Reading from it holds a global lock (namespace_sem) that prevents
mounting and unmounting. Lots of data is encoded and mangled into
text whilst the lock is held, including superblock option strings and
mount point paths. This causes performance problems when there are a
lot of mount objects in a system.
(2) Even though namespace_sem is held during a read, reading the whole
file isn't necessarily atomic with respect to mount-type operations.
If a read isn't satisfied in one go, then it may return to userspace
briefly and then continue reading some way into the file. But changes
can occur in the interval that may then go unseen.
(3) Determining what has changed means parsing and comparing consecutive
outputs of /proc/mounts.
(4) Querying a specific mount or superblock means searching through
/proc/mounts and searching by path or mount ID - but we might have an
fd we want to query.
(5) Whilst you can poll() it for events, it only tells you that something
changed in the namespace, not what or whether you can even see the
change.
To fix the notification issues, the preceding notifications patchset added
mount watch notifications whereby you can watch for notifications in a
specific mount subtree. The notification messages include the ID(s) of the
affected mounts.
To support notifications, however, we need to be able to handle overruns in
the notification queue. I added a number of event counters to struct
super_block and struct mount to allow you to pin down the changes, but
there needs to be a way to retrieve them. Exposing them through /proc
would require adding yet another /proc/mounts-type file. We could add
per-mount directories full of attributes in sysfs, but that has issues also
(see below).
Adding an extensible system call interface for retrieving filesystem
information also allows other things to be exposed:
(1) Jeff Layton's error handling changes need a way to allow error event
information to be retrieved.
(2) Bits in masks returned by things like statx() and FS_IOC_GETFLAGS are
actually 3-state { Set, Unset, Not supported }. It could be useful to
provide a way to expose information like this[*].
(3) Limits of the numerical metadata values in a filesystem[*].
(4) Filesystem capability information[*]. Filesystems don't all have the
same capabilities, and even different instances may have different
capabilities, particularly with network filesystems where the set of
may be server-dependent. Capabilities might even vary at file
granularity - though possibly such information should be conveyed
through statx() instead.
(5) ID mapping/shifting tables in use for a superblock.
(6) Filesystem-specific information. I need something for AFS so that I
can do pioctl()-emulation, thereby allowing me to implement certain of
the AFS command line utilities that query state of a particular file.
This could also have application for other filesystems, such as NFS,
CIFS and ext4.
[*] In a lot of cases these are probably invariant and can be memcpy'd
from static data.
There's a further consideration: I want to make it possible to have
fsconfig(fd, FSCONFIG_CMD_CREATE) be intercepted by a container manager
such that the manager can supervise a mount attempted inside the container.
The manager would be given an fd pointing to the fs_context struct and
would then need some way to query it (fsinfo()) and modify it (fsconfig()).
This could also be used to arbitrate user-requested mounts when containers
are not in play.
================
DESIGN DECISIONS
================
(1) Information is partitioned into sets of attributes.
(2) Attribute IDs are integers as they're fast to compare.
(3) Attribute values are typed (struct, list of structs, string, opaque
blob). They type is fixed for a particular attribute.
(4) For structure types, the length is also a version. New fields can be
tacked onto the end.
(5) When copying a versioned struct to userspace, the core handles a
version mismatch by truncating or zero-padding the data as necessary.
This is transparent to the filesystem.
(6) The core handles all the buffering and buffer resizing.
(7) The filesystem never gets any access to the userspace parameter buffer
or result buffer.
(8) "Meta" attributes can describe other attributes.
========
OVERVIEW
========
fsinfo() is a system call that allows information about the filesystem at a
particular path point to be queried as a set of attributes.
Attribute values are of four basic types:
(1) Structure with version-dependent length (the length is the version).
(2) Variable-length string.
(3) List of structures (all the same length).
(4) Opaque blob.
Attributes can have multiple values either as a sequence of values or a
sequence-of-sequences of values and all the values of a particular
attribute must be of the same type. Values can be up to INT_MAX size,
subject to memory availability.
Note that the values of an attribute *are* allowed to vary between dentries
within a single superblock, depending on the specific dentry that you're
looking at, but the values still have to be of the type for that attribute.
I've tried to make the interface as light as possible, so integer attribute
IDs rather than string and the core does all the buffer allocation and
expansion and all the extensibility support work rather than leaving that
to the filesystems. This also means that userspace pointers are not
exposed to the filesystem.
fsinfo() allows a variety of information to be retrieved about a filesystem
and the mount topology:
(1) General superblock attributes:
- Filesystem identifiers (UUID, volume label, device numbers, ...)
- The limits on a filesystem's capabilities
- Information on supported statx fields and attributes and IOC flags.
- A variety single-bit flags indicating supported capabilities.
- Timestamp resolution and range.
- The amount of space/free space in a filesystem (as statfs()).
- Superblock notification counter.
(2) Filesystem-specific superblock attributes:
- Superblock-level timestamps.
- Cell name, workgroup or other netfs grouping concept.
- Server names and addresses.
(3) VFS information:
- Mount topology information.
- Mount attributes.
- Mount notification counter.
- Mount point path.
(4) Information about what the fsinfo() syscall itself supports, including
the type and struct size of attributes.
The system is extensible:
(1) New attributes can be added. There is no requirement that a
filesystem implement every attribute. A helper function is provided
to scan a list of attributes and a filesystem can have multiple such
lists.
(2) Version length-dependent structure attributes can be made larger and
have additional information tacked on the end, provided it keeps the
layout of the existing fields. If an older process asks for a shorter
structure, it will only be given the bits it asks for. If a newer
process asks for a longer structure on an older kernel, the extra
space will be set to 0. In all cases, the size of the data actually
available is returned.
In essence, the size of a structure is that structure's version: a
smaller size is an earlier version and a later version includes
everything that the earlier version did.
(3) New single-bit capability flags can be added. This is a structure-typed
attribute and, as such, (2) applies. Any bits you wanted but the kernel
doesn't support are automatically set to 0.
fsinfo() may be called like the following, for example:
struct fsinfo_params params = {
.at_flags = AT_SYMLINK_NOFOLLOW,
.flags = FSINFO_FLAGS_QUERY_PATH,
.request = FSINFO_ATTR_AFS_SERVER_ADDRESSES,
.Nth = 2,
};
struct fsinfo_server_address address;
len = fsinfo(AT_FDCWD, "/afs/grand.central.org/doc",
¶ms, sizeof(params),
&address, sizeof(address));
The above example would query an AFS filesystem to retrieve the address
list for the 3rd server, and:
struct fsinfo_params params = {
.at_flags = AT_SYMLINK_NOFOLLOW,
.flags = FSINFO_FLAGS_QUERY_PATH,
.request = FSINFO_ATTR_NFS_SERVER_NAME;
};
char server_name[256];
len = fsinfo(AT_FDCWD, "/home/dhowells/",
¶ms, sizeof(params),
&server_name, sizeof(server_name));
would retrieve the name of the NFS server as a string.
In future, I want to make fsinfo() capable of querying a context created by
fsopen() or fspick(), e.g.:
fd = fsopen("ext4", 0);
struct fsinfo_params params = {
.flags = FSINFO_FLAGS_QUERY_FSCONTEXT,
.request = FSINFO_ATTR_CONFIGURATION;
};
char buffer[65536];
fsinfo(fd, NULL, ¶ms, sizeof(params), &buffer, sizeof(buffer));
even if that context doesn't currently have a superblock attached.
The patches can be found here also:
https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git
on branch:
fsinfo-core
===================
SIGNIFICANT CHANGES
===================
ver #21:
(*) Moved the mount event counters here from the mount notifications
patchset. Made the counters in the kernel atomic_long_t and the UAPI
counters __u64.
(*) Added Jeff Layton's patches to allow userspace to retrieve writeback
error information through fsinfo().
ver #20:
(*) Changed MOUNT_PROPAGATION_SLAVE to MOUNT_PROPAGATION_DEPENDENT and
renamed the fields in the fsinfo_mount_topology struct. The
MOUNT_PROPAGATION_* settings have been turned into an enum and will
also be passed to mount_setattr().
(*) Adjusted the Ext4 patch from feedback and removed the example status
from it.
(*) Dropped the NFS patch.
(*) I've dropped the superblock notifications for now.
ver #19:
(*) Split FSINFO_ATTR_MOUNT_TOPOLOGY from FSINFO_ATTR_MOUNT_INFO. The
latter requires no locking as it looks no further than the mount
object it's dealing with. The topology attribute, however, has to
take the namespace lock. That said, the info attribute includes a
counter that indicates how many times a mount object's position in the
topology has changed.
(*) A bit of patch rearrangement to put the mount topology-exposing
attributes into one patch.
(*) Pass both AT_* and RESOLVE_* flags to fsinfo() as suggested by Linus,
rather than adding missing RESOLVE_* flags.
David
---
David Howells (15):
fsinfo: Introduce a non-repeating system-unique superblock ID
fsinfo: Add fsinfo() syscall to query filesystem information
fsinfo: Provide a bitmap of the features a filesystem supports
fsinfo: Allow retrieval of superblock devname, options and stats
fsinfo: Allow fsinfo() to look up a mount object by ID
fsinfo: Add a uniquifier ID to struct mount
fsinfo: Allow mount information to be queried
fsinfo: Allow mount topology and propagation info to be retrieved
watch_queue: Mount event counters
fsinfo: Provide notification overrun handling support
fsinfo: sample: Mount listing program
fsinfo: Add API documentation
fsinfo: Add support for AFS
fsinfo: Add support to ext4
fsinfo: Add an attribute that lists all the visible mounts in a namespace
Jeff Layton (3):
errseq: add a new errseq_scrape function
vfs: allow fsinfo to fetch the current state of s_wb_err
samples: add error state information to test-fsinfo.c
Documentation/filesystems/fsinfo.rst | 574 +++++++++++++
arch/alpha/kernel/syscalls/syscall.tbl | 1 +
arch/arm/tools/syscall.tbl | 1 +
arch/arm64/include/asm/unistd.h | 2 +-
arch/arm64/include/asm/unistd32.h | 2 +
arch/ia64/kernel/syscalls/syscall.tbl | 1 +
arch/m68k/kernel/syscalls/syscall.tbl | 1 +
arch/microblaze/kernel/syscalls/syscall.tbl | 1 +
arch/mips/kernel/syscalls/syscall_n32.tbl | 1 +
arch/mips/kernel/syscalls/syscall_n64.tbl | 1 +
arch/mips/kernel/syscalls/syscall_o32.tbl | 1 +
arch/parisc/kernel/syscalls/syscall.tbl | 1 +
arch/powerpc/kernel/syscalls/syscall.tbl | 1 +
arch/s390/kernel/syscalls/syscall.tbl | 1 +
arch/sh/kernel/syscalls/syscall.tbl | 1 +
arch/sparc/kernel/syscalls/syscall.tbl | 1 +
arch/x86/entry/syscalls/syscall_32.tbl | 1 +
arch/x86/entry/syscalls/syscall_64.tbl | 1 +
arch/xtensa/kernel/syscalls/syscall.tbl | 1 +
fs/Kconfig | 7 +
fs/Makefile | 1 +
fs/afs/internal.h | 1 +
fs/afs/super.c | 216 ++++-
fs/d_path.c | 2 +-
fs/ext4/Makefile | 1 +
fs/ext4/ext4.h | 6 +
fs/ext4/fsinfo.c | 97 +++
fs/ext4/super.c | 3 +
fs/fsinfo.c | 748 +++++++++++++++++
fs/internal.h | 15 +
fs/mount.h | 6 +
fs/mount_notify.c | 10 +-
fs/namespace.c | 427 +++++++++-
include/linux/errseq.h | 1 +
include/linux/fs.h | 4 +
include/linux/fsinfo.h | 112 +++
include/linux/syscalls.h | 4 +
include/uapi/asm-generic/unistd.h | 4 +-
include/uapi/linux/fsinfo.h | 344 ++++++++
include/uapi/linux/mount.h | 13 +-
kernel/sys_ni.c | 1 +
lib/errseq.c | 33 +-
samples/vfs/Makefile | 6 +-
samples/vfs/test-fsinfo.c | 883 ++++++++++++++++++++
samples/vfs/test-mntinfo.c | 277 ++++++
45 files changed, 3802 insertions(+), 14 deletions(-)
create mode 100644 Documentation/filesystems/fsinfo.rst
create mode 100644 fs/ext4/fsinfo.c
create mode 100644 fs/fsinfo.c
create mode 100644 include/linux/fsinfo.h
create mode 100644 include/uapi/linux/fsinfo.h
create mode 100644 samples/vfs/test-fsinfo.c
create mode 100644 samples/vfs/test-mntinfo.c
^ permalink raw reply [flat|nested] 5+ messages in thread* [PATCH 14/18] fsinfo: Add support to ext4 [ver #21]
2020-08-03 13:36 [PATCH 00/18] VFS: Filesystem information [ver #21] David Howells
@ 2020-08-03 13:38 ` David Howells
2020-08-04 15:39 ` [PATCH 00/18] VFS: Filesystem information " James Bottomley
1 sibling, 0 replies; 5+ messages in thread
From: David Howells @ 2020-08-03 13:38 UTC (permalink / raw)
To: viro
Cc: Darrick J. Wong, Theodore Ts'o, Andreas Dilger, Eric Biggers,
linux-ext4, dhowells, torvalds, raven, mszeredi, christian, jannh,
darrick.wong, kzak, jlayton, linux-api, linux-fsdevel,
linux-security-module, linux-kernel
Add support to ext4, including the following:
(1) FSINFO_ATTR_SUPPORTS: Information about supported STATX attributes and
support for ioctls like FS_IOC_[GS]ETFLAGS and FS_IOC_FS[GS]ETXATTR.
(2) FSINFO_ATTR_FEATURES: Information about features supported by an ext4
filesystem, such as whether version counting, birth time and name case
folding are in operation.
(3) FSINFO_ATTR_VOLUME_NAME: The volume name from the superblock.
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
cc: "Theodore Ts'o" <tytso@mit.edu>
cc: Andreas Dilger <adilger.kernel@dilger.ca>
cc: Eric Biggers <ebiggers@kernel.org>
cc: linux-ext4@vger.kernel.org
---
fs/ext4/Makefile | 1 +
fs/ext4/ext4.h | 6 +++
fs/ext4/fsinfo.c | 97 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
fs/ext4/super.c | 3 ++
4 files changed, 107 insertions(+)
create mode 100644 fs/ext4/fsinfo.c
diff --git a/fs/ext4/Makefile b/fs/ext4/Makefile
index 2e42f47a7f98..ad67812bf7d0 100644
--- a/fs/ext4/Makefile
+++ b/fs/ext4/Makefile
@@ -17,3 +17,4 @@ ext4-$(CONFIG_EXT4_FS_SECURITY) += xattr_security.o
ext4-inode-test-objs += inode-test.o
obj-$(CONFIG_EXT4_KUNIT_TESTS) += ext4-inode-test.o
ext4-$(CONFIG_FS_VERITY) += verity.o
+ext4-$(CONFIG_FSINFO) += fsinfo.o
diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 42f5060f3cdf..99a737cf6308 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -43,6 +43,7 @@
#include <linux/fscrypt.h>
#include <linux/fsverity.h>
+#include <linux/fsinfo.h>
#include <linux/compiler.h>
@@ -3233,6 +3234,11 @@ extern const struct inode_operations ext4_file_inode_operations;
extern const struct file_operations ext4_file_operations;
extern loff_t ext4_llseek(struct file *file, loff_t offset, int origin);
+/* fsinfo.c */
+#ifdef CONFIG_FSINFO
+extern int ext4_fsinfo(struct path *path, struct fsinfo_context *ctx);
+#endif
+
/* inline.c */
extern int ext4_get_max_inline_size(struct inode *inode);
extern int ext4_find_inline_data_nolock(struct inode *inode);
diff --git a/fs/ext4/fsinfo.c b/fs/ext4/fsinfo.c
new file mode 100644
index 000000000000..1d4093ef32e7
--- /dev/null
+++ b/fs/ext4/fsinfo.c
@@ -0,0 +1,97 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Filesystem information for ext4
+ *
+ * Copyright (C) 2020 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells (dhowells@redhat.com)
+ */
+
+#include <linux/mount.h>
+#include "ext4.h"
+
+static int ext4_fsinfo_supports(struct path *path, struct fsinfo_context *ctx)
+{
+ struct fsinfo_supports *p = ctx->buffer;
+ struct inode *inode = d_inode(path->dentry);
+ struct ext4_inode_info *ei = EXT4_I(inode);
+ struct ext4_inode *raw_inode;
+ u32 flags;
+
+ fsinfo_generic_supports(path, ctx);
+ p->stx_attributes |= (STATX_ATTR_APPEND |
+ STATX_ATTR_COMPRESSED |
+ STATX_ATTR_ENCRYPTED |
+ STATX_ATTR_IMMUTABLE |
+ STATX_ATTR_NODUMP |
+ STATX_ATTR_VERITY);
+ if (EXT4_FITS_IN_INODE(raw_inode, ei, i_crtime))
+ p->stx_mask |= STATX_BTIME;
+
+ flags = EXT4_FL_USER_VISIBLE;
+ if (S_ISREG(inode->i_mode))
+ flags &= ~EXT4_PROJINHERIT_FL;
+ p->fs_ioc_getflags = flags;
+ flags &= EXT4_FL_USER_MODIFIABLE;
+ p->fs_ioc_setflags_set = flags;
+ p->fs_ioc_setflags_clear = flags;
+
+ p->fs_ioc_fsgetxattr_xflags = EXT4_FL_XFLAG_VISIBLE;
+ p->fs_ioc_fssetxattr_xflags_set = EXT4_FL_XFLAG_VISIBLE;
+ p->fs_ioc_fssetxattr_xflags_clear = EXT4_FL_XFLAG_VISIBLE;
+ return sizeof(*p);
+}
+
+static int ext4_fsinfo_features(struct path *path, struct fsinfo_context *ctx)
+{
+ struct fsinfo_features *p = ctx->buffer;
+ struct super_block *sb = path->dentry->d_sb;
+ struct inode *inode = d_inode(path->dentry);
+ struct ext4_inode_info *ei = EXT4_I(inode);
+ struct ext4_inode *raw_inode;
+
+ fsinfo_generic_features(path, ctx);
+ fsinfo_set_unix_features(p);
+ fsinfo_set_feature(p, FSINFO_FEAT_VOLUME_UUID);
+ fsinfo_set_feature(p, FSINFO_FEAT_VOLUME_NAME);
+ fsinfo_set_feature(p, FSINFO_FEAT_O_SYNC);
+ fsinfo_set_feature(p, FSINFO_FEAT_O_DIRECT);
+ fsinfo_set_feature(p, FSINFO_FEAT_ADV_LOCKS);
+
+ if (test_opt(sb, XATTR_USER))
+ fsinfo_set_feature(p, FSINFO_FEAT_XATTRS);
+ if (ext4_has_feature_journal(sb))
+ fsinfo_set_feature(p, FSINFO_FEAT_JOURNAL);
+ if (ext4_has_feature_casefold(sb))
+ fsinfo_set_feature(p, FSINFO_FEAT_NAME_CASE_INDEP);
+
+ if (sb->s_flags & SB_I_VERSION &&
+ !test_opt2(sb, HURD_COMPAT) &&
+ EXT4_INODE_SIZE(sb) > EXT4_GOOD_OLD_INODE_SIZE) {
+ fsinfo_set_feature(p, FSINFO_FEAT_IVER_DATA_CHANGE);
+ fsinfo_set_feature(p, FSINFO_FEAT_IVER_MONO_INCR);
+ }
+
+ if (EXT4_FITS_IN_INODE(raw_inode, ei, i_crtime))
+ fsinfo_set_feature(p, FSINFO_FEAT_HAS_BTIME);
+ return sizeof(*p);
+}
+
+static int ext4_fsinfo_get_volume_name(struct path *path, struct fsinfo_context *ctx)
+{
+ const struct ext4_sb_info *sbi = EXT4_SB(path->mnt->mnt_sb);
+ const struct ext4_super_block *es = sbi->s_es;
+
+ memcpy(ctx->buffer, es->s_volume_name, sizeof(es->s_volume_name));
+ return strlen(ctx->buffer) + 1;
+}
+
+static const struct fsinfo_attribute ext4_fsinfo_attributes[] = {
+ FSINFO_VSTRUCT (FSINFO_ATTR_SUPPORTS, ext4_fsinfo_supports),
+ FSINFO_VSTRUCT (FSINFO_ATTR_FEATURES, ext4_fsinfo_features),
+ FSINFO_STRING (FSINFO_ATTR_VOLUME_NAME, ext4_fsinfo_get_volume_name),
+ {}
+};
+
+int ext4_fsinfo(struct path *path, struct fsinfo_context *ctx)
+{
+ return fsinfo_get_attribute(path, ctx, ext4_fsinfo_attributes);
+}
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 330957ed1f05..47f349620176 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -1481,6 +1481,9 @@ static const struct super_operations ext4_sops = {
.freeze_fs = ext4_freeze,
.unfreeze_fs = ext4_unfreeze,
.statfs = ext4_statfs,
+#ifdef CONFIG_FSINFO
+ .fsinfo = ext4_fsinfo,
+#endif
.remount_fs = ext4_remount,
.show_options = ext4_show_options,
#ifdef CONFIG_QUOTA
^ permalink raw reply related [flat|nested] 5+ messages in thread* Re: [PATCH 00/18] VFS: Filesystem information [ver #21]
2020-08-03 13:36 [PATCH 00/18] VFS: Filesystem information [ver #21] David Howells
2020-08-03 13:38 ` [PATCH 14/18] fsinfo: Add support to ext4 " David Howells
@ 2020-08-04 15:39 ` James Bottomley
2020-08-04 19:18 ` Miklos Szeredi
2020-08-05 17:13 ` David Howells
1 sibling, 2 replies; 5+ messages in thread
From: James Bottomley @ 2020-08-04 15:39 UTC (permalink / raw)
To: David Howells, viro
Cc: Theodore Ts'o, Andreas Dilger, Eric Biggers, Jeff Layton,
linux-ext4, Carlos Maiolino, Darrick J. Wong, linux-api, torvalds,
raven, mszeredi, christian, jannh, kzak, jlayton, linux-fsdevel,
linux-security-module, linux-kernel
On Mon, 2020-08-03 at 14:36 +0100, David Howells wrote:
> Here's a set of patches that adds a system call, fsinfo(), that
> allows information about the VFS, mount topology, superblock and
> files to be retrieved.
>
> The patchset is based on top of the notifications patchset and allows
> event counters implemented in the latter to be retrieved to allow
> overruns to be efficiently managed.
Could I repeat the question I asked about six months back that never
got answered:
https://lore.kernel.org/linux-api/1582316494.3376.45.camel@HansenPartnership.com/
It sort of petered out into a long winding thread about why not use
sysfs instead, which really doesn't look like a good idea to me.
I'll repeat the information for those who want to quote it easily on
reply without having to use a web interface:
---
Could I make a suggestion about how this should be done in a way that
doesn't actually require the fsinfo syscall at all: it could just be
done with fsconfig. The idea is based on something I've wanted to do
for configfd but couldn't because otherwise it wouldn't substitute for
fsconfig, but Christian made me think it was actually essential to the
ability of the seccomp and other verifier tools in the critique of
configfd and I belive the same critique applies here.
Instead of making fsconfig functionally configure ... as in you pass
the attribute name, type and parameters down into the fs specific
handler and the handler does a string match and then verifies the
parameters and then acts on them, make it table configured, so what
each fstype does is register a table of attributes which can be got and
optionally set (with each attribute having a get and optional set
function). We'd have multiple tables per fstype, so the generic VFS
can register a table of attributes it understands for every fstype
(things like name, uuid and the like) and then each fs type would
register a table of fs specific attributes following the same pattern.
The system would examine the fs specific table before the generic one,
allowing overrides. fsconfig would have the ability to both get and
set attributes, permitting retrieval as well as setting (which is how I
get rid of the fsinfo syscall), we'd have a global parameter, which
would retrieve the entire table by name and type so the whole thing is
introspectable because the upper layer knows a-priori all the
attributes which can be set for a given fs type and what type they are
(so we can make more of the parsing generic). Any attribute which
doesn't have a set routine would be read only and all attributes would
have to have a get routine meaning everything is queryable.
I think I know how to code this up in a way that would be fully
transparent to the existing syscalls.
---
James
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 00/18] VFS: Filesystem information [ver #21]
2020-08-04 15:39 ` [PATCH 00/18] VFS: Filesystem information " James Bottomley
@ 2020-08-04 19:18 ` Miklos Szeredi
2020-08-05 17:13 ` David Howells
1 sibling, 0 replies; 5+ messages in thread
From: Miklos Szeredi @ 2020-08-04 19:18 UTC (permalink / raw)
To: James Bottomley
Cc: David Howells, Al Viro, Theodore Ts'o, Andreas Dilger,
Eric Biggers, Jeff Layton, linux-ext4, Carlos Maiolino,
Darrick J. Wong, Linux API, Linus Torvalds, Ian Kent,
Miklos Szeredi, Christian Brauner, Jann Horn, Karel Zak,
Jeff Layton, linux-fsdevel, LSM, linux-kernel
On Tue, Aug 4, 2020 at 5:40 PM James Bottomley
<James.Bottomley@hansenpartnership.com> wrote:
>
> On Mon, 2020-08-03 at 14:36 +0100, David Howells wrote:
> > Here's a set of patches that adds a system call, fsinfo(), that
> > allows information about the VFS, mount topology, superblock and
> > files to be retrieved.
> >
> > The patchset is based on top of the notifications patchset and allows
> > event counters implemented in the latter to be retrieved to allow
> > overruns to be efficiently managed.
>
> Could I repeat the question I asked about six months back that never
> got answered:
>
> https://lore.kernel.org/linux-api/1582316494.3376.45.camel@HansenPartnership.com/
>
> It sort of petered out into a long winding thread about why not use
> sysfs instead, which really doesn't look like a good idea to me.
>
> I'll repeat the information for those who want to quote it easily on
> reply without having to use a web interface:
>
> ---
> Could I make a suggestion about how this should be done in a way that
> doesn't actually require the fsinfo syscall at all: it could just be
> done with fsconfig. The idea is based on something I've wanted to do
> for configfd but couldn't because otherwise it wouldn't substitute for
> fsconfig, but Christian made me think it was actually essential to the
> ability of the seccomp and other verifier tools in the critique of
> configfd and I belive the same critique applies here.
>
> Instead of making fsconfig functionally configure ... as in you pass
> the attribute name, type and parameters down into the fs specific
> handler and the handler does a string match and then verifies the
> parameters and then acts on them, make it table configured, so what
> each fstype does is register a table of attributes which can be got and
> optionally set (with each attribute having a get and optional set
> function). We'd have multiple tables per fstype, so the generic VFS
> can register a table of attributes it understands for every fstype
> (things like name, uuid and the like) and then each fs type would
> register a table of fs specific attributes following the same pattern.
> The system would examine the fs specific table before the generic one,
> allowing overrides. fsconfig would have the ability to both get and
> set attributes, permitting retrieval as well as setting (which is how I
> get rid of the fsinfo syscall), we'd have a global parameter, which
> would retrieve the entire table by name and type so the whole thing is
> introspectable because the upper layer knows a-priori all the
> attributes which can be set for a given fs type and what type they are
> (so we can make more of the parsing generic). Any attribute which
> doesn't have a set routine would be read only and all attributes would
> have to have a get routine meaning everything is queryable.
fsconfig(2) takes an fd referring to an fs_context, that in turn
refers to a super_block.
So using fsconfig() for retrieving super_block attributes would be
fine (modulo value being const, and lack of buffer size).
But what about mount attributes?
I don't buy the argument that an API needs to be designed around the
requirements of seccomp and the like. It should be the other way
round. In that, I think your configfd idea was fine, and would answer
the above question.
Thanks,
Miklos
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 00/18] VFS: Filesystem information [ver #21]
2020-08-04 15:39 ` [PATCH 00/18] VFS: Filesystem information " James Bottomley
2020-08-04 19:18 ` Miklos Szeredi
@ 2020-08-05 17:13 ` David Howells
1 sibling, 0 replies; 5+ messages in thread
From: David Howells @ 2020-08-05 17:13 UTC (permalink / raw)
To: James Bottomley
Cc: dhowells, viro, Theodore Ts'o, Andreas Dilger, Eric Biggers,
Jeff Layton, linux-ext4, Carlos Maiolino, Darrick J. Wong,
linux-api, torvalds, raven, mszeredi, christian, jannh, kzak,
jlayton, linux-fsdevel, linux-security-module, linux-kernel
James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
> It sort of petered out into a long winding thread about why not use
> sysfs instead, which really doesn't look like a good idea to me.
It seemed to turn into a set of procfs symlinks that pointed at a bunch of
sysfs stuff - or possibly some special filesystem.
> Could I make a suggestion about how this should be done in a way that
> doesn't actually require the fsinfo syscall at all: it could just be
> done with fsconfig.
I'd prefer to keep it separate. The interface for fsconfig() is intended to
move stuff into the kernel, not out of it. Better to add a parallel syscall
to go the other way (kind of like we have setxattr/getxattr, sendmsg/recvmsg).
Further, fsinfo() can refer directly to a file/fd/mount/whatever, but
fsconfig() doesn't do that. You have to use fspick() to get a context before
you can use fsconfig(). Now, that's fine if you want to gather several pieces
of information from a particular object, but it's not so good if you want to
get one piece of information from each of several objects.
> ... make it table configured...
I did, kind of (though I didn't call it that). Al rewrote the code to get rid
of it.
David
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2020-08-05 19:26 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-08-03 13:36 [PATCH 00/18] VFS: Filesystem information [ver #21] David Howells
2020-08-03 13:38 ` [PATCH 14/18] fsinfo: Add support to ext4 " David Howells
2020-08-04 15:39 ` [PATCH 00/18] VFS: Filesystem information " James Bottomley
2020-08-04 19:18 ` Miklos Szeredi
2020-08-05 17:13 ` David Howells
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox