public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 0/3] fs: allow statmount to fetch the fs_subtype and sb_source
@ 2024-11-11 15:09 Jeff Layton
  2024-11-11 15:09 ` [PATCH v4 1/3] fs: don't let statmount return empty strings Jeff Layton
                   ` (4 more replies)
  0 siblings, 5 replies; 26+ messages in thread
From: Jeff Layton @ 2024-11-11 15:09 UTC (permalink / raw)
  To: Alexander Viro, Christian Brauner, Jan Kara
  Cc: Miklos Szeredi, Ian Kent, Josef Bacik, linux-fsdevel,
	linux-kernel, Jeff Layton

Meta has some internal logging that scrapes /proc/self/mountinfo today.
I'd like to convert it to use listmount()/statmount(), so we can do a
better job of monitoring with containers. We're missing some fields
though. This patchset adds them.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
Changes in v4:
- Rename mnt_devname to sb_source
- Break statmount_string() behavior changes out into separate patch
- Link to v3: https://lore.kernel.org/r/20241107-statmount-v3-0-da5b9744c121@kernel.org

Changes in v3:
- Unescape the result of ->show_devname
- Move handling of nothing being emitted out of the switch statement
- Link to v2: https://lore.kernel.org/r/20241106-statmount-v2-0-93ba2aad38d1@kernel.org

Changes in v2:
- make statmount_fs_subtype
- return fast if no subtype is emitted
- new patch to allow statmount() to return devicename
- Link to v1: https://lore.kernel.org/r/20241106-statmount-v1-1-b93bafd97621@kernel.org

---
Jeff Layton (3):
      fs: don't let statmount return empty strings
      fs: add the ability for statmount() to report the fs_subtype
      fs: add the ability for statmount() to report the sb_source

 fs/namespace.c             | 68 ++++++++++++++++++++++++++++++++++++++++++----
 include/uapi/linux/mount.h |  6 +++-
 2 files changed, 67 insertions(+), 7 deletions(-)
---
base-commit: 26213e1a6caa5a7f508b919059b0122b451f4dfe
change-id: 20241106-statmount-3f91a7ed75fa

Best regards,
-- 
Jeff Layton <jlayton@kernel.org>


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v4 1/3] fs: don't let statmount return empty strings
  2024-11-11 15:09 [PATCH v4 0/3] fs: allow statmount to fetch the fs_subtype and sb_source Jeff Layton
@ 2024-11-11 15:09 ` Jeff Layton
  2024-11-11 17:44   ` Jan Kara
  2024-11-11 15:09 ` [PATCH v4 2/3] fs: add the ability for statmount() to report the fs_subtype Jeff Layton
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 26+ messages in thread
From: Jeff Layton @ 2024-11-11 15:09 UTC (permalink / raw)
  To: Alexander Viro, Christian Brauner, Jan Kara
  Cc: Miklos Szeredi, Ian Kent, Josef Bacik, linux-fsdevel,
	linux-kernel, Jeff Layton

When one of the statmount_string() handlers doesn't emit anything to
seq, the kernel currently sets the corresponding flag and emits an empty
string.

Given that statmount() returns a mask of accessible fields, just leave
the bit unset in this case, and skip any NULL termination. If nothing
was emitted to the seq, then the EOVERFLOW and EAGAIN cases aren't
applicable and the function can just return immediately.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/namespace.c | 15 +++++++++++----
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index ba77ce1c6788dfe461814b5826fcbb3aab68fad4..28ad153b1fb6f49653c0a85d12da457c4650a87e 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -5046,22 +5046,23 @@ static int statmount_string(struct kstatmount *s, u64 flag)
 	size_t kbufsize;
 	struct seq_file *seq = &s->seq;
 	struct statmount *sm = &s->sm;
+	u32 start = seq->count;
 
 	switch (flag) {
 	case STATMOUNT_FS_TYPE:
-		sm->fs_type = seq->count;
+		sm->fs_type = start;
 		ret = statmount_fs_type(s, seq);
 		break;
 	case STATMOUNT_MNT_ROOT:
-		sm->mnt_root = seq->count;
+		sm->mnt_root = start;
 		ret = statmount_mnt_root(s, seq);
 		break;
 	case STATMOUNT_MNT_POINT:
-		sm->mnt_point = seq->count;
+		sm->mnt_point = start;
 		ret = statmount_mnt_point(s, seq);
 		break;
 	case STATMOUNT_MNT_OPTS:
-		sm->mnt_opts = seq->count;
+		sm->mnt_opts = start;
 		ret = statmount_mnt_opts(s, seq);
 		break;
 	default:
@@ -5069,6 +5070,12 @@ static int statmount_string(struct kstatmount *s, u64 flag)
 		return -EINVAL;
 	}
 
+	/*
+	 * If nothing was emitted, return to avoid setting the flag
+	 * and terminating the buffer.
+	 */
+	if (seq->count == start)
+		return ret;
 	if (unlikely(check_add_overflow(sizeof(*sm), seq->count, &kbufsize)))
 		return -EOVERFLOW;
 	if (kbufsize >= s->bufsize)

-- 
2.47.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v4 2/3] fs: add the ability for statmount() to report the fs_subtype
  2024-11-11 15:09 [PATCH v4 0/3] fs: allow statmount to fetch the fs_subtype and sb_source Jeff Layton
  2024-11-11 15:09 ` [PATCH v4 1/3] fs: don't let statmount return empty strings Jeff Layton
@ 2024-11-11 15:09 ` Jeff Layton
  2024-11-11 15:09 ` [PATCH v4 3/3] fs: add the ability for statmount() to report the sb_source Jeff Layton
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 26+ messages in thread
From: Jeff Layton @ 2024-11-11 15:09 UTC (permalink / raw)
  To: Alexander Viro, Christian Brauner, Jan Kara
  Cc: Miklos Szeredi, Ian Kent, Josef Bacik, linux-fsdevel,
	linux-kernel, Jeff Layton

/proc/self/mountinfo prints out the sb->s_subtype after the type. This
is particularly useful for disambiguating FUSE mounts (at least when the
userland driver bothers to set it). Add STATMOUNT_FS_SUBTYPE and claim
one of the __spare2 fields to point to the offset into the str[] array.

Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Ian Kent <raven@themaw.net>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/namespace.c             | 19 +++++++++++++++++--
 include/uapi/linux/mount.h |  5 ++++-
 2 files changed, 21 insertions(+), 3 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 28ad153b1fb6f49653c0a85d12da457c4650a87e..fc4f81891d544305caf863904c0a6e16562fab49 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -5006,6 +5006,14 @@ static int statmount_fs_type(struct kstatmount *s, struct seq_file *seq)
 	return 0;
 }
 
+static void statmount_fs_subtype(struct kstatmount *s, struct seq_file *seq)
+{
+	struct super_block *sb = s->mnt->mnt_sb;
+
+	if (sb->s_subtype)
+		seq_puts(seq, sb->s_subtype);
+}
+
 static void statmount_mnt_ns_id(struct kstatmount *s, struct mnt_namespace *ns)
 {
 	s->sm.mask |= STATMOUNT_MNT_NS_ID;
@@ -5042,7 +5050,7 @@ static int statmount_mnt_opts(struct kstatmount *s, struct seq_file *seq)
 
 static int statmount_string(struct kstatmount *s, u64 flag)
 {
-	int ret;
+	int ret = 0;
 	size_t kbufsize;
 	struct seq_file *seq = &s->seq;
 	struct statmount *sm = &s->sm;
@@ -5065,6 +5073,10 @@ static int statmount_string(struct kstatmount *s, u64 flag)
 		sm->mnt_opts = start;
 		ret = statmount_mnt_opts(s, seq);
 		break;
+	case STATMOUNT_FS_SUBTYPE:
+		sm->fs_subtype = start;
+		statmount_fs_subtype(s, seq);
+		break;
 	default:
 		WARN_ON_ONCE(true);
 		return -EINVAL;
@@ -5210,6 +5222,9 @@ static int do_statmount(struct kstatmount *s, u64 mnt_id, u64 mnt_ns_id,
 	if (!err && s->mask & STATMOUNT_MNT_OPTS)
 		err = statmount_string(s, STATMOUNT_MNT_OPTS);
 
+	if (!err && s->mask & STATMOUNT_FS_SUBTYPE)
+		err = statmount_string(s, STATMOUNT_FS_SUBTYPE);
+
 	if (!err && s->mask & STATMOUNT_MNT_NS_ID)
 		statmount_mnt_ns_id(s, ns);
 
@@ -5231,7 +5246,7 @@ static inline bool retry_statmount(const long ret, size_t *seq_size)
 }
 
 #define STATMOUNT_STRING_REQ (STATMOUNT_MNT_ROOT | STATMOUNT_MNT_POINT | \
-			      STATMOUNT_FS_TYPE | STATMOUNT_MNT_OPTS)
+			      STATMOUNT_FS_TYPE | STATMOUNT_MNT_OPTS | STATMOUNT_FS_SUBTYPE)
 
 static int prepare_kstatmount(struct kstatmount *ks, struct mnt_id_req *kreq,
 			      struct statmount __user *buf, size_t bufsize,
diff --git a/include/uapi/linux/mount.h b/include/uapi/linux/mount.h
index 225bc366ffcbf0319929e2f55f1fbea88e4d7b81..2e939dddf9cbabe574dafdb6cff9ad4cf9298a74 100644
--- a/include/uapi/linux/mount.h
+++ b/include/uapi/linux/mount.h
@@ -173,7 +173,9 @@ struct statmount {
 	__u32 mnt_root;		/* [str] Root of mount relative to root of fs */
 	__u32 mnt_point;	/* [str] Mountpoint relative to current root */
 	__u64 mnt_ns_id;	/* ID of the mount namespace */
-	__u64 __spare2[49];
+	__u32 fs_subtype;	/* [str] Subtype of fs_type (if any) */
+	__u32 __spare1[1];
+	__u64 __spare2[48];
 	char str[];		/* Variable size part containing strings */
 };
 
@@ -207,6 +209,7 @@ struct mnt_id_req {
 #define STATMOUNT_FS_TYPE		0x00000020U	/* Want/got fs_type */
 #define STATMOUNT_MNT_NS_ID		0x00000040U	/* Want/got mnt_ns_id */
 #define STATMOUNT_MNT_OPTS		0x00000080U	/* Want/got mnt_opts */
+#define STATMOUNT_FS_SUBTYPE		0x00000100U	/* Want/got fs_subtype */
 
 /*
  * Special @mnt_id values that can be passed to listmount

-- 
2.47.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v4 3/3] fs: add the ability for statmount() to report the sb_source
  2024-11-11 15:09 [PATCH v4 0/3] fs: allow statmount to fetch the fs_subtype and sb_source Jeff Layton
  2024-11-11 15:09 ` [PATCH v4 1/3] fs: don't let statmount return empty strings Jeff Layton
  2024-11-11 15:09 ` [PATCH v4 2/3] fs: add the ability for statmount() to report the fs_subtype Jeff Layton
@ 2024-11-11 15:09 ` Jeff Layton
  2024-11-12 10:32 ` [PATCH v4 0/3] fs: allow statmount to fetch the fs_subtype and sb_source Miklos Szeredi
  2024-11-12 13:39 ` Christian Brauner
  4 siblings, 0 replies; 26+ messages in thread
From: Jeff Layton @ 2024-11-11 15:09 UTC (permalink / raw)
  To: Alexander Viro, Christian Brauner, Jan Kara
  Cc: Miklos Szeredi, Ian Kent, Josef Bacik, linux-fsdevel,
	linux-kernel, Jeff Layton

/proc/self/mountinfo displays the source for the mount, but statmount()
doesn't yet have a way to return it. Add a new STATMOUNT_SB_SOURCE flag,
claim the 32-bit __spare1 field to hold the offset into the str[] array.

Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/namespace.c             | 36 +++++++++++++++++++++++++++++++++++-
 include/uapi/linux/mount.h |  3 ++-
 2 files changed, 37 insertions(+), 2 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index fc4f81891d544305caf863904c0a6e16562fab49..4f034dba5884ce3641b8e4048e21879e4bda896c 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -5014,6 +5014,32 @@ static void statmount_fs_subtype(struct kstatmount *s, struct seq_file *seq)
 		seq_puts(seq, sb->s_subtype);
 }
 
+static int statmount_sb_source(struct kstatmount *s, struct seq_file *seq)
+{
+	struct super_block *sb = s->mnt->mnt_sb;
+	struct mount *r = real_mount(s->mnt);
+
+	if (sb->s_op->show_devname) {
+		size_t start = seq->count;
+		int ret;
+
+		ret = sb->s_op->show_devname(seq, s->mnt->mnt_root);
+		if (ret)
+			return ret;
+
+		if (unlikely(seq_has_overflowed(seq)))
+			return -EAGAIN;
+
+		/* Unescape the result */
+		seq->buf[seq->count] = '\0';
+		seq->count = start;
+		seq_commit(seq, string_unescape_inplace(seq->buf + start, UNESCAPE_OCTAL));
+	} else if (r->mnt_devname) {
+		seq_puts(seq, r->mnt_devname);
+	}
+	return 0;
+}
+
 static void statmount_mnt_ns_id(struct kstatmount *s, struct mnt_namespace *ns)
 {
 	s->sm.mask |= STATMOUNT_MNT_NS_ID;
@@ -5077,6 +5103,10 @@ static int statmount_string(struct kstatmount *s, u64 flag)
 		sm->fs_subtype = start;
 		statmount_fs_subtype(s, seq);
 		break;
+	case STATMOUNT_SB_SOURCE:
+		sm->sb_source = seq->count;
+		ret = statmount_sb_source(s, seq);
+		break;
 	default:
 		WARN_ON_ONCE(true);
 		return -EINVAL;
@@ -5225,6 +5255,9 @@ static int do_statmount(struct kstatmount *s, u64 mnt_id, u64 mnt_ns_id,
 	if (!err && s->mask & STATMOUNT_FS_SUBTYPE)
 		err = statmount_string(s, STATMOUNT_FS_SUBTYPE);
 
+	if (!err && s->mask & STATMOUNT_SB_SOURCE)
+		err = statmount_string(s, STATMOUNT_SB_SOURCE);
+
 	if (!err && s->mask & STATMOUNT_MNT_NS_ID)
 		statmount_mnt_ns_id(s, ns);
 
@@ -5246,7 +5279,8 @@ static inline bool retry_statmount(const long ret, size_t *seq_size)
 }
 
 #define STATMOUNT_STRING_REQ (STATMOUNT_MNT_ROOT | STATMOUNT_MNT_POINT | \
-			      STATMOUNT_FS_TYPE | STATMOUNT_MNT_OPTS | STATMOUNT_FS_SUBTYPE)
+			      STATMOUNT_FS_TYPE | STATMOUNT_MNT_OPTS | \
+			      STATMOUNT_FS_SUBTYPE | STATMOUNT_SB_SOURCE)
 
 static int prepare_kstatmount(struct kstatmount *ks, struct mnt_id_req *kreq,
 			      struct statmount __user *buf, size_t bufsize,
diff --git a/include/uapi/linux/mount.h b/include/uapi/linux/mount.h
index 2e939dddf9cbabe574dafdb6cff9ad4cf9298a74..2b49e9131d77165899d8e3c17366c6afaa8b7795 100644
--- a/include/uapi/linux/mount.h
+++ b/include/uapi/linux/mount.h
@@ -174,7 +174,7 @@ struct statmount {
 	__u32 mnt_point;	/* [str] Mountpoint relative to current root */
 	__u64 mnt_ns_id;	/* ID of the mount namespace */
 	__u32 fs_subtype;	/* [str] Subtype of fs_type (if any) */
-	__u32 __spare1[1];
+	__u32 sb_source;	/* [str] Source string of the mount */
 	__u64 __spare2[48];
 	char str[];		/* Variable size part containing strings */
 };
@@ -210,6 +210,7 @@ struct mnt_id_req {
 #define STATMOUNT_MNT_NS_ID		0x00000040U	/* Want/got mnt_ns_id */
 #define STATMOUNT_MNT_OPTS		0x00000080U	/* Want/got mnt_opts */
 #define STATMOUNT_FS_SUBTYPE		0x00000100U	/* Want/got fs_subtype */
+#define STATMOUNT_SB_SOURCE		0x00000200U	/* Want/got sb_source */
 
 /*
  * Special @mnt_id values that can be passed to listmount

-- 
2.47.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH v4 1/3] fs: don't let statmount return empty strings
  2024-11-11 15:09 ` [PATCH v4 1/3] fs: don't let statmount return empty strings Jeff Layton
@ 2024-11-11 17:44   ` Jan Kara
  0 siblings, 0 replies; 26+ messages in thread
From: Jan Kara @ 2024-11-11 17:44 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Alexander Viro, Christian Brauner, Jan Kara, Miklos Szeredi,
	Ian Kent, Josef Bacik, linux-fsdevel, linux-kernel

On Mon 11-11-24 10:09:55, Jeff Layton wrote:
> When one of the statmount_string() handlers doesn't emit anything to
> seq, the kernel currently sets the corresponding flag and emits an empty
> string.
> 
> Given that statmount() returns a mask of accessible fields, just leave
> the bit unset in this case, and skip any NULL termination. If nothing
> was emitted to the seq, then the EOVERFLOW and EAGAIN cases aren't
> applicable and the function can just return immediately.
> 
> Signed-off-by: Jeff Layton <jlayton@kernel.org>

Looks good. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  fs/namespace.c | 15 +++++++++++----
>  1 file changed, 11 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/namespace.c b/fs/namespace.c
> index ba77ce1c6788dfe461814b5826fcbb3aab68fad4..28ad153b1fb6f49653c0a85d12da457c4650a87e 100644
> --- a/fs/namespace.c
> +++ b/fs/namespace.c
> @@ -5046,22 +5046,23 @@ static int statmount_string(struct kstatmount *s, u64 flag)
>  	size_t kbufsize;
>  	struct seq_file *seq = &s->seq;
>  	struct statmount *sm = &s->sm;
> +	u32 start = seq->count;
>  
>  	switch (flag) {
>  	case STATMOUNT_FS_TYPE:
> -		sm->fs_type = seq->count;
> +		sm->fs_type = start;
>  		ret = statmount_fs_type(s, seq);
>  		break;
>  	case STATMOUNT_MNT_ROOT:
> -		sm->mnt_root = seq->count;
> +		sm->mnt_root = start;
>  		ret = statmount_mnt_root(s, seq);
>  		break;
>  	case STATMOUNT_MNT_POINT:
> -		sm->mnt_point = seq->count;
> +		sm->mnt_point = start;
>  		ret = statmount_mnt_point(s, seq);
>  		break;
>  	case STATMOUNT_MNT_OPTS:
> -		sm->mnt_opts = seq->count;
> +		sm->mnt_opts = start;
>  		ret = statmount_mnt_opts(s, seq);
>  		break;
>  	default:
> @@ -5069,6 +5070,12 @@ static int statmount_string(struct kstatmount *s, u64 flag)
>  		return -EINVAL;
>  	}
>  
> +	/*
> +	 * If nothing was emitted, return to avoid setting the flag
> +	 * and terminating the buffer.
> +	 */
> +	if (seq->count == start)
> +		return ret;
>  	if (unlikely(check_add_overflow(sizeof(*sm), seq->count, &kbufsize)))
>  		return -EOVERFLOW;
>  	if (kbufsize >= s->bufsize)
> 
> -- 
> 2.47.0
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4 0/3] fs: allow statmount to fetch the fs_subtype and sb_source
  2024-11-11 15:09 [PATCH v4 0/3] fs: allow statmount to fetch the fs_subtype and sb_source Jeff Layton
                   ` (2 preceding siblings ...)
  2024-11-11 15:09 ` [PATCH v4 3/3] fs: add the ability for statmount() to report the sb_source Jeff Layton
@ 2024-11-12 10:32 ` Miklos Szeredi
  2024-11-12 11:12   ` Jeff Layton
  2024-11-12 13:39 ` Christian Brauner
  4 siblings, 1 reply; 26+ messages in thread
From: Miklos Szeredi @ 2024-11-12 10:32 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Alexander Viro, Christian Brauner, Jan Kara, Ian Kent,
	Josef Bacik, linux-fsdevel, linux-kernel

On Mon, 11 Nov 2024 at 16:10, Jeff Layton <jlayton@kernel.org> wrote:
>
> Meta has some internal logging that scrapes /proc/self/mountinfo today.
> I'd like to convert it to use listmount()/statmount(), so we can do a
> better job of monitoring with containers. We're missing some fields
> though. This patchset adds them.
>
> Signed-off-by: Jeff Layton <jlayton@kernel.org>

When thinking of the naming for the unescaped options, I realized that
maybe "source" is better than "sb_soure" which just adds redundant
info.   Just a thought, don't bother if you don't agree.

Acked-by: Miklos Szeredi <mszeredi@redhat.com>

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4 0/3] fs: allow statmount to fetch the fs_subtype and sb_source
  2024-11-12 10:32 ` [PATCH v4 0/3] fs: allow statmount to fetch the fs_subtype and sb_source Miklos Szeredi
@ 2024-11-12 11:12   ` Jeff Layton
  0 siblings, 0 replies; 26+ messages in thread
From: Jeff Layton @ 2024-11-12 11:12 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Alexander Viro, Christian Brauner, Jan Kara, Ian Kent,
	Josef Bacik, linux-fsdevel, linux-kernel

On Tue, 2024-11-12 at 11:32 +0100, Miklos Szeredi wrote:
> On Mon, 11 Nov 2024 at 16:10, Jeff Layton <jlayton@kernel.org> wrote:
> > 
> > Meta has some internal logging that scrapes /proc/self/mountinfo today.
> > I'd like to convert it to use listmount()/statmount(), so we can do a
> > better job of monitoring with containers. We're missing some fields
> > though. This patchset adds them.
> > 
> > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> 
> When thinking of the naming for the unescaped options, I realized that
> maybe "source" is better than "sb_soure" which just adds redundant
> info.   Just a thought, don't bother if you don't agree.
> 
> Acked-by: Miklos Szeredi <mszeredi@redhat.com>
> 

I think sb_source fits better with the convention that the other fields
in the struct use. I say we stick with that.
-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4 0/3] fs: allow statmount to fetch the fs_subtype and sb_source
  2024-11-11 15:09 [PATCH v4 0/3] fs: allow statmount to fetch the fs_subtype and sb_source Jeff Layton
                   ` (3 preceding siblings ...)
  2024-11-12 10:32 ` [PATCH v4 0/3] fs: allow statmount to fetch the fs_subtype and sb_source Miklos Szeredi
@ 2024-11-12 13:39 ` Christian Brauner
  2024-11-13 11:27   ` Karel Zak
  4 siblings, 1 reply; 26+ messages in thread
From: Christian Brauner @ 2024-11-12 13:39 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Christian Brauner, Miklos Szeredi, Ian Kent, Josef Bacik,
	linux-fsdevel, linux-kernel, Alexander Viro, Jan Kara

On Mon, 11 Nov 2024 10:09:54 -0500, Jeff Layton wrote:
> Meta has some internal logging that scrapes /proc/self/mountinfo today.
> I'd like to convert it to use listmount()/statmount(), so we can do a
> better job of monitoring with containers. We're missing some fields
> though. This patchset adds them.
> 
> 

Applied to the vfs.misc branch of the vfs/vfs.git tree.
Patches in the vfs.misc branch should appear in linux-next soon.

Please report any outstanding bugs that were missed during review in a
new review to the original patch series allowing us to drop it.

It's encouraged to provide Acked-bys and Reviewed-bys even though the
patch has now been applied. If possible patch trailers will be updated.

Note that commit hashes shown below are subject to change due to rebase,
trailer updates or similar. If in doubt, please check the listed branch.

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git
branch: vfs.misc

[1/3] fs: don't let statmount return empty strings
      https://git.kernel.org/vfs/vfs/c/75ead69a7173
[2/3] fs: add the ability for statmount() to report the fs_subtype
      https://git.kernel.org/vfs/vfs/c/ed9d95f691c2
[3/3] fs: add the ability for statmount() to report the sb_source
      https://git.kernel.org/vfs/vfs/c/d31cea0ab403

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4 0/3] fs: allow statmount to fetch the fs_subtype and sb_source
  2024-11-12 13:39 ` Christian Brauner
@ 2024-11-13 11:27   ` Karel Zak
  2024-11-13 13:08     ` Miklos Szeredi
  2024-11-13 13:45     ` Jeff Layton
  0 siblings, 2 replies; 26+ messages in thread
From: Karel Zak @ 2024-11-13 11:27 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Jeff Layton, Miklos Szeredi, Ian Kent, Josef Bacik, linux-fsdevel,
	linux-kernel, Alexander Viro, Jan Kara

On Tue, Nov 12, 2024 at 02:39:21PM GMT, Christian Brauner wrote:
> On Mon, 11 Nov 2024 10:09:54 -0500, Jeff Layton wrote:
> > Meta has some internal logging that scrapes /proc/self/mountinfo today.
> > I'd like to convert it to use listmount()/statmount(), so we can do a
> > better job of monitoring with containers. We're missing some fields
> > though. This patchset adds them.
> > 
> > 
> 
> Applied to the vfs.misc branch of the vfs/vfs.git tree.
> Patches in the vfs.misc branch should appear in linux-next soon.

Jeff, thank you for this!

I have already implemented support for statmount() and listmount() in
libmount (PR: https://github.com/util-linux/util-linux/pull/3092). The
only remaining issue was the mount source and incomplete file system
type.

Currently, the library does not use these syscalls when mounting (it
still uses /proc/#/mountinfo). However, there is already a library API
to fetch mount information from the kernel using listmount+statmount,
and it can be accessed from the command line using:

   findmnt --kernel=listmount

Next on the wish list is a notification (a file descriptor that can be
used in epoll) that returns a 64-bit ID when there is a change in the
mount node. This will enable us to enhance systemd so that it does not
have to read the entire mount table after every change.


    Karel

-- 
 Karel Zak  <kzak@redhat.com>
 http://karelzak.blogspot.com


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4 0/3] fs: allow statmount to fetch the fs_subtype and sb_source
  2024-11-13 11:27   ` Karel Zak
@ 2024-11-13 13:08     ` Miklos Szeredi
  2024-11-13 13:45     ` Jeff Layton
  1 sibling, 0 replies; 26+ messages in thread
From: Miklos Szeredi @ 2024-11-13 13:08 UTC (permalink / raw)
  To: Karel Zak
  Cc: Christian Brauner, Jeff Layton, Ian Kent, Josef Bacik,
	linux-fsdevel, linux-kernel, Alexander Viro, Jan Kara,
	Amir Goldstein

On Wed, 13 Nov 2024 at 12:28, Karel Zak <kzak@redhat.com> wrote:

> Next on the wish list is a notification (a file descriptor that can be
> used in epoll) that returns a 64-bit ID when there is a change in the
> mount node. This will enable us to enhance systemd so that it does not
> have to read the entire mount table after every change.

IIRC Amir said he'd take this up (this was at LSFMM 2023).  I'm also
happy to take part in the design and implementation.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4 0/3] fs: allow statmount to fetch the fs_subtype and sb_source
  2024-11-13 11:27   ` Karel Zak
  2024-11-13 13:08     ` Miklos Szeredi
@ 2024-11-13 13:45     ` Jeff Layton
  2024-11-13 15:18       ` Jan Kara
  2024-11-14 11:29       ` Christian Brauner
  1 sibling, 2 replies; 26+ messages in thread
From: Jeff Layton @ 2024-11-13 13:45 UTC (permalink / raw)
  To: Karel Zak, Christian Brauner
  Cc: Miklos Szeredi, Ian Kent, Josef Bacik, linux-fsdevel,
	linux-kernel, Alexander Viro, Jan Kara

On Wed, 2024-11-13 at 12:27 +0100, Karel Zak wrote:
> On Tue, Nov 12, 2024 at 02:39:21PM GMT, Christian Brauner wrote:
> > On Mon, 11 Nov 2024 10:09:54 -0500, Jeff Layton wrote:
> > > Meta has some internal logging that scrapes /proc/self/mountinfo today.
> > > I'd like to convert it to use listmount()/statmount(), so we can do a
> > > better job of monitoring with containers. We're missing some fields
> > > though. This patchset adds them.
> > > 
> > > 
> > 
> > Applied to the vfs.misc branch of the vfs/vfs.git tree.
> > Patches in the vfs.misc branch should appear in linux-next soon.
> 
> Jeff, thank you for this!
> 
> I have already implemented support for statmount() and listmount() in
> libmount (PR: https://github.com/util-linux/util-linux/pull/3092). The
> only remaining issue was the mount source and incomplete file system
> type.
> 

Unfortunately, I think we might be missing something else:

The mountinfo (and "mounts") file generator calls show_sb_opts() which
generates some strings from the sb->s_flags field and then calls
security_sb_show_options(). statmount() will give you the s_flags field
(or an equivalent), but it doesn't give you the security options
string. So, those aren't currently visible from statmount().

How should we expose those? Should we create a new statmount string
field and populate it, or is it better to just tack them onto the end
of the statmount.mnt_opts string?

> Currently, the library does not use these syscalls when mounting (it
> still uses /proc/#/mountinfo). However, there is already a library API
> to fetch mount information from the kernel using listmount+statmount,
> and it can be accessed from the command line using:
> 
>    findmnt --kernel=listmount
> 

Very cool. I need to play with findmnt more.

> Next on the wish list is a notification (a file descriptor that can be
> used in epoll) that returns a 64-bit ID when there is a change in the
> mount node. This will enable us to enhance systemd so that it does not
> have to read the entire mount table after every change.
> 

New fanotify events for mount table changes, perhaps?

-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4 0/3] fs: allow statmount to fetch the fs_subtype and sb_source
  2024-11-13 13:45     ` Jeff Layton
@ 2024-11-13 15:18       ` Jan Kara
  2024-11-13 15:49         ` Miklos Szeredi
                           ` (2 more replies)
  2024-11-14 11:29       ` Christian Brauner
  1 sibling, 3 replies; 26+ messages in thread
From: Jan Kara @ 2024-11-13 15:18 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Karel Zak, Christian Brauner, Miklos Szeredi, Ian Kent,
	Josef Bacik, linux-fsdevel, linux-kernel, Alexander Viro,
	Jan Kara

On Wed 13-11-24 08:45:06, Jeff Layton wrote:
> On Wed, 2024-11-13 at 12:27 +0100, Karel Zak wrote:
> > On Tue, Nov 12, 2024 at 02:39:21PM GMT, Christian Brauner wrote:
> > Next on the wish list is a notification (a file descriptor that can be
> > used in epoll) that returns a 64-bit ID when there is a change in the
> > mount node. This will enable us to enhance systemd so that it does not
> > have to read the entire mount table after every change.
> > 
> 
> New fanotify events for mount table changes, perhaps?

Now that I'm looking at it I'm not sure fanotify is a great fit for this
usecase. A lot of fanotify functionality does not really work for virtual
filesystems such as proc and hence we generally try to discourage use of
fanotify for them. So just supporting one type of event (like FAN_MODIFY)
on one file inside proc looks as rather inconsistent interface. But I
vaguely remember we were discussing some kind of mount event, weren't we?
Or was that for something else?

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4 0/3] fs: allow statmount to fetch the fs_subtype and sb_source
  2024-11-13 15:18       ` Jan Kara
@ 2024-11-13 15:49         ` Miklos Szeredi
  2024-11-13 16:00           ` Jan Kara
  2024-11-14  1:45         ` Ian Kent
  2024-11-14  1:51         ` Ian Kent
  2 siblings, 1 reply; 26+ messages in thread
From: Miklos Szeredi @ 2024-11-13 15:49 UTC (permalink / raw)
  To: Jan Kara
  Cc: Jeff Layton, Karel Zak, Christian Brauner, Ian Kent, Josef Bacik,
	linux-fsdevel, linux-kernel, Alexander Viro, Lennart Poettering

On Wed, 13 Nov 2024 at 16:18, Jan Kara <jack@suse.cz> wrote:
>
> On Wed 13-11-24 08:45:06, Jeff Layton wrote:
> > On Wed, 2024-11-13 at 12:27 +0100, Karel Zak wrote:
> > > On Tue, Nov 12, 2024 at 02:39:21PM GMT, Christian Brauner wrote:
> > > Next on the wish list is a notification (a file descriptor that can be
> > > used in epoll) that returns a 64-bit ID when there is a change in the
> > > mount node. This will enable us to enhance systemd so that it does not
> > > have to read the entire mount table after every change.
> > >
> >
> > New fanotify events for mount table changes, perhaps?
>
> Now that I'm looking at it I'm not sure fanotify is a great fit for this
> usecase. A lot of fanotify functionality does not really work for virtual
> filesystems such as proc and hence we generally try to discourage use of
> fanotify for them. So just supporting one type of event (like FAN_MODIFY)
> on one file inside proc looks as rather inconsistent interface. But I
> vaguely remember we were discussing some kind of mount event, weren't we?
> Or was that for something else?

Yeah, if memory serves right what we agreed on was that placing a
watch on a mount would result in events being generated for
mount/umount/move_mount directly under that mount.  So this would not
be monitoring the entire namespace as poll on /proc/$$/mountinfo does.
IIRC Lennart said that this is okay and even desirable for systemd,
since it's only interested in a particular set of mounts.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4 0/3] fs: allow statmount to fetch the fs_subtype and sb_source
  2024-11-13 15:49         ` Miklos Szeredi
@ 2024-11-13 16:00           ` Jan Kara
  0 siblings, 0 replies; 26+ messages in thread
From: Jan Kara @ 2024-11-13 16:00 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Jan Kara, Jeff Layton, Karel Zak, Christian Brauner, Ian Kent,
	Josef Bacik, linux-fsdevel, linux-kernel, Alexander Viro,
	Lennart Poettering

On Wed 13-11-24 16:49:52, Miklos Szeredi wrote:
> On Wed, 13 Nov 2024 at 16:18, Jan Kara <jack@suse.cz> wrote:
> >
> > On Wed 13-11-24 08:45:06, Jeff Layton wrote:
> > > On Wed, 2024-11-13 at 12:27 +0100, Karel Zak wrote:
> > > > On Tue, Nov 12, 2024 at 02:39:21PM GMT, Christian Brauner wrote:
> > > > Next on the wish list is a notification (a file descriptor that can be
> > > > used in epoll) that returns a 64-bit ID when there is a change in the
> > > > mount node. This will enable us to enhance systemd so that it does not
> > > > have to read the entire mount table after every change.
> > > >
> > >
> > > New fanotify events for mount table changes, perhaps?
> >
> > Now that I'm looking at it I'm not sure fanotify is a great fit for this
> > usecase. A lot of fanotify functionality does not really work for virtual
> > filesystems such as proc and hence we generally try to discourage use of
> > fanotify for them. So just supporting one type of event (like FAN_MODIFY)
> > on one file inside proc looks as rather inconsistent interface. But I
> > vaguely remember we were discussing some kind of mount event, weren't we?
> > Or was that for something else?
> 
> Yeah, if memory serves right what we agreed on was that placing a
> watch on a mount would result in events being generated for
> mount/umount/move_mount directly under that mount.  So this would not
> be monitoring the entire namespace as poll on /proc/$$/mountinfo does.
> IIRC Lennart said that this is okay and even desirable for systemd,
> since it's only interested in a particular set of mounts.

Oh, right. Thanks for reminding me. And yes, this would fit within what
fanotify supports quite nicely.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4 0/3] fs: allow statmount to fetch the fs_subtype and sb_source
  2024-11-13 15:18       ` Jan Kara
  2024-11-13 15:49         ` Miklos Szeredi
@ 2024-11-14  1:45         ` Ian Kent
  2024-11-14 11:56           ` Jan Kara
  2024-11-14  1:51         ` Ian Kent
  2 siblings, 1 reply; 26+ messages in thread
From: Ian Kent @ 2024-11-14  1:45 UTC (permalink / raw)
  To: Jan Kara, Jeff Layton
  Cc: Karel Zak, Christian Brauner, Miklos Szeredi, Josef Bacik,
	linux-fsdevel, linux-kernel, Alexander Viro

On 13/11/24 23:18, Jan Kara wrote:
> On Wed 13-11-24 08:45:06, Jeff Layton wrote:
>> On Wed, 2024-11-13 at 12:27 +0100, Karel Zak wrote:
>>> On Tue, Nov 12, 2024 at 02:39:21PM GMT, Christian Brauner wrote:
>>> Next on the wish list is a notification (a file descriptor that can be
>>> used in epoll) that returns a 64-bit ID when there is a change in the
>>> mount node. This will enable us to enhance systemd so that it does not
>>> have to read the entire mount table after every change.
>>>
>> New fanotify events for mount table changes, perhaps?
> Now that I'm looking at it I'm not sure fanotify is a great fit for this
> usecase. A lot of fanotify functionality does not really work for virtual
> filesystems such as proc and hence we generally try to discourage use of
> fanotify for them. So just supporting one type of event (like FAN_MODIFY)
> on one file inside proc looks as rather inconsistent interface. But I
> vaguely remember we were discussing some kind of mount event, weren't we?
> Or was that for something else?

I still need to have a look at the existing notifications sub-systems but,

tbh, I also don't think they offer the needed functionality.


The thing that was most useful with David's notifications when I was trying

to improve the mounts handling was the queuing interface. It allowed me to

batch notifications up to around a couple of hundred and grab them in one go

for processing. This significantly lowered the overhead of rapid fire event

processing. The ability to go directly to an individual mount and get it's

information only got about half the improvement I saw, the rest come 
from the

notifications improvement.


Ian


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4 0/3] fs: allow statmount to fetch the fs_subtype and sb_source
  2024-11-13 15:18       ` Jan Kara
  2024-11-13 15:49         ` Miklos Szeredi
  2024-11-14  1:45         ` Ian Kent
@ 2024-11-14  1:51         ` Ian Kent
  2 siblings, 0 replies; 26+ messages in thread
From: Ian Kent @ 2024-11-14  1:51 UTC (permalink / raw)
  To: Jan Kara, Jeff Layton
  Cc: Karel Zak, Christian Brauner, Miklos Szeredi, Josef Bacik,
	linux-fsdevel, linux-kernel, Alexander Viro


On 13/11/24 23:18, Jan Kara wrote:
> On Wed 13-11-24 08:45:06, Jeff Layton wrote:
>> On Wed, 2024-11-13 at 12:27 +0100, Karel Zak wrote:
>>> On Tue, Nov 12, 2024 at 02:39:21PM GMT, Christian Brauner wrote:
>>> Next on the wish list is a notification (a file descriptor that can be
>>> used in epoll) that returns a 64-bit ID when there is a change in the
>>> mount node. This will enable us to enhance systemd so that it does not
>>> have to read the entire mount table after every change.
>>>
>> New fanotify events for mount table changes, perhaps?
> Now that I'm looking at it I'm not sure fanotify is a great fit for this
> usecase. A lot of fanotify functionality does not really work for virtual
> filesystems such as proc and hence we generally try to discourage use of
> fanotify for them. So just supporting one type of event (like FAN_MODIFY)
> on one file inside proc looks as rather inconsistent interface. But I
> vaguely remember we were discussing some kind of mount event, weren't we?
> Or was that for something else?

Well, yes, the idea was to be able to avoid the overhead of scanning the

proc mount table(s) pretty much entirely, particularly bad for rapid fire

event handling such as a largish number of rapid mounta or umounts.


Ian


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4 0/3] fs: allow statmount to fetch the fs_subtype and sb_source
  2024-11-13 13:45     ` Jeff Layton
  2024-11-13 15:18       ` Jan Kara
@ 2024-11-14 11:29       ` Christian Brauner
  2024-11-14 12:29         ` Jeff Layton
  1 sibling, 1 reply; 26+ messages in thread
From: Christian Brauner @ 2024-11-14 11:29 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Karel Zak, Miklos Szeredi, Ian Kent, Josef Bacik, linux-fsdevel,
	linux-kernel, Alexander Viro, Jan Kara

On Wed, Nov 13, 2024 at 08:45:06AM -0500, Jeff Layton wrote:
> On Wed, 2024-11-13 at 12:27 +0100, Karel Zak wrote:
> > On Tue, Nov 12, 2024 at 02:39:21PM GMT, Christian Brauner wrote:
> > > On Mon, 11 Nov 2024 10:09:54 -0500, Jeff Layton wrote:
> > > > Meta has some internal logging that scrapes /proc/self/mountinfo today.
> > > > I'd like to convert it to use listmount()/statmount(), so we can do a
> > > > better job of monitoring with containers. We're missing some fields
> > > > though. This patchset adds them.
> > > > 
> > > > 
> > > 
> > > Applied to the vfs.misc branch of the vfs/vfs.git tree.
> > > Patches in the vfs.misc branch should appear in linux-next soon.
> > 
> > Jeff, thank you for this!
> > 
> > I have already implemented support for statmount() and listmount() in
> > libmount (PR: https://github.com/util-linux/util-linux/pull/3092). The
> > only remaining issue was the mount source and incomplete file system
> > type.
> > 
> 
> Unfortunately, I think we might be missing something else:
> 
> The mountinfo (and "mounts") file generator calls show_sb_opts() which
> generates some strings from the sb->s_flags field and then calls
> security_sb_show_options(). statmount() will give you the s_flags field
> (or an equivalent), but it doesn't give you the security options
> string. So, those aren't currently visible from statmount().
> 
> How should we expose those? Should we create a new statmount string
> field and populate it, or is it better to just tack them onto the end
> of the statmount.mnt_opts string?

I'm leaning towards using a separate field because mnt_opts/opts_array
is about filesystem specific mount options whereas the security mount
options are somewhat generic. So it's easy to tell them apart.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4 0/3] fs: allow statmount to fetch the fs_subtype and sb_source
  2024-11-14  1:45         ` Ian Kent
@ 2024-11-14 11:56           ` Jan Kara
  2024-11-14 13:20             ` Miklos Szeredi
  2024-11-17 23:29             ` Ian Kent
  0 siblings, 2 replies; 26+ messages in thread
From: Jan Kara @ 2024-11-14 11:56 UTC (permalink / raw)
  To: Ian Kent
  Cc: Jan Kara, Jeff Layton, Karel Zak, Christian Brauner,
	Miklos Szeredi, Josef Bacik, linux-fsdevel, linux-kernel,
	Alexander Viro

On Thu 14-11-24 09:45:23, Ian Kent wrote:
> On 13/11/24 23:18, Jan Kara wrote:
> > On Wed 13-11-24 08:45:06, Jeff Layton wrote:
> > > On Wed, 2024-11-13 at 12:27 +0100, Karel Zak wrote:
> > > > On Tue, Nov 12, 2024 at 02:39:21PM GMT, Christian Brauner wrote:
> > > > Next on the wish list is a notification (a file descriptor that can be
> > > > used in epoll) that returns a 64-bit ID when there is a change in the
> > > > mount node. This will enable us to enhance systemd so that it does not
> > > > have to read the entire mount table after every change.
> > > > 
> > > New fanotify events for mount table changes, perhaps?
> > Now that I'm looking at it I'm not sure fanotify is a great fit for this
> > usecase. A lot of fanotify functionality does not really work for virtual
> > filesystems such as proc and hence we generally try to discourage use of
> > fanotify for them. So just supporting one type of event (like FAN_MODIFY)
> > on one file inside proc looks as rather inconsistent interface. But I
> > vaguely remember we were discussing some kind of mount event, weren't we?
> > Or was that for something else?
> 
> I still need to have a look at the existing notifications sub-systems but,
> tbh, I also don't think they offer the needed functionality.
> 
> The thing that was most useful with David's notifications when I was trying
> to improve the mounts handling was the queuing interface. It allowed me to
> batch notifications up to around a couple of hundred and grab them in one go
> for processing. This significantly lowered the overhead of rapid fire event
> processing. The ability to go directly to an individual mount and get it's
> information only got about half the improvement I saw, the rest come from
> the notifications improvement.

Well, if we implemented the mount notification events in fanotify, then the
mount events get queued in the notification group queue and you can process
the whole batch of events in one go if you want. So I don't see batching as
an issue. What I'm more worried about is that watching the whole system
for new mounts is going to be somewhat cumbersome when all you can do is to
watch new mounts attached under an existing mount / filesystem.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4 0/3] fs: allow statmount to fetch the fs_subtype and sb_source
  2024-11-14 11:29       ` Christian Brauner
@ 2024-11-14 12:29         ` Jeff Layton
  2024-11-14 13:16           ` Miklos Szeredi
  0 siblings, 1 reply; 26+ messages in thread
From: Jeff Layton @ 2024-11-14 12:29 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Karel Zak, Miklos Szeredi, Ian Kent, Josef Bacik, linux-fsdevel,
	linux-kernel, Alexander Viro, Jan Kara

On Thu, 2024-11-14 at 12:29 +0100, Christian Brauner wrote:
> On Wed, Nov 13, 2024 at 08:45:06AM -0500, Jeff Layton wrote:
> > On Wed, 2024-11-13 at 12:27 +0100, Karel Zak wrote:
> > > On Tue, Nov 12, 2024 at 02:39:21PM GMT, Christian Brauner wrote:
> > > > On Mon, 11 Nov 2024 10:09:54 -0500, Jeff Layton wrote:
> > > > > Meta has some internal logging that scrapes /proc/self/mountinfo today.
> > > > > I'd like to convert it to use listmount()/statmount(), so we can do a
> > > > > better job of monitoring with containers. We're missing some fields
> > > > > though. This patchset adds them.
> > > > > 
> > > > > 
> > > > 
> > > > Applied to the vfs.misc branch of the vfs/vfs.git tree.
> > > > Patches in the vfs.misc branch should appear in linux-next soon.
> > > 
> > > Jeff, thank you for this!
> > > 
> > > I have already implemented support for statmount() and listmount() in
> > > libmount (PR: https://github.com/util-linux/util-linux/pull/3092). The
> > > only remaining issue was the mount source and incomplete file system
> > > type.
> > > 
> > 
> > Unfortunately, I think we might be missing something else:
> > 
> > The mountinfo (and "mounts") file generator calls show_sb_opts() which
> > generates some strings from the sb->s_flags field and then calls
> > security_sb_show_options(). statmount() will give you the s_flags field
> > (or an equivalent), but it doesn't give you the security options
> > string. So, those aren't currently visible from statmount().
> > 
> > How should we expose those? Should we create a new statmount string
> > field and populate it, or is it better to just tack them onto the end
> > of the statmount.mnt_opts string?
> 
> I'm leaning towards using a separate field because mnt_opts/opts_array
> is about filesystem specific mount options whereas the security mount
> options are somewhat generic. So it's easy to tell them apart.

Ordinarily, I might agree, but we're now growing a new mount option
field that has them separated by NULs. Will we need two extra fields
for this? One comma-separated, and one NUL separated?

/proc/#/mountinfo and mounts prepend these to the output of 
->show_options, so the simple solution would be to just prepend those
there instead of adding a new field. FWIW, only SELinux has any extra
mount options to show here.

Tough call -- anyone else have opinions?
-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4 0/3] fs: allow statmount to fetch the fs_subtype and sb_source
  2024-11-14 12:29         ` Jeff Layton
@ 2024-11-14 13:16           ` Miklos Szeredi
  2024-11-14 14:48             ` Christian Brauner
  0 siblings, 1 reply; 26+ messages in thread
From: Miklos Szeredi @ 2024-11-14 13:16 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Christian Brauner, Karel Zak, Ian Kent, Josef Bacik,
	linux-fsdevel, linux-kernel, Alexander Viro, Jan Kara

On Thu, 14 Nov 2024 at 13:29, Jeff Layton <jlayton@kernel.org> wrote:

> Ordinarily, I might agree, but we're now growing a new mount option
> field that has them separated by NULs. Will we need two extra fields
> for this? One comma-separated, and one NUL separated?
>
> /proc/#/mountinfo and mounts prepend these to the output of
> ->show_options, so the simple solution would be to just prepend those
> there instead of adding a new field. FWIW, only SELinux has any extra
> mount options to show here.

Compromise: tack them onto the end of the comma separated list, but
add a new field for the nul separated security options.

I think this would be logical, since the comma separated list is more
useful for having a /proc/$$/mountinfo compatible string than for
actually interpreting what's in there.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4 0/3] fs: allow statmount to fetch the fs_subtype and sb_source
  2024-11-14 11:56           ` Jan Kara
@ 2024-11-14 13:20             ` Miklos Szeredi
  2024-11-17 23:29             ` Ian Kent
  1 sibling, 0 replies; 26+ messages in thread
From: Miklos Szeredi @ 2024-11-14 13:20 UTC (permalink / raw)
  To: Jan Kara
  Cc: Ian Kent, Jeff Layton, Karel Zak, Christian Brauner, Josef Bacik,
	linux-fsdevel, linux-kernel, Alexander Viro

On Thu, 14 Nov 2024 at 12:56, Jan Kara <jack@suse.cz> wrote:

>   What I'm more worried about is that watching the whole system
> for new mounts is going to be somewhat cumbersome when all you can do is to
> watch new mounts attached under an existing mount / filesystem.

We don't even know if there's a use case for that.  I think it would
make sense to think about it when/if such a use case emerges.  The
inode notification interfaces went through that evolution too, no?

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4 0/3] fs: allow statmount to fetch the fs_subtype and sb_source
  2024-11-14 13:16           ` Miklos Szeredi
@ 2024-11-14 14:48             ` Christian Brauner
  2024-11-14 14:51               ` Jeff Layton
  0 siblings, 1 reply; 26+ messages in thread
From: Christian Brauner @ 2024-11-14 14:48 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Jeff Layton, Karel Zak, Ian Kent, Josef Bacik, linux-fsdevel,
	linux-kernel, Alexander Viro, Jan Kara

On Thu, Nov 14, 2024 at 02:16:05PM +0100, Miklos Szeredi wrote:
> On Thu, 14 Nov 2024 at 13:29, Jeff Layton <jlayton@kernel.org> wrote:
> 
> > Ordinarily, I might agree, but we're now growing a new mount option
> > field that has them separated by NULs. Will we need two extra fields
> > for this? One comma-separated, and one NUL separated?
> >
> > /proc/#/mountinfo and mounts prepend these to the output of
> > ->show_options, so the simple solution would be to just prepend those
> > there instead of adding a new field. FWIW, only SELinux has any extra
> > mount options to show here.
> 
> Compromise: tack them onto the end of the comma separated list, but
> add a new field for the nul separated security options.
> 
> I think this would be logical, since the comma separated list is more
> useful for having a /proc/$$/mountinfo compatible string than for
> actually interpreting what's in there.

Fair. Here's an incremental for the array of security options.

diff --git a/fs/namespace.c b/fs/namespace.c
index 4f39c4aba85d..a9065a9ab971 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -5072,13 +5072,30 @@ static int statmount_mnt_opts(struct kstatmount *s, struct seq_file *seq)
 	return 0;
 }
 
+static inline int statmount_opt_unescape(struct seq_file *seq, char *buf_start)
+{
+	char *buf_end, *opt_start, *opt_end;
+	int count = 0;
+
+	buf_end = seq->buf + seq->count;
+	*buf_end = '\0';
+	for (opt_start = buf_start + 1; opt_start < buf_end; opt_start = opt_end + 1) {
+		opt_end = strchrnul(opt_start, ',');
+		*opt_end = '\0';
+		buf_start += string_unescape(opt_start, buf_start, 0, UNESCAPE_OCTAL) + 1;
+		if (WARN_ON_ONCE(++count == INT_MAX))
+			return -EOVERFLOW;
+	}
+	seq->count = buf_start - 1 - seq->buf;
+	return count;
+}
+
 static int statmount_opt_array(struct kstatmount *s, struct seq_file *seq)
 {
 	struct vfsmount *mnt = s->mnt;
 	struct super_block *sb = mnt->mnt_sb;
 	size_t start = seq->count;
-	char *buf_start, *buf_end, *opt_start, *opt_end;
-	u32 count = 0;
+	char *buf_start;
 	int err;
 
 	if (!sb->s_op->show_options)
@@ -5095,17 +5112,39 @@ static int statmount_opt_array(struct kstatmount *s, struct seq_file *seq)
 	if (seq->count == start)
 		return 0;
 
-	buf_end = seq->buf + seq->count;
-	*buf_end = '\0';
-	for (opt_start = buf_start + 1; opt_start < buf_end; opt_start = opt_end + 1) {
-		opt_end = strchrnul(opt_start, ',');
-		*opt_end = '\0';
-		buf_start += string_unescape(opt_start, buf_start, 0, UNESCAPE_OCTAL) + 1;
-		if (WARN_ON_ONCE(++count == 0))
-			return -EOVERFLOW;
-	}
-	seq->count = buf_start - 1 - seq->buf;
-	s->sm.opt_num = count;
+	err = statmount_opt_unescape(seq, buf_start);
+	if (err < 0)
+		return err;
+
+	s->sm.opt_num = err;
+	return 0;
+}
+
+static int statmount_opt_sec_array(struct kstatmount *s, struct seq_file *seq)
+{
+	struct vfsmount *mnt = s->mnt;
+	struct super_block *sb = mnt->mnt_sb;
+	size_t start = seq->count;
+	char *buf_start;
+	int err;
+
+	buf_start = seq->buf + start;
+
+	err = security_sb_show_options(seq, sb);
+	if (!err)
+		return err;
+
+	if (unlikely(seq_has_overflowed(seq)))
+		return -EAGAIN;
+
+	if (seq->count == start)
+		return 0;
+
+	err = statmount_opt_unescape(seq, buf_start);
+	if (err < 0)
+		return err;
+
+	s->sm.opt_sec_num = err;
 	return 0;
 }
 
@@ -5138,6 +5177,10 @@ static int statmount_string(struct kstatmount *s, u64 flag)
 		sm->opt_array = start;
 		ret = statmount_opt_array(s, seq);
 		break;
+	case STATMOUNT_OPT_SEC_ARRAY:
+		sm->opt_sec_array = start;
+		ret = statmount_opt_sec_array(s, seq);
+		break;
 	case STATMOUNT_FS_SUBTYPE:
 		sm->fs_subtype = start;
 		statmount_fs_subtype(s, seq);
@@ -5294,6 +5337,9 @@ static int do_statmount(struct kstatmount *s, u64 mnt_id, u64 mnt_ns_id,
 	if (!err && s->mask & STATMOUNT_OPT_ARRAY)
 		err = statmount_string(s, STATMOUNT_OPT_ARRAY);
 
+	if (!err && s->mask & STATMOUNT_OPT_SEC_ARRAY)
+		err = statmount_string(s, STATMOUNT_OPT_SEC_ARRAY);
+
 	if (!err && s->mask & STATMOUNT_FS_SUBTYPE)
 		err = statmount_string(s, STATMOUNT_FS_SUBTYPE);
 
@@ -5323,7 +5369,7 @@ static inline bool retry_statmount(const long ret, size_t *seq_size)
 #define STATMOUNT_STRING_REQ (STATMOUNT_MNT_ROOT | STATMOUNT_MNT_POINT | \
 			      STATMOUNT_FS_TYPE | STATMOUNT_MNT_OPTS | \
 			      STATMOUNT_FS_SUBTYPE | STATMOUNT_SB_SOURCE | \
-			      STATMOUNT_OPT_ARRAY)
+			      STATMOUNT_OPT_ARRAY | STATMOUNT_OPT_SEC_ARRAY)
 
 static int prepare_kstatmount(struct kstatmount *ks, struct mnt_id_req *kreq,
 			      struct statmount __user *buf, size_t bufsize,
diff --git a/include/uapi/linux/mount.h b/include/uapi/linux/mount.h
index c0fda4604187..569d938a5757 100644
--- a/include/uapi/linux/mount.h
+++ b/include/uapi/linux/mount.h
@@ -177,7 +177,9 @@ struct statmount {
 	__u32 sb_source;	/* [str] Source string of the mount */
 	__u32 opt_num;		/* Number of fs options */
 	__u32 opt_array;	/* [str] Array of nul terminated fs options */
-	__u64 __spare2[47];
+	__u32 opt_sec_num;	/* Number of security options */
+	__u32 opt_sec_array;	/* [str] Array of nul terminated security options */
+	__u64 __spare2[45];
 	char str[];		/* Variable size part containing strings */
 };
 
@@ -214,6 +216,7 @@ struct mnt_id_req {
 #define STATMOUNT_FS_SUBTYPE		0x00000100U	/* Want/got fs_subtype */
 #define STATMOUNT_SB_SOURCE		0x00000200U	/* Want/got sb_source */
 #define STATMOUNT_OPT_ARRAY		0x00000400U	/* Want/got opt_... */
+#define STATMOUNT_OPT_SEC_ARRAY		0x00000800U	/* Want/got opt_sec... */
 
 /*
  * Special @mnt_id values that can be passed to listmount

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH v4 0/3] fs: allow statmount to fetch the fs_subtype and sb_source
  2024-11-14 14:48             ` Christian Brauner
@ 2024-11-14 14:51               ` Jeff Layton
  2024-11-14 15:09                 ` Christian Brauner
  0 siblings, 1 reply; 26+ messages in thread
From: Jeff Layton @ 2024-11-14 14:51 UTC (permalink / raw)
  To: Christian Brauner, Miklos Szeredi
  Cc: Karel Zak, Ian Kent, Josef Bacik, linux-fsdevel, linux-kernel,
	Alexander Viro, Jan Kara

On Thu, 2024-11-14 at 15:48 +0100, Christian Brauner wrote:
> On Thu, Nov 14, 2024 at 02:16:05PM +0100, Miklos Szeredi wrote:
> > On Thu, 14 Nov 2024 at 13:29, Jeff Layton <jlayton@kernel.org> wrote:
> > 
> > > Ordinarily, I might agree, but we're now growing a new mount option
> > > field that has them separated by NULs. Will we need two extra fields
> > > for this? One comma-separated, and one NUL separated?
> > > 
> > > /proc/#/mountinfo and mounts prepend these to the output of
> > > ->show_options, so the simple solution would be to just prepend those
> > > there instead of adding a new field. FWIW, only SELinux has any extra
> > > mount options to show here.
> > 
> > Compromise: tack them onto the end of the comma separated list, but
> > add a new field for the nul separated security options.
> > 
> > I think this would be logical, since the comma separated list is more
> > useful for having a /proc/$$/mountinfo compatible string than for
> > actually interpreting what's in there.
> 
> Fair. Here's an incremental for the array of security options.
> 
> diff --git a/fs/namespace.c b/fs/namespace.c
> index 4f39c4aba85d..a9065a9ab971 100644
> --- a/fs/namespace.c
> +++ b/fs/namespace.c
> @@ -5072,13 +5072,30 @@ static int statmount_mnt_opts(struct kstatmount *s, struct seq_file *seq)
>  	return 0;
>  }
>  
> +static inline int statmount_opt_unescape(struct seq_file *seq, char *buf_start)
> +{
> +	char *buf_end, *opt_start, *opt_end;
> +	int count = 0;
> +
> +	buf_end = seq->buf + seq->count;
> +	*buf_end = '\0';
> +	for (opt_start = buf_start + 1; opt_start < buf_end; opt_start = opt_end + 1) {
> +		opt_end = strchrnul(opt_start, ',');
> +		*opt_end = '\0';
> +		buf_start += string_unescape(opt_start, buf_start, 0, UNESCAPE_OCTAL) + 1;
> +		if (WARN_ON_ONCE(++count == INT_MAX))
> +			return -EOVERFLOW;
> +	}
> +	seq->count = buf_start - 1 - seq->buf;
> +	return count;
> +}
> +
>  static int statmount_opt_array(struct kstatmount *s, struct seq_file *seq)
>  {
>  	struct vfsmount *mnt = s->mnt;
>  	struct super_block *sb = mnt->mnt_sb;
>  	size_t start = seq->count;
> -	char *buf_start, *buf_end, *opt_start, *opt_end;
> -	u32 count = 0;
> +	char *buf_start;
>  	int err;
>  
>  	if (!sb->s_op->show_options)
> @@ -5095,17 +5112,39 @@ static int statmount_opt_array(struct kstatmount *s, struct seq_file *seq)
>  	if (seq->count == start)
>  		return 0;
>  
> -	buf_end = seq->buf + seq->count;
> -	*buf_end = '\0';
> -	for (opt_start = buf_start + 1; opt_start < buf_end; opt_start = opt_end + 1) {
> -		opt_end = strchrnul(opt_start, ',');
> -		*opt_end = '\0';
> -		buf_start += string_unescape(opt_start, buf_start, 0, UNESCAPE_OCTAL) + 1;
> -		if (WARN_ON_ONCE(++count == 0))
> -			return -EOVERFLOW;
> -	}
> -	seq->count = buf_start - 1 - seq->buf;
> -	s->sm.opt_num = count;
> +	err = statmount_opt_unescape(seq, buf_start);
> +	if (err < 0)
> +		return err;
> +
> +	s->sm.opt_num = err;
> +	return 0;
> +}
> +
> +static int statmount_opt_sec_array(struct kstatmount *s, struct seq_file *seq)
> +{
> +	struct vfsmount *mnt = s->mnt;
> +	struct super_block *sb = mnt->mnt_sb;
> +	size_t start = seq->count;
> +	char *buf_start;
> +	int err;
> +
> +	buf_start = seq->buf + start;
> +
> +	err = security_sb_show_options(seq, sb);
> +	if (!err)
> +		return err;
> +
> +	if (unlikely(seq_has_overflowed(seq)))
> +		return -EAGAIN;
> +
> +	if (seq->count == start)
> +		return 0;
> +
> +	err = statmount_opt_unescape(seq, buf_start);
> +	if (err < 0)
> +		return err;
> +
> +	s->sm.opt_sec_num = err;
>  	return 0;
>  }
>  
> @@ -5138,6 +5177,10 @@ static int statmount_string(struct kstatmount *s, u64 flag)
>  		sm->opt_array = start;
>  		ret = statmount_opt_array(s, seq);
>  		break;
> +	case STATMOUNT_OPT_SEC_ARRAY:
> +		sm->opt_sec_array = start;
> +		ret = statmount_opt_sec_array(s, seq);
> +		break;
>  	case STATMOUNT_FS_SUBTYPE:
>  		sm->fs_subtype = start;
>  		statmount_fs_subtype(s, seq);
> @@ -5294,6 +5337,9 @@ static int do_statmount(struct kstatmount *s, u64 mnt_id, u64 mnt_ns_id,
>  	if (!err && s->mask & STATMOUNT_OPT_ARRAY)
>  		err = statmount_string(s, STATMOUNT_OPT_ARRAY);
>  
> +	if (!err && s->mask & STATMOUNT_OPT_SEC_ARRAY)
> +		err = statmount_string(s, STATMOUNT_OPT_SEC_ARRAY);
> +
>  	if (!err && s->mask & STATMOUNT_FS_SUBTYPE)
>  		err = statmount_string(s, STATMOUNT_FS_SUBTYPE);
>  
> @@ -5323,7 +5369,7 @@ static inline bool retry_statmount(const long ret, size_t *seq_size)
>  #define STATMOUNT_STRING_REQ (STATMOUNT_MNT_ROOT | STATMOUNT_MNT_POINT | \
>  			      STATMOUNT_FS_TYPE | STATMOUNT_MNT_OPTS | \
>  			      STATMOUNT_FS_SUBTYPE | STATMOUNT_SB_SOURCE | \
> -			      STATMOUNT_OPT_ARRAY)
> +			      STATMOUNT_OPT_ARRAY | STATMOUNT_OPT_SEC_ARRAY)
>  
>  static int prepare_kstatmount(struct kstatmount *ks, struct mnt_id_req *kreq,
>  			      struct statmount __user *buf, size_t bufsize,
> diff --git a/include/uapi/linux/mount.h b/include/uapi/linux/mount.h
> index c0fda4604187..569d938a5757 100644
> --- a/include/uapi/linux/mount.h
> +++ b/include/uapi/linux/mount.h
> @@ -177,7 +177,9 @@ struct statmount {
>  	__u32 sb_source;	/* [str] Source string of the mount */
>  	__u32 opt_num;		/* Number of fs options */
>  	__u32 opt_array;	/* [str] Array of nul terminated fs options */
> -	__u64 __spare2[47];
> +	__u32 opt_sec_num;	/* Number of security options */
> +	__u32 opt_sec_array;	/* [str] Array of nul terminated security options */
> +	__u64 __spare2[45];

shouldn't that be 46 ?

>  	char str[];		/* Variable size part containing strings */
>  };
>  
> @@ -214,6 +216,7 @@ struct mnt_id_req {
>  #define STATMOUNT_FS_SUBTYPE		0x00000100U	/* Want/got fs_subtype */
>  #define STATMOUNT_SB_SOURCE		0x00000200U	/* Want/got sb_source */
>  #define STATMOUNT_OPT_ARRAY		0x00000400U	/* Want/got opt_... */
> +#define STATMOUNT_OPT_SEC_ARRAY		0x00000800U	/* Want/got opt_sec... */
>  
>  /*
>   * Special @mnt_id values that can be passed to listmount

The rest looks good to me though!
-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4 0/3] fs: allow statmount to fetch the fs_subtype and sb_source
  2024-11-14 14:51               ` Jeff Layton
@ 2024-11-14 15:09                 ` Christian Brauner
  0 siblings, 0 replies; 26+ messages in thread
From: Christian Brauner @ 2024-11-14 15:09 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Miklos Szeredi, Karel Zak, Ian Kent, Josef Bacik, linux-fsdevel,
	linux-kernel, Alexander Viro, Jan Kara

On Thu, Nov 14, 2024 at 09:51:42AM -0500, Jeff Layton wrote:
> On Thu, 2024-11-14 at 15:48 +0100, Christian Brauner wrote:
> > On Thu, Nov 14, 2024 at 02:16:05PM +0100, Miklos Szeredi wrote:
> > > On Thu, 14 Nov 2024 at 13:29, Jeff Layton <jlayton@kernel.org> wrote:
> > > 
> > > > Ordinarily, I might agree, but we're now growing a new mount option
> > > > field that has them separated by NULs. Will we need two extra fields
> > > > for this? One comma-separated, and one NUL separated?
> > > > 
> > > > /proc/#/mountinfo and mounts prepend these to the output of
> > > > ->show_options, so the simple solution would be to just prepend those
> > > > there instead of adding a new field. FWIW, only SELinux has any extra
> > > > mount options to show here.
> > > 
> > > Compromise: tack them onto the end of the comma separated list, but
> > > add a new field for the nul separated security options.
> > > 
> > > I think this would be logical, since the comma separated list is more
> > > useful for having a /proc/$$/mountinfo compatible string than for
> > > actually interpreting what's in there.
> > 
> > Fair. Here's an incremental for the array of security options.
> > 
> > diff --git a/fs/namespace.c b/fs/namespace.c
> > index 4f39c4aba85d..a9065a9ab971 100644
> > --- a/fs/namespace.c
> > +++ b/fs/namespace.c
> > @@ -5072,13 +5072,30 @@ static int statmount_mnt_opts(struct kstatmount *s, struct seq_file *seq)
> >  	return 0;
> >  }
> >  
> > +static inline int statmount_opt_unescape(struct seq_file *seq, char *buf_start)
> > +{
> > +	char *buf_end, *opt_start, *opt_end;
> > +	int count = 0;
> > +
> > +	buf_end = seq->buf + seq->count;
> > +	*buf_end = '\0';
> > +	for (opt_start = buf_start + 1; opt_start < buf_end; opt_start = opt_end + 1) {
> > +		opt_end = strchrnul(opt_start, ',');
> > +		*opt_end = '\0';
> > +		buf_start += string_unescape(opt_start, buf_start, 0, UNESCAPE_OCTAL) + 1;
> > +		if (WARN_ON_ONCE(++count == INT_MAX))
> > +			return -EOVERFLOW;
> > +	}
> > +	seq->count = buf_start - 1 - seq->buf;
> > +	return count;
> > +}
> > +
> >  static int statmount_opt_array(struct kstatmount *s, struct seq_file *seq)
> >  {
> >  	struct vfsmount *mnt = s->mnt;
> >  	struct super_block *sb = mnt->mnt_sb;
> >  	size_t start = seq->count;
> > -	char *buf_start, *buf_end, *opt_start, *opt_end;
> > -	u32 count = 0;
> > +	char *buf_start;
> >  	int err;
> >  
> >  	if (!sb->s_op->show_options)
> > @@ -5095,17 +5112,39 @@ static int statmount_opt_array(struct kstatmount *s, struct seq_file *seq)
> >  	if (seq->count == start)
> >  		return 0;
> >  
> > -	buf_end = seq->buf + seq->count;
> > -	*buf_end = '\0';
> > -	for (opt_start = buf_start + 1; opt_start < buf_end; opt_start = opt_end + 1) {
> > -		opt_end = strchrnul(opt_start, ',');
> > -		*opt_end = '\0';
> > -		buf_start += string_unescape(opt_start, buf_start, 0, UNESCAPE_OCTAL) + 1;
> > -		if (WARN_ON_ONCE(++count == 0))
> > -			return -EOVERFLOW;
> > -	}
> > -	seq->count = buf_start - 1 - seq->buf;
> > -	s->sm.opt_num = count;
> > +	err = statmount_opt_unescape(seq, buf_start);
> > +	if (err < 0)
> > +		return err;
> > +
> > +	s->sm.opt_num = err;
> > +	return 0;
> > +}
> > +
> > +static int statmount_opt_sec_array(struct kstatmount *s, struct seq_file *seq)
> > +{
> > +	struct vfsmount *mnt = s->mnt;
> > +	struct super_block *sb = mnt->mnt_sb;
> > +	size_t start = seq->count;
> > +	char *buf_start;
> > +	int err;
> > +
> > +	buf_start = seq->buf + start;
> > +
> > +	err = security_sb_show_options(seq, sb);
> > +	if (!err)
> > +		return err;
> > +
> > +	if (unlikely(seq_has_overflowed(seq)))
> > +		return -EAGAIN;
> > +
> > +	if (seq->count == start)
> > +		return 0;
> > +
> > +	err = statmount_opt_unescape(seq, buf_start);
> > +	if (err < 0)
> > +		return err;
> > +
> > +	s->sm.opt_sec_num = err;
> >  	return 0;
> >  }
> >  
> > @@ -5138,6 +5177,10 @@ static int statmount_string(struct kstatmount *s, u64 flag)
> >  		sm->opt_array = start;
> >  		ret = statmount_opt_array(s, seq);
> >  		break;
> > +	case STATMOUNT_OPT_SEC_ARRAY:
> > +		sm->opt_sec_array = start;
> > +		ret = statmount_opt_sec_array(s, seq);
> > +		break;
> >  	case STATMOUNT_FS_SUBTYPE:
> >  		sm->fs_subtype = start;
> >  		statmount_fs_subtype(s, seq);
> > @@ -5294,6 +5337,9 @@ static int do_statmount(struct kstatmount *s, u64 mnt_id, u64 mnt_ns_id,
> >  	if (!err && s->mask & STATMOUNT_OPT_ARRAY)
> >  		err = statmount_string(s, STATMOUNT_OPT_ARRAY);
> >  
> > +	if (!err && s->mask & STATMOUNT_OPT_SEC_ARRAY)
> > +		err = statmount_string(s, STATMOUNT_OPT_SEC_ARRAY);
> > +
> >  	if (!err && s->mask & STATMOUNT_FS_SUBTYPE)
> >  		err = statmount_string(s, STATMOUNT_FS_SUBTYPE);
> >  
> > @@ -5323,7 +5369,7 @@ static inline bool retry_statmount(const long ret, size_t *seq_size)
> >  #define STATMOUNT_STRING_REQ (STATMOUNT_MNT_ROOT | STATMOUNT_MNT_POINT | \
> >  			      STATMOUNT_FS_TYPE | STATMOUNT_MNT_OPTS | \
> >  			      STATMOUNT_FS_SUBTYPE | STATMOUNT_SB_SOURCE | \
> > -			      STATMOUNT_OPT_ARRAY)
> > +			      STATMOUNT_OPT_ARRAY | STATMOUNT_OPT_SEC_ARRAY)
> >  
> >  static int prepare_kstatmount(struct kstatmount *ks, struct mnt_id_req *kreq,
> >  			      struct statmount __user *buf, size_t bufsize,
> > diff --git a/include/uapi/linux/mount.h b/include/uapi/linux/mount.h
> > index c0fda4604187..569d938a5757 100644
> > --- a/include/uapi/linux/mount.h
> > +++ b/include/uapi/linux/mount.h
> > @@ -177,7 +177,9 @@ struct statmount {
> >  	__u32 sb_source;	/* [str] Source string of the mount */
> >  	__u32 opt_num;		/* Number of fs options */
> >  	__u32 opt_array;	/* [str] Array of nul terminated fs options */
> > -	__u64 __spare2[47];
> > +	__u32 opt_sec_num;	/* Number of security options */
> > +	__u32 opt_sec_array;	/* [str] Array of nul terminated security options */
> > +	__u64 __spare2[45];
> 
> shouldn't that be 46 ?

Yes, apparently I can't count. Thanks for noticing!

> 
> >  	char str[];		/* Variable size part containing strings */
> >  };
> >  
> > @@ -214,6 +216,7 @@ struct mnt_id_req {
> >  #define STATMOUNT_FS_SUBTYPE		0x00000100U	/* Want/got fs_subtype */
> >  #define STATMOUNT_SB_SOURCE		0x00000200U	/* Want/got sb_source */
> >  #define STATMOUNT_OPT_ARRAY		0x00000400U	/* Want/got opt_... */
> > +#define STATMOUNT_OPT_SEC_ARRAY		0x00000800U	/* Want/got opt_sec... */
> >  
> >  /*
> >   * Special @mnt_id values that can be passed to listmount
> 
> The rest looks good to me though!
> -- 
> Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4 0/3] fs: allow statmount to fetch the fs_subtype and sb_source
  2024-11-14 11:56           ` Jan Kara
  2024-11-14 13:20             ` Miklos Szeredi
@ 2024-11-17 23:29             ` Ian Kent
  2024-11-18  9:07               ` Jan Kara
  1 sibling, 1 reply; 26+ messages in thread
From: Ian Kent @ 2024-11-17 23:29 UTC (permalink / raw)
  To: Jan Kara
  Cc: Jeff Layton, Karel Zak, Christian Brauner, Miklos Szeredi,
	Josef Bacik, linux-fsdevel, linux-kernel, Alexander Viro


On 14/11/24 19:56, Jan Kara wrote:
> On Thu 14-11-24 09:45:23, Ian Kent wrote:
>> On 13/11/24 23:18, Jan Kara wrote:
>>> On Wed 13-11-24 08:45:06, Jeff Layton wrote:
>>>> On Wed, 2024-11-13 at 12:27 +0100, Karel Zak wrote:
>>>>> On Tue, Nov 12, 2024 at 02:39:21PM GMT, Christian Brauner wrote:
>>>>> Next on the wish list is a notification (a file descriptor that can be
>>>>> used in epoll) that returns a 64-bit ID when there is a change in the
>>>>> mount node. This will enable us to enhance systemd so that it does not
>>>>> have to read the entire mount table after every change.
>>>>>
>>>> New fanotify events for mount table changes, perhaps?
>>> Now that I'm looking at it I'm not sure fanotify is a great fit for this
>>> usecase. A lot of fanotify functionality does not really work for virtual
>>> filesystems such as proc and hence we generally try to discourage use of
>>> fanotify for them. So just supporting one type of event (like FAN_MODIFY)
>>> on one file inside proc looks as rather inconsistent interface. But I
>>> vaguely remember we were discussing some kind of mount event, weren't we?
>>> Or was that for something else?
>> I still need to have a look at the existing notifications sub-systems but,
>> tbh, I also don't think they offer the needed functionality.
>>
>> The thing that was most useful with David's notifications when I was trying
>> to improve the mounts handling was the queuing interface. It allowed me to
>> batch notifications up to around a couple of hundred and grab them in one go
>> for processing. This significantly lowered the overhead of rapid fire event
>> processing. The ability to go directly to an individual mount and get it's
>> information only got about half the improvement I saw, the rest come from
>> the notifications improvement.
> Well, if we implemented the mount notification events in fanotify, then the
> mount events get queued in the notification group queue and you can process
> the whole batch of events in one go if you want. So I don't see batching as
> an issue. What I'm more worried about is that watching the whole system
> for new mounts is going to be somewhat cumbersome when all you can do is to
> watch new mounts attached under an existing mount / filesystem.

But, for mounts/unounts for example, isn't it the act of performing the

mount/unmount that triggers the notification if the path in within a file

system that's marked to report such events?


Ian


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4 0/3] fs: allow statmount to fetch the fs_subtype and sb_source
  2024-11-17 23:29             ` Ian Kent
@ 2024-11-18  9:07               ` Jan Kara
  0 siblings, 0 replies; 26+ messages in thread
From: Jan Kara @ 2024-11-18  9:07 UTC (permalink / raw)
  To: Ian Kent
  Cc: Jan Kara, Jeff Layton, Karel Zak, Christian Brauner,
	Miklos Szeredi, Josef Bacik, linux-fsdevel, linux-kernel,
	Alexander Viro

On Mon 18-11-24 07:29:42, Ian Kent wrote:
> On 14/11/24 19:56, Jan Kara wrote:
> > On Thu 14-11-24 09:45:23, Ian Kent wrote:
> > > On 13/11/24 23:18, Jan Kara wrote:
> > > > On Wed 13-11-24 08:45:06, Jeff Layton wrote:
> > > > > On Wed, 2024-11-13 at 12:27 +0100, Karel Zak wrote:
> > > > > > On Tue, Nov 12, 2024 at 02:39:21PM GMT, Christian Brauner wrote:
> > > > > > Next on the wish list is a notification (a file descriptor that can be
> > > > > > used in epoll) that returns a 64-bit ID when there is a change in the
> > > > > > mount node. This will enable us to enhance systemd so that it does not
> > > > > > have to read the entire mount table after every change.
> > > > > > 
> > > > > New fanotify events for mount table changes, perhaps?
> > > > Now that I'm looking at it I'm not sure fanotify is a great fit for this
> > > > usecase. A lot of fanotify functionality does not really work for virtual
> > > > filesystems such as proc and hence we generally try to discourage use of
> > > > fanotify for them. So just supporting one type of event (like FAN_MODIFY)
> > > > on one file inside proc looks as rather inconsistent interface. But I
> > > > vaguely remember we were discussing some kind of mount event, weren't we?
> > > > Or was that for something else?
> > > I still need to have a look at the existing notifications sub-systems but,
> > > tbh, I also don't think they offer the needed functionality.
> > > 
> > > The thing that was most useful with David's notifications when I was trying
> > > to improve the mounts handling was the queuing interface. It allowed me to
> > > batch notifications up to around a couple of hundred and grab them in one go
> > > for processing. This significantly lowered the overhead of rapid fire event
> > > processing. The ability to go directly to an individual mount and get it's
> > > information only got about half the improvement I saw, the rest come from
> > > the notifications improvement.
> > Well, if we implemented the mount notification events in fanotify, then the
> > mount events get queued in the notification group queue and you can process
> > the whole batch of events in one go if you want. So I don't see batching as
> > an issue. What I'm more worried about is that watching the whole system
> > for new mounts is going to be somewhat cumbersome when all you can do is to
> > watch new mounts attached under an existing mount / filesystem.
> 
> But, for mounts/unounts for example, isn't it the act of performing the
> mount/unmount that triggers the notification if the path in within a file
> system that's marked to report such events?

Obviously it is the act of mounting / unmounting that will trigger the
generation of the event. But I guess I don't understand what are you
getting at...

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2024-11-18  9:07 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-11-11 15:09 [PATCH v4 0/3] fs: allow statmount to fetch the fs_subtype and sb_source Jeff Layton
2024-11-11 15:09 ` [PATCH v4 1/3] fs: don't let statmount return empty strings Jeff Layton
2024-11-11 17:44   ` Jan Kara
2024-11-11 15:09 ` [PATCH v4 2/3] fs: add the ability for statmount() to report the fs_subtype Jeff Layton
2024-11-11 15:09 ` [PATCH v4 3/3] fs: add the ability for statmount() to report the sb_source Jeff Layton
2024-11-12 10:32 ` [PATCH v4 0/3] fs: allow statmount to fetch the fs_subtype and sb_source Miklos Szeredi
2024-11-12 11:12   ` Jeff Layton
2024-11-12 13:39 ` Christian Brauner
2024-11-13 11:27   ` Karel Zak
2024-11-13 13:08     ` Miklos Szeredi
2024-11-13 13:45     ` Jeff Layton
2024-11-13 15:18       ` Jan Kara
2024-11-13 15:49         ` Miklos Szeredi
2024-11-13 16:00           ` Jan Kara
2024-11-14  1:45         ` Ian Kent
2024-11-14 11:56           ` Jan Kara
2024-11-14 13:20             ` Miklos Szeredi
2024-11-17 23:29             ` Ian Kent
2024-11-18  9:07               ` Jan Kara
2024-11-14  1:51         ` Ian Kent
2024-11-14 11:29       ` Christian Brauner
2024-11-14 12:29         ` Jeff Layton
2024-11-14 13:16           ` Miklos Szeredi
2024-11-14 14:48             ` Christian Brauner
2024-11-14 14:51               ` Jeff Layton
2024-11-14 15:09                 ` Christian Brauner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox