* Re: [RFC PATCH] fanotify: notify on mount attach and detach
2024-11-28 14:39 [RFC PATCH] fanotify: notify on mount attach and detach Miklos Szeredi
@ 2024-11-28 16:43 ` Amir Goldstein
2024-11-29 7:16 ` Miklos Szeredi
2024-11-28 19:57 ` Al Viro
` (2 subsequent siblings)
3 siblings, 1 reply; 13+ messages in thread
From: Amir Goldstein @ 2024-11-28 16:43 UTC (permalink / raw)
To: Miklos Szeredi
Cc: linux-fsdevel, Jan Kara, Karel Zak, Lennart Poettering, Ian Kent,
Alexander Viro, Christian Brauner
On Thu, Nov 28, 2024 at 3:40 PM Miklos Szeredi <mszeredi@redhat.com> wrote:
>
> Add notifications for attaching and detaching mounts. The following new
> event masks are added:
>
> FAN_MNT_ATTACH - Mount was attached
> FAN_MNT_DETACH - Mount was detached
>
> These events add an info record of type FAN_EVENT_INFO_TYPE_MNT containing
> these fields identifying the affected mounts:
>
> __u64 mnt_id - the attached or detached mount
> __u64 parent_id - where the mount was attached to or detached from
>
Nice! some comments below.
FYI, this conflicts with almost every part of Jan's fsnotify_hsm branch
including event macro assignments etc.
> The mountpoint object (file or directory) is also contained in the event
> either as an FD or a FID.
>
This sounds good, but do the watchers actually need this information
or is it redundant to parent_id?
How is the mount monitoring application expected to be using this information?
If it is not going to be useful or is redundant maybe we do not send it??
> Adding marks with FAN_MARK_MOUNT and FAN_MARK_FILESYSTEM both work.
>
> FAN_MARK_INODE doesn't, not sure why. I think it might make sense to make
> it work, i.e. allow watching a single mountpoint.
The inode that is queried for fsnotify marks is the inode argument to
fsnotify(), so you need to pass non NULL in that argument.
Watching a single mountpoint makes sense, as long as it is clear that
FAN_EVENT_ON_CHILD for watching the parent of several mountpoints
is not supported (I don't think we want that).
This makes the interface somewhat confusing when mixing non-mount
events that are supported in parent watches.
Another confusing thing is whether the FAN_ONDIR filter is applicable
to FAN_MNT_ events. The way you implemented them it does not
because the event is never reported with FS_ISDIR.
I think this is correct, just needs to be documented.
If we are NOT reporting fd/fid, then this point is less confusing IMO.
To reduce API complexity and test matrix, my suggestion is that:
FAN_MNT_* event cannot be mixed with other events in the same group
i.e. the role of mount namespace monitoring application is not related to
filesystem namespace monitoring and the two should not be using the same
group/event-queue.
The simplest way to implement this (if there is an agreement) is to require
opt-in on fanotify_init with FAN_REPORT_MNTID and require that only
FAN_MARK_* events can be set in such a group and not in other groups.
If we are not sure that reporting fd/fid is needed, then we can limit
FAN_REPORT_MNTID | FAN_REPORT_*FID now and consider adding it later.
WDYT?
>
> Prior to this patch mount namespace changes could be monitored by polling
> /proc/self/mountinfo, which did not convey any information about what
> changed.
>
> To monitor an entire mount namespace with this new interface, watches need
> to be added to all existing mounts. This can be done by performing
> listmount()/statmount() recursively at startup and when a new mount is
> added.
>
> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
> ---
> fs/namespace.c | 23 +++++++++++++++++
> fs/notify/fanotify/fanotify.c | 10 +++++++-
> fs/notify/fanotify/fanotify.h | 7 +++++
> fs/notify/fanotify/fanotify_user.c | 41 ++++++++++++++++++++++++++++++
> fs/notify/fsnotify.c | 2 +-
> include/linux/fanotify.h | 7 +++--
> include/linux/fsnotify.h | 18 +++++++++++++
> include/linux/fsnotify_backend.h | 30 +++++++++++++++++++++-
> include/uapi/linux/fanotify.h | 9 +++++++
> 9 files changed, 142 insertions(+), 5 deletions(-)
>
> diff --git a/fs/namespace.c b/fs/namespace.c
> index 6b0a17487d0f..7724c78df945 100644
> --- a/fs/namespace.c
> +++ b/fs/namespace.c
> @@ -988,12 +988,24 @@ static void __touch_mnt_namespace(struct mnt_namespace *ns)
> }
> }
>
> +static inline void __fsnotify_mnt_detach(struct mount *mnt)
> +{
> + struct path mountpoint = {
> + .mnt = &mnt->mnt_parent->mnt,
> + .dentry = mnt->mnt_mountpoint,
> + };
> + fsnotify_mnt_detach(&mountpoint, &mnt->mnt);
> +}
> +
> /*
> * vfsmount lock must be held for write
> */
> static struct mountpoint *unhash_mnt(struct mount *mnt)
> {
> struct mountpoint *mp;
> +
> + __fsnotify_mnt_detach(mnt);
> +
> mnt->mnt_parent = mnt;
> mnt->mnt_mountpoint = mnt->mnt.mnt_root;
> list_del_init(&mnt->mnt_child);
> @@ -1027,6 +1039,15 @@ void mnt_set_mountpoint(struct mount *mnt,
> hlist_add_head(&child_mnt->mnt_mp_list, &mp->m_list);
> }
>
> +static inline void __fsnotify_mnt_attach(struct mount *mnt)
> +{
> + struct path mountpoint = {
> + .mnt = &mnt->mnt_parent->mnt,
> + .dentry = mnt->mnt_mountpoint,
> + };
> + fsnotify_mnt_attach(&mountpoint, &mnt->mnt);
> +}
> +
> /**
> * mnt_set_mountpoint_beneath - mount a mount beneath another one
> *
> @@ -1059,6 +1080,8 @@ static void __attach_mnt(struct mount *mnt, struct mount *parent)
> hlist_add_head_rcu(&mnt->mnt_hash,
> m_hash(&parent->mnt, mnt->mnt_mountpoint));
> list_add_tail(&mnt->mnt_child, &parent->mnt_mounts);
> +
> + __fsnotify_mnt_attach(mnt);
> }
>
> /**
> diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c
> index 24c7c5df4998..9a93c9b58bd4 100644
> --- a/fs/notify/fanotify/fanotify.c
> +++ b/fs/notify/fanotify/fanotify.c
> @@ -16,6 +16,7 @@
> #include <linux/stringhash.h>
>
> #include "fanotify.h"
> +#include "../../mount.h"
>
> static bool fanotify_path_equal(const struct path *p1, const struct path *p2)
> {
> @@ -715,6 +716,9 @@ static struct fanotify_event *fanotify_alloc_event(
> fid_mode);
> struct inode *dirid = fanotify_dfid_inode(mask, data, data_type, dir);
> const struct path *path = fsnotify_data_path(data, data_type);
> + struct vfsmount *mnt = fsnotify_data_mnt(data, data_type);
> + u64 mnt_id = mnt ? real_mount(mnt)->mnt_id_unique : 0;
> + u64 parent_id = path ? real_mount(path->mnt)->mnt_id_unique : 0;
> struct mem_cgroup *old_memcg;
> struct dentry *moved = NULL;
> struct inode *child = NULL;
> @@ -824,8 +828,12 @@ static struct fanotify_event *fanotify_alloc_event(
>
> /* Mix event info, FAN_ONDIR flag and pid into event merge key */
> hash ^= hash_long((unsigned long)pid | ondir, FANOTIFY_EVENT_HASH_BITS);
> + hash ^= hash_64(mnt_id, FANOTIFY_EVENT_HASH_BITS);
> + hash ^= hash_64(parent_id, FANOTIFY_EVENT_HASH_BITS);
You missed fanotify_should_merge(). IMO FAN_MNT_ events should never be merged
so not sure that mixing this data in the hash is needed.
> fanotify_init_event(event, hash, mask);
> event->pid = pid;
> + event->mnt_id = mnt_id;
> + event->parent_id = parent_id;
>
> out:
> set_active_memcg(old_memcg);
> @@ -910,7 +918,7 @@ static int fanotify_handle_event(struct fsnotify_group *group, u32 mask,
> BUILD_BUG_ON(FAN_FS_ERROR != FS_ERROR);
> BUILD_BUG_ON(FAN_RENAME != FS_RENAME);
>
> - BUILD_BUG_ON(HWEIGHT32(ALL_FANOTIFY_EVENT_BITS) != 21);
> + BUILD_BUG_ON(HWEIGHT32(ALL_FANOTIFY_EVENT_BITS) != 23);
>
> mask = fanotify_group_event_mask(group, iter_info, &match_mask,
> mask, data, data_type, dir);
> diff --git a/fs/notify/fanotify/fanotify.h b/fs/notify/fanotify/fanotify.h
> index e5ab33cae6a7..71cc9cb2335a 100644
> --- a/fs/notify/fanotify/fanotify.h
> +++ b/fs/notify/fanotify/fanotify.h
> @@ -261,6 +261,8 @@ struct fanotify_event {
> unsigned int hash : FANOTIFY_EVENT_HASH_BITS;
> };
> struct pid *pid;
> + u64 mnt_id;
> + u64 parent_id;
There can be many fanotify_fid_event and fanotify_name_event in a queue
of filesystem monitor. It is not nice to burden this memory on them.
I think if we do not HAVE TO mix mntid info and fid info, then we better
stick with event->fd + mntid and add those fields to fanotify_path_event.
We can also inherit specialized fanotify_mnt_event, but I don't think this
is critical.??
> };
>
> static inline void fanotify_init_event(struct fanotify_event *event,
> @@ -456,6 +458,11 @@ static inline bool fanotify_is_error_event(u32 mask)
> return mask & FAN_FS_ERROR;
> }
>
> +static inline bool fanotify_is_mnt_event(u32 mask)
> +{
> + return mask & (FAN_MNT_ATTACH | FAN_MNT_DETACH);
mask & FANOTIFY_MOUNT_EVENTS;
> +}
> +
> static inline const struct path *fanotify_event_path(struct fanotify_event *event)
> {
> if (event->type == FANOTIFY_EVENT_TYPE_PATH)
> diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
> index 2d85c71717d6..adf19e1a10bd 100644
> --- a/fs/notify/fanotify/fanotify_user.c
> +++ b/fs/notify/fanotify/fanotify_user.c
> @@ -122,6 +122,8 @@ struct kmem_cache *fanotify_perm_event_cachep __ro_after_init;
> sizeof(struct fanotify_event_info_pidfd)
> #define FANOTIFY_ERROR_INFO_LEN \
> (sizeof(struct fanotify_event_info_error))
> +#define FANOTIFY_MNT_INFO_LEN \
> + (sizeof(struct fanotify_event_info_mnt))
>
> static int fanotify_fid_info_len(int fh_len, int name_len)
> {
> @@ -159,6 +161,9 @@ static size_t fanotify_event_len(unsigned int info_mode,
> int fh_len;
> int dot_len = 0;
>
> + if (fanotify_is_mnt_event(event->mask))
> + event_len += FANOTIFY_MNT_INFO_LEN;
> +
> if (!info_mode)
> return event_len;
>
> @@ -380,6 +385,26 @@ static int process_access_response(struct fsnotify_group *group,
> return -ENOENT;
> }
>
> +static size_t copy_mnt_info_to_user(struct fanotify_event *event,
> + char __user *buf, int count)
> +{
> + struct fanotify_event_info_mnt info = { };
> +
> + info.hdr.info_type = FAN_EVENT_INFO_TYPE_MNT;
> + info.hdr.len = FANOTIFY_MNT_INFO_LEN;
> +
> + if (WARN_ON(count < info.hdr.len))
> + return -EFAULT;
> +
> + info.mnt_id = event->mnt_id;
> + info.parent_id = event->parent_id;
> +
> + if (copy_to_user(buf, &info, sizeof(info)))
> + return -EFAULT;
> +
> + return info.hdr.len;
> +}
> +
> static size_t copy_error_info_to_user(struct fanotify_event *event,
> char __user *buf, int count)
> {
> @@ -656,6 +681,7 @@ static ssize_t copy_event_to_user(struct fsnotify_group *group,
> unsigned int pidfd_mode = info_mode & FAN_REPORT_PIDFD;
> struct file *f = NULL, *pidfd_file = NULL;
> int ret, pidfd = -ESRCH, fd = -EBADF;
> + int total_bytes = 0;
>
> pr_debug("%s: group=%p event=%p\n", __func__, group, event);
>
> @@ -755,14 +781,29 @@ static ssize_t copy_event_to_user(struct fsnotify_group *group,
>
> buf += FAN_EVENT_METADATA_LEN;
> count -= FAN_EVENT_METADATA_LEN;
> + total_bytes += FAN_EVENT_METADATA_LEN;
>
> if (info_mode) {
> ret = copy_info_records_to_user(event, info, info_mode, pidfd,
> buf, count);
> if (ret < 0)
> goto out_close_fd;
> +
> + buf += ret;
> + count -= ret;
> + total_bytes += ret;
> + }
> +
> + if (fanotify_is_mnt_event(event->mask)) {
> + ret = copy_mnt_info_to_user(event, buf, count);
> + if (ret < 0)
> + goto out_close_fd;
> +
> + total_bytes += ret;
> }
See patch "fanotify: don't skip extra event info if no info_mode is set"
in Jan's fsnotify_hsm branch.
This should be inside copy_info_records_to_user().
>
> + WARN_ON(metadata.event_len != total_bytes);
> +
> if (f)
> fd_install(fd, f);
>
> diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c
> index f976949d2634..1d5831b127e6 100644
> --- a/fs/notify/fsnotify.c
> +++ b/fs/notify/fsnotify.c
> @@ -627,7 +627,7 @@ static __init int fsnotify_init(void)
> {
> int ret;
>
> - BUILD_BUG_ON(HWEIGHT32(ALL_FSNOTIFY_BITS) != 23);
> + BUILD_BUG_ON(HWEIGHT32(ALL_FSNOTIFY_BITS) != 25);
>
> ret = init_srcu_struct(&fsnotify_mark_srcu);
> if (ret)
> diff --git a/include/linux/fanotify.h b/include/linux/fanotify.h
> index 89ff45bd6f01..84e6f81bdb1b 100644
> --- a/include/linux/fanotify.h
> +++ b/include/linux/fanotify.h
> @@ -90,7 +90,7 @@
> FAN_RENAME)
>
> /* Events that can be reported with event->fd */
> -#define FANOTIFY_FD_EVENTS (FANOTIFY_PATH_EVENTS | FANOTIFY_PERM_EVENTS)
> +#define FANOTIFY_FD_EVENTS (FANOTIFY_PATH_EVENTS | FANOTIFY_PERM_EVENTS | FANOTIFY_MOUNT_EVENTS)
>
> /* Events that can only be reported with data type FSNOTIFY_EVENT_INODE */
> #define FANOTIFY_INODE_EVENTS (FANOTIFY_DIRENT_EVENTS | \
> @@ -99,10 +99,13 @@
> /* Events that can only be reported with data type FSNOTIFY_EVENT_ERROR */
> #define FANOTIFY_ERROR_EVENTS (FAN_FS_ERROR)
>
> +#define FANOTIFY_MOUNT_EVENTS (FAN_MNT_ATTACH | FAN_MNT_DETACH)
> +
> /* Events that user can request to be notified on */
> #define FANOTIFY_EVENTS (FANOTIFY_PATH_EVENTS | \
> FANOTIFY_INODE_EVENTS | \
> - FANOTIFY_ERROR_EVENTS)
> + FANOTIFY_ERROR_EVENTS | \
> + FANOTIFY_MOUNT_EVENTS )
>
> /* Events that require a permission response from user */
> #define FANOTIFY_PERM_EVENTS (FAN_OPEN_PERM | FAN_ACCESS_PERM | \
> diff --git a/include/linux/fsnotify.h b/include/linux/fsnotify.h
> index 278620e063ab..4129347e4f16 100644
> --- a/include/linux/fsnotify.h
> +++ b/include/linux/fsnotify.h
> @@ -463,4 +463,22 @@ static inline int fsnotify_sb_error(struct super_block *sb, struct inode *inode,
> NULL, NULL, NULL, 0);
> }
>
> +static inline void fsnotify_mnt_attach(struct path *mountpoint, struct vfsmount *mnt)
> +{
> + struct fsnotify_mnt data = {
> + .path = mountpoint,
> + .mnt = mnt,
> + };
> + fsnotify(FS_MNT_ATTACH, &data, FSNOTIFY_EVENT_MNT, NULL, NULL, NULL, 0);
> +}
> +
> +static inline void fsnotify_mnt_detach(struct path *mountpoint, struct vfsmount *mnt)
> +{
> + struct fsnotify_mnt data = {
> + .path = mountpoint,
> + .mnt = mnt,
> + };
> + fsnotify(FS_MNT_DETACH, &data, FSNOTIFY_EVENT_MNT, NULL, NULL, NULL, 0);
> +}
> +
> #endif /* _LINUX_FS_NOTIFY_H */
> diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
> index 3ecf7768e577..1e9c15ad64b6 100644
> --- a/include/linux/fsnotify_backend.h
> +++ b/include/linux/fsnotify_backend.h
> @@ -56,6 +56,9 @@
> #define FS_ACCESS_PERM 0x00020000 /* access event in a permissions hook */
> #define FS_OPEN_EXEC_PERM 0x00040000 /* open/exec event in a permission hook */
>
> +#define FS_MNT_ATTACH 0x00100000 /* Mount was attached */
> +#define FS_MNT_DETACH 0x00200000 /* Mount was detached */
> +
> /*
> * Set on inode mark that cares about things that happen to its children.
> * Always set for dnotify and inotify.
> @@ -102,7 +105,7 @@
> FS_EVENTS_POSS_ON_CHILD | \
> FS_DELETE_SELF | FS_MOVE_SELF | \
> FS_UNMOUNT | FS_Q_OVERFLOW | FS_IN_IGNORED | \
> - FS_ERROR)
> + FS_ERROR | FS_MNT_ATTACH | FS_MNT_DETACH)
>
> /* Extra flags that may be reported with event or control handling of events */
> #define ALL_FSNOTIFY_FLAGS (FS_ISDIR | FS_EVENT_ON_CHILD | FS_DN_MULTISHOT)
> @@ -288,6 +291,7 @@ enum fsnotify_data_type {
> FSNOTIFY_EVENT_PATH,
> FSNOTIFY_EVENT_INODE,
> FSNOTIFY_EVENT_DENTRY,
> + FSNOTIFY_EVENT_MNT,
> FSNOTIFY_EVENT_ERROR,
> };
>
> @@ -297,6 +301,11 @@ struct fs_error_report {
> struct super_block *sb;
> };
>
> +struct fsnotify_mnt {
> + const struct path *path;
> + struct vfsmount *mnt;
> +};
> +
> static inline struct inode *fsnotify_data_inode(const void *data, int data_type)
> {
> switch (data_type) {
> @@ -306,6 +315,8 @@ static inline struct inode *fsnotify_data_inode(const void *data, int data_type)
> return d_inode(data);
> case FSNOTIFY_EVENT_PATH:
> return d_inode(((const struct path *)data)->dentry);
> + case FSNOTIFY_EVENT_MNT:
> + return d_inode(((struct fsnotify_mnt *)data)->path->dentry);
> case FSNOTIFY_EVENT_ERROR:
> return ((struct fs_error_report *)data)->inode;
> default:
> @@ -321,6 +332,8 @@ static inline struct dentry *fsnotify_data_dentry(const void *data, int data_typ
> return (struct dentry *)data;
> case FSNOTIFY_EVENT_PATH:
> return ((const struct path *)data)->dentry;
> + case FSNOTIFY_EVENT_MNT:
> + return ((const struct fsnotify_mnt *)data)->path->dentry;
> default:
> return NULL;
> }
> @@ -332,6 +345,8 @@ static inline const struct path *fsnotify_data_path(const void *data,
> switch (data_type) {
> case FSNOTIFY_EVENT_PATH:
> return data;
> + case FSNOTIFY_EVENT_MNT:
> + return ((const struct fsnotify_mnt *)data)->path;
> default:
> return NULL;
> }
> @@ -347,6 +362,8 @@ static inline struct super_block *fsnotify_data_sb(const void *data,
> return ((struct dentry *)data)->d_sb;
> case FSNOTIFY_EVENT_PATH:
> return ((const struct path *)data)->dentry->d_sb;
> + case FSNOTIFY_EVENT_MNT:
> + return ((const struct fsnotify_mnt *)data)->mnt->mnt_sb;
> case FSNOTIFY_EVENT_ERROR:
> return ((struct fs_error_report *) data)->sb;
> default:
> @@ -354,6 +371,17 @@ static inline struct super_block *fsnotify_data_sb(const void *data,
> }
> }
>
> +static inline struct vfsmount *fsnotify_data_mnt(const void *data,
> + int data_type)
> +{
> + switch (data_type) {
> + case FSNOTIFY_EVENT_MNT:
> + return ((const struct fsnotify_mnt *)data)->mnt;
> + default:
> + return NULL;
> + }
> +}
> +
> static inline struct fs_error_report *fsnotify_data_error_report(
> const void *data,
> int data_type)
> diff --git a/include/uapi/linux/fanotify.h b/include/uapi/linux/fanotify.h
> index 34f221d3a1b9..8b2d47947bc2 100644
> --- a/include/uapi/linux/fanotify.h
> +++ b/include/uapi/linux/fanotify.h
> @@ -25,6 +25,8 @@
> #define FAN_OPEN_PERM 0x00010000 /* File open in perm check */
> #define FAN_ACCESS_PERM 0x00020000 /* File accessed in perm check */
> #define FAN_OPEN_EXEC_PERM 0x00040000 /* File open/exec in perm check */
> +#define FAN_MNT_ATTACH 0x00100000 /* Mount was attached */
> +#define FAN_MNT_DETACH 0x00200000 /* Mount was detached */
>
> #define FAN_EVENT_ON_CHILD 0x08000000 /* Interested in child events */
>
> @@ -143,6 +145,7 @@ struct fanotify_event_metadata {
> #define FAN_EVENT_INFO_TYPE_DFID 3
> #define FAN_EVENT_INFO_TYPE_PIDFD 4
> #define FAN_EVENT_INFO_TYPE_ERROR 5
> +#define FAN_EVENT_INFO_TYPE_MNT 6
>
> /* Special info types for FAN_RENAME */
> #define FAN_EVENT_INFO_TYPE_OLD_DFID_NAME 10
> @@ -189,6 +192,12 @@ struct fanotify_event_info_error {
> __u32 error_count;
> };
>
> +struct fanotify_event_info_mnt {
> + struct fanotify_event_info_header hdr;
> + __u64 mnt_id;
> + __u64 parent_id;
> +};
> +
> /*
> * User space may need to record additional information about its decision.
> * The extra information type records what kind of information is included.
> --
> 2.47.0
>
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: [RFC PATCH] fanotify: notify on mount attach and detach
2024-11-28 16:43 ` Amir Goldstein
@ 2024-11-29 7:16 ` Miklos Szeredi
2024-11-29 10:23 ` Amir Goldstein
0 siblings, 1 reply; 13+ messages in thread
From: Miklos Szeredi @ 2024-11-29 7:16 UTC (permalink / raw)
To: Amir Goldstein
Cc: Miklos Szeredi, linux-fsdevel, Jan Kara, Karel Zak,
Lennart Poettering, Ian Kent, Alexander Viro, Christian Brauner
On Thu, 28 Nov 2024 at 17:44, Amir Goldstein <amir73il@gmail.com> wrote:
> This sounds good, but do the watchers actually need this information
> or is it redundant to parent_id?
Everything but mnt_id is redundant, since they can be retrieved with statmount.
I thought, why not use the existing infrastructure in the event? But
it's not strictly needed.
> If we are not sure that reporting fd/fid is needed, then we can limit
> FAN_REPORT_MNTID | FAN_REPORT_*FID now and consider adding it later.
>
> WDYT?
Sounds good.
> You missed fanotify_should_merge(). IMO FAN_MNT_ events should never be merged
> so not sure that mixing this data in the hash is needed.
Okay.
> I think if we do not HAVE TO mix mntid info and fid info, then we better
> stick with event->fd + mntid and add those fields to fanotify_path_event.
Okay.
> See patch "fanotify: don't skip extra event info if no info_mode is set"
> in Jan's fsnotify_hsm branch.
> This should be inside copy_info_records_to_user().
Makes sense. I was wondering why copy_info_records_to_user() was
called conditionally.
Thanks,
Miklos
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC PATCH] fanotify: notify on mount attach and detach
2024-11-29 7:16 ` Miklos Szeredi
@ 2024-11-29 10:23 ` Amir Goldstein
2024-12-03 15:36 ` Miklos Szeredi
0 siblings, 1 reply; 13+ messages in thread
From: Amir Goldstein @ 2024-11-29 10:23 UTC (permalink / raw)
To: Miklos Szeredi
Cc: Miklos Szeredi, linux-fsdevel, Jan Kara, Karel Zak,
Lennart Poettering, Ian Kent, Alexander Viro, Christian Brauner
On Fri, Nov 29, 2024 at 8:16 AM Miklos Szeredi <miklos@szeredi.hu> wrote:
>
> On Thu, 28 Nov 2024 at 17:44, Amir Goldstein <amir73il@gmail.com> wrote:
>
> > This sounds good, but do the watchers actually need this information
> > or is it redundant to parent_id?
>
> Everything but mnt_id is redundant, since they can be retrieved with statmount.
>
> I thought, why not use the existing infrastructure in the event? But
> it's not strictly needed.
>
> > If we are not sure that reporting fd/fid is needed, then we can limit
> > FAN_REPORT_MNTID | FAN_REPORT_*FID now and consider adding it later.
> >
> > WDYT?
>
> Sounds good.
>
>
> > You missed fanotify_should_merge(). IMO FAN_MNT_ events should never be merged
> > so not sure that mixing this data in the hash is needed.
>
> Okay.
>
> > I think if we do not HAVE TO mix mntid info and fid info, then we better
> > stick with event->fd + mntid and add those fields to fanotify_path_event.
>
> Okay.
>
> > See patch "fanotify: don't skip extra event info if no info_mode is set"
> > in Jan's fsnotify_hsm branch.
> > This should be inside copy_info_records_to_user().
>
> Makes sense. I was wondering why copy_info_records_to_user() was
> called conditionally.
Up to pre-content events and FAN_EVENT_INFO_TYPE_RANGE,
fanotify_event_metadata had preserved its legacy non-extended format,
unless userspace explicitly opted-in to acknowledge the format extension
with one of the FAN_REPORT_ init flags.
If you take the route of FAN_REPORT_MNTID that I suggested, and add this
flag to FANOTIFY_INFO_MODES then copy_info_records_to_user() will be
called anyway.
This brings the question: should the FAN_REPORT_MNT flag be added to
FANOTIFY_ADMIN_INIT_FLAGS or FANOTIFY_USER_INIT_FLAGS?
Currently, watching a sb/mount requires capable(SYS_ADMIN),
but I have a pretty simple patchset [1] to require ns_capable(SYS_ADMIN).
Thing is, I never got feedback from userspace that this is needed [2].
Seeing that statmount/listmount() requires at most ns_capable(SYS_ADMIN),
I am guessing that you would also want mount monitor to require
at most ns_capable(SYS_ADMIN) rather than capable(SYS_ADMIN)?
If that is the case, feel free to pick up my patches and test mount monitor
inside userns.
Which then still leaves the question: do we want to allow an unprivileged
user to setup an inode mark with FAN_MNT_ events on a mount point?
This can be a bit tricky, because currently setting up an inode mark only
requires that the caller has access to that path.
That could lead to sending events with mntid from other namespaces
to unprivileged user watching a mount point inode.
Option #1: do not allow setting FAN_MNT_ events on inode marks (for now)
Option #2: apply the same requirement for sb mark from fanotify_userns patch
to inode mark on group with FAN_REPORT_MNTID.
Thanks,
Amir.
[1] https://github.com/amir73il/linux/commits/fanotify_userns/
[2] https://lore.kernel.org/linux-fsdevel/CAOQ4uxiwGTg=FeO6iiLEwtsP9eTudw-rsLD_0u3NtG8rz5chFg@mail.gmail.com/
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC PATCH] fanotify: notify on mount attach and detach
2024-11-29 10:23 ` Amir Goldstein
@ 2024-12-03 15:36 ` Miklos Szeredi
0 siblings, 0 replies; 13+ messages in thread
From: Miklos Szeredi @ 2024-12-03 15:36 UTC (permalink / raw)
To: Amir Goldstein
Cc: Miklos Szeredi, linux-fsdevel, Jan Kara, Karel Zak,
Lennart Poettering, Ian Kent, Alexander Viro, Christian Brauner
On Fri, 29 Nov 2024 at 11:23, Amir Goldstein <amir73il@gmail.com> wrote:
> Currently, watching a sb/mount requires capable(SYS_ADMIN),
> but I have a pretty simple patchset [1] to require ns_capable(SYS_ADMIN).
> Thing is, I never got feedback from userspace that this is needed [2].
> Seeing that statmount/listmount() requires at most ns_capable(SYS_ADMIN),
> I am guessing that you would also want mount monitor to require
> at most ns_capable(SYS_ADMIN) rather than capable(SYS_ADMIN)?
Yes, allowing this to work in a userns makes sense.
> Option #1: do not allow setting FAN_MNT_ events on inode marks (for now)
> Option #2: apply the same requirement for sb mark from fanotify_userns patch
> to inode mark on group with FAN_REPORT_MNTID.
Let's go with #1, as that gives the simplest interface. We can extend
that later if needed.
Thanks,
Miklos
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC PATCH] fanotify: notify on mount attach and detach
2024-11-28 14:39 [RFC PATCH] fanotify: notify on mount attach and detach Miklos Szeredi
2024-11-28 16:43 ` Amir Goldstein
@ 2024-11-28 19:57 ` Al Viro
2024-11-29 7:10 ` Miklos Szeredi
2024-11-29 9:25 ` Christian Brauner
2024-12-03 11:40 ` Karel Zak
3 siblings, 1 reply; 13+ messages in thread
From: Al Viro @ 2024-11-28 19:57 UTC (permalink / raw)
To: Miklos Szeredi
Cc: linux-fsdevel, Jan Kara, Amir Goldstein, Karel Zak,
Lennart Poettering, Ian Kent, Christian Brauner
On Thu, Nov 28, 2024 at 03:39:59PM +0100, Miklos Szeredi wrote:
> Prior to this patch mount namespace changes could be monitored by polling
> /proc/self/mountinfo, which did not convey any information about what
> changed.
>
> To monitor an entire mount namespace with this new interface, watches need
> to be added to all existing mounts. This can be done by performing
> listmount()/statmount() recursively at startup and when a new mount is
> added.
First impression is that it's bloody awful, TBH. You are calling fsnotify()
under mount_lock; in effect, *ANY* path_init() call done during that time
will be spinning in __read_seqcount_begin() until you are done with that
shite.
And it's _very_ easy to generate a lot of such events with a single syscall;
that doesn't even need sroot - a root in container will suffice.
So... why is it not a DoS?
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC PATCH] fanotify: notify on mount attach and detach
2024-11-28 19:57 ` Al Viro
@ 2024-11-29 7:10 ` Miklos Szeredi
0 siblings, 0 replies; 13+ messages in thread
From: Miklos Szeredi @ 2024-11-29 7:10 UTC (permalink / raw)
To: Al Viro
Cc: Miklos Szeredi, linux-fsdevel, Jan Kara, Amir Goldstein,
Karel Zak, Lennart Poettering, Ian Kent, Christian Brauner
On Thu, 28 Nov 2024 at 20:57, Al Viro <viro@zeniv.linux.org.uk> wrote:
> So... why is it not a DoS?
Certainly looks like one. I intentionally didn't give too much
thought to the hook placement to be able to send out a quick proof of
concept patch. Will move them outside of mount_lock.
Thanks,
Mikos
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC PATCH] fanotify: notify on mount attach and detach
2024-11-28 14:39 [RFC PATCH] fanotify: notify on mount attach and detach Miklos Szeredi
2024-11-28 16:43 ` Amir Goldstein
2024-11-28 19:57 ` Al Viro
@ 2024-11-29 9:25 ` Christian Brauner
2024-12-03 11:40 ` Karel Zak
3 siblings, 0 replies; 13+ messages in thread
From: Christian Brauner @ 2024-11-29 9:25 UTC (permalink / raw)
To: Miklos Szeredi
Cc: linux-fsdevel, Jan Kara, Amir Goldstein, Karel Zak,
Lennart Poettering, Ian Kent, Alexander Viro
On Thu, Nov 28, 2024 at 03:39:59PM +0100, Miklos Szeredi wrote:
> Add notifications for attaching and detaching mounts. The following new
> event masks are added:
>
> FAN_MNT_ATTACH - Mount was attached
> FAN_MNT_DETACH - Mount was detached
>
> These events add an info record of type FAN_EVENT_INFO_TYPE_MNT containing
> these fields identifying the affected mounts:
>
> __u64 mnt_id - the attached or detached mount
> __u64 parent_id - where the mount was attached to or detached from
>
> The mountpoint object (file or directory) is also contained in the event
> either as an FD or a FID.
>
> Adding marks with FAN_MARK_MOUNT and FAN_MARK_FILESYSTEM both work.
>
> FAN_MARK_INODE doesn't, not sure why. I think it might make sense to make
> it work, i.e. allow watching a single mountpoint.
>
> Prior to this patch mount namespace changes could be monitored by polling
> /proc/self/mountinfo, which did not convey any information about what
> changed.
>
> To monitor an entire mount namespace with this new interface, watches need
> to be added to all existing mounts. This can be done by performing
> listmount()/statmount() recursively at startup and when a new mount is
> added.
>
> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
> ---
> fs/namespace.c | 23 +++++++++++++++++
> fs/notify/fanotify/fanotify.c | 10 +++++++-
> fs/notify/fanotify/fanotify.h | 7 +++++
> fs/notify/fanotify/fanotify_user.c | 41 ++++++++++++++++++++++++++++++
> fs/notify/fsnotify.c | 2 +-
> include/linux/fanotify.h | 7 +++--
> include/linux/fsnotify.h | 18 +++++++++++++
> include/linux/fsnotify_backend.h | 30 +++++++++++++++++++++-
> include/uapi/linux/fanotify.h | 9 +++++++
> 9 files changed, 142 insertions(+), 5 deletions(-)
>
> diff --git a/fs/namespace.c b/fs/namespace.c
> index 6b0a17487d0f..7724c78df945 100644
> --- a/fs/namespace.c
> +++ b/fs/namespace.c
> @@ -988,12 +988,24 @@ static void __touch_mnt_namespace(struct mnt_namespace *ns)
> }
> }
>
> +static inline void __fsnotify_mnt_detach(struct mount *mnt)
> +{
> + struct path mountpoint = {
> + .mnt = &mnt->mnt_parent->mnt,
> + .dentry = mnt->mnt_mountpoint,
> + };
> + fsnotify_mnt_detach(&mountpoint, &mnt->mnt);
> +}
> +
> /*
> * vfsmount lock must be held for write
> */
> static struct mountpoint *unhash_mnt(struct mount *mnt)
> {
> struct mountpoint *mp;
> +
> + __fsnotify_mnt_detach(mnt);
> +
> mnt->mnt_parent = mnt;
> mnt->mnt_mountpoint = mnt->mnt.mnt_root;
> list_del_init(&mnt->mnt_child);
> @@ -1027,6 +1039,15 @@ void mnt_set_mountpoint(struct mount *mnt,
> hlist_add_head(&child_mnt->mnt_mp_list, &mp->m_list);
> }
>
> +static inline void __fsnotify_mnt_attach(struct mount *mnt)
> +{
> + struct path mountpoint = {
> + .mnt = &mnt->mnt_parent->mnt,
> + .dentry = mnt->mnt_mountpoint,
> + };
> + fsnotify_mnt_attach(&mountpoint, &mnt->mnt);
> +}
> +
> /**
> * mnt_set_mountpoint_beneath - mount a mount beneath another one
> *
> @@ -1059,6 +1080,8 @@ static void __attach_mnt(struct mount *mnt, struct mount *parent)
> hlist_add_head_rcu(&mnt->mnt_hash,
> m_hash(&parent->mnt, mnt->mnt_mountpoint));
> list_add_tail(&mnt->mnt_child, &parent->mnt_mounts);
> +
> + __fsnotify_mnt_attach(mnt);
> }
It was already mentioned in the thread but unhash_mnt() and
__attach_mnt() are called under lock_mnt_hash() which is problematic.
I do like that we seem to be getting away without a lot of aweful custom
machinery in the mount specific code. So that seems promising to me.
You shouldn't need to call that under lock_mount_hash(). Calling that
under namespace_sem should be perfectly fine as that needs to be held to
add or remove mounts into any namespace anyway.
One possibility would be to use the real_mount(mnt)->mnt_fsnotify_marks
struct and add a list_head or similar in there and link all new mounts
in attach_recursive_mnt() to a list. So basically:
diff --git a/fs/namespace.c b/fs/namespace.c
index dc789f2751d7..b4a150f76bf6 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -2475,6 +2475,7 @@ static int attach_recursive_mnt(struct mount *source_mnt,
{
struct user_namespace *user_ns = current->nsproxy->mnt_ns->user_ns;
HLIST_HEAD(tree_list);
+ HLIST_HEAD(fsnotify_list);
struct mnt_namespace *ns = top_mnt->mnt_ns;
struct mountpoint *smp;
struct mount *child, *dest_mnt, *p;
@@ -2508,6 +2509,9 @@ static int attach_recursive_mnt(struct mount *source_mnt,
goto out;
err = propagate_mnt(dest_mnt, dest_mp, source_mnt, &tree_list);
}
+
+ // Add all mounts from tree_list to fsnotify_list.
+
lock_mount_hash();
if (err)
goto out_cleanup_ids;
@@ -2539,6 +2543,8 @@ static int attach_recursive_mnt(struct mount *source_mnt,
commit_tree(source_mnt);
}
+ // Now add "main" mount to the beginning of the list.
+
hlist_for_each_entry_safe(child, n, &tree_list, mnt_hash) {
struct mount *q;
hlist_del_init(&child->mnt_hash);
@@ -2555,6 +2561,9 @@ static int attach_recursive_mnt(struct mount *source_mnt,
put_mountpoint(smp);
unlock_mount_hash();
+ // We know that namespace_sem is held, walk through fsnotify_list and
+ // send mount notifications.
+
return 0;
> /**
> diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c
> index 24c7c5df4998..9a93c9b58bd4 100644
> --- a/fs/notify/fanotify/fanotify.c
> +++ b/fs/notify/fanotify/fanotify.c
> @@ -16,6 +16,7 @@
> #include <linux/stringhash.h>
>
> #include "fanotify.h"
> +#include "../../mount.h"
>
> static bool fanotify_path_equal(const struct path *p1, const struct path *p2)
> {
> @@ -715,6 +716,9 @@ static struct fanotify_event *fanotify_alloc_event(
> fid_mode);
> struct inode *dirid = fanotify_dfid_inode(mask, data, data_type, dir);
> const struct path *path = fsnotify_data_path(data, data_type);
> + struct vfsmount *mnt = fsnotify_data_mnt(data, data_type);
> + u64 mnt_id = mnt ? real_mount(mnt)->mnt_id_unique : 0;
> + u64 parent_id = path ? real_mount(path->mnt)->mnt_id_unique : 0;
> struct mem_cgroup *old_memcg;
> struct dentry *moved = NULL;
> struct inode *child = NULL;
> @@ -824,8 +828,12 @@ static struct fanotify_event *fanotify_alloc_event(
>
> /* Mix event info, FAN_ONDIR flag and pid into event merge key */
> hash ^= hash_long((unsigned long)pid | ondir, FANOTIFY_EVENT_HASH_BITS);
> + hash ^= hash_64(mnt_id, FANOTIFY_EVENT_HASH_BITS);
> + hash ^= hash_64(parent_id, FANOTIFY_EVENT_HASH_BITS);
> fanotify_init_event(event, hash, mask);
> event->pid = pid;
> + event->mnt_id = mnt_id;
> + event->parent_id = parent_id;
>
> out:
> set_active_memcg(old_memcg);
> @@ -910,7 +918,7 @@ static int fanotify_handle_event(struct fsnotify_group *group, u32 mask,
> BUILD_BUG_ON(FAN_FS_ERROR != FS_ERROR);
> BUILD_BUG_ON(FAN_RENAME != FS_RENAME);
>
> - BUILD_BUG_ON(HWEIGHT32(ALL_FANOTIFY_EVENT_BITS) != 21);
> + BUILD_BUG_ON(HWEIGHT32(ALL_FANOTIFY_EVENT_BITS) != 23);
>
> mask = fanotify_group_event_mask(group, iter_info, &match_mask,
> mask, data, data_type, dir);
> diff --git a/fs/notify/fanotify/fanotify.h b/fs/notify/fanotify/fanotify.h
> index e5ab33cae6a7..71cc9cb2335a 100644
> --- a/fs/notify/fanotify/fanotify.h
> +++ b/fs/notify/fanotify/fanotify.h
> @@ -261,6 +261,8 @@ struct fanotify_event {
> unsigned int hash : FANOTIFY_EVENT_HASH_BITS;
> };
> struct pid *pid;
> + u64 mnt_id;
> + u64 parent_id;
> };
>
> static inline void fanotify_init_event(struct fanotify_event *event,
> @@ -456,6 +458,11 @@ static inline bool fanotify_is_error_event(u32 mask)
> return mask & FAN_FS_ERROR;
> }
>
> +static inline bool fanotify_is_mnt_event(u32 mask)
> +{
> + return mask & (FAN_MNT_ATTACH | FAN_MNT_DETACH);
> +}
> +
> static inline const struct path *fanotify_event_path(struct fanotify_event *event)
> {
> if (event->type == FANOTIFY_EVENT_TYPE_PATH)
> diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
> index 2d85c71717d6..adf19e1a10bd 100644
> --- a/fs/notify/fanotify/fanotify_user.c
> +++ b/fs/notify/fanotify/fanotify_user.c
> @@ -122,6 +122,8 @@ struct kmem_cache *fanotify_perm_event_cachep __ro_after_init;
> sizeof(struct fanotify_event_info_pidfd)
> #define FANOTIFY_ERROR_INFO_LEN \
> (sizeof(struct fanotify_event_info_error))
> +#define FANOTIFY_MNT_INFO_LEN \
> + (sizeof(struct fanotify_event_info_mnt))
>
> static int fanotify_fid_info_len(int fh_len, int name_len)
> {
> @@ -159,6 +161,9 @@ static size_t fanotify_event_len(unsigned int info_mode,
> int fh_len;
> int dot_len = 0;
>
> + if (fanotify_is_mnt_event(event->mask))
> + event_len += FANOTIFY_MNT_INFO_LEN;
> +
> if (!info_mode)
> return event_len;
>
> @@ -380,6 +385,26 @@ static int process_access_response(struct fsnotify_group *group,
> return -ENOENT;
> }
>
> +static size_t copy_mnt_info_to_user(struct fanotify_event *event,
> + char __user *buf, int count)
> +{
> + struct fanotify_event_info_mnt info = { };
> +
> + info.hdr.info_type = FAN_EVENT_INFO_TYPE_MNT;
> + info.hdr.len = FANOTIFY_MNT_INFO_LEN;
> +
> + if (WARN_ON(count < info.hdr.len))
> + return -EFAULT;
> +
> + info.mnt_id = event->mnt_id;
> + info.parent_id = event->parent_id;
> +
> + if (copy_to_user(buf, &info, sizeof(info)))
> + return -EFAULT;
> +
> + return info.hdr.len;
> +}
> +
> static size_t copy_error_info_to_user(struct fanotify_event *event,
> char __user *buf, int count)
> {
> @@ -656,6 +681,7 @@ static ssize_t copy_event_to_user(struct fsnotify_group *group,
> unsigned int pidfd_mode = info_mode & FAN_REPORT_PIDFD;
> struct file *f = NULL, *pidfd_file = NULL;
> int ret, pidfd = -ESRCH, fd = -EBADF;
> + int total_bytes = 0;
>
> pr_debug("%s: group=%p event=%p\n", __func__, group, event);
>
> @@ -755,14 +781,29 @@ static ssize_t copy_event_to_user(struct fsnotify_group *group,
>
> buf += FAN_EVENT_METADATA_LEN;
> count -= FAN_EVENT_METADATA_LEN;
> + total_bytes += FAN_EVENT_METADATA_LEN;
>
> if (info_mode) {
> ret = copy_info_records_to_user(event, info, info_mode, pidfd,
> buf, count);
> if (ret < 0)
> goto out_close_fd;
> +
> + buf += ret;
> + count -= ret;
> + total_bytes += ret;
> + }
> +
> + if (fanotify_is_mnt_event(event->mask)) {
> + ret = copy_mnt_info_to_user(event, buf, count);
> + if (ret < 0)
> + goto out_close_fd;
> +
> + total_bytes += ret;
> }
>
> + WARN_ON(metadata.event_len != total_bytes);
> +
> if (f)
> fd_install(fd, f);
>
> diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c
> index f976949d2634..1d5831b127e6 100644
> --- a/fs/notify/fsnotify.c
> +++ b/fs/notify/fsnotify.c
> @@ -627,7 +627,7 @@ static __init int fsnotify_init(void)
> {
> int ret;
>
> - BUILD_BUG_ON(HWEIGHT32(ALL_FSNOTIFY_BITS) != 23);
> + BUILD_BUG_ON(HWEIGHT32(ALL_FSNOTIFY_BITS) != 25);
>
> ret = init_srcu_struct(&fsnotify_mark_srcu);
> if (ret)
> diff --git a/include/linux/fanotify.h b/include/linux/fanotify.h
> index 89ff45bd6f01..84e6f81bdb1b 100644
> --- a/include/linux/fanotify.h
> +++ b/include/linux/fanotify.h
> @@ -90,7 +90,7 @@
> FAN_RENAME)
>
> /* Events that can be reported with event->fd */
> -#define FANOTIFY_FD_EVENTS (FANOTIFY_PATH_EVENTS | FANOTIFY_PERM_EVENTS)
> +#define FANOTIFY_FD_EVENTS (FANOTIFY_PATH_EVENTS | FANOTIFY_PERM_EVENTS | FANOTIFY_MOUNT_EVENTS)
>
> /* Events that can only be reported with data type FSNOTIFY_EVENT_INODE */
> #define FANOTIFY_INODE_EVENTS (FANOTIFY_DIRENT_EVENTS | \
> @@ -99,10 +99,13 @@
> /* Events that can only be reported with data type FSNOTIFY_EVENT_ERROR */
> #define FANOTIFY_ERROR_EVENTS (FAN_FS_ERROR)
>
> +#define FANOTIFY_MOUNT_EVENTS (FAN_MNT_ATTACH | FAN_MNT_DETACH)
> +
> /* Events that user can request to be notified on */
> #define FANOTIFY_EVENTS (FANOTIFY_PATH_EVENTS | \
> FANOTIFY_INODE_EVENTS | \
> - FANOTIFY_ERROR_EVENTS)
> + FANOTIFY_ERROR_EVENTS | \
> + FANOTIFY_MOUNT_EVENTS )
>
> /* Events that require a permission response from user */
> #define FANOTIFY_PERM_EVENTS (FAN_OPEN_PERM | FAN_ACCESS_PERM | \
> diff --git a/include/linux/fsnotify.h b/include/linux/fsnotify.h
> index 278620e063ab..4129347e4f16 100644
> --- a/include/linux/fsnotify.h
> +++ b/include/linux/fsnotify.h
> @@ -463,4 +463,22 @@ static inline int fsnotify_sb_error(struct super_block *sb, struct inode *inode,
> NULL, NULL, NULL, 0);
> }
>
> +static inline void fsnotify_mnt_attach(struct path *mountpoint, struct vfsmount *mnt)
> +{
> + struct fsnotify_mnt data = {
> + .path = mountpoint,
> + .mnt = mnt,
> + };
> + fsnotify(FS_MNT_ATTACH, &data, FSNOTIFY_EVENT_MNT, NULL, NULL, NULL, 0);
> +}
> +
> +static inline void fsnotify_mnt_detach(struct path *mountpoint, struct vfsmount *mnt)
> +{
> + struct fsnotify_mnt data = {
> + .path = mountpoint,
> + .mnt = mnt,
> + };
> + fsnotify(FS_MNT_DETACH, &data, FSNOTIFY_EVENT_MNT, NULL, NULL, NULL, 0);
> +}
> +
> #endif /* _LINUX_FS_NOTIFY_H */
> diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
> index 3ecf7768e577..1e9c15ad64b6 100644
> --- a/include/linux/fsnotify_backend.h
> +++ b/include/linux/fsnotify_backend.h
> @@ -56,6 +56,9 @@
> #define FS_ACCESS_PERM 0x00020000 /* access event in a permissions hook */
> #define FS_OPEN_EXEC_PERM 0x00040000 /* open/exec event in a permission hook */
>
> +#define FS_MNT_ATTACH 0x00100000 /* Mount was attached */
> +#define FS_MNT_DETACH 0x00200000 /* Mount was detached */
> +
> /*
> * Set on inode mark that cares about things that happen to its children.
> * Always set for dnotify and inotify.
> @@ -102,7 +105,7 @@
> FS_EVENTS_POSS_ON_CHILD | \
> FS_DELETE_SELF | FS_MOVE_SELF | \
> FS_UNMOUNT | FS_Q_OVERFLOW | FS_IN_IGNORED | \
> - FS_ERROR)
> + FS_ERROR | FS_MNT_ATTACH | FS_MNT_DETACH)
>
> /* Extra flags that may be reported with event or control handling of events */
> #define ALL_FSNOTIFY_FLAGS (FS_ISDIR | FS_EVENT_ON_CHILD | FS_DN_MULTISHOT)
> @@ -288,6 +291,7 @@ enum fsnotify_data_type {
> FSNOTIFY_EVENT_PATH,
> FSNOTIFY_EVENT_INODE,
> FSNOTIFY_EVENT_DENTRY,
> + FSNOTIFY_EVENT_MNT,
> FSNOTIFY_EVENT_ERROR,
> };
>
> @@ -297,6 +301,11 @@ struct fs_error_report {
> struct super_block *sb;
> };
>
> +struct fsnotify_mnt {
> + const struct path *path;
> + struct vfsmount *mnt;
> +};
> +
> static inline struct inode *fsnotify_data_inode(const void *data, int data_type)
> {
> switch (data_type) {
> @@ -306,6 +315,8 @@ static inline struct inode *fsnotify_data_inode(const void *data, int data_type)
> return d_inode(data);
> case FSNOTIFY_EVENT_PATH:
> return d_inode(((const struct path *)data)->dentry);
> + case FSNOTIFY_EVENT_MNT:
> + return d_inode(((struct fsnotify_mnt *)data)->path->dentry);
> case FSNOTIFY_EVENT_ERROR:
> return ((struct fs_error_report *)data)->inode;
> default:
> @@ -321,6 +332,8 @@ static inline struct dentry *fsnotify_data_dentry(const void *data, int data_typ
> return (struct dentry *)data;
> case FSNOTIFY_EVENT_PATH:
> return ((const struct path *)data)->dentry;
> + case FSNOTIFY_EVENT_MNT:
> + return ((const struct fsnotify_mnt *)data)->path->dentry;
> default:
> return NULL;
> }
> @@ -332,6 +345,8 @@ static inline const struct path *fsnotify_data_path(const void *data,
> switch (data_type) {
> case FSNOTIFY_EVENT_PATH:
> return data;
> + case FSNOTIFY_EVENT_MNT:
> + return ((const struct fsnotify_mnt *)data)->path;
> default:
> return NULL;
> }
> @@ -347,6 +362,8 @@ static inline struct super_block *fsnotify_data_sb(const void *data,
> return ((struct dentry *)data)->d_sb;
> case FSNOTIFY_EVENT_PATH:
> return ((const struct path *)data)->dentry->d_sb;
> + case FSNOTIFY_EVENT_MNT:
> + return ((const struct fsnotify_mnt *)data)->mnt->mnt_sb;
> case FSNOTIFY_EVENT_ERROR:
> return ((struct fs_error_report *) data)->sb;
> default:
> @@ -354,6 +371,17 @@ static inline struct super_block *fsnotify_data_sb(const void *data,
> }
> }
>
> +static inline struct vfsmount *fsnotify_data_mnt(const void *data,
> + int data_type)
> +{
> + switch (data_type) {
> + case FSNOTIFY_EVENT_MNT:
> + return ((const struct fsnotify_mnt *)data)->mnt;
> + default:
> + return NULL;
> + }
> +}
> +
> static inline struct fs_error_report *fsnotify_data_error_report(
> const void *data,
> int data_type)
> diff --git a/include/uapi/linux/fanotify.h b/include/uapi/linux/fanotify.h
> index 34f221d3a1b9..8b2d47947bc2 100644
> --- a/include/uapi/linux/fanotify.h
> +++ b/include/uapi/linux/fanotify.h
> @@ -25,6 +25,8 @@
> #define FAN_OPEN_PERM 0x00010000 /* File open in perm check */
> #define FAN_ACCESS_PERM 0x00020000 /* File accessed in perm check */
> #define FAN_OPEN_EXEC_PERM 0x00040000 /* File open/exec in perm check */
> +#define FAN_MNT_ATTACH 0x00100000 /* Mount was attached */
> +#define FAN_MNT_DETACH 0x00200000 /* Mount was detached */
>
> #define FAN_EVENT_ON_CHILD 0x08000000 /* Interested in child events */
>
> @@ -143,6 +145,7 @@ struct fanotify_event_metadata {
> #define FAN_EVENT_INFO_TYPE_DFID 3
> #define FAN_EVENT_INFO_TYPE_PIDFD 4
> #define FAN_EVENT_INFO_TYPE_ERROR 5
> +#define FAN_EVENT_INFO_TYPE_MNT 6
>
> /* Special info types for FAN_RENAME */
> #define FAN_EVENT_INFO_TYPE_OLD_DFID_NAME 10
> @@ -189,6 +192,12 @@ struct fanotify_event_info_error {
> __u32 error_count;
> };
>
> +struct fanotify_event_info_mnt {
> + struct fanotify_event_info_header hdr;
> + __u64 mnt_id;
> + __u64 parent_id;
> +};
Hm, I wonder whether we should indicate to the caller whether the mount
they received a notification for was propagated or not or whether they
should just infer this from statmount(). Probably enough to let them
infer this from statmount() via mnt_ns_id.
^ permalink raw reply related [flat|nested] 13+ messages in thread* Re: [RFC PATCH] fanotify: notify on mount attach and detach
2024-11-28 14:39 [RFC PATCH] fanotify: notify on mount attach and detach Miklos Szeredi
` (2 preceding siblings ...)
2024-11-29 9:25 ` Christian Brauner
@ 2024-12-03 11:40 ` Karel Zak
2024-12-03 13:03 ` Amir Goldstein
3 siblings, 1 reply; 13+ messages in thread
From: Karel Zak @ 2024-12-03 11:40 UTC (permalink / raw)
To: Miklos Szeredi
Cc: linux-fsdevel, Jan Kara, Amir Goldstein, Lennart Poettering,
Ian Kent, Alexander Viro, Christian Brauner
Thank you for working on this.
On Thu, Nov 28, 2024 at 03:39:59PM GMT, Miklos Szeredi wrote:
> To monitor an entire mount namespace with this new interface, watches need
> to be added to all existing mounts. This can be done by performing
> listmount()/statmount() recursively at startup and when a new mount is
> added.
It seems that maintaining a complete tree of nodes on large systems
with thousands of mountpoints is quite costly for userspace. It also
appears to be fragile, as any missed new node (due to a race or other
reason) would result in the loss of the ability to monitor that part
of the hierarchy. Let's imagine that there are new mount nodes added
between the listmount() and fanotify_mark() calls. These nodes
will be invisible.
It would be beneficial to have a "recursive" flag that would allow for
opening only one mount node and receiving notifications for the entire
hierarchy. (I have no knowledge about fanotify, so it is possible that
this may not be feasible due to the internal design of fanotify.)
Karel
--
Karel Zak <kzak@redhat.com>
http://karelzak.blogspot.com
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: [RFC PATCH] fanotify: notify on mount attach and detach
2024-12-03 11:40 ` Karel Zak
@ 2024-12-03 13:03 ` Amir Goldstein
2024-12-03 16:42 ` Jan Kara
0 siblings, 1 reply; 13+ messages in thread
From: Amir Goldstein @ 2024-12-03 13:03 UTC (permalink / raw)
To: Karel Zak
Cc: Miklos Szeredi, linux-fsdevel, Jan Kara, Lennart Poettering,
Ian Kent, Alexander Viro, Christian Brauner
On Tue, Dec 3, 2024 at 12:40 PM Karel Zak <kzak@redhat.com> wrote:
>
>
> Thank you for working on this.
>
> On Thu, Nov 28, 2024 at 03:39:59PM GMT, Miklos Szeredi wrote:
> > To monitor an entire mount namespace with this new interface, watches need
> > to be added to all existing mounts. This can be done by performing
> > listmount()/statmount() recursively at startup and when a new mount is
> > added.
>
> It seems that maintaining a complete tree of nodes on large systems
> with thousands of mountpoints is quite costly for userspace. It also
> appears to be fragile, as any missed new node (due to a race or other
> reason) would result in the loss of the ability to monitor that part
> of the hierarchy. Let's imagine that there are new mount nodes added
> between the listmount() and fanotify_mark() calls. These nodes
> will be invisible.
That should not happen if the monitor does:
1. set fanotify_mark() on parent mount to get notified on new child mounts
2. listmount() on parent mount to list existing children mounts
I think that is how Miklos designed the API, but not certain.
>
> It would be beneficial to have a "recursive" flag that would allow for
> opening only one mount node and receiving notifications for the entire
> hierarchy. (I have no knowledge about fanotify, so it is possible that
> this may not be feasible due to the internal design of fanotify.)
>
This can be challenging, but if it is acceptable to hold the namespace
mutex while setting all the marks (?) then maybe.
What should be possible is to set a mark on the mount namespace
to get all the mount attach/detach events in the mount namespace
and let userspace filter out the events that are not relevant to the
subtree of interest.
Thanks,
Amir.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC PATCH] fanotify: notify on mount attach and detach
2024-12-03 13:03 ` Amir Goldstein
@ 2024-12-03 16:42 ` Jan Kara
2024-12-04 11:30 ` Christian Brauner
0 siblings, 1 reply; 13+ messages in thread
From: Jan Kara @ 2024-12-03 16:42 UTC (permalink / raw)
To: Amir Goldstein
Cc: Karel Zak, Miklos Szeredi, linux-fsdevel, Jan Kara,
Lennart Poettering, Ian Kent, Alexander Viro, Christian Brauner
On Tue 03-12-24 14:03:24, Amir Goldstein wrote:
> On Tue, Dec 3, 2024 at 12:40 PM Karel Zak <kzak@redhat.com> wrote:
> > Thank you for working on this.
> >
> > On Thu, Nov 28, 2024 at 03:39:59PM GMT, Miklos Szeredi wrote:
> > > To monitor an entire mount namespace with this new interface, watches need
> > > to be added to all existing mounts. This can be done by performing
> > > listmount()/statmount() recursively at startup and when a new mount is
> > > added.
> >
> > It seems that maintaining a complete tree of nodes on large systems
> > with thousands of mountpoints is quite costly for userspace. It also
> > appears to be fragile, as any missed new node (due to a race or other
> > reason) would result in the loss of the ability to monitor that part
> > of the hierarchy. Let's imagine that there are new mount nodes added
> > between the listmount() and fanotify_mark() calls. These nodes
> > will be invisible.
>
> That should not happen if the monitor does:
> 1. set fanotify_mark() on parent mount to get notified on new child mounts
> 2. listmount() on parent mount to list existing children mounts
Right, that works in principle. But it will have all those headaches as
trying to do recursive subtree watching with inotify directory watches
(mounts can also be moved, added, removed, etc. while we are trying to
capture them). It is possible to do but properly handling all the possible
races was challenging to say the least. That's why I have my doubts whether
this is really the interface we want to offer to userspace...
> > It would be beneficial to have a "recursive" flag that would allow for
> > opening only one mount node and receiving notifications for the entire
> > hierarchy. (I have no knowledge about fanotify, so it is possible that
> > this may not be feasible due to the internal design of fanotify.)
>
> This can be challenging, but if it is acceptable to hold the namespace
> mutex while setting all the marks (?) then maybe.
So for mounts, given the relative rarity of mount / umount events and depth
of a mount tree (compared to the situation with ordinary inodes and
standard fanotify events), I think it might be even acceptable to walk up
the mount tree and notify everybody along that path.
> What should be possible is to set a mark on the mount namespace
> to get all the mount attach/detach events in the mount namespace
> and let userspace filter out the events that are not relevant to the
> subtree of interest.
Or this.
Honza
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC PATCH] fanotify: notify on mount attach and detach
2024-12-03 16:42 ` Jan Kara
@ 2024-12-04 11:30 ` Christian Brauner
2024-12-06 15:18 ` Miklos Szeredi
0 siblings, 1 reply; 13+ messages in thread
From: Christian Brauner @ 2024-12-04 11:30 UTC (permalink / raw)
To: Jan Kara
Cc: Amir Goldstein, Karel Zak, Miklos Szeredi, linux-fsdevel,
Lennart Poettering, Ian Kent, Alexander Viro
On Tue, Dec 03, 2024 at 05:42:04PM +0100, Jan Kara wrote:
> On Tue 03-12-24 14:03:24, Amir Goldstein wrote:
> > On Tue, Dec 3, 2024 at 12:40 PM Karel Zak <kzak@redhat.com> wrote:
> > > Thank you for working on this.
> > >
> > > On Thu, Nov 28, 2024 at 03:39:59PM GMT, Miklos Szeredi wrote:
> > > > To monitor an entire mount namespace with this new interface, watches need
> > > > to be added to all existing mounts. This can be done by performing
> > > > listmount()/statmount() recursively at startup and when a new mount is
> > > > added.
> > >
> > > It seems that maintaining a complete tree of nodes on large systems
> > > with thousands of mountpoints is quite costly for userspace. It also
> > > appears to be fragile, as any missed new node (due to a race or other
> > > reason) would result in the loss of the ability to monitor that part
> > > of the hierarchy. Let's imagine that there are new mount nodes added
> > > between the listmount() and fanotify_mark() calls. These nodes
> > > will be invisible.
> >
> > That should not happen if the monitor does:
> > 1. set fanotify_mark() on parent mount to get notified on new child mounts
> > 2. listmount() on parent mount to list existing children mounts
>
> Right, that works in principle. But it will have all those headaches as
> trying to do recursive subtree watching with inotify directory watches
> (mounts can also be moved, added, removed, etc. while we are trying to
> capture them). It is possible to do but properly handling all the possible
> races was challenging to say the least. That's why I have my doubts whether
> this is really the interface we want to offer to userspace...
>
> > > It would be beneficial to have a "recursive" flag that would allow for
> > > opening only one mount node and receiving notifications for the entire
> > > hierarchy. (I have no knowledge about fanotify, so it is possible that
> > > this may not be feasible due to the internal design of fanotify.)
> >
> > This can be challenging, but if it is acceptable to hold the namespace
> > mutex while setting all the marks (?) then maybe.
>
> So for mounts, given the relative rarity of mount / umount events and depth
> of a mount tree (compared to the situation with ordinary inodes and
> standard fanotify events), I think it might be even acceptable to walk up
> the mount tree and notify everybody along that path.
Mount trees can get pretty massive due to containers and mount
propagation. That's why propagate_umount() is so ugly because it's
optimized to deal with such cases.
But, I think that recursive watches have to be restricted to mount
namespaces anyway. Such that you can get notifications about all mount
and umounts in a specific mount namespace. That reigns in the problem
quite a bit.
>
> > What should be possible is to set a mark on the mount namespace
> > to get all the mount attach/detach events in the mount namespace
> > and let userspace filter out the events that are not relevant to the
> > subtree of interest.
Yes, that's what I've been arguing for at LSFMM.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC PATCH] fanotify: notify on mount attach and detach
2024-12-04 11:30 ` Christian Brauner
@ 2024-12-06 15:18 ` Miklos Szeredi
0 siblings, 0 replies; 13+ messages in thread
From: Miklos Szeredi @ 2024-12-06 15:18 UTC (permalink / raw)
To: Christian Brauner
Cc: Jan Kara, Amir Goldstein, Karel Zak, Miklos Szeredi,
linux-fsdevel, Lennart Poettering, Ian Kent, Alexander Viro
On Wed, 4 Dec 2024 at 13:04, Christian Brauner <brauner@kernel.org> wrote:
> > > What should be possible is to set a mark on the mount namespace
> > > to get all the mount attach/detach events in the mount namespace
> > > and let userspace filter out the events that are not relevant to the
> > > subtree of interest.
>
> Yes, that's what I've been arguing for at LSFMM.
Okay, done this. We can add submount based notification later if it
turns out to be useful.
I think all comments are addressed in v2. There remain a few FIXME
items, selinux in particular is one I have no clue about.
Thanks,
Miklos
^ permalink raw reply [flat|nested] 13+ messages in thread