public inbox for linux-fsdevel@vger.kernel.org
 help / color / mirror / Atom feed
* [RFC][PATCH 0/5] fanotify namespace monitoring
@ 2026-03-07 11:05 Amir Goldstein
  2026-03-07 11:05 ` [RFC][PATCH 1/5] fanotify: add support for watching the namespaces tree Amir Goldstein
                   ` (5 more replies)
  0 siblings, 6 replies; 12+ messages in thread
From: Amir Goldstein @ 2026-03-07 11:05 UTC (permalink / raw)
  To: Jan Kara
  Cc: Christian Brauner, Lennart Poettering, Tejun Heo, T . J . Mercier,
	linux-fsdevel

Jan,

Similar to mount notifications and listmount(), this is the complementary
part of listns().

The discussion about FAN_DELETE_SELF events for kernfs [1] for cgroup
tree monitoring got me thinking that this sort of monitoring should not be
tied to vfs inodes.

Monitoring the cgroups tree has some semantic nuances, but I am told by
Christian, that similar requirement exists for monitoring namepsace tree,
where the semantics w.r.t userns are more clear.

I prepared this RFC to see if it meets the requirements of userspace
and think if that works, the solution could be extended to monitoring
cgroup trees.

IMO monitoring namespace trees and monitoring filesystem objects do not
need to be mixed in the same fanotify group, so I wanted to try using
the high 32bits for event flags rather than wasting more event flags
in low 32bit. I remember that I wanted to so that for mount monitoring
events, but did not insist, so too bad.

However, the code for using the high 32bit in uapi is quite ugly and
hackish ATM, so I kept it as a separate patch, that we can either throw
away or improve later.

Christian/Lennart,

I had considered if doing "recursive watches" to get all events from
descendant namepsaces is worth while and decided with myself that it was
not.

Please let me know if this UAPI meets your requirements.

Amir.

[1] https://lore.kernel.org/r/20260220055449.3073-1-tjmercier@google.com/

Amir Goldstein (5):
  fanotify: add support for watching the namespaces tree
  fanotify: use high bits for FAN_NS_CREATE/FAN_NS_DELETE
  selftests/filesystems: create fanotify test dir
  filesystems/statmount: update mount.h in tools include dir
  selftests/filesystems: add fanotify namespace notifications test

 fs/notify/fanotify/fanotify.c                 |  43 ++-
 fs/notify/fanotify/fanotify.h                 |  19 +
 fs/notify/fanotify/fanotify_user.c            | 102 +++++-
 fs/notify/fdinfo.c                            |  14 +-
 fs/notify/fsnotify.c                          |  28 +-
 fs/notify/fsnotify.h                          |   7 +
 fs/notify/mark.c                              |   7 +
 fs/nsfs.c                                     |  21 ++
 include/linux/fanotify.h                      |  17 +-
 include/linux/fsnotify_backend.h              |  22 ++
 include/linux/proc_fs.h                       |   2 +
 include/linux/user_namespace.h                |   6 +
 include/uapi/linux/fanotify.h                 |  79 +++--
 kernel/nscommon.c                             |  46 +++
 tools/include/uapi/linux/fanotify.h           |  79 +++--
 tools/include/uapi/linux/mount.h              |  13 +-
 tools/testing/selftests/Makefile              |   2 +-
 .../{mount-notify => fanotify}/.gitignore     |   0
 .../{mount-notify => fanotify}/Makefile       |   3 +-
 .../mount-notify_test.c                       |   0
 .../mount-notify_test_ns.c                    |   0
 .../filesystems/fanotify/ns-notify_test.c     | 330 ++++++++++++++++++
 22 files changed, 746 insertions(+), 94 deletions(-)
 rename tools/testing/selftests/filesystems/{mount-notify => fanotify}/.gitignore (100%)
 rename tools/testing/selftests/filesystems/{mount-notify => fanotify}/Makefile (67%)
 rename tools/testing/selftests/filesystems/{mount-notify => fanotify}/mount-notify_test.c (100%)
 rename tools/testing/selftests/filesystems/{mount-notify => fanotify}/mount-notify_test_ns.c (100%)
 create mode 100644 tools/testing/selftests/filesystems/fanotify/ns-notify_test.c

-- 
2.53.0


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [RFC][PATCH 1/5] fanotify: add support for watching the namespaces tree
  2026-03-07 11:05 [RFC][PATCH 0/5] fanotify namespace monitoring Amir Goldstein
@ 2026-03-07 11:05 ` Amir Goldstein
  2026-03-09 18:07   ` Amir Goldstein
  2026-03-07 11:05 ` [RFC][PATCH 2/5] fanotify: use high bits for FAN_NS_CREATE/FAN_NS_DELETE Amir Goldstein
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 12+ messages in thread
From: Amir Goldstein @ 2026-03-07 11:05 UTC (permalink / raw)
  To: Jan Kara
  Cc: Christian Brauner, Lennart Poettering, Tejun Heo, T . J . Mercier,
	linux-fsdevel

Introduce FAN_MARK_USERNS type to mark a user namespace object
from nsfs path.

Support two events FAN_CREATE and FAN_DELETE to report creation
and tear down of namespaces owned by the marked userns.

Introduce FAN_REPORT_NSID to report the self and owner nsid of
the created or torn down namespace.

At this time, an fanotify group initialized with flags
FAN_REPORT_MNT|FAN_REPORT_NSID, may add marks on both userns
and mntns objects to mix mount and namespace events, but the same
group cannot also request filesystem events with file handles
(e.g. FAN_REPORT_FID).

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/notify/fanotify/fanotify.c      | 32 ++++++++++++++
 fs/notify/fanotify/fanotify.h      | 19 ++++++++
 fs/notify/fanotify/fanotify_user.c | 71 +++++++++++++++++++++++++-----
 fs/notify/fdinfo.c                 |  9 +++-
 fs/notify/fsnotify.c               | 28 +++++++++++-
 fs/notify/fsnotify.h               |  7 +++
 fs/notify/mark.c                   |  7 +++
 fs/nsfs.c                          | 21 +++++++++
 include/linux/fanotify.h           | 14 ++++--
 include/linux/fsnotify_backend.h   | 22 +++++++++
 include/linux/proc_fs.h            |  2 +
 include/linux/user_namespace.h     |  6 +++
 include/uapi/linux/fanotify.h      |  9 ++++
 kernel/nscommon.c                  | 46 +++++++++++++++++++
 14 files changed, 276 insertions(+), 17 deletions(-)

diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c
index bfe884d624e7b..3818b4d53dcad 100644
--- a/fs/notify/fanotify/fanotify.c
+++ b/fs/notify/fanotify/fanotify.c
@@ -168,6 +168,8 @@ static bool fanotify_should_merge(struct fanotify_event *old,
 						  FANOTIFY_EE(new));
 	case FANOTIFY_EVENT_TYPE_MNT:
 		return false;
+	case FANOTIFY_EVENT_TYPE_NS:
+		return false;
 	default:
 		WARN_ON_ONCE(1);
 	}
@@ -317,6 +319,9 @@ static u32 fanotify_group_event_mask(struct fsnotify_group *group,
 	if (FAN_GROUP_FLAG(group, FAN_REPORT_MNT)) {
 		if (data_type != FSNOTIFY_EVENT_MNT)
 			return 0;
+	} else if (FAN_GROUP_FLAG(group, FAN_REPORT_NSID)) {
+		if (data_type != FSNOTIFY_EVENT_NS)
+			return 0;
 	} else if (!fid_mode) {
 		/* Do we have path to open a file descriptor? */
 		if (!path)
@@ -582,6 +587,22 @@ static struct fanotify_event *fanotify_alloc_mnt_event(u64 mnt_id, gfp_t gfp)
 	return &pevent->fae;
 }
 
+static struct fanotify_event *fanotify_alloc_ns_event(const struct fsnotify_ns *ns_data,
+						      gfp_t gfp)
+{
+	struct fanotify_ns_event *pevent;
+
+	pevent = kmem_cache_alloc(fanotify_ns_event_cachep, gfp);
+	if (!pevent)
+		return NULL;
+
+	pevent->fae.type = FANOTIFY_EVENT_TYPE_NS;
+	pevent->self_nsid = ns_data->self_nsid;
+	pevent->owner_nsid = ns_data->owner_nsid;
+
+	return &pevent->fae;
+}
+
 static struct fanotify_event *fanotify_alloc_perm_event(const void *data,
 							int data_type,
 							gfp_t gfp)
@@ -755,6 +776,7 @@ static struct fanotify_event *fanotify_alloc_event(
 	struct inode *id = fanotify_fid_inode(mask, data, data_type, dir,
 					      fid_mode);
 	struct inode *dirid = fanotify_dfid_inode(mask, data, data_type, dir);
+	const struct fsnotify_ns *ns_data = fsnotify_data_ns(data, data_type);
 	const struct path *path = fsnotify_data_path(data, data_type);
 	u64 mnt_id = fsnotify_data_mnt_id(data, data_type);
 	struct mem_cgroup *old_memcg;
@@ -856,6 +878,8 @@ static struct fanotify_event *fanotify_alloc_event(
 		event = fanotify_alloc_path_event(path, &hash, gfp);
 	} else if (mnt_id) {
 		event = fanotify_alloc_mnt_event(mnt_id, gfp);
+	} else if (ns_data) {
+		event = fanotify_alloc_ns_event(ns_data, gfp);
 	} else {
 		WARN_ON_ONCE(1);
 	}
@@ -1064,6 +1088,11 @@ static void fanotify_free_mnt_event(struct fanotify_event *event)
 	kmem_cache_free(fanotify_mnt_event_cachep, FANOTIFY_ME(event));
 }
 
+static void fanotify_free_ns_event(struct fanotify_event *event)
+{
+	kmem_cache_free(fanotify_ns_event_cachep, FANOTIFY_NSE(event));
+}
+
 static void fanotify_free_event(struct fsnotify_group *group,
 				struct fsnotify_event *fsn_event)
 {
@@ -1093,6 +1122,9 @@ static void fanotify_free_event(struct fsnotify_group *group,
 	case FANOTIFY_EVENT_TYPE_MNT:
 		fanotify_free_mnt_event(event);
 		break;
+	case FANOTIFY_EVENT_TYPE_NS:
+		fanotify_free_ns_event(event);
+		break;
 	default:
 		WARN_ON_ONCE(1);
 	}
diff --git a/fs/notify/fanotify/fanotify.h b/fs/notify/fanotify/fanotify.h
index 39e60218df7ce..2eaac302ccac0 100644
--- a/fs/notify/fanotify/fanotify.h
+++ b/fs/notify/fanotify/fanotify.h
@@ -10,6 +10,7 @@ extern struct kmem_cache *fanotify_fid_event_cachep;
 extern struct kmem_cache *fanotify_path_event_cachep;
 extern struct kmem_cache *fanotify_perm_event_cachep;
 extern struct kmem_cache *fanotify_mnt_event_cachep;
+extern struct kmem_cache *fanotify_ns_event_cachep;
 
 /* Possible states of the permission event */
 enum {
@@ -245,6 +246,7 @@ enum fanotify_event_type {
 	FANOTIFY_EVENT_TYPE_OVERFLOW, /* struct fanotify_event */
 	FANOTIFY_EVENT_TYPE_FS_ERROR, /* struct fanotify_error_event */
 	FANOTIFY_EVENT_TYPE_MNT,
+	FANOTIFY_EVENT_TYPE_NS,
 	__FANOTIFY_EVENT_TYPE_NUM
 };
 
@@ -415,6 +417,12 @@ struct fanotify_mnt_event {
 	u64 mnt_id;
 };
 
+struct fanotify_ns_event {
+	struct fanotify_event fae;
+	u64 self_nsid;
+	u64 owner_nsid;
+};
+
 static inline struct fanotify_path_event *
 FANOTIFY_PE(struct fanotify_event *event)
 {
@@ -427,6 +435,12 @@ FANOTIFY_ME(struct fanotify_event *event)
 	return container_of(event, struct fanotify_mnt_event, fae);
 }
 
+static inline struct fanotify_ns_event *
+FANOTIFY_NSE(struct fanotify_event *event)
+{
+	return container_of(event, struct fanotify_ns_event, fae);
+}
+
 /*
  * Structure for permission fanotify events. It gets allocated and freed in
  * fanotify_handle_event() since we wait there for user response. When the
@@ -485,6 +499,11 @@ static inline bool fanotify_is_mnt_event(u32 mask)
 	return mask & (FAN_MNT_ATTACH | FAN_MNT_DETACH);
 }
 
+static inline bool fanotify_is_ns_event(const struct fanotify_event *event)
+{
+	return event->type == FANOTIFY_EVENT_TYPE_NS;
+}
+
 static inline const struct path *fanotify_event_path(struct fanotify_event *event)
 {
 	if (event->type == FANOTIFY_EVENT_TYPE_PATH)
diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
index ae904451dfc09..126069101669a 100644
--- a/fs/notify/fanotify/fanotify_user.c
+++ b/fs/notify/fanotify/fanotify_user.c
@@ -19,6 +19,7 @@
 #include <linux/memcontrol.h>
 #include <linux/statfs.h>
 #include <linux/exportfs.h>
+#include <linux/proc_fs.h>
 
 #include <asm/ioctls.h>
 
@@ -208,6 +209,7 @@ struct kmem_cache *fanotify_fid_event_cachep __ro_after_init;
 struct kmem_cache *fanotify_path_event_cachep __ro_after_init;
 struct kmem_cache *fanotify_perm_event_cachep __ro_after_init;
 struct kmem_cache *fanotify_mnt_event_cachep __ro_after_init;
+struct kmem_cache *fanotify_ns_event_cachep __ro_after_init;
 
 #define FANOTIFY_EVENT_ALIGN 4
 #define FANOTIFY_FID_INFO_HDR_LEN \
@@ -220,6 +222,8 @@ struct kmem_cache *fanotify_mnt_event_cachep __ro_after_init;
 	(sizeof(struct fanotify_event_info_range))
 #define FANOTIFY_MNT_INFO_LEN \
 	(sizeof(struct fanotify_event_info_mnt))
+#define FANOTIFY_NS_INFO_LEN \
+	(sizeof(struct fanotify_event_info_ns))
 
 static int fanotify_fid_info_len(int fh_len, int name_len)
 {
@@ -277,6 +281,8 @@ static size_t fanotify_event_len(unsigned int info_mode,
 	}
 	if (fanotify_is_mnt_event(event->mask))
 		event_len += FANOTIFY_MNT_INFO_LEN;
+	if (fanotify_is_ns_event(event))
+		event_len += FANOTIFY_NS_INFO_LEN;
 
 	if (info_mode & FAN_REPORT_PIDFD)
 		event_len += FANOTIFY_PIDFD_INFO_LEN;
@@ -523,6 +529,26 @@ static size_t copy_mnt_info_to_user(struct fanotify_event *event,
 	return info.hdr.len;
 }
 
+static size_t copy_ns_info_to_user(struct fanotify_event *event,
+				   char __user *buf, int count)
+{
+	struct fanotify_event_info_ns info = { };
+
+	info.hdr.info_type = FAN_EVENT_INFO_TYPE_NS;
+	info.hdr.len = sizeof(info);
+
+	if (WARN_ON(count < info.hdr.len))
+		return -EFAULT;
+
+	info.self_nsid  = FANOTIFY_NSE(event)->self_nsid;
+	info.owner_nsid = FANOTIFY_NSE(event)->owner_nsid;
+
+	if (copy_to_user(buf, &info, sizeof(info)))
+		return -EFAULT;
+
+	return info.hdr.len;
+}
+
 static size_t copy_error_info_to_user(struct fanotify_event *event,
 				      char __user *buf, int count)
 {
@@ -827,6 +853,15 @@ static int copy_info_records_to_user(struct fanotify_event *event,
 		total_bytes += ret;
 	}
 
+	if (fanotify_is_ns_event(event)) {
+		ret = copy_ns_info_to_user(event, buf, count);
+		if (ret < 0)
+			return ret;
+		buf += ret;
+		count -= ret;
+		total_bytes += ret;
+	}
+
 	return total_bytes;
 }
 
@@ -1604,11 +1639,11 @@ SYSCALL_DEFINE2(fanotify_init, unsigned int, flags, unsigned int, event_f_flags)
 	/*
 	 * An unprivileged user can setup an fanotify group with limited
 	 * functionality - an unprivileged group is limited to notification
-	 * events with file handles or mount ids and it cannot use unlimited
+	 * events with file handles or mount/ns ids and it cannot use unlimited
 	 * queue/marks.
 	 */
 	if (((flags & FANOTIFY_ADMIN_INIT_FLAGS) ||
-	     !(flags & (FANOTIFY_FID_BITS | FAN_REPORT_MNT))) &&
+	     !(flags & (FANOTIFY_FID_BITS | FAN_REPORT_MNT | FAN_REPORT_NSID))) &&
 	    !capable(CAP_SYS_ADMIN))
 		return -EPERM;
 
@@ -1636,8 +1671,8 @@ SYSCALL_DEFINE2(fanotify_init, unsigned int, flags, unsigned int, event_f_flags)
 	if ((flags & FAN_REPORT_PIDFD) && (flags & FAN_REPORT_TID))
 		return -EINVAL;
 
-	/* Don't allow mixing mnt events with inode events for now */
-	if (flags & FAN_REPORT_MNT) {
+	/* Don't allow mixing mnt/ns events with inode events for now */
+	if (flags & (FAN_REPORT_MNT | FAN_REPORT_NSID)) {
 		if (class != FAN_CLASS_NOTIF)
 			return -EINVAL;
 		if (flags & (FANOTIFY_FID_BITS | FAN_REPORT_FD_ERROR))
@@ -1913,6 +1948,9 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask,
 	case FAN_MARK_MNTNS:
 		obj_type = FSNOTIFY_OBJ_TYPE_MNTNS;
 		break;
+	case FAN_MARK_USERNS:
+		obj_type = FSNOTIFY_OBJ_TYPE_USERNS;
+		break;
 	default:
 		return -EINVAL;
 	}
@@ -1960,16 +1998,22 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask,
 		return -EINVAL;
 	group = fd_file(f)->private_data;
 
-	/* Only report mount events on mnt namespace */
-	if (FAN_GROUP_FLAG(group, FAN_REPORT_MNT)) {
+	/* Only report mount events on mnt namespace mark */
+	if (mark_type == FAN_MARK_MNTNS) {
 		if (mask & ~FANOTIFY_MOUNT_EVENTS)
 			return -EINVAL;
-		if (mark_type != FAN_MARK_MNTNS)
+		if (!FAN_GROUP_FLAG(group, FAN_REPORT_MNT))
 			return -EINVAL;
 	} else {
 		if (mask & FANOTIFY_MOUNT_EVENTS)
 			return -EINVAL;
-		if (mark_type == FAN_MARK_MNTNS)
+	}
+
+	/* Only report namespace events on user namespace mark */
+	if (mark_type == FAN_MARK_USERNS) {
+		if (mask & ~FANOTIFY_NS_EVENTS)
+			return -EINVAL;
+		if (!FAN_GROUP_FLAG(group, FAN_REPORT_NSID))
 			return -EINVAL;
 	}
 
@@ -2087,6 +2131,12 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask,
 			goto path_put_and_out;
 		user_ns = mntns->user_ns;
 		obj = mntns;
+	} else if (obj_type == FSNOTIFY_OBJ_TYPE_USERNS) {
+		ret = -EINVAL;
+		user_ns = userns_from_dentry(path.dentry);
+		if (!user_ns)
+			goto path_put_and_out;
+		obj = user_ns;
 	}
 
 	ret = -EPERM;
@@ -2190,8 +2240,8 @@ static int __init fanotify_user_setup(void)
 				     FANOTIFY_DEFAULT_MAX_USER_MARKS);
 
 	BUILD_BUG_ON(FANOTIFY_INIT_FLAGS & FANOTIFY_INTERNAL_GROUP_FLAGS);
-	BUILD_BUG_ON(HWEIGHT32(FANOTIFY_INIT_FLAGS) != 14);
-	BUILD_BUG_ON(HWEIGHT32(FANOTIFY_MARK_FLAGS) != 11);
+	BUILD_BUG_ON(HWEIGHT32(FANOTIFY_INIT_FLAGS) != 15);
+	BUILD_BUG_ON(HWEIGHT32(FANOTIFY_MARK_FLAGS) != 12);
 
 	fanotify_mark_cache = KMEM_CACHE(fanotify_mark,
 					 SLAB_PANIC|SLAB_ACCOUNT);
@@ -2204,6 +2254,7 @@ static int __init fanotify_user_setup(void)
 			KMEM_CACHE(fanotify_perm_event, SLAB_PANIC);
 	}
 	fanotify_mnt_event_cachep = KMEM_CACHE(fanotify_mnt_event, SLAB_PANIC);
+	fanotify_ns_event_cachep = KMEM_CACHE(fanotify_ns_event, SLAB_PANIC);
 
 	fanotify_max_queued_events = FANOTIFY_DEFAULT_MAX_EVENTS;
 	init_user_ns.ucount_max[UCOUNT_FANOTIFY_GROUPS] =
diff --git a/fs/notify/fdinfo.c b/fs/notify/fdinfo.c
index 9cc7eb8636437..946cffaf16e18 100644
--- a/fs/notify/fdinfo.c
+++ b/fs/notify/fdinfo.c
@@ -130,8 +130,13 @@ static void fanotify_fdinfo(struct seq_file *m, struct fsnotify_mark *mark)
 	} else if (mark->connector->type == FSNOTIFY_OBJ_TYPE_MNTNS) {
 		struct mnt_namespace *mnt_ns = fsnotify_conn_mntns(mark->connector);
 
-		seq_printf(m, "fanotify mnt_ns:%u mflags:%x mask:%x ignored_mask:%x\n",
-			   mnt_ns->ns.inum, mflags, mark->mask, mark->ignore_mask);
+		seq_printf(m, "fanotify mnt_ns_id:%llu mflags:%x mask:%x ignored_mask:%x\n",
+			   mnt_ns->ns.ns_id, mflags, mark->mask, mark->ignore_mask);
+	} else if (mark->connector->type == FSNOTIFY_OBJ_TYPE_USERNS) {
+		struct user_namespace *userns = fsnotify_conn_userns(mark->connector);
+
+		seq_printf(m, "fanotify user_ns_id:%llu mflags:%x mask:%x ignored_mask:%x\n",
+			   userns->ns.ns_id, mflags, mark->mask, mark->ignore_mask);
 	}
 }
 
diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c
index 9995de1710e59..638136c0d6cb9 100644
--- a/fs/notify/fsnotify.c
+++ b/fs/notify/fsnotify.c
@@ -495,6 +495,7 @@ int fsnotify(__u32 mask, const void *data, int data_type, struct inode *dir,
 	const struct path *path = fsnotify_data_path(data, data_type);
 	struct super_block *sb = fsnotify_data_sb(data, data_type);
 	const struct fsnotify_mnt *mnt_data = fsnotify_data_mnt(data, data_type);
+	const struct fsnotify_ns *ns_data = fsnotify_data_ns(data, data_type);
 	struct fsnotify_sb_info *sbinfo = sb ? fsnotify_sb_info(sb) : NULL;
 	struct fsnotify_iter_info iter_info = {};
 	struct mount *mnt = NULL;
@@ -536,7 +537,8 @@ int fsnotify(__u32 mask, const void *data, int data_type, struct inode *dir,
 	    (!mnt || !mnt->mnt_fsnotify_marks) &&
 	    (!inode || !inode->i_fsnotify_marks) &&
 	    (!inode2 || !inode2->i_fsnotify_marks) &&
-	    (!mnt_data || !mnt_data->ns->n_fsnotify_marks))
+	    (!mnt_data || !mnt_data->ns->n_fsnotify_marks) &&
+	    (!ns_data || !ns_data->userns->n_fsnotify_marks))
 		return 0;
 
 	if (sb)
@@ -549,6 +551,8 @@ int fsnotify(__u32 mask, const void *data, int data_type, struct inode *dir,
 		marks_mask |= READ_ONCE(inode2->i_fsnotify_mask);
 	if (mnt_data)
 		marks_mask |= READ_ONCE(mnt_data->ns->n_fsnotify_mask);
+	if (ns_data)
+		marks_mask |= READ_ONCE(ns_data->userns->n_fsnotify_mask);
 
 	/*
 	 * If this is a modify event we may need to clear some ignore masks.
@@ -582,6 +586,10 @@ int fsnotify(__u32 mask, const void *data, int data_type, struct inode *dir,
 		iter_info.marks[FSNOTIFY_ITER_TYPE_MNTNS] =
 			fsnotify_first_mark(&mnt_data->ns->n_fsnotify_marks);
 	}
+	if (ns_data) {
+		iter_info.marks[FSNOTIFY_ITER_TYPE_USERNS] =
+			fsnotify_first_mark(&ns_data->userns->n_fsnotify_marks);
+	}
 
 	/*
 	 * We need to merge inode/vfsmount/sb mark lists so that e.g. inode mark
@@ -711,6 +719,24 @@ void fsnotify_mnt(__u32 mask, struct mnt_namespace *ns, struct vfsmount *mnt)
 	fsnotify(mask, &data, FSNOTIFY_EVENT_MNT, NULL, NULL, NULL, 0);
 }
 
+void fsnotify_ns(__u32 mask, struct user_namespace *userns,
+		 u64 self_nsid, u64 owner_nsid)
+{
+	struct fsnotify_ns data = {
+		.userns = userns,
+		.self_nsid = self_nsid,
+		.owner_nsid = owner_nsid,
+	};
+
+	if (WARN_ON_ONCE(!userns))
+		return;
+
+	if (!READ_ONCE(userns->n_fsnotify_marks))
+		return;
+
+	fsnotify(mask, &data, FSNOTIFY_EVENT_NS, NULL, NULL, NULL, 0);
+}
+
 static __init int fsnotify_init(void)
 {
 	int ret;
diff --git a/fs/notify/fsnotify.h b/fs/notify/fsnotify.h
index 58c7bb25e5718..f58c69de7f067 100644
--- a/fs/notify/fsnotify.h
+++ b/fs/notify/fsnotify.h
@@ -6,6 +6,7 @@
 #include <linux/fsnotify.h>
 #include <linux/srcu.h>
 #include <linux/types.h>
+#include <linux/user_namespace.h>
 
 #include "../mount.h"
 
@@ -39,6 +40,12 @@ static inline struct mnt_namespace *fsnotify_conn_mntns(
 	return conn->obj;
 }
 
+static inline struct user_namespace *fsnotify_conn_userns(
+				struct fsnotify_mark_connector *conn)
+{
+	return conn->obj;
+}
+
 static inline struct super_block *fsnotify_object_sb(void *obj,
 			enum fsnotify_obj_type obj_type)
 {
diff --git a/fs/notify/mark.c b/fs/notify/mark.c
index c2ed5b11b0fe6..4086b37637cbe 100644
--- a/fs/notify/mark.c
+++ b/fs/notify/mark.c
@@ -74,6 +74,7 @@
 #include <linux/atomic.h>
 
 #include <linux/fsnotify_backend.h>
+#include <linux/user_namespace.h>
 #include "fsnotify.h"
 
 #define FSNOTIFY_REAPER_DELAY	(1)	/* 1 jiffy */
@@ -110,6 +111,8 @@ static fsnotify_connp_t *fsnotify_object_connp(void *obj,
 		return fsnotify_sb_marks(obj);
 	case FSNOTIFY_OBJ_TYPE_MNTNS:
 		return &((struct mnt_namespace *)obj)->n_fsnotify_marks;
+	case FSNOTIFY_OBJ_TYPE_USERNS:
+		return &((struct user_namespace *)obj)->n_fsnotify_marks;
 	default:
 		return NULL;
 	}
@@ -125,6 +128,8 @@ static __u32 *fsnotify_conn_mask_p(struct fsnotify_mark_connector *conn)
 		return &fsnotify_conn_sb(conn)->s_fsnotify_mask;
 	else if (conn->type == FSNOTIFY_OBJ_TYPE_MNTNS)
 		return &fsnotify_conn_mntns(conn)->n_fsnotify_mask;
+	else if (conn->type == FSNOTIFY_OBJ_TYPE_USERNS)
+		return &fsnotify_conn_userns(conn)->n_fsnotify_mask;
 	return NULL;
 }
 
@@ -356,6 +361,8 @@ static void *fsnotify_detach_connector_from_object(
 		fsnotify_conn_sb(conn)->s_fsnotify_mask = 0;
 	} else if (conn->type == FSNOTIFY_OBJ_TYPE_MNTNS) {
 		fsnotify_conn_mntns(conn)->n_fsnotify_mask = 0;
+	} else if (conn->type == FSNOTIFY_OBJ_TYPE_USERNS) {
+		fsnotify_conn_userns(conn)->n_fsnotify_mask = 0;
 	}
 
 	rcu_assign_pointer(*connp, NULL);
diff --git a/fs/nsfs.c b/fs/nsfs.c
index c215878d55e87..ace17de243f45 100644
--- a/fs/nsfs.c
+++ b/fs/nsfs.c
@@ -387,6 +387,27 @@ bool proc_ns_file(const struct file *file)
 	return file->f_op == &ns_file_operations;
 }
 
+/**
+ * userns_from_dentry() - Return the user_namespace referenced by an nsfs dentry.
+ * @dentry: dentry of an open nsfs file
+ *
+ * Returns the user_namespace if @dentry is an nsfs file for a user namespace,
+ * NULL otherwise.  The caller is responsible for ensuring the returned pointer
+ * remains valid (e.g. by holding a reference to the dentry).
+ */
+struct user_namespace *userns_from_dentry(struct dentry *dentry)
+{
+	struct inode *inode = d_inode(dentry);
+	struct ns_common *ns;
+
+	if (!inode || inode->i_sb->s_magic != NSFS_MAGIC)
+		return NULL;
+	ns = get_proc_ns(inode);
+	if (!ns || ns->ns_type != CLONE_NEWUSER)
+		return NULL;
+	return to_user_ns(ns);
+}
+
 /**
  * ns_match() - Returns true if current namespace matches dev/ino provided.
  * @ns: current namespace
diff --git a/include/linux/fanotify.h b/include/linux/fanotify.h
index 879cff5eccd4e..279082ae40fe2 100644
--- a/include/linux/fanotify.h
+++ b/include/linux/fanotify.h
@@ -25,7 +25,8 @@
 
 #define FANOTIFY_FID_BITS	(FAN_REPORT_DFID_NAME_TARGET)
 
-#define FANOTIFY_INFO_MODES	(FANOTIFY_FID_BITS | FAN_REPORT_PIDFD | FAN_REPORT_MNT)
+#define FANOTIFY_INFO_MODES	(FANOTIFY_FID_BITS | FAN_REPORT_PIDFD | FAN_REPORT_MNT | \
+				 FAN_REPORT_NSID)
 
 /*
  * fanotify_init() flags that require CAP_SYS_ADMIN.
@@ -47,8 +48,9 @@
  * so one of the flags for reporting file handles is required.
  */
 #define FANOTIFY_USER_INIT_FLAGS	(FAN_CLASS_NOTIF | \
-					 FANOTIFY_FID_BITS | FAN_REPORT_MNT | \
-					 FAN_CLOEXEC | FAN_NONBLOCK)
+				 FANOTIFY_FID_BITS | FAN_REPORT_MNT | \
+				 FAN_REPORT_NSID | \
+				 FAN_CLOEXEC | FAN_NONBLOCK)
 
 #define FANOTIFY_INIT_FLAGS	(FANOTIFY_ADMIN_INIT_FLAGS | \
 				 FANOTIFY_USER_INIT_FLAGS)
@@ -58,7 +60,8 @@
 #define FANOTIFY_INTERNAL_GROUP_FLAGS	(FANOTIFY_UNPRIV)
 
 #define FANOTIFY_MARK_TYPE_BITS	(FAN_MARK_INODE | FAN_MARK_MOUNT | \
-				 FAN_MARK_FILESYSTEM | FAN_MARK_MNTNS)
+				 FAN_MARK_FILESYSTEM | FAN_MARK_MNTNS | \
+				 FAN_MARK_USERNS)
 
 #define FANOTIFY_MARK_CMD_BITS	(FAN_MARK_ADD | FAN_MARK_REMOVE | \
 				 FAN_MARK_FLUSH)
@@ -111,6 +114,9 @@
 
 #define FANOTIFY_MOUNT_EVENTS	(FAN_MNT_ATTACH | FAN_MNT_DETACH)
 
+/* Events that can be reported with data type FSNOTIFY_EVENT_NS */
+#define FANOTIFY_NS_EVENTS	(FAN_CREATE | FAN_DELETE)
+
 /* Events that user can request to be notified on */
 #define FANOTIFY_EVENTS		(FANOTIFY_PATH_EVENTS | \
 				 FANOTIFY_INODE_EVENTS | \
diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
index 95985400d3d8e..2145d2f4262db 100644
--- a/include/linux/fsnotify_backend.h
+++ b/include/linux/fsnotify_backend.h
@@ -310,6 +310,7 @@ enum fsnotify_data_type {
 	FSNOTIFY_EVENT_INODE,
 	FSNOTIFY_EVENT_DENTRY,
 	FSNOTIFY_EVENT_MNT,
+	FSNOTIFY_EVENT_NS,
 	FSNOTIFY_EVENT_ERROR,
 };
 
@@ -335,6 +336,12 @@ struct fsnotify_mnt {
 	u64 mnt_id;
 };
 
+struct fsnotify_ns {
+	const struct user_namespace *userns;
+	u64 self_nsid;
+	u64 owner_nsid;
+};
+
 static inline struct inode *fsnotify_data_inode(const void *data, int data_type)
 {
 	switch (data_type) {
@@ -411,6 +418,17 @@ static inline const struct fsnotify_mnt *fsnotify_data_mnt(const void *data,
 	}
 }
 
+static inline const struct fsnotify_ns *fsnotify_data_ns(const void *data,
+							 int data_type)
+{
+	switch (data_type) {
+	case FSNOTIFY_EVENT_NS:
+		return data;
+	default:
+		return NULL;
+	}
+}
+
 static inline u64 fsnotify_data_mnt_id(const void *data, int data_type)
 {
 	const struct fsnotify_mnt *mnt_data = fsnotify_data_mnt(data, data_type);
@@ -456,6 +474,7 @@ enum fsnotify_iter_type {
 	FSNOTIFY_ITER_TYPE_PARENT,
 	FSNOTIFY_ITER_TYPE_INODE2,
 	FSNOTIFY_ITER_TYPE_MNTNS,
+	FSNOTIFY_ITER_TYPE_USERNS,
 	FSNOTIFY_ITER_TYPE_COUNT
 };
 
@@ -466,6 +485,7 @@ enum fsnotify_obj_type {
 	FSNOTIFY_OBJ_TYPE_VFSMOUNT,
 	FSNOTIFY_OBJ_TYPE_SB,
 	FSNOTIFY_OBJ_TYPE_MNTNS,
+	FSNOTIFY_OBJ_TYPE_USERNS,
 	FSNOTIFY_OBJ_TYPE_COUNT,
 	FSNOTIFY_OBJ_TYPE_DETACHED = FSNOTIFY_OBJ_TYPE_COUNT
 };
@@ -657,6 +677,8 @@ extern void __fsnotify_mntns_delete(struct mnt_namespace *mntns);
 extern void fsnotify_sb_free(struct super_block *sb);
 extern u32 fsnotify_get_cookie(void);
 extern void fsnotify_mnt(__u32 mask, struct mnt_namespace *ns, struct vfsmount *mnt);
+extern void fsnotify_ns(__u32 mask, struct user_namespace *userns,
+			u64 self_nsid, u64 owner_nsid);
 
 static inline __u32 fsnotify_parent_needed_mask(__u32 mask)
 {
diff --git a/include/linux/proc_fs.h b/include/linux/proc_fs.h
index 19d1c5e5f3350..3b7d2bc88ae6c 100644
--- a/include/linux/proc_fs.h
+++ b/include/linux/proc_fs.h
@@ -248,4 +248,6 @@ static inline struct pid_namespace *proc_pid_ns(struct super_block *sb)
 
 bool proc_ns_file(const struct file *file);
 
+struct user_namespace *userns_from_dentry(struct dentry *dentry);
+
 #endif /* _LINUX_PROC_FS_H */
diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
index 9c3be157397e0..7ff8420495308 100644
--- a/include/linux/user_namespace.h
+++ b/include/linux/user_namespace.h
@@ -13,6 +13,8 @@
 #include <linux/sysctl.h>
 #include <linux/err.h>
 
+struct fsnotify_mark_connector;
+
 #define UID_GID_MAP_MAX_BASE_EXTENTS 5
 #define UID_GID_MAP_MAX_EXTENTS 340
 
@@ -86,6 +88,10 @@ struct user_namespace {
 	/* parent_could_setfcap: true if the creator if this ns had CAP_SETFCAP
 	 * in its effective capability set at the child ns creation time. */
 	bool			parent_could_setfcap;
+#ifdef CONFIG_FSNOTIFY
+	__u32 n_fsnotify_mask;
+	struct fsnotify_mark_connector __rcu *n_fsnotify_marks;
+#endif
 
 #ifdef CONFIG_KEYS
 	/* List of joinable keyrings in this namespace.  Modification access of
diff --git a/include/uapi/linux/fanotify.h b/include/uapi/linux/fanotify.h
index e710967c7c263..6b4f470ee7e01 100644
--- a/include/uapi/linux/fanotify.h
+++ b/include/uapi/linux/fanotify.h
@@ -67,6 +67,7 @@
 #define FAN_REPORT_TARGET_FID	0x00001000	/* Report dirent target id  */
 #define FAN_REPORT_FD_ERROR	0x00002000	/* event->fd can report error */
 #define FAN_REPORT_MNT		0x00004000	/* Report mount events */
+#define FAN_REPORT_NSID		0x00008000	/* Report namespace events */
 
 /* Convenience macro - FAN_REPORT_NAME requires FAN_REPORT_DIR_FID */
 #define FAN_REPORT_DFID_NAME	(FAN_REPORT_DIR_FID | FAN_REPORT_NAME)
@@ -98,6 +99,7 @@
 #define FAN_MARK_MOUNT		0x00000010
 #define FAN_MARK_FILESYSTEM	0x00000100
 #define FAN_MARK_MNTNS		0x00000110
+#define FAN_MARK_USERNS		0x00001000
 
 /*
  * Convenience macro - FAN_MARK_IGNORE requires FAN_MARK_IGNORED_SURV_MODIFY
@@ -152,6 +154,7 @@ struct fanotify_event_metadata {
 #define FAN_EVENT_INFO_TYPE_ERROR	5
 #define FAN_EVENT_INFO_TYPE_RANGE	6
 #define FAN_EVENT_INFO_TYPE_MNT		7
+#define FAN_EVENT_INFO_TYPE_NS		8
 
 /* Special info types for FAN_RENAME */
 #define FAN_EVENT_INFO_TYPE_OLD_DFID_NAME	10
@@ -210,6 +213,12 @@ struct fanotify_event_info_mnt {
 	__u64 mnt_id;
 };
 
+struct fanotify_event_info_ns {
+	struct fanotify_event_info_header hdr;
+	__u64 self_nsid;	/* ns_id of the namespace */
+	__u64 owner_nsid;	/* ns_id of its owning user namespace */
+};
+
 /*
  * User space may need to record additional information about its decision.
  * The extra information type records what kind of information is included.
diff --git a/kernel/nscommon.c b/kernel/nscommon.c
index 3166c1fd844af..a6fdacb394ea7 100644
--- a/kernel/nscommon.c
+++ b/kernel/nscommon.c
@@ -6,6 +6,7 @@
 #include <linux/proc_ns.h>
 #include <linux/user_namespace.h>
 #include <linux/vfsdebug.h>
+#include <linux/fsnotify_backend.h>
 
 #ifdef CONFIG_DEBUG_VFS
 static void ns_debug(struct ns_common *ns, const struct proc_ns_operations *ops)
@@ -111,6 +112,43 @@ struct ns_common *__must_check ns_owner(struct ns_common *ns)
 	return to_ns_common(owner);
 }
 
+/*
+ * Return the owning user_namespace of @ns, including init_user_ns.
+ * Unlike ns_owner(), which returns NULL for namespaces owned by
+ * init_user_ns (to serve as a propagation terminator), this gives us
+ * the real owner for notification routing.
+ */
+static struct user_namespace *ns_direct_owner(struct ns_common *ns)
+{
+	if (unlikely(!ns->ops || !ns->ops->owner))
+		return NULL;
+	return ns->ops->owner(ns);
+}
+
+static void ns_common_notify(__u32 mask, struct ns_common *ns)
+{
+	struct user_namespace *owner_userns;
+
+	if (!IS_ENABLED(CONFIG_FSNOTIFY))
+		return;
+
+	owner_userns = ns_direct_owner(ns);
+	if (!owner_userns)
+		return;
+
+#ifdef CONFIG_FSNOTIFY
+	/*
+	 * READ_ONCE macro expansion does not understand that this code
+	 * is not reachable without CONFIG_FSNOTIFY.
+	 */
+	if (!READ_ONCE(owner_userns->n_fsnotify_marks))
+		return;
+#endif
+
+	fsnotify_ns(mask, owner_userns, ns->ns_id,
+		    to_ns_common(owner_userns)->ns_id);
+}
+
 /*
  * The active reference count works by having each namespace that gets
  * created take a single active reference on its owning user namespace.
@@ -172,6 +210,8 @@ void __ns_ref_active_put(struct ns_common *ns)
 		return;
 	}
 
+	ns_common_notify(FS_DELETE, ns);
+
 	VFS_WARN_ON_ONCE(is_ns_init_id(ns));
 	VFS_WARN_ON_ONCE(!__ns_ref_read(ns));
 
@@ -184,6 +224,8 @@ void __ns_ref_active_put(struct ns_common *ns)
 			VFS_WARN_ON_ONCE(__ns_ref_active_read(ns) < 0);
 			return;
 		}
+
+		ns_common_notify(FS_DELETE, ns);
 	}
 }
 
@@ -293,6 +335,8 @@ void __ns_ref_active_get(struct ns_common *ns)
 	if (likely(prev))
 		return;
 
+	ns_common_notify(FS_CREATE, ns);
+
 	/*
 	 * We did resurrect it. Walk the ownership hierarchy upwards
 	 * until we found an owning user namespace that is active.
@@ -307,6 +351,8 @@ void __ns_ref_active_get(struct ns_common *ns)
 		VFS_WARN_ON_ONCE(prev < 0);
 		if (likely(prev))
 			return;
+
+		ns_common_notify(FS_CREATE, ns);
 	}
 }
 
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [RFC][PATCH 2/5] fanotify: use high bits for FAN_NS_CREATE/FAN_NS_DELETE
  2026-03-07 11:05 [RFC][PATCH 0/5] fanotify namespace monitoring Amir Goldstein
  2026-03-07 11:05 ` [RFC][PATCH 1/5] fanotify: add support for watching the namespaces tree Amir Goldstein
@ 2026-03-07 11:05 ` Amir Goldstein
  2026-03-07 11:05 ` [RFC][PATCH 3/5] selftests/filesystems: create fanotify test dir Amir Goldstein
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 12+ messages in thread
From: Amir Goldstein @ 2026-03-07 11:05 UTC (permalink / raw)
  To: Jan Kara
  Cc: Christian Brauner, Lennart Poettering, Tejun Heo, T . J . Mercier,
	linux-fsdevel

For the uapi, but keep using FS_CREATE/FS_DELETE internally, because
we do not mix inode events with ns listeners.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/notify/fanotify/fanotify.c      | 11 +++--
 fs/notify/fanotify/fanotify_user.c | 31 ++++++++++---
 fs/notify/fdinfo.c                 |  9 +++-
 include/linux/fanotify.h           |  5 ++-
 include/uapi/linux/fanotify.h      | 70 +++++++++++++++++-------------
 5 files changed, 81 insertions(+), 45 deletions(-)

diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c
index 3818b4d53dcad..4b9c89772404e 100644
--- a/fs/notify/fanotify/fanotify.c
+++ b/fs/notify/fanotify/fanotify.c
@@ -304,9 +304,8 @@ static u32 fanotify_group_event_mask(struct fsnotify_group *group,
 				     const void *data, int data_type,
 				     struct inode *dir)
 {
-	__u32 marks_mask = 0, marks_ignore_mask = 0;
-	__u32 test_mask, user_mask = FANOTIFY_OUTGOING_EVENTS |
-				     FANOTIFY_EVENT_FLAGS;
+	__u32 test_mask, marks_mask = 0, marks_ignore_mask = 0;
+	__u64 user_mask = FANOTIFY_OUTGOING_EVENTS | FANOTIFY_EVENT_FLAGS;
 	const struct path *path = fsnotify_data_path(data, data_type);
 	unsigned int fid_mode = FAN_GROUP_FLAG(group, FANOTIFY_FID_BITS);
 	struct fsnotify_mark *mark;
@@ -980,8 +979,12 @@ static int fanotify_handle_event(struct fsnotify_group *group, u32 mask,
 	BUILD_BUG_ON(FAN_FS_ERROR != FS_ERROR);
 	BUILD_BUG_ON(FAN_RENAME != FS_RENAME);
 	BUILD_BUG_ON(FAN_PRE_ACCESS != FS_PRE_ACCESS);
+	/* NS events live in upper 32 bits; verify the >> 32 round-trip used in copy_event_to_user */
+	BUILD_BUG_ON(upper_32_bits(FAN_NS_CREATE) != FAN_CREATE);
+	BUILD_BUG_ON(upper_32_bits(FAN_NS_DELETE) != FAN_DELETE);
 
-	BUILD_BUG_ON(HWEIGHT32(ALL_FANOTIFY_EVENT_BITS) != 24);
+	/* ALL_FANOTIFY_EVENT_BITS now spans 64 bits (NS events in upper 32) */
+	BUILD_BUG_ON(HWEIGHT64(ALL_FANOTIFY_EVENT_BITS) != 26);
 
 	mask = fanotify_group_event_mask(group, iter_info, &match_mask,
 					 mask, data, data_type, dir);
diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
index 126069101669a..3b23ec1ade8fc 100644
--- a/fs/notify/fanotify/fanotify_user.c
+++ b/fs/notify/fanotify/fanotify_user.c
@@ -883,7 +883,15 @@ static ssize_t copy_event_to_user(struct fsnotify_group *group,
 	metadata.metadata_len = FAN_EVENT_METADATA_LEN;
 	metadata.vers = FANOTIFY_METADATA_VERSION;
 	metadata.reserved = 0;
-	metadata.mask = event->mask & FANOTIFY_OUTGOING_EVENTS;
+	metadata.mask = event->mask;
+	/*
+	 * NS events are stored internally as FS_CREATE/FS_DELETE (lower 32
+	 * bits) but reported to userspace as FAN_NS_CREATE/FAN_NS_DELETE
+	 * (upper 32 bits).  Shift them back up for the UAPI event mask.
+	 */
+	if (fanotify_is_ns_event(event))
+		metadata.mask <<= 32;
+	metadata.mask &= FANOTIFY_OUTGOING_EVENTS;
 	metadata.pid = pid_vnr(event->pid);
 	/*
 	 * For an unprivileged listener, event->pid can be used to identify the
@@ -1916,7 +1924,7 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask,
 	struct fan_fsid __fsid, *fsid = NULL;
 	struct user_namespace *user_ns = NULL;
 	struct mnt_namespace *mntns;
-	u32 valid_mask = FANOTIFY_EVENTS | FANOTIFY_EVENT_FLAGS;
+	u64 valid_mask = FANOTIFY_EVENTS | FANOTIFY_EVENT_FLAGS;
 	unsigned int mark_type = flags & FANOTIFY_MARK_TYPE_BITS;
 	unsigned int mark_cmd = flags & FANOTIFY_MARK_CMD_BITS;
 	unsigned int ignore = flags & FANOTIFY_MARK_IGNORE_BITS;
@@ -1928,8 +1936,11 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask,
 	pr_debug("%s: fanotify_fd=%d flags=%x dfd=%d pathname=%p mask=%llx\n",
 		 __func__, fanotify_fd, flags, dfd, pathname, mask);
 
-	/* we only use the lower 32 bits as of right now. */
-	if (upper_32_bits(mask))
+	/*
+	 * NS events (FAN_NS_CREATE/FAN_NS_DELETE) live in the upper 32 bits
+	 * and are only valid for FAN_MARK_USERNS.  Reject any other upper bits.
+	 */
+	if (upper_32_bits(mask) && mark_type != FAN_MARK_USERNS)
 		return -EINVAL;
 
 	if (flags & ~FANOTIFY_MARK_FLAGS)
@@ -2056,7 +2067,8 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask,
 	 * point.
 	 */
 	fid_mode = FAN_GROUP_FLAG(group, FANOTIFY_FID_BITS);
-	if (mask & ~(FANOTIFY_FD_EVENTS|FANOTIFY_MOUNT_EVENTS|FANOTIFY_EVENT_FLAGS) &&
+	if (mask & ~(FANOTIFY_FD_EVENTS|FANOTIFY_MOUNT_EVENTS|FANOTIFY_NS_EVENTS|
+		     FANOTIFY_EVENT_FLAGS) &&
 	    (!fid_mode || mark_type == FAN_MARK_MOUNT))
 		return -EINVAL;
 
@@ -2178,7 +2190,14 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask,
 			mask |= FAN_EVENT_ON_CHILD;
 	}
 
-	/* create/update an inode mark */
+	/*
+	 * Translate upper-bit UAPI NS events to the internal FS_CREATE/
+	 * FS_DELETE bits used by fsnotify.
+	 */
+	if (obj_type == FSNOTIFY_OBJ_TYPE_USERNS)
+		mask >>= 32;
+
+	/* create/update an fsnotify mark */
 	switch (mark_cmd) {
 	case FAN_MARK_ADD:
 		ret = fanotify_add_mark(group, obj, obj_type, mask, flags,
diff --git a/fs/notify/fdinfo.c b/fs/notify/fdinfo.c
index 946cffaf16e18..6106fad1dcf1b 100644
--- a/fs/notify/fdinfo.c
+++ b/fs/notify/fdinfo.c
@@ -135,8 +135,13 @@ static void fanotify_fdinfo(struct seq_file *m, struct fsnotify_mark *mark)
 	} else if (mark->connector->type == FSNOTIFY_OBJ_TYPE_USERNS) {
 		struct user_namespace *userns = fsnotify_conn_userns(mark->connector);
 
-		seq_printf(m, "fanotify user_ns_id:%llu mflags:%x mask:%x ignored_mask:%x\n",
-			   userns->ns.ns_id, mflags, mark->mask, mark->ignore_mask);
+		/*
+		 * Userns marks store FS_CREATE/FS_DELETE internally but expose
+		 * FAN_NS_CREATE/FAN_NS_DELETE (upper 32 bits) to userspace.
+		 */
+		seq_printf(m, "fanotify user_ns_id:%llu mflags:%x mask:%llx ignored_mask:%llx\n",
+			   userns->ns.ns_id, mflags,
+			   (u64)mark->mask << 32, (u64)mark->ignore_mask << 32);
 	}
 }
 
diff --git a/include/linux/fanotify.h b/include/linux/fanotify.h
index 279082ae40fe2..e85034063a115 100644
--- a/include/linux/fanotify.h
+++ b/include/linux/fanotify.h
@@ -115,13 +115,14 @@
 #define FANOTIFY_MOUNT_EVENTS	(FAN_MNT_ATTACH | FAN_MNT_DETACH)
 
 /* Events that can be reported with data type FSNOTIFY_EVENT_NS */
-#define FANOTIFY_NS_EVENTS	(FAN_CREATE | FAN_DELETE)
+#define FANOTIFY_NS_EVENTS	(FAN_NS_CREATE | FAN_NS_DELETE)
 
 /* Events that user can request to be notified on */
 #define FANOTIFY_EVENTS		(FANOTIFY_PATH_EVENTS | \
 				 FANOTIFY_INODE_EVENTS | \
 				 FANOTIFY_ERROR_EVENTS | \
-				 FANOTIFY_MOUNT_EVENTS)
+				 FANOTIFY_MOUNT_EVENTS | \
+				 FANOTIFY_NS_EVENTS)
 
 /* Extra flags that may be reported with event or control handling of events */
 #define FANOTIFY_EVENT_FLAGS	(FAN_EVENT_ON_CHILD | FAN_ONDIR)
diff --git a/include/uapi/linux/fanotify.h b/include/uapi/linux/fanotify.h
index 6b4f470ee7e01..45ad484ceb473 100644
--- a/include/uapi/linux/fanotify.h
+++ b/include/uapi/linux/fanotify.h
@@ -5,37 +5,45 @@
 #include <linux/types.h>
 
 /* the following events that user-space can register for */
-#define FAN_ACCESS		0x00000001	/* File was accessed */
-#define FAN_MODIFY		0x00000002	/* File was modified */
-#define FAN_ATTRIB		0x00000004	/* Metadata changed */
-#define FAN_CLOSE_WRITE		0x00000008	/* Writable file closed */
-#define FAN_CLOSE_NOWRITE	0x00000010	/* Unwritable file closed */
-#define FAN_OPEN		0x00000020	/* File was opened */
-#define FAN_MOVED_FROM		0x00000040	/* File was moved from X */
-#define FAN_MOVED_TO		0x00000080	/* File was moved to Y */
-#define FAN_CREATE		0x00000100	/* Subfile was created */
-#define FAN_DELETE		0x00000200	/* Subfile was deleted */
-#define FAN_DELETE_SELF		0x00000400	/* Self was deleted */
-#define FAN_MOVE_SELF		0x00000800	/* Self was moved */
-#define FAN_OPEN_EXEC		0x00001000	/* File was opened for exec */
-
-#define FAN_Q_OVERFLOW		0x00004000	/* Event queued overflowed */
-#define FAN_FS_ERROR		0x00008000	/* Filesystem error */
-
-#define FAN_OPEN_PERM		0x00010000	/* File open in perm check */
-#define FAN_ACCESS_PERM		0x00020000	/* File accessed in perm check */
-#define FAN_OPEN_EXEC_PERM	0x00040000	/* File open/exec in perm check */
-/* #define FAN_DIR_MODIFY	0x00080000 */	/* Deprecated (reserved) */
-
-#define FAN_PRE_ACCESS		0x00100000	/* Pre-content access hook */
-#define FAN_MNT_ATTACH		0x01000000	/* Mount was attached */
-#define FAN_MNT_DETACH		0x02000000	/* Mount was detached */
-
-#define FAN_EVENT_ON_CHILD	0x08000000	/* Interested in child events */
-
-#define FAN_RENAME		0x10000000	/* File was renamed */
-
-#define FAN_ONDIR		0x40000000	/* Event occurred against dir */
+#define FAN_ACCESS		0x00000001ULL	/* File was accessed */
+#define FAN_MODIFY		0x00000002ULL	/* File was modified */
+#define FAN_ATTRIB		0x00000004ULL	/* Metadata changed */
+#define FAN_CLOSE_WRITE		0x00000008ULL	/* Writable file closed */
+#define FAN_CLOSE_NOWRITE	0x00000010ULL	/* Unwritable file closed */
+#define FAN_OPEN		0x00000020ULL	/* File was opened */
+#define FAN_MOVED_FROM		0x00000040ULL	/* File was moved from X */
+#define FAN_MOVED_TO		0x00000080ULL	/* File was moved to Y */
+#define FAN_CREATE		0x00000100ULL	/* Subfile was created */
+#define FAN_DELETE		0x00000200ULL	/* Subfile was deleted */
+#define FAN_DELETE_SELF		0x00000400ULL	/* Self was deleted */
+#define FAN_MOVE_SELF		0x00000800ULL	/* Self was moved */
+#define FAN_OPEN_EXEC		0x00001000ULL	/* File was opened for exec */
+
+#define FAN_Q_OVERFLOW		0x00004000ULL	/* Event queued overflowed */
+#define FAN_FS_ERROR		0x00008000ULL	/* Filesystem error */
+
+#define FAN_OPEN_PERM		0x00010000ULL	/* File open in perm check */
+#define FAN_ACCESS_PERM		0x00020000ULL	/* File accessed in perm check */
+#define FAN_OPEN_EXEC_PERM	0x00040000ULL	/* File open/exec in perm check */
+/* #define FAN_DIR_MODIFY	0x00080000ULL */	/* Deprecated (reserved) */
+
+#define FAN_PRE_ACCESS		0x00100000ULL	/* Pre-content access hook */
+#define FAN_MNT_ATTACH		0x01000000ULL	/* Mount was attached */
+#define FAN_MNT_DETACH		0x02000000ULL	/* Mount was detached */
+
+#define FAN_EVENT_ON_CHILD	0x08000000ULL	/* Interested in child events */
+
+#define FAN_RENAME		0x10000000ULL	/* File was renamed */
+
+#define FAN_ONDIR		0x40000000ULL	/* Event occurred against dir */
+
+/*
+ * Namespace lifecycle events use the upper 32 bits of the 64-bit mask
+ * to avoid confusion with the inode-level FAN_CREATE/FAN_DELETE events.
+ * They are only valid with FAN_MARK_USERNS and FAN_REPORT_NSID.
+ */
+#define FAN_NS_CREATE		(FAN_CREATE << 32)	/* Namespace became active */
+#define FAN_NS_DELETE		(FAN_DELETE << 32)	/* Namespace became inactive */
 
 /* helper events */
 #define FAN_CLOSE		(FAN_CLOSE_WRITE | FAN_CLOSE_NOWRITE) /* close */
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [RFC][PATCH 3/5] selftests/filesystems: create fanotify test dir
  2026-03-07 11:05 [RFC][PATCH 0/5] fanotify namespace monitoring Amir Goldstein
  2026-03-07 11:05 ` [RFC][PATCH 1/5] fanotify: add support for watching the namespaces tree Amir Goldstein
  2026-03-07 11:05 ` [RFC][PATCH 2/5] fanotify: use high bits for FAN_NS_CREATE/FAN_NS_DELETE Amir Goldstein
@ 2026-03-07 11:05 ` Amir Goldstein
  2026-03-07 11:05 ` [RFC][PATCH 4/5] filesystems/statmount: update mount.h in tools include dir Amir Goldstein
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 12+ messages in thread
From: Amir Goldstein @ 2026-03-07 11:05 UTC (permalink / raw)
  To: Jan Kara
  Cc: Christian Brauner, Lennart Poettering, Tejun Heo, T . J . Mercier,
	linux-fsdevel

Rename the dir mount-notify with two fanotify mount notify tests
to fanotify before adding more fanotify tests.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 tools/testing/selftests/Makefile                                | 2 +-
 .../selftests/filesystems/{mount-notify => fanotify}/.gitignore | 0
 .../selftests/filesystems/{mount-notify => fanotify}/Makefile   | 0
 .../filesystems/{mount-notify => fanotify}/mount-notify_test.c  | 0
 .../{mount-notify => fanotify}/mount-notify_test_ns.c           | 0
 5 files changed, 1 insertion(+), 1 deletion(-)
 rename tools/testing/selftests/filesystems/{mount-notify => fanotify}/.gitignore (100%)
 rename tools/testing/selftests/filesystems/{mount-notify => fanotify}/Makefile (100%)
 rename tools/testing/selftests/filesystems/{mount-notify => fanotify}/mount-notify_test.c (100%)
 rename tools/testing/selftests/filesystems/{mount-notify => fanotify}/mount-notify_test_ns.c (100%)

diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile
index 450f13ba4cca9..dd48b69c1b21d 100644
--- a/tools/testing/selftests/Makefile
+++ b/tools/testing/selftests/Makefile
@@ -36,7 +36,7 @@ TARGETS += filesystems/epoll
 TARGETS += filesystems/fat
 TARGETS += filesystems/overlayfs
 TARGETS += filesystems/statmount
-TARGETS += filesystems/mount-notify
+TARGETS += filesystems/fanotify
 TARGETS += filesystems/fuse
 TARGETS += firmware
 TARGETS += fpu
diff --git a/tools/testing/selftests/filesystems/mount-notify/.gitignore b/tools/testing/selftests/filesystems/fanotify/.gitignore
similarity index 100%
rename from tools/testing/selftests/filesystems/mount-notify/.gitignore
rename to tools/testing/selftests/filesystems/fanotify/.gitignore
diff --git a/tools/testing/selftests/filesystems/mount-notify/Makefile b/tools/testing/selftests/filesystems/fanotify/Makefile
similarity index 100%
rename from tools/testing/selftests/filesystems/mount-notify/Makefile
rename to tools/testing/selftests/filesystems/fanotify/Makefile
diff --git a/tools/testing/selftests/filesystems/mount-notify/mount-notify_test.c b/tools/testing/selftests/filesystems/fanotify/mount-notify_test.c
similarity index 100%
rename from tools/testing/selftests/filesystems/mount-notify/mount-notify_test.c
rename to tools/testing/selftests/filesystems/fanotify/mount-notify_test.c
diff --git a/tools/testing/selftests/filesystems/mount-notify/mount-notify_test_ns.c b/tools/testing/selftests/filesystems/fanotify/mount-notify_test_ns.c
similarity index 100%
rename from tools/testing/selftests/filesystems/mount-notify/mount-notify_test_ns.c
rename to tools/testing/selftests/filesystems/fanotify/mount-notify_test_ns.c
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [RFC][PATCH 4/5] filesystems/statmount: update mount.h in tools include dir
  2026-03-07 11:05 [RFC][PATCH 0/5] fanotify namespace monitoring Amir Goldstein
                   ` (2 preceding siblings ...)
  2026-03-07 11:05 ` [RFC][PATCH 3/5] selftests/filesystems: create fanotify test dir Amir Goldstein
@ 2026-03-07 11:05 ` Amir Goldstein
  2026-03-07 11:05 ` [RFC][PATCH 5/5] selftests/filesystems: add fanotify namespace notifications test Amir Goldstein
  2026-03-09 12:33 ` [RFC][PATCH 0/5] fanotify namespace monitoring Christian Brauner
  5 siblings, 0 replies; 12+ messages in thread
From: Amir Goldstein @ 2026-03-07 11:05 UTC (permalink / raw)
  To: Jan Kara
  Cc: Christian Brauner, Lennart Poettering, Tejun Heo, T . J . Mercier,
	linux-fsdevel

To fix test build without installing kernel headers.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 tools/include/uapi/linux/mount.h | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/tools/include/uapi/linux/mount.h b/tools/include/uapi/linux/mount.h
index 7fa67c2031a5d..d9d86598d100c 100644
--- a/tools/include/uapi/linux/mount.h
+++ b/tools/include/uapi/linux/mount.h
@@ -61,7 +61,8 @@
 /*
  * open_tree() flags.
  */
-#define OPEN_TREE_CLONE		1		/* Clone the target tree and attach the clone */
+#define OPEN_TREE_CLONE		(1 << 0)	/* Clone the target tree and attach the clone */
+#define OPEN_TREE_NAMESPACE	(1 << 1)	/* Clone the target tree into a new mount namespace */
 #define OPEN_TREE_CLOEXEC	O_CLOEXEC	/* Close the file on execve() */
 
 /*
@@ -197,7 +198,10 @@ struct statmount {
  */
 struct mnt_id_req {
 	__u32 size;
-	__u32 spare;
+	union {
+		__u32 mnt_ns_fd;
+		__u32 mnt_fd;
+	};
 	__u64 mnt_id;
 	__u64 param;
 	__u64 mnt_ns_id;
@@ -232,4 +236,9 @@ struct mnt_id_req {
 #define LSMT_ROOT		0xffffffffffffffff	/* root mount */
 #define LISTMOUNT_REVERSE	(1 << 0) /* List later mounts first */
 
+/*
+ * @flag bits for statmount(2)
+ */
+#define STATMOUNT_BY_FD		0x00000001U	/* want mountinfo for given fd */
+
 #endif /* _UAPI_LINUX_MOUNT_H */
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [RFC][PATCH 5/5] selftests/filesystems: add fanotify namespace notifications test
  2026-03-07 11:05 [RFC][PATCH 0/5] fanotify namespace monitoring Amir Goldstein
                   ` (3 preceding siblings ...)
  2026-03-07 11:05 ` [RFC][PATCH 4/5] filesystems/statmount: update mount.h in tools include dir Amir Goldstein
@ 2026-03-07 11:05 ` Amir Goldstein
  2026-03-09 12:33 ` [RFC][PATCH 0/5] fanotify namespace monitoring Christian Brauner
  5 siblings, 0 replies; 12+ messages in thread
From: Amir Goldstein @ 2026-03-07 11:05 UTC (permalink / raw)
  To: Jan Kara
  Cc: Christian Brauner, Lennart Poettering, Tejun Heo, T . J . Mercier,
	linux-fsdevel

Test create and delete events for nsfs:
- For init userns and child userns
- Verify delete event is created regardless of vfs inode access
- Verify required ns capabilities

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 tools/include/uapi/linux/fanotify.h           |  79 +++--
 .../selftests/filesystems/fanotify/Makefile   |   3 +-
 .../filesystems/fanotify/ns-notify_test.c     | 330 ++++++++++++++++++
 3 files changed, 380 insertions(+), 32 deletions(-)
 create mode 100644 tools/testing/selftests/filesystems/fanotify/ns-notify_test.c

diff --git a/tools/include/uapi/linux/fanotify.h b/tools/include/uapi/linux/fanotify.h
index e710967c7c263..45ad484ceb473 100644
--- a/tools/include/uapi/linux/fanotify.h
+++ b/tools/include/uapi/linux/fanotify.h
@@ -5,37 +5,45 @@
 #include <linux/types.h>
 
 /* the following events that user-space can register for */
-#define FAN_ACCESS		0x00000001	/* File was accessed */
-#define FAN_MODIFY		0x00000002	/* File was modified */
-#define FAN_ATTRIB		0x00000004	/* Metadata changed */
-#define FAN_CLOSE_WRITE		0x00000008	/* Writable file closed */
-#define FAN_CLOSE_NOWRITE	0x00000010	/* Unwritable file closed */
-#define FAN_OPEN		0x00000020	/* File was opened */
-#define FAN_MOVED_FROM		0x00000040	/* File was moved from X */
-#define FAN_MOVED_TO		0x00000080	/* File was moved to Y */
-#define FAN_CREATE		0x00000100	/* Subfile was created */
-#define FAN_DELETE		0x00000200	/* Subfile was deleted */
-#define FAN_DELETE_SELF		0x00000400	/* Self was deleted */
-#define FAN_MOVE_SELF		0x00000800	/* Self was moved */
-#define FAN_OPEN_EXEC		0x00001000	/* File was opened for exec */
-
-#define FAN_Q_OVERFLOW		0x00004000	/* Event queued overflowed */
-#define FAN_FS_ERROR		0x00008000	/* Filesystem error */
-
-#define FAN_OPEN_PERM		0x00010000	/* File open in perm check */
-#define FAN_ACCESS_PERM		0x00020000	/* File accessed in perm check */
-#define FAN_OPEN_EXEC_PERM	0x00040000	/* File open/exec in perm check */
-/* #define FAN_DIR_MODIFY	0x00080000 */	/* Deprecated (reserved) */
-
-#define FAN_PRE_ACCESS		0x00100000	/* Pre-content access hook */
-#define FAN_MNT_ATTACH		0x01000000	/* Mount was attached */
-#define FAN_MNT_DETACH		0x02000000	/* Mount was detached */
-
-#define FAN_EVENT_ON_CHILD	0x08000000	/* Interested in child events */
-
-#define FAN_RENAME		0x10000000	/* File was renamed */
-
-#define FAN_ONDIR		0x40000000	/* Event occurred against dir */
+#define FAN_ACCESS		0x00000001ULL	/* File was accessed */
+#define FAN_MODIFY		0x00000002ULL	/* File was modified */
+#define FAN_ATTRIB		0x00000004ULL	/* Metadata changed */
+#define FAN_CLOSE_WRITE		0x00000008ULL	/* Writable file closed */
+#define FAN_CLOSE_NOWRITE	0x00000010ULL	/* Unwritable file closed */
+#define FAN_OPEN		0x00000020ULL	/* File was opened */
+#define FAN_MOVED_FROM		0x00000040ULL	/* File was moved from X */
+#define FAN_MOVED_TO		0x00000080ULL	/* File was moved to Y */
+#define FAN_CREATE		0x00000100ULL	/* Subfile was created */
+#define FAN_DELETE		0x00000200ULL	/* Subfile was deleted */
+#define FAN_DELETE_SELF		0x00000400ULL	/* Self was deleted */
+#define FAN_MOVE_SELF		0x00000800ULL	/* Self was moved */
+#define FAN_OPEN_EXEC		0x00001000ULL	/* File was opened for exec */
+
+#define FAN_Q_OVERFLOW		0x00004000ULL	/* Event queued overflowed */
+#define FAN_FS_ERROR		0x00008000ULL	/* Filesystem error */
+
+#define FAN_OPEN_PERM		0x00010000ULL	/* File open in perm check */
+#define FAN_ACCESS_PERM		0x00020000ULL	/* File accessed in perm check */
+#define FAN_OPEN_EXEC_PERM	0x00040000ULL	/* File open/exec in perm check */
+/* #define FAN_DIR_MODIFY	0x00080000ULL */	/* Deprecated (reserved) */
+
+#define FAN_PRE_ACCESS		0x00100000ULL	/* Pre-content access hook */
+#define FAN_MNT_ATTACH		0x01000000ULL	/* Mount was attached */
+#define FAN_MNT_DETACH		0x02000000ULL	/* Mount was detached */
+
+#define FAN_EVENT_ON_CHILD	0x08000000ULL	/* Interested in child events */
+
+#define FAN_RENAME		0x10000000ULL	/* File was renamed */
+
+#define FAN_ONDIR		0x40000000ULL	/* Event occurred against dir */
+
+/*
+ * Namespace lifecycle events use the upper 32 bits of the 64-bit mask
+ * to avoid confusion with the inode-level FAN_CREATE/FAN_DELETE events.
+ * They are only valid with FAN_MARK_USERNS and FAN_REPORT_NSID.
+ */
+#define FAN_NS_CREATE		(FAN_CREATE << 32)	/* Namespace became active */
+#define FAN_NS_DELETE		(FAN_DELETE << 32)	/* Namespace became inactive */
 
 /* helper events */
 #define FAN_CLOSE		(FAN_CLOSE_WRITE | FAN_CLOSE_NOWRITE) /* close */
@@ -67,6 +75,7 @@
 #define FAN_REPORT_TARGET_FID	0x00001000	/* Report dirent target id  */
 #define FAN_REPORT_FD_ERROR	0x00002000	/* event->fd can report error */
 #define FAN_REPORT_MNT		0x00004000	/* Report mount events */
+#define FAN_REPORT_NSID		0x00008000	/* Report namespace events */
 
 /* Convenience macro - FAN_REPORT_NAME requires FAN_REPORT_DIR_FID */
 #define FAN_REPORT_DFID_NAME	(FAN_REPORT_DIR_FID | FAN_REPORT_NAME)
@@ -98,6 +107,7 @@
 #define FAN_MARK_MOUNT		0x00000010
 #define FAN_MARK_FILESYSTEM	0x00000100
 #define FAN_MARK_MNTNS		0x00000110
+#define FAN_MARK_USERNS		0x00001000
 
 /*
  * Convenience macro - FAN_MARK_IGNORE requires FAN_MARK_IGNORED_SURV_MODIFY
@@ -152,6 +162,7 @@ struct fanotify_event_metadata {
 #define FAN_EVENT_INFO_TYPE_ERROR	5
 #define FAN_EVENT_INFO_TYPE_RANGE	6
 #define FAN_EVENT_INFO_TYPE_MNT		7
+#define FAN_EVENT_INFO_TYPE_NS		8
 
 /* Special info types for FAN_RENAME */
 #define FAN_EVENT_INFO_TYPE_OLD_DFID_NAME	10
@@ -210,6 +221,12 @@ struct fanotify_event_info_mnt {
 	__u64 mnt_id;
 };
 
+struct fanotify_event_info_ns {
+	struct fanotify_event_info_header hdr;
+	__u64 self_nsid;	/* ns_id of the namespace */
+	__u64 owner_nsid;	/* ns_id of its owning user namespace */
+};
+
 /*
  * User space may need to record additional information about its decision.
  * The extra information type records what kind of information is included.
diff --git a/tools/testing/selftests/filesystems/fanotify/Makefile b/tools/testing/selftests/filesystems/fanotify/Makefile
index 836a4eb7be062..d251249630985 100644
--- a/tools/testing/selftests/filesystems/fanotify/Makefile
+++ b/tools/testing/selftests/filesystems/fanotify/Makefile
@@ -3,9 +3,10 @@
 CFLAGS += -Wall -O2 -g $(KHDR_INCLUDES) $(TOOLS_INCLUDES)
 LDLIBS += -lcap
 
-TEST_GEN_PROGS := mount-notify_test mount-notify_test_ns
+TEST_GEN_PROGS := mount-notify_test mount-notify_test_ns ns-notify_test
 
 include ../../lib.mk
 
 $(OUTPUT)/mount-notify_test: ../utils.c
 $(OUTPUT)/mount-notify_test_ns: ../utils.c
+$(OUTPUT)/ns-notify_test: ../utils.c
diff --git a/tools/testing/selftests/filesystems/fanotify/ns-notify_test.c b/tools/testing/selftests/filesystems/fanotify/ns-notify_test.c
new file mode 100644
index 0000000000000..012a62c92ee4a
--- /dev/null
+++ b/tools/testing/selftests/filesystems/fanotify/ns-notify_test.c
@@ -0,0 +1,330 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (c) 2025
+
+#define _GNU_SOURCE
+
+// Needed for linux/fanotify.h
+typedef struct {
+	int	val[2];
+} __kernel_fsid_t;
+#define __kernel_fsid_t __kernel_fsid_t
+
+#include <fcntl.h>
+#include <sched.h>
+#include <signal.h>
+#include <stdio.h>
+#include <string.h>
+#include <sys/fanotify.h>
+#include <sys/wait.h>
+#include <unistd.h>
+
+#include "kselftest_harness.h"
+#include "../utils.h"
+
+#include <linux/fanotify.h>
+
+/*
+ * Retrieve the ns_id of a namespace fd via name_to_handle_at().
+ * nsfs encodes { ns_id(u64), ns_type(u32), ns_inum(u32) } in f_handle.
+ */
+static uint64_t get_ns_id(int fd)
+{
+	struct {
+		struct file_handle fh;
+		uint64_t ns_id;
+		uint32_t ns_type;
+		uint32_t ns_inum;
+	} h = { .fh.handle_bytes = sizeof(uint64_t) + sizeof(uint32_t) * 2 };
+	int mnt_id;
+
+	if (name_to_handle_at(fd, "", &h.fh, &mnt_id, AT_EMPTY_PATH))
+		return 0;
+	return h.ns_id;
+}
+
+static void read_ns_event_fd(struct __test_metadata *const _metadata,
+			     int fd, char *buf, size_t buf_size,
+			     uint64_t expect_mask,
+			     uint64_t *self_nsid_out, uint64_t *owner_nsid_out)
+{
+	struct fanotify_event_metadata *meta;
+	struct fanotify_event_info_ns *info;
+	ssize_t len;
+
+	len = read(fd, buf, buf_size);
+	ASSERT_GT(len, 0);
+
+	meta = (struct fanotify_event_metadata *)buf;
+	ASSERT_TRUE(FAN_EVENT_OK(meta, len));
+	ASSERT_EQ(meta->mask, expect_mask);
+	ASSERT_EQ(meta->fd, FAN_NOFD);
+	ASSERT_EQ(meta->event_len,
+		  sizeof(*meta) + sizeof(struct fanotify_event_info_ns));
+
+	info = (struct fanotify_event_info_ns *)(meta + 1);
+	ASSERT_EQ(info->hdr.info_type, FAN_EVENT_INFO_TYPE_NS);
+	ASSERT_EQ(info->hdr.len, sizeof(*info));
+
+	*self_nsid_out  = info->self_nsid;
+	*owner_nsid_out = info->owner_nsid;
+}
+
+/* =========================================================================
+ * Outer tests: watch init_user_ns from root context (no setup_userns).
+ * ========================================================================= */
+
+/*
+ * Root-only: watch init_user_ns, fork a child that creates a user namespace
+ * owned by init_user_ns, verify FAN_CREATE, let the child exit, verify
+ * FAN_DELETE.  The watched namespace is created and destroyed entirely within
+ * the test body so both events are observable.
+ */
+TEST(outer_create_delete_userns)
+{
+	int fan_fd, ns_fd;
+	int pipefd[2];
+	pid_t pid;
+	uint64_t ns_nsid, create_self, create_owner;
+	uint64_t delete_self, delete_owner;
+	char buf[256];
+	char c;
+
+	if (geteuid() != 0)
+		SKIP(return, "requires root");
+
+	ns_fd = open("/proc/self/ns/user", O_RDONLY);
+	ASSERT_GE(ns_fd, 0);
+
+	ns_nsid = get_ns_id(ns_fd);
+	ASSERT_NE(ns_nsid, 0);
+
+	fan_fd = fanotify_init(FAN_REPORT_NSID, 0);
+	ASSERT_GE(fan_fd, 0);
+
+	errno = 0;
+	ASSERT_EQ(fanotify_mark(fan_fd, FAN_MARK_ADD | FAN_MARK_USERNS,
+				FAN_NS_CREATE | FAN_NS_DELETE, ns_fd, NULL), 0)
+		TH_LOG("fanotify_mark errno=%d (%s)", errno, strerror(errno));
+
+	ASSERT_EQ(pipe(pipefd), 0);
+
+	pid = fork();
+	ASSERT_GE(pid, 0);
+
+	if (pid == 0) {
+		close(pipefd[0]);
+		if (unshare(CLONE_NEWUSER))
+			_exit(1);
+		if (write(pipefd[1], "r", 1) < 0)
+			_exit(1);
+		close(pipefd[1]);
+		pause();
+		_exit(0);
+	}
+
+	close(pipefd[1]);
+	ASSERT_EQ(read(pipefd[0], &c, 1), 1);
+	close(pipefd[0]);
+
+	/* --- FAN_NS_CREATE: new user namespace owned by init_user_ns --- */
+	read_ns_event_fd(_metadata, fan_fd, buf, sizeof(buf),
+			 FAN_NS_CREATE, &create_self, &create_owner);
+	ASSERT_NE(create_self, 0);
+	ASSERT_EQ(create_owner, ns_nsid);
+
+	/* Let child exit, deactivating its user namespace */
+	kill(pid, SIGTERM);
+	waitpid(pid, NULL, 0);
+
+	/* --- FAN_NS_DELETE --- */
+	read_ns_event_fd(_metadata, fan_fd, buf, sizeof(buf),
+			 FAN_NS_DELETE, &delete_self, &delete_owner);
+	ASSERT_EQ(delete_self, create_self);
+	ASSERT_EQ(delete_owner, ns_nsid);
+
+	close(fan_fd);
+	close(ns_fd);
+}
+
+/* =========================================================================
+ * Inner tests: watch a child userns from within it (via setup_userns).
+ * ========================================================================= */
+
+FIXTURE(userns_notify) {
+	int fan_fd;
+	int userns_fd;
+	int outer_ns_fd;	/* init_user_ns fd, captured before setup_userns() */
+	uint64_t userns_nsid;
+	char buf[256];
+};
+
+FIXTURE_SETUP(userns_notify)
+{
+	int ret;
+
+	/* Capture the outer user namespace fd before setup_userns() */
+	self->outer_ns_fd = open("/proc/self/ns/user", O_RDONLY);
+	ASSERT_GE(self->outer_ns_fd, 0);
+
+	ret = setup_userns();
+	ASSERT_EQ(ret, 0);
+
+	self->userns_fd = open("/proc/self/ns/user", O_RDONLY);
+	ASSERT_GE(self->userns_fd, 0);
+
+	self->userns_nsid = get_ns_id(self->userns_fd);
+	ASSERT_NE(self->userns_nsid, 0);
+
+	self->fan_fd = fanotify_init(FAN_REPORT_NSID, 0);
+	ASSERT_GE(self->fan_fd, 0);
+
+	errno = 0;
+	ret = fanotify_mark(self->fan_fd, FAN_MARK_ADD | FAN_MARK_USERNS,
+			    FAN_NS_CREATE | FAN_NS_DELETE,
+			    self->userns_fd, NULL);
+	ASSERT_EQ(ret, 0)
+		TH_LOG("fanotify_mark errno=%d (%s)", errno, strerror(errno));
+}
+
+FIXTURE_TEARDOWN(userns_notify)
+{
+	close(self->fan_fd);
+	close(self->userns_fd);
+	close(self->outer_ns_fd);
+}
+
+static void read_ns_event(struct __test_metadata *const _metadata,
+			  FIXTURE_DATA(userns_notify) *self,
+			  uint64_t expect_mask,
+			  uint64_t *self_nsid_out, uint64_t *owner_nsid_out)
+{
+	read_ns_event_fd(_metadata, self->fan_fd, self->buf, sizeof(self->buf),
+			 expect_mask, self_nsid_out, owner_nsid_out);
+}
+
+/*
+ * Create a UTS namespace inside the watched user namespace, verify
+ * FAN_CREATE, then let the child exit and verify FAN_DELETE.
+ * Cross-check self_nsid against the actual ns_id obtained via
+ * name_to_handle_at() on the child's /proc/pid/ns/uts.
+ */
+TEST_F(userns_notify, inner_create_delete_uts)
+{
+	int pipefd[2];
+	pid_t pid;
+	uint64_t create_self, create_owner;
+	uint64_t delete_self, delete_owner;
+	char c;
+
+	ASSERT_EQ(pipe(pipefd), 0);
+
+	pid = fork();
+	ASSERT_GE(pid, 0);
+
+	if (pid == 0) {
+		close(pipefd[0]);
+		if (unshare(CLONE_NEWUTS))
+			_exit(1);
+		if (write(pipefd[1], "r", 1) < 0)
+			_exit(1);
+		close(pipefd[1]);
+		pause();
+		_exit(0);
+	}
+
+	close(pipefd[1]);
+	ASSERT_EQ(read(pipefd[0], &c, 1), 1);
+	close(pipefd[0]);
+
+	/* --- FAN_NS_CREATE --- */
+	read_ns_event(_metadata, self, FAN_NS_CREATE, &create_self, &create_owner);
+	ASSERT_NE(create_self, 0);
+	ASSERT_EQ(create_owner, self->userns_nsid);
+
+	/* Cross-check self_nsid against the child's actual UTS ns_id */
+	char path[64];
+	int ns_fd;
+	uint64_t uts_nsid;
+
+	snprintf(path, sizeof(path), "/proc/%d/ns/uts", pid);
+	ns_fd = open(path, O_RDONLY);
+	ASSERT_GE(ns_fd, 0);
+	uts_nsid = get_ns_id(ns_fd);
+	close(ns_fd);
+	ASSERT_EQ(uts_nsid, create_self);
+
+	kill(pid, SIGTERM);
+	waitpid(pid, NULL, 0);
+
+	/* --- FAN_NS_DELETE --- */
+	read_ns_event(_metadata, self, FAN_NS_DELETE, &delete_self, &delete_owner);
+	ASSERT_EQ(delete_self, create_self);
+	ASSERT_EQ(delete_owner, self->userns_nsid);
+}
+
+/*
+ * Same as inner_create_delete_uts but the namespace fd is never opened, so
+ * the stashed nsfs dentry/inode is never populated.  Verifies that FAN_CREATE
+ * and FAN_DELETE are still delivered and carry a consistent self_nsid.
+ */
+TEST_F(userns_notify, inner_create_delete_uts_no_open)
+{
+	int pipefd[2];
+	pid_t pid;
+	uint64_t create_self, create_owner;
+	uint64_t delete_self, delete_owner;
+	char c;
+
+	ASSERT_EQ(pipe(pipefd), 0);
+
+	pid = fork();
+	ASSERT_GE(pid, 0);
+
+	if (pid == 0) {
+		close(pipefd[0]);
+		if (unshare(CLONE_NEWUTS))
+			_exit(1);
+		if (write(pipefd[1], "r", 1) < 0)
+			_exit(1);
+		close(pipefd[1]);
+		pause();
+		_exit(0);
+	}
+
+	close(pipefd[1]);
+	ASSERT_EQ(read(pipefd[0], &c, 1), 1);
+	close(pipefd[0]);
+
+	/* --- FAN_NS_CREATE (no open of /proc/pid/ns/uts) --- */
+	read_ns_event(_metadata, self, FAN_NS_CREATE, &create_self, &create_owner);
+	ASSERT_NE(create_self, 0);
+	ASSERT_EQ(create_owner, self->userns_nsid);
+
+	kill(pid, SIGTERM);
+	waitpid(pid, NULL, 0);
+
+	/* --- FAN_NS_DELETE --- */
+	read_ns_event(_metadata, self, FAN_NS_DELETE, &delete_self, &delete_owner);
+	ASSERT_EQ(delete_self, create_self);
+	ASSERT_EQ(delete_owner, self->userns_nsid);
+}
+
+/*
+ * Attempt to set a FAN_MARK_USERNS watch on the initial user namespace.
+ * Requires CAP_SYS_ADMIN in init_user_ns.  Since FIXTURE_SETUP calls
+ * setup_userns(), the process lives in a child user namespace and cannot
+ * hold capabilities in init_user_ns, so the call must fail with EPERM
+ * regardless of the outer uid.
+ */
+TEST_F(userns_notify, inner_mark_init_userns_eperm)
+{
+	int ret;
+
+	ret = fanotify_mark(self->fan_fd, FAN_MARK_ADD | FAN_MARK_USERNS,
+			    FAN_NS_CREATE | FAN_NS_DELETE,
+			    self->outer_ns_fd, NULL);
+	EXPECT_EQ(ret, -1);
+	EXPECT_EQ(errno, EPERM);
+}
+
+TEST_HARNESS_MAIN
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [RFC][PATCH 0/5] fanotify namespace monitoring
  2026-03-07 11:05 [RFC][PATCH 0/5] fanotify namespace monitoring Amir Goldstein
                   ` (4 preceding siblings ...)
  2026-03-07 11:05 ` [RFC][PATCH 5/5] selftests/filesystems: add fanotify namespace notifications test Amir Goldstein
@ 2026-03-09 12:33 ` Christian Brauner
  2026-03-09 15:47   ` Amir Goldstein
  5 siblings, 1 reply; 12+ messages in thread
From: Christian Brauner @ 2026-03-09 12:33 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Jan Kara, Lennart Poettering, Tejun Heo, T . J . Mercier,
	linux-fsdevel

On Sat, Mar 07, 2026 at 12:05:45PM +0100, Amir Goldstein wrote:
> Jan,
> 
> Similar to mount notifications and listmount(), this is the complementary
> part of listns().
> 
> The discussion about FAN_DELETE_SELF events for kernfs [1] for cgroup
> tree monitoring got me thinking that this sort of monitoring should not be
> tied to vfs inodes.
> 
> Monitoring the cgroups tree has some semantic nuances, but I am told by
> Christian, that similar requirement exists for monitoring namepsace tree,
> where the semantics w.r.t userns are more clear.
> 
> I prepared this RFC to see if it meets the requirements of userspace
> and think if that works, the solution could be extended to monitoring
> cgroup trees.
> 
> IMO monitoring namespace trees and monitoring filesystem objects do not
> need to be mixed in the same fanotify group, so I wanted to try using
> the high 32bits for event flags rather than wasting more event flags
> in low 32bit. I remember that I wanted to so that for mount monitoring
> events, but did not insist, so too bad.
> 
> However, the code for using the high 32bit in uapi is quite ugly and
> hackish ATM, so I kept it as a separate patch, that we can either throw
> away or improve later.
> 
> Christian/Lennart,
> 
> I had considered if doing "recursive watches" to get all events from
> descendant namepsaces is worth while and decided with myself that it was
> not.
> 
> Please let me know if this UAPI meets your requirements.

I think this looks great overall and is very useful as it allows to
monitor namespace events outside of bpf lsms. I agree with the
non-recursive design. You could generalize this approach by deriving the
watch from the namespace file descriptor? Then you can get notifications
for all types of namespaces.

If we ever want recursive watches, then we just need to add a separate
flag. This is only applicable to userns and pidns anyway.

I want to put another - crazier idea - in your head: Since pidfds are
file descriptors and now have the ability to persist information past
pidfd closure via struct pid->attr it is possible to allow fanotify
watches on pidfds.

I think that this opens up a crazy amount of possibilities that will be
tremendously useful - also would mean fsnotify outside of fs/ proper.
Just thinking on the spot: if you allow marking a pidfd it's super easy
to plumb exec notifications via fanotify on top of it. It's also easy to
monitor _all_ namespace events for a specific process via pidfds.

This obviously needs some thinking wrt security etc but I just want to
put the thought out there that the integration of pidfds and fanotify is
possible.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC][PATCH 0/5] fanotify namespace monitoring
  2026-03-09 12:33 ` [RFC][PATCH 0/5] fanotify namespace monitoring Christian Brauner
@ 2026-03-09 15:47   ` Amir Goldstein
  2026-03-10 10:31     ` Christian Brauner
  0 siblings, 1 reply; 12+ messages in thread
From: Amir Goldstein @ 2026-03-09 15:47 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Jan Kara, Lennart Poettering, Tejun Heo, T . J . Mercier,
	linux-fsdevel

On Mon, Mar 9, 2026 at 1:33 PM Christian Brauner <brauner@kernel.org> wrote:
>
> On Sat, Mar 07, 2026 at 12:05:45PM +0100, Amir Goldstein wrote:
> > Jan,
> >
> > Similar to mount notifications and listmount(), this is the complementary
> > part of listns().
> >
> > The discussion about FAN_DELETE_SELF events for kernfs [1] for cgroup
> > tree monitoring got me thinking that this sort of monitoring should not be
> > tied to vfs inodes.
> >
> > Monitoring the cgroups tree has some semantic nuances, but I am told by
> > Christian, that similar requirement exists for monitoring namepsace tree,
> > where the semantics w.r.t userns are more clear.
> >
> > I prepared this RFC to see if it meets the requirements of userspace
> > and think if that works, the solution could be extended to monitoring
> > cgroup trees.
> >
> > IMO monitoring namespace trees and monitoring filesystem objects do not
> > need to be mixed in the same fanotify group, so I wanted to try using
> > the high 32bits for event flags rather than wasting more event flags
> > in low 32bit. I remember that I wanted to so that for mount monitoring
> > events, but did not insist, so too bad.
> >
> > However, the code for using the high 32bit in uapi is quite ugly and
> > hackish ATM, so I kept it as a separate patch, that we can either throw
> > away or improve later.
> >
> > Christian/Lennart,
> >
> > I had considered if doing "recursive watches" to get all events from
> > descendant namepsaces is worth while and decided with myself that it was
> > not.
> >
> > Please let me know if this UAPI meets your requirements.
>
> I think this looks great overall and is very useful as it allows to
> monitor namespace events outside of bpf lsms. I agree with the
> non-recursive design. You could generalize this approach by deriving the
> watch from the namespace file descriptor? Then you can get notifications
> for all types of namespaces.

Not sure what you mean?
Which type of notifications?
This RFC generates notifications for all types of namespaces created/deleted
under the watched userns.

Which notifications did you intend to watch for other types of ns?
DELETE_SELF?

That would be easy to add.
Would just need to move the n_fsnotify_marks/mask to struct ns_common
(also from mnt_namespace).

>
> If we ever want recursive watches, then we just need to add a separate
> flag. This is only applicable to userns and pidns anyway.

Yes, if we wanted to.

>
> I want to put another - crazier idea - in your head: Since pidfds are
> file descriptors and now have the ability to persist information past
> pidfd closure via struct pid->attr it is possible to allow fanotify
> watches on pidfds.
>
> I think that this opens up a crazy amount of possibilities that will be
> tremendously useful - also would mean fsnotify outside of fs/ proper.
> Just thinking on the spot: if you allow marking a pidfd it's super easy
> to plumb exec notifications via fanotify on top of it. It's also easy to
> monitor _all_ namespace events for a specific process via pidfds.

Anything's possible, but we need to make sure it's worth it.
Aren't there already enough ways to monitor a process via ptrace/landlock?

>
> This obviously needs some thinking wrt security etc but I just want to
> put the thought out there that the integration of pidfds and fanotify is
> possible.

I was thinking more about watching the entire pidfs space, but sure.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC][PATCH 1/5] fanotify: add support for watching the namespaces tree
  2026-03-07 11:05 ` [RFC][PATCH 1/5] fanotify: add support for watching the namespaces tree Amir Goldstein
@ 2026-03-09 18:07   ` Amir Goldstein
  0 siblings, 0 replies; 12+ messages in thread
From: Amir Goldstein @ 2026-03-09 18:07 UTC (permalink / raw)
  To: Jan Kara
  Cc: Christian Brauner, Lennart Poettering, Tejun Heo, T . J . Mercier,
	linux-fsdevel

On Sat, Mar 7, 2026 at 12:05 PM Amir Goldstein <amir73il@gmail.com> wrote:
>
> Introduce FAN_MARK_USERNS type to mark a user namespace object
> from nsfs path.
>
> Support two events FAN_CREATE and FAN_DELETE to report creation
> and tear down of namespaces owned by the marked userns.
>
> Introduce FAN_REPORT_NSID to report the self and owner nsid of
> the created or torn down namespace.
>
> At this time, an fanotify group initialized with flags
> FAN_REPORT_MNT|FAN_REPORT_NSID, may add marks on both userns
> and mntns objects to mix mount and namespace events, but the same
> group cannot also request filesystem events with file handles
> (e.g. FAN_REPORT_FID).
>
> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> ---
>  fs/notify/fanotify/fanotify.c      | 32 ++++++++++++++
>  fs/notify/fanotify/fanotify.h      | 19 ++++++++
>  fs/notify/fanotify/fanotify_user.c | 71 +++++++++++++++++++++++++-----
>  fs/notify/fdinfo.c                 |  9 +++-
>  fs/notify/fsnotify.c               | 28 +++++++++++-
>  fs/notify/fsnotify.h               |  7 +++
>  fs/notify/mark.c                   |  7 +++
>  fs/nsfs.c                          | 21 +++++++++
>  include/linux/fanotify.h           | 14 ++++--
>  include/linux/fsnotify_backend.h   | 22 +++++++++
>  include/linux/proc_fs.h            |  2 +
>  include/linux/user_namespace.h     |  6 +++
>  include/uapi/linux/fanotify.h      |  9 ++++
>  kernel/nscommon.c                  | 46 +++++++++++++++++++
>  14 files changed, 276 insertions(+), 17 deletions(-)
>
> diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c
> index bfe884d624e7b..3818b4d53dcad 100644
> --- a/fs/notify/fanotify/fanotify.c
> +++ b/fs/notify/fanotify/fanotify.c
> @@ -168,6 +168,8 @@ static bool fanotify_should_merge(struct fanotify_event *old,
>                                                   FANOTIFY_EE(new));
>         case FANOTIFY_EVENT_TYPE_MNT:
>                 return false;
> +       case FANOTIFY_EVENT_TYPE_NS:
> +               return false;
>         default:
>                 WARN_ON_ONCE(1);
>         }
> @@ -317,6 +319,9 @@ static u32 fanotify_group_event_mask(struct fsnotify_group *group,
>         if (FAN_GROUP_FLAG(group, FAN_REPORT_MNT)) {
>                 if (data_type != FSNOTIFY_EVENT_MNT)
>                         return 0;
> +       } else if (FAN_GROUP_FLAG(group, FAN_REPORT_NSID)) {
> +               if (data_type != FSNOTIFY_EVENT_NS)
> +                       return 0;
>         } else if (!fid_mode) {
>                 /* Do we have path to open a file descriptor? */
>                 if (!path)
> @@ -582,6 +587,22 @@ static struct fanotify_event *fanotify_alloc_mnt_event(u64 mnt_id, gfp_t gfp)
>         return &pevent->fae;
>  }
>
> +static struct fanotify_event *fanotify_alloc_ns_event(const struct fsnotify_ns *ns_data,
> +                                                     gfp_t gfp)
> +{
> +       struct fanotify_ns_event *pevent;
> +
> +       pevent = kmem_cache_alloc(fanotify_ns_event_cachep, gfp);
> +       if (!pevent)
> +               return NULL;
> +
> +       pevent->fae.type = FANOTIFY_EVENT_TYPE_NS;
> +       pevent->self_nsid = ns_data->self_nsid;
> +       pevent->owner_nsid = ns_data->owner_nsid;
> +
> +       return &pevent->fae;
> +}
> +
>  static struct fanotify_event *fanotify_alloc_perm_event(const void *data,
>                                                         int data_type,
>                                                         gfp_t gfp)
> @@ -755,6 +776,7 @@ static struct fanotify_event *fanotify_alloc_event(
>         struct inode *id = fanotify_fid_inode(mask, data, data_type, dir,
>                                               fid_mode);
>         struct inode *dirid = fanotify_dfid_inode(mask, data, data_type, dir);
> +       const struct fsnotify_ns *ns_data = fsnotify_data_ns(data, data_type);
>         const struct path *path = fsnotify_data_path(data, data_type);
>         u64 mnt_id = fsnotify_data_mnt_id(data, data_type);
>         struct mem_cgroup *old_memcg;
> @@ -856,6 +878,8 @@ static struct fanotify_event *fanotify_alloc_event(
>                 event = fanotify_alloc_path_event(path, &hash, gfp);
>         } else if (mnt_id) {
>                 event = fanotify_alloc_mnt_event(mnt_id, gfp);
> +       } else if (ns_data) {
> +               event = fanotify_alloc_ns_event(ns_data, gfp);
>         } else {
>                 WARN_ON_ONCE(1);
>         }
> @@ -1064,6 +1088,11 @@ static void fanotify_free_mnt_event(struct fanotify_event *event)
>         kmem_cache_free(fanotify_mnt_event_cachep, FANOTIFY_ME(event));
>  }
>
> +static void fanotify_free_ns_event(struct fanotify_event *event)
> +{
> +       kmem_cache_free(fanotify_ns_event_cachep, FANOTIFY_NSE(event));
> +}
> +
>  static void fanotify_free_event(struct fsnotify_group *group,
>                                 struct fsnotify_event *fsn_event)
>  {
> @@ -1093,6 +1122,9 @@ static void fanotify_free_event(struct fsnotify_group *group,
>         case FANOTIFY_EVENT_TYPE_MNT:
>                 fanotify_free_mnt_event(event);
>                 break;
> +       case FANOTIFY_EVENT_TYPE_NS:
> +               fanotify_free_ns_event(event);
> +               break;
>         default:
>                 WARN_ON_ONCE(1);
>         }
> diff --git a/fs/notify/fanotify/fanotify.h b/fs/notify/fanotify/fanotify.h
> index 39e60218df7ce..2eaac302ccac0 100644
> --- a/fs/notify/fanotify/fanotify.h
> +++ b/fs/notify/fanotify/fanotify.h
> @@ -10,6 +10,7 @@ extern struct kmem_cache *fanotify_fid_event_cachep;
>  extern struct kmem_cache *fanotify_path_event_cachep;
>  extern struct kmem_cache *fanotify_perm_event_cachep;
>  extern struct kmem_cache *fanotify_mnt_event_cachep;
> +extern struct kmem_cache *fanotify_ns_event_cachep;
>
>  /* Possible states of the permission event */
>  enum {
> @@ -245,6 +246,7 @@ enum fanotify_event_type {
>         FANOTIFY_EVENT_TYPE_OVERFLOW, /* struct fanotify_event */
>         FANOTIFY_EVENT_TYPE_FS_ERROR, /* struct fanotify_error_event */
>         FANOTIFY_EVENT_TYPE_MNT,
> +       FANOTIFY_EVENT_TYPE_NS,
>         __FANOTIFY_EVENT_TYPE_NUM
>  };
>
> @@ -415,6 +417,12 @@ struct fanotify_mnt_event {
>         u64 mnt_id;
>  };
>
> +struct fanotify_ns_event {
> +       struct fanotify_event fae;
> +       u64 self_nsid;
> +       u64 owner_nsid;
> +};
> +
>  static inline struct fanotify_path_event *
>  FANOTIFY_PE(struct fanotify_event *event)
>  {
> @@ -427,6 +435,12 @@ FANOTIFY_ME(struct fanotify_event *event)
>         return container_of(event, struct fanotify_mnt_event, fae);
>  }
>
> +static inline struct fanotify_ns_event *
> +FANOTIFY_NSE(struct fanotify_event *event)
> +{
> +       return container_of(event, struct fanotify_ns_event, fae);
> +}
> +
>  /*
>   * Structure for permission fanotify events. It gets allocated and freed in
>   * fanotify_handle_event() since we wait there for user response. When the
> @@ -485,6 +499,11 @@ static inline bool fanotify_is_mnt_event(u32 mask)
>         return mask & (FAN_MNT_ATTACH | FAN_MNT_DETACH);
>  }
>
> +static inline bool fanotify_is_ns_event(const struct fanotify_event *event)
> +{
> +       return event->type == FANOTIFY_EVENT_TYPE_NS;
> +}
> +
>  static inline const struct path *fanotify_event_path(struct fanotify_event *event)
>  {
>         if (event->type == FANOTIFY_EVENT_TYPE_PATH)
> diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
> index ae904451dfc09..126069101669a 100644
> --- a/fs/notify/fanotify/fanotify_user.c
> +++ b/fs/notify/fanotify/fanotify_user.c
> @@ -19,6 +19,7 @@
>  #include <linux/memcontrol.h>
>  #include <linux/statfs.h>
>  #include <linux/exportfs.h>
> +#include <linux/proc_fs.h>
>
>  #include <asm/ioctls.h>
>
> @@ -208,6 +209,7 @@ struct kmem_cache *fanotify_fid_event_cachep __ro_after_init;
>  struct kmem_cache *fanotify_path_event_cachep __ro_after_init;
>  struct kmem_cache *fanotify_perm_event_cachep __ro_after_init;
>  struct kmem_cache *fanotify_mnt_event_cachep __ro_after_init;
> +struct kmem_cache *fanotify_ns_event_cachep __ro_after_init;
>
>  #define FANOTIFY_EVENT_ALIGN 4
>  #define FANOTIFY_FID_INFO_HDR_LEN \
> @@ -220,6 +222,8 @@ struct kmem_cache *fanotify_mnt_event_cachep __ro_after_init;
>         (sizeof(struct fanotify_event_info_range))
>  #define FANOTIFY_MNT_INFO_LEN \
>         (sizeof(struct fanotify_event_info_mnt))
> +#define FANOTIFY_NS_INFO_LEN \
> +       (sizeof(struct fanotify_event_info_ns))
>
>  static int fanotify_fid_info_len(int fh_len, int name_len)
>  {
> @@ -277,6 +281,8 @@ static size_t fanotify_event_len(unsigned int info_mode,
>         }
>         if (fanotify_is_mnt_event(event->mask))
>                 event_len += FANOTIFY_MNT_INFO_LEN;
> +       if (fanotify_is_ns_event(event))
> +               event_len += FANOTIFY_NS_INFO_LEN;
>
>         if (info_mode & FAN_REPORT_PIDFD)
>                 event_len += FANOTIFY_PIDFD_INFO_LEN;
> @@ -523,6 +529,26 @@ static size_t copy_mnt_info_to_user(struct fanotify_event *event,
>         return info.hdr.len;
>  }
>
> +static size_t copy_ns_info_to_user(struct fanotify_event *event,
> +                                  char __user *buf, int count)
> +{
> +       struct fanotify_event_info_ns info = { };
> +
> +       info.hdr.info_type = FAN_EVENT_INFO_TYPE_NS;
> +       info.hdr.len = sizeof(info);
> +
> +       if (WARN_ON(count < info.hdr.len))
> +               return -EFAULT;
> +
> +       info.self_nsid  = FANOTIFY_NSE(event)->self_nsid;
> +       info.owner_nsid = FANOTIFY_NSE(event)->owner_nsid;
> +
> +       if (copy_to_user(buf, &info, sizeof(info)))
> +               return -EFAULT;
> +
> +       return info.hdr.len;
> +}
> +
>  static size_t copy_error_info_to_user(struct fanotify_event *event,
>                                       char __user *buf, int count)
>  {
> @@ -827,6 +853,15 @@ static int copy_info_records_to_user(struct fanotify_event *event,
>                 total_bytes += ret;
>         }
>
> +       if (fanotify_is_ns_event(event)) {
> +               ret = copy_ns_info_to_user(event, buf, count);
> +               if (ret < 0)
> +                       return ret;
> +               buf += ret;
> +               count -= ret;
> +               total_bytes += ret;
> +       }
> +
>         return total_bytes;
>  }
>
> @@ -1604,11 +1639,11 @@ SYSCALL_DEFINE2(fanotify_init, unsigned int, flags, unsigned int, event_f_flags)
>         /*
>          * An unprivileged user can setup an fanotify group with limited
>          * functionality - an unprivileged group is limited to notification
> -        * events with file handles or mount ids and it cannot use unlimited
> +        * events with file handles or mount/ns ids and it cannot use unlimited
>          * queue/marks.
>          */
>         if (((flags & FANOTIFY_ADMIN_INIT_FLAGS) ||
> -            !(flags & (FANOTIFY_FID_BITS | FAN_REPORT_MNT))) &&
> +            !(flags & (FANOTIFY_FID_BITS | FAN_REPORT_MNT | FAN_REPORT_NSID))) &&
>             !capable(CAP_SYS_ADMIN))
>                 return -EPERM;
>
> @@ -1636,8 +1671,8 @@ SYSCALL_DEFINE2(fanotify_init, unsigned int, flags, unsigned int, event_f_flags)
>         if ((flags & FAN_REPORT_PIDFD) && (flags & FAN_REPORT_TID))
>                 return -EINVAL;
>
> -       /* Don't allow mixing mnt events with inode events for now */
> -       if (flags & FAN_REPORT_MNT) {
> +       /* Don't allow mixing mnt/ns events with inode events for now */
> +       if (flags & (FAN_REPORT_MNT | FAN_REPORT_NSID)) {
>                 if (class != FAN_CLASS_NOTIF)
>                         return -EINVAL;
>                 if (flags & (FANOTIFY_FID_BITS | FAN_REPORT_FD_ERROR))
> @@ -1913,6 +1948,9 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask,
>         case FAN_MARK_MNTNS:
>                 obj_type = FSNOTIFY_OBJ_TYPE_MNTNS;
>                 break;
> +       case FAN_MARK_USERNS:
> +               obj_type = FSNOTIFY_OBJ_TYPE_USERNS;
> +               break;
>         default:
>                 return -EINVAL;
>         }
> @@ -1960,16 +1998,22 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask,
>                 return -EINVAL;
>         group = fd_file(f)->private_data;
>
> -       /* Only report mount events on mnt namespace */
> -       if (FAN_GROUP_FLAG(group, FAN_REPORT_MNT)) {
> +       /* Only report mount events on mnt namespace mark */
> +       if (mark_type == FAN_MARK_MNTNS) {
>                 if (mask & ~FANOTIFY_MOUNT_EVENTS)
>                         return -EINVAL;
> -               if (mark_type != FAN_MARK_MNTNS)
> +               if (!FAN_GROUP_FLAG(group, FAN_REPORT_MNT))
>                         return -EINVAL;
>         } else {
>                 if (mask & FANOTIFY_MOUNT_EVENTS)
>                         return -EINVAL;
> -               if (mark_type == FAN_MARK_MNTNS)
> +       }
> +
> +       /* Only report namespace events on user namespace mark */
> +       if (mark_type == FAN_MARK_USERNS) {
> +               if (mask & ~FANOTIFY_NS_EVENTS)
> +                       return -EINVAL;
> +               if (!FAN_GROUP_FLAG(group, FAN_REPORT_NSID))
>                         return -EINVAL;
>         }
>
> @@ -2087,6 +2131,12 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask,
>                         goto path_put_and_out;
>                 user_ns = mntns->user_ns;
>                 obj = mntns;
> +       } else if (obj_type == FSNOTIFY_OBJ_TYPE_USERNS) {
> +               ret = -EINVAL;
> +               user_ns = userns_from_dentry(path.dentry);
> +               if (!user_ns)
> +                       goto path_put_and_out;
> +               obj = user_ns;
>         }
>
>         ret = -EPERM;
> @@ -2190,8 +2240,8 @@ static int __init fanotify_user_setup(void)
>                                      FANOTIFY_DEFAULT_MAX_USER_MARKS);
>
>         BUILD_BUG_ON(FANOTIFY_INIT_FLAGS & FANOTIFY_INTERNAL_GROUP_FLAGS);
> -       BUILD_BUG_ON(HWEIGHT32(FANOTIFY_INIT_FLAGS) != 14);
> -       BUILD_BUG_ON(HWEIGHT32(FANOTIFY_MARK_FLAGS) != 11);
> +       BUILD_BUG_ON(HWEIGHT32(FANOTIFY_INIT_FLAGS) != 15);
> +       BUILD_BUG_ON(HWEIGHT32(FANOTIFY_MARK_FLAGS) != 12);
>
>         fanotify_mark_cache = KMEM_CACHE(fanotify_mark,
>                                          SLAB_PANIC|SLAB_ACCOUNT);
> @@ -2204,6 +2254,7 @@ static int __init fanotify_user_setup(void)
>                         KMEM_CACHE(fanotify_perm_event, SLAB_PANIC);
>         }
>         fanotify_mnt_event_cachep = KMEM_CACHE(fanotify_mnt_event, SLAB_PANIC);
> +       fanotify_ns_event_cachep = KMEM_CACHE(fanotify_ns_event, SLAB_PANIC);
>
>         fanotify_max_queued_events = FANOTIFY_DEFAULT_MAX_EVENTS;
>         init_user_ns.ucount_max[UCOUNT_FANOTIFY_GROUPS] =
> diff --git a/fs/notify/fdinfo.c b/fs/notify/fdinfo.c
> index 9cc7eb8636437..946cffaf16e18 100644
> --- a/fs/notify/fdinfo.c
> +++ b/fs/notify/fdinfo.c
> @@ -130,8 +130,13 @@ static void fanotify_fdinfo(struct seq_file *m, struct fsnotify_mark *mark)
>         } else if (mark->connector->type == FSNOTIFY_OBJ_TYPE_MNTNS) {
>                 struct mnt_namespace *mnt_ns = fsnotify_conn_mntns(mark->connector);
>
> -               seq_printf(m, "fanotify mnt_ns:%u mflags:%x mask:%x ignored_mask:%x\n",
> -                          mnt_ns->ns.inum, mflags, mark->mask, mark->ignore_mask);
> +               seq_printf(m, "fanotify mnt_ns_id:%llu mflags:%x mask:%x ignored_mask:%x\n",
> +                          mnt_ns->ns.ns_id, mflags, mark->mask, mark->ignore_mask);
> +       } else if (mark->connector->type == FSNOTIFY_OBJ_TYPE_USERNS) {
> +               struct user_namespace *userns = fsnotify_conn_userns(mark->connector);
> +
> +               seq_printf(m, "fanotify user_ns_id:%llu mflags:%x mask:%x ignored_mask:%x\n",
> +                          userns->ns.ns_id, mflags, mark->mask, mark->ignore_mask);
>         }
>  }
>
> diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c
> index 9995de1710e59..638136c0d6cb9 100644
> --- a/fs/notify/fsnotify.c
> +++ b/fs/notify/fsnotify.c
> @@ -495,6 +495,7 @@ int fsnotify(__u32 mask, const void *data, int data_type, struct inode *dir,
>         const struct path *path = fsnotify_data_path(data, data_type);
>         struct super_block *sb = fsnotify_data_sb(data, data_type);
>         const struct fsnotify_mnt *mnt_data = fsnotify_data_mnt(data, data_type);
> +       const struct fsnotify_ns *ns_data = fsnotify_data_ns(data, data_type);
>         struct fsnotify_sb_info *sbinfo = sb ? fsnotify_sb_info(sb) : NULL;
>         struct fsnotify_iter_info iter_info = {};
>         struct mount *mnt = NULL;
> @@ -536,7 +537,8 @@ int fsnotify(__u32 mask, const void *data, int data_type, struct inode *dir,
>             (!mnt || !mnt->mnt_fsnotify_marks) &&
>             (!inode || !inode->i_fsnotify_marks) &&
>             (!inode2 || !inode2->i_fsnotify_marks) &&
> -           (!mnt_data || !mnt_data->ns->n_fsnotify_marks))
> +           (!mnt_data || !mnt_data->ns->n_fsnotify_marks) &&
> +           (!ns_data || !ns_data->userns->n_fsnotify_marks))
>                 return 0;
>
>         if (sb)
> @@ -549,6 +551,8 @@ int fsnotify(__u32 mask, const void *data, int data_type, struct inode *dir,
>                 marks_mask |= READ_ONCE(inode2->i_fsnotify_mask);
>         if (mnt_data)
>                 marks_mask |= READ_ONCE(mnt_data->ns->n_fsnotify_mask);
> +       if (ns_data)
> +               marks_mask |= READ_ONCE(ns_data->userns->n_fsnotify_mask);
>
>         /*
>          * If this is a modify event we may need to clear some ignore masks.
> @@ -582,6 +586,10 @@ int fsnotify(__u32 mask, const void *data, int data_type, struct inode *dir,
>                 iter_info.marks[FSNOTIFY_ITER_TYPE_MNTNS] =
>                         fsnotify_first_mark(&mnt_data->ns->n_fsnotify_marks);
>         }
> +       if (ns_data) {
> +               iter_info.marks[FSNOTIFY_ITER_TYPE_USERNS] =
> +                       fsnotify_first_mark(&ns_data->userns->n_fsnotify_marks);
> +       }
>
>         /*
>          * We need to merge inode/vfsmount/sb mark lists so that e.g. inode mark
> @@ -711,6 +719,24 @@ void fsnotify_mnt(__u32 mask, struct mnt_namespace *ns, struct vfsmount *mnt)
>         fsnotify(mask, &data, FSNOTIFY_EVENT_MNT, NULL, NULL, NULL, 0);
>  }
>
> +void fsnotify_ns(__u32 mask, struct user_namespace *userns,
> +                u64 self_nsid, u64 owner_nsid)
> +{
> +       struct fsnotify_ns data = {
> +               .userns = userns,
> +               .self_nsid = self_nsid,
> +               .owner_nsid = owner_nsid,
> +       };
> +
> +       if (WARN_ON_ONCE(!userns))
> +               return;
> +
> +       if (!READ_ONCE(userns->n_fsnotify_marks))
> +               return;
> +
> +       fsnotify(mask, &data, FSNOTIFY_EVENT_NS, NULL, NULL, NULL, 0);
> +}
> +
>  static __init int fsnotify_init(void)
>  {
>         int ret;
> diff --git a/fs/notify/fsnotify.h b/fs/notify/fsnotify.h
> index 58c7bb25e5718..f58c69de7f067 100644
> --- a/fs/notify/fsnotify.h
> +++ b/fs/notify/fsnotify.h
> @@ -6,6 +6,7 @@
>  #include <linux/fsnotify.h>
>  #include <linux/srcu.h>
>  #include <linux/types.h>
> +#include <linux/user_namespace.h>
>
>  #include "../mount.h"
>
> @@ -39,6 +40,12 @@ static inline struct mnt_namespace *fsnotify_conn_mntns(
>         return conn->obj;
>  }
>
> +static inline struct user_namespace *fsnotify_conn_userns(
> +                               struct fsnotify_mark_connector *conn)
> +{
> +       return conn->obj;
> +}
> +
>  static inline struct super_block *fsnotify_object_sb(void *obj,
>                         enum fsnotify_obj_type obj_type)
>  {
> diff --git a/fs/notify/mark.c b/fs/notify/mark.c
> index c2ed5b11b0fe6..4086b37637cbe 100644
> --- a/fs/notify/mark.c
> +++ b/fs/notify/mark.c
> @@ -74,6 +74,7 @@
>  #include <linux/atomic.h>
>
>  #include <linux/fsnotify_backend.h>
> +#include <linux/user_namespace.h>
>  #include "fsnotify.h"
>
>  #define FSNOTIFY_REAPER_DELAY  (1)     /* 1 jiffy */
> @@ -110,6 +111,8 @@ static fsnotify_connp_t *fsnotify_object_connp(void *obj,
>                 return fsnotify_sb_marks(obj);
>         case FSNOTIFY_OBJ_TYPE_MNTNS:
>                 return &((struct mnt_namespace *)obj)->n_fsnotify_marks;
> +       case FSNOTIFY_OBJ_TYPE_USERNS:
> +               return &((struct user_namespace *)obj)->n_fsnotify_marks;
>         default:
>                 return NULL;
>         }
> @@ -125,6 +128,8 @@ static __u32 *fsnotify_conn_mask_p(struct fsnotify_mark_connector *conn)
>                 return &fsnotify_conn_sb(conn)->s_fsnotify_mask;
>         else if (conn->type == FSNOTIFY_OBJ_TYPE_MNTNS)
>                 return &fsnotify_conn_mntns(conn)->n_fsnotify_mask;
> +       else if (conn->type == FSNOTIFY_OBJ_TYPE_USERNS)
> +               return &fsnotify_conn_userns(conn)->n_fsnotify_mask;
>         return NULL;
>  }
>
> @@ -356,6 +361,8 @@ static void *fsnotify_detach_connector_from_object(
>                 fsnotify_conn_sb(conn)->s_fsnotify_mask = 0;
>         } else if (conn->type == FSNOTIFY_OBJ_TYPE_MNTNS) {
>                 fsnotify_conn_mntns(conn)->n_fsnotify_mask = 0;
> +       } else if (conn->type == FSNOTIFY_OBJ_TYPE_USERNS) {
> +               fsnotify_conn_userns(conn)->n_fsnotify_mask = 0;
>         }
>
>         rcu_assign_pointer(*connp, NULL);
> diff --git a/fs/nsfs.c b/fs/nsfs.c
> index c215878d55e87..ace17de243f45 100644
> --- a/fs/nsfs.c
> +++ b/fs/nsfs.c
> @@ -387,6 +387,27 @@ bool proc_ns_file(const struct file *file)
>         return file->f_op == &ns_file_operations;
>  }
>
> +/**
> + * userns_from_dentry() - Return the user_namespace referenced by an nsfs dentry.
> + * @dentry: dentry of an open nsfs file
> + *
> + * Returns the user_namespace if @dentry is an nsfs file for a user namespace,
> + * NULL otherwise.  The caller is responsible for ensuring the returned pointer
> + * remains valid (e.g. by holding a reference to the dentry).
> + */
> +struct user_namespace *userns_from_dentry(struct dentry *dentry)
> +{
> +       struct inode *inode = d_inode(dentry);
> +       struct ns_common *ns;
> +
> +       if (!inode || inode->i_sb->s_magic != NSFS_MAGIC)
> +               return NULL;
> +       ns = get_proc_ns(inode);
> +       if (!ns || ns->ns_type != CLONE_NEWUSER)
> +               return NULL;
> +       return to_user_ns(ns);
> +}
> +
>  /**
>   * ns_match() - Returns true if current namespace matches dev/ino provided.
>   * @ns: current namespace
> diff --git a/include/linux/fanotify.h b/include/linux/fanotify.h
> index 879cff5eccd4e..279082ae40fe2 100644
> --- a/include/linux/fanotify.h
> +++ b/include/linux/fanotify.h
> @@ -25,7 +25,8 @@
>
>  #define FANOTIFY_FID_BITS      (FAN_REPORT_DFID_NAME_TARGET)
>
> -#define FANOTIFY_INFO_MODES    (FANOTIFY_FID_BITS | FAN_REPORT_PIDFD | FAN_REPORT_MNT)
> +#define FANOTIFY_INFO_MODES    (FANOTIFY_FID_BITS | FAN_REPORT_PIDFD | FAN_REPORT_MNT | \
> +                                FAN_REPORT_NSID)
>
>  /*
>   * fanotify_init() flags that require CAP_SYS_ADMIN.
> @@ -47,8 +48,9 @@
>   * so one of the flags for reporting file handles is required.
>   */
>  #define FANOTIFY_USER_INIT_FLAGS       (FAN_CLASS_NOTIF | \
> -                                        FANOTIFY_FID_BITS | FAN_REPORT_MNT | \
> -                                        FAN_CLOEXEC | FAN_NONBLOCK)
> +                                FANOTIFY_FID_BITS | FAN_REPORT_MNT | \
> +                                FAN_REPORT_NSID | \
> +                                FAN_CLOEXEC | FAN_NONBLOCK)
>
>  #define FANOTIFY_INIT_FLAGS    (FANOTIFY_ADMIN_INIT_FLAGS | \
>                                  FANOTIFY_USER_INIT_FLAGS)
> @@ -58,7 +60,8 @@
>  #define FANOTIFY_INTERNAL_GROUP_FLAGS  (FANOTIFY_UNPRIV)
>
>  #define FANOTIFY_MARK_TYPE_BITS        (FAN_MARK_INODE | FAN_MARK_MOUNT | \
> -                                FAN_MARK_FILESYSTEM | FAN_MARK_MNTNS)
> +                                FAN_MARK_FILESYSTEM | FAN_MARK_MNTNS | \
> +                                FAN_MARK_USERNS)
>
>  #define FANOTIFY_MARK_CMD_BITS (FAN_MARK_ADD | FAN_MARK_REMOVE | \
>                                  FAN_MARK_FLUSH)
> @@ -111,6 +114,9 @@
>
>  #define FANOTIFY_MOUNT_EVENTS  (FAN_MNT_ATTACH | FAN_MNT_DETACH)
>
> +/* Events that can be reported with data type FSNOTIFY_EVENT_NS */
> +#define FANOTIFY_NS_EVENTS     (FAN_CREATE | FAN_DELETE)
> +
>  /* Events that user can request to be notified on */
>  #define FANOTIFY_EVENTS                (FANOTIFY_PATH_EVENTS | \
>                                  FANOTIFY_INODE_EVENTS | \
> diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
> index 95985400d3d8e..2145d2f4262db 100644
> --- a/include/linux/fsnotify_backend.h
> +++ b/include/linux/fsnotify_backend.h
> @@ -310,6 +310,7 @@ enum fsnotify_data_type {
>         FSNOTIFY_EVENT_INODE,
>         FSNOTIFY_EVENT_DENTRY,
>         FSNOTIFY_EVENT_MNT,
> +       FSNOTIFY_EVENT_NS,
>         FSNOTIFY_EVENT_ERROR,
>  };
>
> @@ -335,6 +336,12 @@ struct fsnotify_mnt {
>         u64 mnt_id;
>  };
>
> +struct fsnotify_ns {
> +       const struct user_namespace *userns;
> +       u64 self_nsid;
> +       u64 owner_nsid;
> +};
> +
>  static inline struct inode *fsnotify_data_inode(const void *data, int data_type)
>  {
>         switch (data_type) {
> @@ -411,6 +418,17 @@ static inline const struct fsnotify_mnt *fsnotify_data_mnt(const void *data,
>         }
>  }
>
> +static inline const struct fsnotify_ns *fsnotify_data_ns(const void *data,
> +                                                        int data_type)
> +{
> +       switch (data_type) {
> +       case FSNOTIFY_EVENT_NS:
> +               return data;
> +       default:
> +               return NULL;
> +       }
> +}
> +
>  static inline u64 fsnotify_data_mnt_id(const void *data, int data_type)
>  {
>         const struct fsnotify_mnt *mnt_data = fsnotify_data_mnt(data, data_type);
> @@ -456,6 +474,7 @@ enum fsnotify_iter_type {
>         FSNOTIFY_ITER_TYPE_PARENT,
>         FSNOTIFY_ITER_TYPE_INODE2,
>         FSNOTIFY_ITER_TYPE_MNTNS,
> +       FSNOTIFY_ITER_TYPE_USERNS,
>         FSNOTIFY_ITER_TYPE_COUNT
>  };
>
> @@ -466,6 +485,7 @@ enum fsnotify_obj_type {
>         FSNOTIFY_OBJ_TYPE_VFSMOUNT,
>         FSNOTIFY_OBJ_TYPE_SB,
>         FSNOTIFY_OBJ_TYPE_MNTNS,
> +       FSNOTIFY_OBJ_TYPE_USERNS,
>         FSNOTIFY_OBJ_TYPE_COUNT,
>         FSNOTIFY_OBJ_TYPE_DETACHED = FSNOTIFY_OBJ_TYPE_COUNT
>  };
> @@ -657,6 +677,8 @@ extern void __fsnotify_mntns_delete(struct mnt_namespace *mntns);
>  extern void fsnotify_sb_free(struct super_block *sb);
>  extern u32 fsnotify_get_cookie(void);
>  extern void fsnotify_mnt(__u32 mask, struct mnt_namespace *ns, struct vfsmount *mnt);
> +extern void fsnotify_ns(__u32 mask, struct user_namespace *userns,
> +                       u64 self_nsid, u64 owner_nsid);
>
>  static inline __u32 fsnotify_parent_needed_mask(__u32 mask)
>  {
> diff --git a/include/linux/proc_fs.h b/include/linux/proc_fs.h
> index 19d1c5e5f3350..3b7d2bc88ae6c 100644
> --- a/include/linux/proc_fs.h
> +++ b/include/linux/proc_fs.h
> @@ -248,4 +248,6 @@ static inline struct pid_namespace *proc_pid_ns(struct super_block *sb)
>
>  bool proc_ns_file(const struct file *file);
>
> +struct user_namespace *userns_from_dentry(struct dentry *dentry);
> +
>  #endif /* _LINUX_PROC_FS_H */
> diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
> index 9c3be157397e0..7ff8420495308 100644
> --- a/include/linux/user_namespace.h
> +++ b/include/linux/user_namespace.h
> @@ -13,6 +13,8 @@
>  #include <linux/sysctl.h>
>  #include <linux/err.h>
>
> +struct fsnotify_mark_connector;
> +
>  #define UID_GID_MAP_MAX_BASE_EXTENTS 5
>  #define UID_GID_MAP_MAX_EXTENTS 340
>
> @@ -86,6 +88,10 @@ struct user_namespace {
>         /* parent_could_setfcap: true if the creator if this ns had CAP_SETFCAP
>          * in its effective capability set at the child ns creation time. */
>         bool                    parent_could_setfcap;
> +#ifdef CONFIG_FSNOTIFY
> +       __u32 n_fsnotify_mask;
> +       struct fsnotify_mark_connector __rcu *n_fsnotify_marks;
> +#endif
>
>  #ifdef CONFIG_KEYS
>         /* List of joinable keyrings in this namespace.  Modification access of
> diff --git a/include/uapi/linux/fanotify.h b/include/uapi/linux/fanotify.h
> index e710967c7c263..6b4f470ee7e01 100644
> --- a/include/uapi/linux/fanotify.h
> +++ b/include/uapi/linux/fanotify.h
> @@ -67,6 +67,7 @@
>  #define FAN_REPORT_TARGET_FID  0x00001000      /* Report dirent target id  */
>  #define FAN_REPORT_FD_ERROR    0x00002000      /* event->fd can report error */
>  #define FAN_REPORT_MNT         0x00004000      /* Report mount events */
> +#define FAN_REPORT_NSID                0x00008000      /* Report namespace events */
>
>  /* Convenience macro - FAN_REPORT_NAME requires FAN_REPORT_DIR_FID */
>  #define FAN_REPORT_DFID_NAME   (FAN_REPORT_DIR_FID | FAN_REPORT_NAME)
> @@ -98,6 +99,7 @@
>  #define FAN_MARK_MOUNT         0x00000010
>  #define FAN_MARK_FILESYSTEM    0x00000100
>  #define FAN_MARK_MNTNS         0x00000110
> +#define FAN_MARK_USERNS                0x00001000
>
>  /*
>   * Convenience macro - FAN_MARK_IGNORE requires FAN_MARK_IGNORED_SURV_MODIFY
> @@ -152,6 +154,7 @@ struct fanotify_event_metadata {
>  #define FAN_EVENT_INFO_TYPE_ERROR      5
>  #define FAN_EVENT_INFO_TYPE_RANGE      6
>  #define FAN_EVENT_INFO_TYPE_MNT                7
> +#define FAN_EVENT_INFO_TYPE_NS         8
>
>  /* Special info types for FAN_RENAME */
>  #define FAN_EVENT_INFO_TYPE_OLD_DFID_NAME      10
> @@ -210,6 +213,12 @@ struct fanotify_event_info_mnt {
>         __u64 mnt_id;
>  };
>
> +struct fanotify_event_info_ns {
> +       struct fanotify_event_info_header hdr;
> +       __u64 self_nsid;        /* ns_id of the namespace */
> +       __u64 owner_nsid;       /* ns_id of its owning user namespace */
> +};
> +
>  /*
>   * User space may need to record additional information about its decision.
>   * The extra information type records what kind of information is included.
> diff --git a/kernel/nscommon.c b/kernel/nscommon.c
> index 3166c1fd844af..a6fdacb394ea7 100644
> --- a/kernel/nscommon.c
> +++ b/kernel/nscommon.c
> @@ -6,6 +6,7 @@
>  #include <linux/proc_ns.h>
>  #include <linux/user_namespace.h>
>  #include <linux/vfsdebug.h>
> +#include <linux/fsnotify_backend.h>
>
>  #ifdef CONFIG_DEBUG_VFS
>  static void ns_debug(struct ns_common *ns, const struct proc_ns_operations *ops)
> @@ -111,6 +112,43 @@ struct ns_common *__must_check ns_owner(struct ns_common *ns)
>         return to_ns_common(owner);
>  }
>
> +/*
> + * Return the owning user_namespace of @ns, including init_user_ns.
> + * Unlike ns_owner(), which returns NULL for namespaces owned by
> + * init_user_ns (to serve as a propagation terminator), this gives us
> + * the real owner for notification routing.
> + */
> +static struct user_namespace *ns_direct_owner(struct ns_common *ns)
> +{
> +       if (unlikely(!ns->ops || !ns->ops->owner))
> +               return NULL;
> +       return ns->ops->owner(ns);
> +}
> +
> +static void ns_common_notify(__u32 mask, struct ns_common *ns)
> +{
> +       struct user_namespace *owner_userns;
> +
> +       if (!IS_ENABLED(CONFIG_FSNOTIFY))
> +               return;
> +
> +       owner_userns = ns_direct_owner(ns);
> +       if (!owner_userns)
> +               return;
> +
> +#ifdef CONFIG_FSNOTIFY
> +       /*
> +        * READ_ONCE macro expansion does not understand that this code
> +        * is not reachable without CONFIG_FSNOTIFY.
> +        */
> +       if (!READ_ONCE(owner_userns->n_fsnotify_marks))
> +               return;
> +#endif
> +
> +       fsnotify_ns(mask, owner_userns, ns->ns_id,
> +                   to_ns_common(owner_userns)->ns_id);
> +}
> +
>  /*
>   * The active reference count works by having each namespace that gets
>   * created take a single active reference on its owning user namespace.
> @@ -172,6 +210,8 @@ void __ns_ref_active_put(struct ns_common *ns)
>                 return;
>         }
>
> +       ns_common_notify(FS_DELETE, ns);
> +
>         VFS_WARN_ON_ONCE(is_ns_init_id(ns));
>         VFS_WARN_ON_ONCE(!__ns_ref_read(ns));
>
> @@ -184,6 +224,8 @@ void __ns_ref_active_put(struct ns_common *ns)
>                         VFS_WARN_ON_ONCE(__ns_ref_active_read(ns) < 0);
>                         return;
>                 }
> +
> +               ns_common_notify(FS_DELETE, ns);
>         }
>  }
>
> @@ -293,6 +335,8 @@ void __ns_ref_active_get(struct ns_common *ns)
>         if (likely(prev))
>                 return;
>
> +       ns_common_notify(FS_CREATE, ns);
> +
>         /*
>          * We did resurrect it. Walk the ownership hierarchy upwards
>          * until we found an owning user namespace that is active.
> @@ -307,6 +351,8 @@ void __ns_ref_active_get(struct ns_common *ns)
>                 VFS_WARN_ON_ONCE(prev < 0);
>                 if (likely(prev))
>                         return;
> +
> +               ns_common_notify(FS_CREATE, ns);
>         }
>  }
>
> --
> 2.53.0
>

FYI, this patch is missing fsnotify_destroy_marks(&userns->n_fsnotify_marks)
on free_user_ns().

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC][PATCH 0/5] fanotify namespace monitoring
  2026-03-09 15:47   ` Amir Goldstein
@ 2026-03-10 10:31     ` Christian Brauner
  2026-03-10 11:14       ` Amir Goldstein
  0 siblings, 1 reply; 12+ messages in thread
From: Christian Brauner @ 2026-03-10 10:31 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Jan Kara, Lennart Poettering, Tejun Heo, T . J . Mercier,
	linux-fsdevel

On Mon, Mar 09, 2026 at 04:47:20PM +0100, Amir Goldstein wrote:
> On Mon, Mar 9, 2026 at 1:33 PM Christian Brauner <brauner@kernel.org> wrote:
> >
> > On Sat, Mar 07, 2026 at 12:05:45PM +0100, Amir Goldstein wrote:
> > > Jan,
> > >
> > > Similar to mount notifications and listmount(), this is the complementary
> > > part of listns().
> > >
> > > The discussion about FAN_DELETE_SELF events for kernfs [1] for cgroup
> > > tree monitoring got me thinking that this sort of monitoring should not be
> > > tied to vfs inodes.
> > >
> > > Monitoring the cgroups tree has some semantic nuances, but I am told by
> > > Christian, that similar requirement exists for monitoring namepsace tree,
> > > where the semantics w.r.t userns are more clear.
> > >
> > > I prepared this RFC to see if it meets the requirements of userspace
> > > and think if that works, the solution could be extended to monitoring
> > > cgroup trees.
> > >
> > > IMO monitoring namespace trees and monitoring filesystem objects do not
> > > need to be mixed in the same fanotify group, so I wanted to try using
> > > the high 32bits for event flags rather than wasting more event flags
> > > in low 32bit. I remember that I wanted to so that for mount monitoring
> > > events, but did not insist, so too bad.
> > >
> > > However, the code for using the high 32bit in uapi is quite ugly and
> > > hackish ATM, so I kept it as a separate patch, that we can either throw
> > > away or improve later.
> > >
> > > Christian/Lennart,
> > >
> > > I had considered if doing "recursive watches" to get all events from
> > > descendant namepsaces is worth while and decided with myself that it was
> > > not.
> > >
> > > Please let me know if this UAPI meets your requirements.
> >
> > I think this looks great overall and is very useful as it allows to
> > monitor namespace events outside of bpf lsms. I agree with the
> > non-recursive design. You could generalize this approach by deriving the
> > watch from the namespace file descriptor? Then you can get notifications
> > for all types of namespaces.
> 
> Not sure what you mean?
> Which type of notifications?
> This RFC generates notifications for all types of namespaces created/deleted
> under the watched userns.

I misunderstood that part. Yes, that's really useful as is.

> 
> Which notifications did you intend to watch for other types of ns?
> DELETE_SELF?
> 
> That would be easy to add.
> Would just need to move the n_fsnotify_marks/mask to struct ns_common
> (also from mnt_namespace).

Yes, I was thinking about a way to monitor namespace destruction for an
arbitrary namespace.

> 
> >
> > If we ever want recursive watches, then we just need to add a separate
> > flag. This is only applicable to userns and pidns anyway.
> 
> Yes, if we wanted to.
> 
> >
> > I want to put another - crazier idea - in your head: Since pidfds are
> > file descriptors and now have the ability to persist information past
> > pidfd closure via struct pid->attr it is possible to allow fanotify
> > watches on pidfds.
> >
> > I think that this opens up a crazy amount of possibilities that will be
> > tremendously useful - also would mean fsnotify outside of fs/ proper.
> > Just thinking on the spot: if you allow marking a pidfd it's super easy
> > to plumb exec notifications via fanotify on top of it. It's also easy to
> > monitor _all_ namespace events for a specific process via pidfds.
> 
> Anything's possible, but we need to make sure it's worth it.
> Aren't there already enough ways to monitor a process via ptrace/landlock?

Oh god, no. ;)
ptrace() is way too heavy handed for usages outside of debugging tools
such as strace and also has the downside that only one ptracer can be
attached at any time. I have no idea about landlock but it is an LSM and
I don't know why an LSM should have such notification abilities.

fanotify on the other has no odd limitations and it naturally ties in
with filesystem objects and with pidfs we've made processes filesystem
objects. So I think this is a very natural fit.

So you could just register a pidfd with fanotify and then get notified
about exec events for that pidfd with arbitrary metadata attached to it
if need. I've wanted this for a long time and one way I thought about
going about this is by somehow plubming this into poll for pidfd but
that's not going to scale as we grow more notifications events for
pidfds.

I think fanotify is the perfect fit for this with its extensible model.

> 
> >
> > This obviously needs some thinking wrt security etc but I just want to
> > put the thought out there that the integration of pidfds and fanotify is
> > possible.
> 
> I was thinking more about watching the entire pidfs space, but sure.

That's certainly possible too. Initially it would have to be scoped to
global admin privileges but ultimately you could probably filter by
credentials - if needed.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC][PATCH 0/5] fanotify namespace monitoring
  2026-03-10 10:31     ` Christian Brauner
@ 2026-03-10 11:14       ` Amir Goldstein
  2026-03-16 10:05         ` Jan Kara
  0 siblings, 1 reply; 12+ messages in thread
From: Amir Goldstein @ 2026-03-10 11:14 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Jan Kara, Lennart Poettering, Tejun Heo, T . J . Mercier,
	linux-fsdevel

On Tue, Mar 10, 2026 at 11:31 AM Christian Brauner <brauner@kernel.org> wrote:
>
> On Mon, Mar 09, 2026 at 04:47:20PM +0100, Amir Goldstein wrote:
> > On Mon, Mar 9, 2026 at 1:33 PM Christian Brauner <brauner@kernel.org> wrote:
> > >
> > > On Sat, Mar 07, 2026 at 12:05:45PM +0100, Amir Goldstein wrote:
> > > > Jan,
> > > >
> > > > Similar to mount notifications and listmount(), this is the complementary
> > > > part of listns().
> > > >
> > > > The discussion about FAN_DELETE_SELF events for kernfs [1] for cgroup
> > > > tree monitoring got me thinking that this sort of monitoring should not be
> > > > tied to vfs inodes.
> > > >
> > > > Monitoring the cgroups tree has some semantic nuances, but I am told by
> > > > Christian, that similar requirement exists for monitoring namepsace tree,
> > > > where the semantics w.r.t userns are more clear.
> > > >
> > > > I prepared this RFC to see if it meets the requirements of userspace
> > > > and think if that works, the solution could be extended to monitoring
> > > > cgroup trees.
> > > >
> > > > IMO monitoring namespace trees and monitoring filesystem objects do not
> > > > need to be mixed in the same fanotify group, so I wanted to try using
> > > > the high 32bits for event flags rather than wasting more event flags
> > > > in low 32bit. I remember that I wanted to so that for mount monitoring
> > > > events, but did not insist, so too bad.
> > > >
> > > > However, the code for using the high 32bit in uapi is quite ugly and
> > > > hackish ATM, so I kept it as a separate patch, that we can either throw
> > > > away or improve later.
> > > >
> > > > Christian/Lennart,
> > > >
> > > > I had considered if doing "recursive watches" to get all events from
> > > > descendant namepsaces is worth while and decided with myself that it was
> > > > not.
> > > >
> > > > Please let me know if this UAPI meets your requirements.
> > >
> > > I think this looks great overall and is very useful as it allows to
> > > monitor namespace events outside of bpf lsms. I agree with the
> > > non-recursive design. You could generalize this approach by deriving the
> > > watch from the namespace file descriptor? Then you can get notifications
> > > for all types of namespaces.
> >
> > Not sure what you mean?
> > Which type of notifications?
> > This RFC generates notifications for all types of namespaces created/deleted
> > under the watched userns.
>
> I misunderstood that part. Yes, that's really useful as is.
>
> >
> > Which notifications did you intend to watch for other types of ns?
> > DELETE_SELF?
> >
> > That would be easy to add.
> > Would just need to move the n_fsnotify_marks/mask to struct ns_common
> > (also from mnt_namespace).
>
> Yes, I was thinking about a way to monitor namespace destruction for an
> arbitrary namespace.
>
> >
> > >
> > > If we ever want recursive watches, then we just need to add a separate
> > > flag. This is only applicable to userns and pidns anyway.
> >
> > Yes, if we wanted to.
> >
> > >
> > > I want to put another - crazier idea - in your head: Since pidfds are
> > > file descriptors and now have the ability to persist information past
> > > pidfd closure via struct pid->attr it is possible to allow fanotify
> > > watches on pidfds.
> > >
> > > I think that this opens up a crazy amount of possibilities that will be
> > > tremendously useful - also would mean fsnotify outside of fs/ proper.
> > > Just thinking on the spot: if you allow marking a pidfd it's super easy
> > > to plumb exec notifications via fanotify on top of it. It's also easy to
> > > monitor _all_ namespace events for a specific process via pidfds.
> >
> > Anything's possible, but we need to make sure it's worth it.
> > Aren't there already enough ways to monitor a process via ptrace/landlock?
>
> Oh god, no. ;)
> ptrace() is way too heavy handed for usages outside of debugging tools
> such as strace and also has the downside that only one ptracer can be
> attached at any time. I have no idea about landlock but it is an LSM and
> I don't know why an LSM should have such notification abilities.
>
> fanotify on the other has no odd limitations and it naturally ties in
> with filesystem objects and with pidfs we've made processes filesystem
> objects. So I think this is a very natural fit.
>
> So you could just register a pidfd with fanotify and then get notified
> about exec events for that pidfd with arbitrary metadata attached to it
> if need. I've wanted this for a long time and one way I thought about
> going about this is by somehow plubming this into poll for pidfd but
> that's not going to scale as we grow more notifications events for
> pidfds.
>
> I think fanotify is the perfect fit for this with its extensible model.
>

I have no objection to utilizing fanotify beyond fs boundaries,
but if we do that, I think we need to carefully partition the event flags
namespace.

As I wrote in the cover letter, I see no reason for FAN_MARK_FILESYSTEM
watch to be in the same group (fanotifyfd) as FAN_MARK_USERNS or
FAN_MARK_MNTNS for than matter.

I would rather use something like FAN_CLASS_PROC classification
to explicitly say "this fanotifyfd is only for process/namespace notifications"
and in this context user would use completely different event constants.

OTOH, maybe it will be less confusing to add new syscalls
pnotify_init()/pnotify_mark() as a frontend to this separate class of events.
If for no other reason, then because fanotify_mark(2) man page is
getting out of control ;)

> >
> > >
> > > This obviously needs some thinking wrt security etc but I just want to
> > > put the thought out there that the integration of pidfds and fanotify is
> > > possible.
> >
> > I was thinking more about watching the entire pidfs space, but sure.
>
> That's certainly possible too. Initially it would have to be scoped to
> global admin privileges but ultimately you could probably filter by
> credentials - if needed.

Or watch all processes in a pidns.

The leading rule is if the object (e.g. pidns) is accessible in
the context of the event, then we could setup a watch on the object.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC][PATCH 0/5] fanotify namespace monitoring
  2026-03-10 11:14       ` Amir Goldstein
@ 2026-03-16 10:05         ` Jan Kara
  0 siblings, 0 replies; 12+ messages in thread
From: Jan Kara @ 2026-03-16 10:05 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Christian Brauner, Jan Kara, Lennart Poettering, Tejun Heo,
	T . J . Mercier, linux-fsdevel

Sorry for delayed reply, I was on vacation most of last week.

On Tue 10-03-26 12:14:45, Amir Goldstein wrote:
> On Tue, Mar 10, 2026 at 11:31 AM Christian Brauner <brauner@kernel.org> wrote:
> > On Mon, Mar 09, 2026 at 04:47:20PM +0100, Amir Goldstein wrote:
> > > On Mon, Mar 9, 2026 at 1:33 PM Christian Brauner <brauner@kernel.org> wrote:
> > > > If we ever want recursive watches, then we just need to add a separate
> > > > flag. This is only applicable to userns and pidns anyway.
> > >
> > > Yes, if we wanted to.
> > >
> > > >
> > > > I want to put another - crazier idea - in your head: Since pidfds are
> > > > file descriptors and now have the ability to persist information past
> > > > pidfd closure via struct pid->attr it is possible to allow fanotify
> > > > watches on pidfds.
> > > >
> > > > I think that this opens up a crazy amount of possibilities that will be
> > > > tremendously useful - also would mean fsnotify outside of fs/ proper.
> > > > Just thinking on the spot: if you allow marking a pidfd it's super easy
> > > > to plumb exec notifications via fanotify on top of it. It's also easy to
> > > > monitor _all_ namespace events for a specific process via pidfds.
> > >
> > > Anything's possible, but we need to make sure it's worth it.
> > > Aren't there already enough ways to monitor a process via ptrace/landlock?
> >
> > Oh god, no. ;)
> > ptrace() is way too heavy handed for usages outside of debugging tools
> > such as strace and also has the downside that only one ptracer can be
> > attached at any time. I have no idea about landlock but it is an LSM and
> > I don't know why an LSM should have such notification abilities.
> >
> > fanotify on the other has no odd limitations and it naturally ties in
> > with filesystem objects and with pidfs we've made processes filesystem
> > objects. So I think this is a very natural fit.
> >
> > So you could just register a pidfd with fanotify and then get notified
> > about exec events for that pidfd with arbitrary metadata attached to it
> > if need. I've wanted this for a long time and one way I thought about
> > going about this is by somehow plubming this into poll for pidfd but
> > that's not going to scale as we grow more notifications events for
> > pidfds.
> >
> > I think fanotify is the perfect fit for this with its extensible model.
> >
> 
> I have no objection to utilizing fanotify beyond fs boundaries,
> but if we do that, I think we need to carefully partition the event flags
> namespace.
> 
> As I wrote in the cover letter, I see no reason for FAN_MARK_FILESYSTEM
> watch to be in the same group (fanotifyfd) as FAN_MARK_USERNS or
> FAN_MARK_MNTNS for than matter.
> 
> I would rather use something like FAN_CLASS_PROC classification
> to explicitly say "this fanotifyfd is only for process/namespace notifications"
> and in this context user would use completely different event constants.
> 
> OTOH, maybe it will be less confusing to add new syscalls
> pnotify_init()/pnotify_mark() as a frontend to this separate class of events.
> If for no other reason, then because fanotify_mark(2) man page is
> getting out of control ;)

I agree fanotify may be a reasonable framework for generating these
kind of events in the kernel and delivering them to userspace but I also
think we should have a clear separation in the API between standard
filesystem notification events and watching of these special filesystems
for special events. Because the semantics of standard filesystem events
doesn't always necessarily match the semantics we need from these events
for special filesystems and with special events such as exec notification
we are clearly deviating from standard filesystem events even further.

We already kind of started on this path with mount event notifications.
There we've chosen FAN_REPORT_MNT for separating groups that receive mount
event notifications - so a group can either get mount notifications or
standard filesystem events. So with namespaces notification events it would
be kind of natural to continue with this tradition with other FAN_REPORT_*
flags (we could possibly use several bits of this flag space as an enum to
select type of events to receive).

I think using new FAN_CLASS would be also possible but it doesn't quite fit
the current semantic of classes in my opinion (which determine group
priority when delivering events and are kind of linearly ordered where
higher class allows you to do more). So IMHO it would be somewhat
confusing if we introduced new class that would be completely separate.

Regarding hiding the new notification group types behind new syscalls
(instead of flags bits) - I'm not strictly opposed to that but my current
thinking is it would be a bit of an overkill. Adding new syscalls is
somewhat annoying as far as I remember and (somewhat more importantly) we'd
have to explain to users big part of the mechanics of the fanotify API
anyway so the amount of information user has to digest to be able to use
the API will not be significantly smaller I'm afraid. So I think we might
just introduce something like "notification group type" - filesystem,
mount, process, ... - and clearly separate which group type can do what.
That could make things more comprehensible and easier to find in the
manpage.

Regarding events, since we are kind of short on event bits (7 bits left if
I'm counting right), I agree we could make the event spaces for these
"special type" groups separate and overlapping - i.e., the meaning of the
event bit would depend on the type of the group. That means 'mask' isn't
enough to determine which group should receive the event anymore but I
think fsnotify() could determine the type of group for which the event is
destined based on data_type.

This way I think fsnotify+fanotify could be extended for other uses.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2026-03-16 16:38 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-07 11:05 [RFC][PATCH 0/5] fanotify namespace monitoring Amir Goldstein
2026-03-07 11:05 ` [RFC][PATCH 1/5] fanotify: add support for watching the namespaces tree Amir Goldstein
2026-03-09 18:07   ` Amir Goldstein
2026-03-07 11:05 ` [RFC][PATCH 2/5] fanotify: use high bits for FAN_NS_CREATE/FAN_NS_DELETE Amir Goldstein
2026-03-07 11:05 ` [RFC][PATCH 3/5] selftests/filesystems: create fanotify test dir Amir Goldstein
2026-03-07 11:05 ` [RFC][PATCH 4/5] filesystems/statmount: update mount.h in tools include dir Amir Goldstein
2026-03-07 11:05 ` [RFC][PATCH 5/5] selftests/filesystems: add fanotify namespace notifications test Amir Goldstein
2026-03-09 12:33 ` [RFC][PATCH 0/5] fanotify namespace monitoring Christian Brauner
2026-03-09 15:47   ` Amir Goldstein
2026-03-10 10:31     ` Christian Brauner
2026-03-10 11:14       ` Amir Goldstein
2026-03-16 10:05         ` Jan Kara

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox