* [PATCH v2 00/10] fanotify namespace monitoring
@ 2026-04-24 17:04 Amir Goldstein
2026-04-24 17:04 ` [PATCH v2 01/10] fsnotify: rename fsnotify group flag macros Amir Goldstein
` (9 more replies)
0 siblings, 10 replies; 11+ messages in thread
From: Amir Goldstein @ 2026-04-24 17:04 UTC (permalink / raw)
To: Jan Kara; +Cc: Christian Brauner, linux-fsdevel
Jan,
Following your feedback from v1 [1] review, I've made the changes
to clear the way for reusing the fs watcher event bits for ns watcher
event bits.
The terminology of "ns watcher" vs. "ns events" is a little confusing:
- "ns watcher" group can place mark on ns objects with
mntns/userns marks
- The events that can be requested by ns watcher are
mount (tree monitoring) and ns (tree monitoring) events
- We could imagine requesting all mount events of all mntns owned
by a specific userns, but this was not implemented
- "fs watcher" group can place mark on fs objects with
inode/mnt/sb marks
- The events that can be requested by fs watcher are
fs (monitoring, permission and pre-content) events
To simplify the implementation, the event flags (ON_CHILD, ISDIR)
live in a shared space that cannot be overloaded by neither group types.
This is not because ISDIR makes sense for ns watcher, just to reduce the
number of gates in common code. ON_CHILD flag might be usable for ns
watchers, not sure.
Thanks,
Amir.
Changes since v1:
- Introduce group type and gates
- FAN_NS_CREATE/FAN_NS_DELETE overload FAN_CREATE/FAN_DELETE in uapi
instead of using high 32bit
[1] https://lore.kernel.org/linux-fsdevel/20260307110550.373762-1-amir73il@gmail.com/
Amir Goldstein (10):
fsnotify: rename fsnotify group flag macros
fsnotify: introduce fsnotify group types
fsnotify: separate the events bitmask macros by group type
fanotify: test event->type instead of event mask when possible
fsnotify: do not report mount events with fsnotify()
fanotify: gate fs event classification by group type
fanotify: gate fs events checks in fanotify_mark() by group type
fanotify: add support for watching the namespaces tree
selftests/filesystems: create fanotify test dir
selftests/filesystems: add fanotify namespace notifications test
fs/notify/fanotify/fanotify.c | 141 ++++++--
fs/notify/fanotify/fanotify.h | 62 +++-
fs/notify/fanotify/fanotify_user.c | 218 +++++++++---
fs/notify/fdinfo.c | 9 +-
fs/notify/fsnotify.c | 123 +++++--
fs/notify/fsnotify.h | 12 +
fs/notify/group.c | 14 +-
fs/notify/inotify/inotify_user.c | 2 +-
fs/notify/mark.c | 9 +-
fs/nsfs.c | 21 ++
include/linux/fanotify.h | 40 ++-
include/linux/fsnotify.h | 5 +
include/linux/fsnotify_backend.h | 108 ++++--
include/linux/proc_fs.h | 2 +
include/linux/user_namespace.h | 6 +
include/uapi/linux/fanotify.h | 37 +-
kernel/audit_fsnotify.c | 2 +-
kernel/nscommon.c | 47 +++
kernel/user_namespace.c | 2 +
tools/include/uapi/linux/fanotify.h | 37 +-
tools/testing/selftests/Makefile | 2 +-
.../{mount-notify => fanotify}/.gitignore | 0
.../{mount-notify => fanotify}/Makefile | 3 +-
.../mount-notify_test.c | 0
.../mount-notify_test_ns.c | 0
.../filesystems/fanotify/ns-notify_test.c | 330 ++++++++++++++++++
26 files changed, 1045 insertions(+), 187 deletions(-)
rename tools/testing/selftests/filesystems/{mount-notify => fanotify}/.gitignore (100%)
rename tools/testing/selftests/filesystems/{mount-notify => fanotify}/Makefile (67%)
rename tools/testing/selftests/filesystems/{mount-notify => fanotify}/mount-notify_test.c (100%)
rename tools/testing/selftests/filesystems/{mount-notify => fanotify}/mount-notify_test_ns.c (100%)
create mode 100644 tools/testing/selftests/filesystems/fanotify/ns-notify_test.c
--
2.54.0
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH v2 01/10] fsnotify: rename fsnotify group flag macros
2026-04-24 17:04 [PATCH v2 00/10] fanotify namespace monitoring Amir Goldstein
@ 2026-04-24 17:04 ` Amir Goldstein
2026-04-24 17:04 ` [PATCH v2 02/10] fsnotify: introduce fsnotify group types Amir Goldstein
` (8 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: Amir Goldstein @ 2026-04-24 17:04 UTC (permalink / raw)
To: Jan Kara; +Cc: Christian Brauner, linux-fsdevel
Use more explicit FSNOTIFY_GROUP_FLAG_ prefix.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
fs/notify/fanotify/fanotify_user.c | 2 +-
fs/notify/group.c | 4 ++--
fs/notify/inotify/inotify_user.c | 2 +-
fs/notify/mark.c | 2 +-
include/linux/fsnotify_backend.h | 4 ++--
kernel/audit_fsnotify.c | 2 +-
6 files changed, 8 insertions(+), 8 deletions(-)
diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
index ae904451dfc09..49531a4fe71de 100644
--- a/fs/notify/fanotify/fanotify_user.c
+++ b/fs/notify/fanotify/fanotify_user.c
@@ -1682,7 +1682,7 @@ SYSCALL_DEFINE2(fanotify_init, unsigned int, flags, unsigned int, event_f_flags)
f_flags |= O_NONBLOCK;
CLASS(fsnotify_group, group)(&fanotify_fsnotify_ops,
- FSNOTIFY_GROUP_USER);
+ FSNOTIFY_GROUP_FLAG_USER);
/* fsnotify_alloc_group takes a ref. Dropped in fanotify_release */
if (IS_ERR(group))
return PTR_ERR(group);
diff --git a/fs/notify/group.c b/fs/notify/group.c
index b56d1c1d9644a..cf445e270ba6a 100644
--- a/fs/notify/group.c
+++ b/fs/notify/group.c
@@ -145,8 +145,8 @@ static struct fsnotify_group *__fsnotify_alloc_group(
struct fsnotify_group *fsnotify_alloc_group(const struct fsnotify_ops *ops,
int flags)
{
- gfp_t gfp = (flags & FSNOTIFY_GROUP_USER) ? GFP_KERNEL_ACCOUNT :
- GFP_KERNEL;
+ gfp_t gfp = (flags & FSNOTIFY_GROUP_FLAG_USER) ?
+ GFP_KERNEL_ACCOUNT : GFP_KERNEL;
return __fsnotify_alloc_group(ops, flags, gfp);
}
diff --git a/fs/notify/inotify/inotify_user.c b/fs/notify/inotify/inotify_user.c
index ed37491c16189..3b59340284922 100644
--- a/fs/notify/inotify/inotify_user.c
+++ b/fs/notify/inotify/inotify_user.c
@@ -657,7 +657,7 @@ static struct fsnotify_group *inotify_new_group(unsigned int max_events)
struct inotify_event_info *oevent;
group = fsnotify_alloc_group(&inotify_fsnotify_ops,
- FSNOTIFY_GROUP_USER);
+ FSNOTIFY_GROUP_FLAG_USER);
if (IS_ERR(group))
return group;
diff --git a/fs/notify/mark.c b/fs/notify/mark.c
index e256b420100dc..961475090f088 100644
--- a/fs/notify/mark.c
+++ b/fs/notify/mark.c
@@ -908,7 +908,7 @@ static int fsnotify_add_mark_list(struct fsnotify_mark *mark, void *obj,
if ((lmark->group == mark->group) &&
(lmark->flags & FSNOTIFY_MARK_FLAG_ATTACHED) &&
- !(mark->group->flags & FSNOTIFY_GROUP_DUPS)) {
+ !(mark->group->flags & FSNOTIFY_GROUP_FLAG_DUPS)) {
err = -EEXIST;
goto out_err;
}
diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
index e5cde39d6e85d..87524706792e7 100644
--- a/include/linux/fsnotify_backend.h
+++ b/include/linux/fsnotify_backend.h
@@ -232,8 +232,8 @@ struct fsnotify_group {
enum fsnotify_group_prio priority; /* priority for sending events */
bool shutdown; /* group is being shut down, don't queue more events */
-#define FSNOTIFY_GROUP_USER 0x01 /* user allocated group */
-#define FSNOTIFY_GROUP_DUPS 0x02 /* allow multiple marks per object */
+#define FSNOTIFY_GROUP_FLAG_USER 0x01 /* user allocated group */
+#define FSNOTIFY_GROUP_FLAG_DUPS 0x02 /* allow multiple marks per object */
int flags;
unsigned int owner_flags; /* stored flags of mark_mutex owner */
diff --git a/kernel/audit_fsnotify.c b/kernel/audit_fsnotify.c
index 711454f9f7242..82216887a19b2 100644
--- a/kernel/audit_fsnotify.c
+++ b/kernel/audit_fsnotify.c
@@ -184,7 +184,7 @@ static const struct fsnotify_ops audit_mark_fsnotify_ops = {
static int __init audit_fsnotify_init(void)
{
audit_fsnotify_group = fsnotify_alloc_group(&audit_mark_fsnotify_ops,
- FSNOTIFY_GROUP_DUPS);
+ FSNOTIFY_GROUP_FLAG_DUPS);
if (IS_ERR(audit_fsnotify_group)) {
audit_fsnotify_group = NULL;
audit_panic("cannot create audit fsnotify group");
--
2.54.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v2 02/10] fsnotify: introduce fsnotify group types
2026-04-24 17:04 [PATCH v2 00/10] fanotify namespace monitoring Amir Goldstein
2026-04-24 17:04 ` [PATCH v2 01/10] fsnotify: rename fsnotify group flag macros Amir Goldstein
@ 2026-04-24 17:04 ` Amir Goldstein
2026-04-24 17:04 ` [PATCH v2 03/10] fsnotify: separate the events bitmask macros by group type Amir Goldstein
` (7 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: Amir Goldstein @ 2026-04-24 17:04 UTC (permalink / raw)
To: Jan Kara; +Cc: Christian Brauner, linux-fsdevel
Currently an fanotify group can subscribe to events FAN_MNT_ATTACH/DETACH
only on mark type FAN_MARK_MNTNS and IFF the group was initialized with
FAN_REPORT_MNT.
Hence, mount events can not be mixed on the same group with filesystem
events and mntns marks cannot be mixed with filesystem type marks.
Define a new property for an fsnotify group called fsnotify_group_type
which describes the category of events and mark types and use a distinct
category FSNOTIFY_GROUP_TYPE_NS for the mount event groups.
Create helpers fsnotify_is_{fs,ns}_watcher() and the macro
FANOTIFY_NS_INIT_FLAGS and use them to generalize some code that was
checking the FAN_REPORT_MNT flag.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
fs/notify/fanotify/fanotify.c | 4 ++-
fs/notify/fanotify/fanotify_user.c | 40 ++++++++++++++++++++++--------
fs/notify/group.c | 10 +++++---
include/linux/fanotify.h | 8 ++++--
include/linux/fsnotify_backend.h | 24 ++++++++++++++++++
5 files changed, 68 insertions(+), 18 deletions(-)
diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c
index 38290b9c07f7b..e768c25262e27 100644
--- a/fs/notify/fanotify/fanotify.c
+++ b/fs/notify/fanotify/fanotify.c
@@ -314,9 +314,11 @@ static u32 fanotify_group_event_mask(struct fsnotify_group *group,
pr_debug("%s: report_mask=%x mask=%x data=%p data_type=%d\n",
__func__, iter_info->report_mask, event_mask, data, data_type);
- if (FAN_GROUP_FLAG(group, FAN_REPORT_MNT)) {
+ if (fsnotify_is_ns_watcher(group)) {
if (data_type != FSNOTIFY_EVENT_MNT)
return 0;
+ } else if (WARN_ON_ONCE(!fsnotify_is_fs_watcher(group))) {
+ return 0;
} else if (!fid_mode) {
/* Do we have path to open a file descriptor? */
if (!path)
diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
index 49531a4fe71de..95e7c24c70249 100644
--- a/fs/notify/fanotify/fanotify_user.c
+++ b/fs/notify/fanotify/fanotify_user.c
@@ -1586,8 +1586,11 @@ static struct hlist_head *fanotify_alloc_merge_hash(void)
DEFINE_CLASS(fsnotify_group,
struct fsnotify_group *,
if (!IS_ERR_OR_NULL(_T)) fsnotify_destroy_group(_T),
- fsnotify_alloc_group(ops, flags),
- const struct fsnotify_ops *ops, int flags)
+ __fsnotify_alloc_group(ops, type,
+ FSNOTIFY_GROUP_FLAG_USER,
+ GFP_KERNEL_ACCOUNT),
+ const struct fsnotify_ops *ops,
+ enum fsnotify_group_type type)
/* fanotify syscalls */
SYSCALL_DEFINE2(fanotify_init, unsigned int, flags, unsigned int, event_f_flags)
@@ -1597,18 +1600,22 @@ SYSCALL_DEFINE2(fanotify_init, unsigned int, flags, unsigned int, event_f_flags)
unsigned int fid_mode = flags & FANOTIFY_FID_BITS;
unsigned int class = flags & FANOTIFY_CLASS_BITS;
unsigned int internal_flags = 0;
+ /* A group is either for watching filesystems or namespaces */
+ enum fsnotify_group_type type = (flags & FANOTIFY_NS_INIT_FLAGS) ?
+ FSNOTIFY_GROUP_TYPE_NS :
+ FSNOTIFY_GROUP_TYPE_FS;
pr_debug("%s: flags=%x event_f_flags=%x\n",
__func__, flags, event_f_flags);
/*
* An unprivileged user can setup an fanotify group with limited
- * functionality - an unprivileged group is limited to notification
- * events with file handles or mount ids and it cannot use unlimited
+ * functionality - an unprivileged group cannot receive filesystem
+ * notification events with file descriptors and cannot use unlimited
* queue/marks.
*/
if (((flags & FANOTIFY_ADMIN_INIT_FLAGS) ||
- !(flags & (FANOTIFY_FID_BITS | FAN_REPORT_MNT))) &&
+ !(flags & (FANOTIFY_FID_BITS | FANOTIFY_NS_INIT_FLAGS))) &&
!capable(CAP_SYS_ADMIN))
return -EPERM;
@@ -1636,8 +1643,11 @@ SYSCALL_DEFINE2(fanotify_init, unsigned int, flags, unsigned int, event_f_flags)
if ((flags & FAN_REPORT_PIDFD) && (flags & FAN_REPORT_TID))
return -EINVAL;
- /* Don't allow mixing mnt events with inode events for now */
- if (flags & FAN_REPORT_MNT) {
+ /*
+ * Namespace watchers do not support priority classes and do not
+ * support reporting file info event extensions.
+ */
+ if (type == FSNOTIFY_GROUP_TYPE_NS) {
if (class != FAN_CLASS_NOTIF)
return -EINVAL;
if (flags & (FANOTIFY_FID_BITS | FAN_REPORT_FD_ERROR))
@@ -1681,8 +1691,7 @@ SYSCALL_DEFINE2(fanotify_init, unsigned int, flags, unsigned int, event_f_flags)
if (flags & FAN_NONBLOCK)
f_flags |= O_NONBLOCK;
- CLASS(fsnotify_group, group)(&fanotify_fsnotify_ops,
- FSNOTIFY_GROUP_FLAG_USER);
+ CLASS(fsnotify_group, group)(&fanotify_fsnotify_ops, type);
/* fsnotify_alloc_group takes a ref. Dropped in fanotify_release */
if (IS_ERR(group))
return PTR_ERR(group);
@@ -1885,6 +1894,7 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask,
unsigned int mark_type = flags & FANOTIFY_MARK_TYPE_BITS;
unsigned int mark_cmd = flags & FANOTIFY_MARK_CMD_BITS;
unsigned int ignore = flags & FANOTIFY_MARK_IGNORE_BITS;
+ enum fsnotify_group_type group_type;
unsigned int obj_type, fid_mode;
void *obj = NULL;
u32 umask = 0;
@@ -1903,15 +1913,19 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask,
switch (mark_type) {
case FAN_MARK_INODE:
obj_type = FSNOTIFY_OBJ_TYPE_INODE;
+ group_type = FSNOTIFY_GROUP_TYPE_FS;
break;
case FAN_MARK_MOUNT:
obj_type = FSNOTIFY_OBJ_TYPE_VFSMOUNT;
+ group_type = FSNOTIFY_GROUP_TYPE_FS;
break;
case FAN_MARK_FILESYSTEM:
obj_type = FSNOTIFY_OBJ_TYPE_SB;
+ group_type = FSNOTIFY_GROUP_TYPE_FS;
break;
case FAN_MARK_MNTNS:
obj_type = FSNOTIFY_OBJ_TYPE_MNTNS;
+ group_type = FSNOTIFY_GROUP_TYPE_NS;
break;
default:
return -EINVAL;
@@ -1960,6 +1974,9 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask,
return -EINVAL;
group = fd_file(f)->private_data;
+ if (group->type != group_type)
+ return -EINVAL;
+
/* Only report mount events on mnt namespace */
if (FAN_GROUP_FLAG(group, FAN_REPORT_MNT)) {
if (mask & ~FANOTIFY_MOUNT_EVENTS)
@@ -2005,14 +2022,15 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask,
return -EINVAL;
/*
- * Events that do not carry enough information to report
+ * Filesystem events that do not carry enough information to report
* event->fd require a group that supports reporting fid. Those
* events are not supported on a mount mark, because they do not
* carry enough information (i.e. path) to be filtered by mount
* point.
*/
fid_mode = FAN_GROUP_FLAG(group, FANOTIFY_FID_BITS);
- if (mask & ~(FANOTIFY_FD_EVENTS|FANOTIFY_MOUNT_EVENTS|FANOTIFY_EVENT_FLAGS) &&
+ if (fsnotify_is_fs_watcher(group) &&
+ mask & ~(FANOTIFY_FD_EVENTS|FANOTIFY_EVENT_FLAGS) &&
(!fid_mode || mark_type == FAN_MARK_MOUNT))
return -EINVAL;
diff --git a/fs/notify/group.c b/fs/notify/group.c
index cf445e270ba6a..754d87e3a0189 100644
--- a/fs/notify/group.c
+++ b/fs/notify/group.c
@@ -111,9 +111,9 @@ void fsnotify_put_group(struct fsnotify_group *group)
}
EXPORT_SYMBOL_GPL(fsnotify_put_group);
-static struct fsnotify_group *__fsnotify_alloc_group(
- const struct fsnotify_ops *ops,
- int flags, gfp_t gfp)
+struct fsnotify_group *__fsnotify_alloc_group(const struct fsnotify_ops *ops,
+ enum fsnotify_group_type type,
+ int flags, gfp_t gfp)
{
struct fsnotify_group *group;
@@ -135,9 +135,11 @@ static struct fsnotify_group *__fsnotify_alloc_group(
group->ops = ops;
group->flags = flags;
+ group->type = type;
return group;
}
+EXPORT_SYMBOL_GPL(__fsnotify_alloc_group);
/*
* Create a new fsnotify_group and hold a reference for the group returned.
@@ -148,7 +150,7 @@ struct fsnotify_group *fsnotify_alloc_group(const struct fsnotify_ops *ops,
gfp_t gfp = (flags & FSNOTIFY_GROUP_FLAG_USER) ?
GFP_KERNEL_ACCOUNT : GFP_KERNEL;
- return __fsnotify_alloc_group(ops, flags, gfp);
+ return __fsnotify_alloc_group(ops, FSNOTIFY_GROUP_TYPE_FS, flags, gfp);
}
EXPORT_SYMBOL_GPL(fsnotify_alloc_group);
diff --git a/include/linux/fanotify.h b/include/linux/fanotify.h
index 879cff5eccd4e..1a09400b843b5 100644
--- a/include/linux/fanotify.h
+++ b/include/linux/fanotify.h
@@ -25,7 +25,10 @@
#define FANOTIFY_FID_BITS (FAN_REPORT_DFID_NAME_TARGET)
-#define FANOTIFY_INFO_MODES (FANOTIFY_FID_BITS | FAN_REPORT_PIDFD | FAN_REPORT_MNT)
+#define FANOTIFY_INFO_MODES (FANOTIFY_FID_BITS | FAN_REPORT_PIDFD)
+
+/* fanotify_init() flags to create a namepsace event watcher */
+#define FANOTIFY_NS_INIT_FLAGS (FAN_REPORT_MNT)
/*
* fanotify_init() flags that require CAP_SYS_ADMIN.
@@ -47,7 +50,8 @@
* so one of the flags for reporting file handles is required.
*/
#define FANOTIFY_USER_INIT_FLAGS (FAN_CLASS_NOTIF | \
- FANOTIFY_FID_BITS | FAN_REPORT_MNT | \
+ FANOTIFY_FID_BITS | \
+ FANOTIFY_NS_INIT_FLAGS | \
FAN_CLOEXEC | FAN_NONBLOCK)
#define FANOTIFY_INIT_FLAGS (FANOTIFY_ADMIN_INIT_FLAGS | \
diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
index 87524706792e7..698fc75b0b6d4 100644
--- a/include/linux/fsnotify_backend.h
+++ b/include/linux/fsnotify_backend.h
@@ -204,6 +204,15 @@ enum fsnotify_group_prio {
__FSNOTIFY_PRIO_NUM
};
+/*
+ * Category of kernel objects watched by this group.
+ * Every category has its own mark types and event types.
+ */
+enum fsnotify_group_type {
+ FSNOTIFY_GROUP_TYPE_FS = 0,
+ FSNOTIFY_GROUP_TYPE_NS,
+};
+
/*
* A group is a "thing" that wants to receive notification about filesystem
* events. The mask holds the subset of event types this group cares about.
@@ -230,6 +239,7 @@ struct fsnotify_group {
unsigned int q_len; /* events on the queue */
unsigned int max_events; /* maximum events allowed on the list */
enum fsnotify_group_prio priority; /* priority for sending events */
+ enum fsnotify_group_type type; /* category of watched objects */
bool shutdown; /* group is being shut down, don't queue more events */
#define FSNOTIFY_GROUP_FLAG_USER 0x01 /* user allocated group */
@@ -280,6 +290,16 @@ struct fsnotify_group {
};
};
+static inline bool fsnotify_is_fs_watcher(const struct fsnotify_group *group)
+{
+ return group->type == FSNOTIFY_GROUP_TYPE_FS;
+}
+
+static inline bool fsnotify_is_ns_watcher(const struct fsnotify_group *group)
+{
+ return group->type == FSNOTIFY_GROUP_TYPE_NS;
+}
+
/*
* These helpers are used to prevent deadlock when reclaiming inodes with
* evictable marks of the same group that is allocating a new mark.
@@ -707,6 +727,10 @@ static inline void fsnotify_update_flags(struct dentry *dentry)
/* called from fsnotify listeners, such as fanotify or dnotify */
/* create a new group */
+extern struct fsnotify_group *__fsnotify_alloc_group(
+ const struct fsnotify_ops *ops,
+ enum fsnotify_group_type type,
+ int flags, gfp_t gfp);
extern struct fsnotify_group *fsnotify_alloc_group(
const struct fsnotify_ops *ops,
int flags);
--
2.54.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v2 03/10] fsnotify: separate the events bitmask macros by group type
2026-04-24 17:04 [PATCH v2 00/10] fanotify namespace monitoring Amir Goldstein
2026-04-24 17:04 ` [PATCH v2 01/10] fsnotify: rename fsnotify group flag macros Amir Goldstein
2026-04-24 17:04 ` [PATCH v2 02/10] fsnotify: introduce fsnotify group types Amir Goldstein
@ 2026-04-24 17:04 ` Amir Goldstein
2026-04-24 17:04 ` [PATCH v2 04/10] fanotify: test event->type instead of event mask when possible Amir Goldstein
` (6 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: Amir Goldstein @ 2026-04-24 17:04 UTC (permalink / raw)
To: Jan Kara; +Cc: Christian Brauner, linux-fsdevel
Fork the macros ALL_FSNOTIFY_EVENTS and FANOTIFY_EVENTS to
*_EVENTS_ON_{FS,NS} and use them to determine the valid mask for
fanotify_mark() and the outgoing mask by the group type.
Use the fanotify_is_valid_mask() helepr to simplifies the check for
reporting mount events IFF group was initialized with FAN_REPORT_MNT
and IFF watching an mntns object.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
fs/notify/fanotify/fanotify.c | 3 +-
fs/notify/fanotify/fanotify_user.c | 68 ++++++++++++++++++++----------
fs/notify/fsnotify.c | 6 +--
include/linux/fanotify.h | 26 ++++++++----
include/linux/fsnotify_backend.h | 42 ++++++++++++------
include/uapi/linux/fanotify.h | 25 ++++++++---
6 files changed, 113 insertions(+), 57 deletions(-)
diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c
index e768c25262e27..494bba49634e1 100644
--- a/fs/notify/fanotify/fanotify.c
+++ b/fs/notify/fanotify/fanotify.c
@@ -303,7 +303,7 @@ static u32 fanotify_group_event_mask(struct fsnotify_group *group,
struct inode *dir)
{
__u32 marks_mask = 0, marks_ignore_mask = 0;
- __u32 test_mask, user_mask = FANOTIFY_OUTGOING_EVENTS |
+ __u32 test_mask, user_mask = FANOTIFY_OUTGOING_FS_EVENTS |
FANOTIFY_EVENT_FLAGS;
const struct path *path = fsnotify_data_path(data, data_type);
unsigned int fid_mode = FAN_GROUP_FLAG(group, FANOTIFY_FID_BITS);
@@ -315,6 +315,7 @@ static u32 fanotify_group_event_mask(struct fsnotify_group *group,
__func__, iter_info->report_mask, event_mask, data, data_type);
if (fsnotify_is_ns_watcher(group)) {
+ user_mask = FANOTIFY_OUTGOING_NS_EVENTS;
if (data_type != FSNOTIFY_EVENT_MNT)
return 0;
} else if (WARN_ON_ONCE(!fsnotify_is_fs_watcher(group))) {
diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
index 95e7c24c70249..7afb017d40f50 100644
--- a/fs/notify/fanotify/fanotify_user.c
+++ b/fs/notify/fanotify/fanotify_user.c
@@ -830,6 +830,19 @@ static int copy_info_records_to_user(struct fanotify_event *event,
return total_bytes;
}
+static __u64 fanotify_event_mask(struct fsnotify_group *group,
+ struct fanotify_event *event)
+{
+ switch (group->type) {
+ case FSNOTIFY_GROUP_TYPE_FS:
+ return event->mask & FANOTIFY_OUTGOING_FS_EVENTS;
+ case FSNOTIFY_GROUP_TYPE_NS:
+ return event->mask & FANOTIFY_OUTGOING_NS_EVENTS;
+ }
+
+ return 0;
+}
+
static ssize_t copy_event_to_user(struct fsnotify_group *group,
struct fanotify_event *event,
char __user *buf, size_t count)
@@ -848,7 +861,7 @@ static ssize_t copy_event_to_user(struct fsnotify_group *group,
metadata.metadata_len = FAN_EVENT_METADATA_LEN;
metadata.vers = FANOTIFY_METADATA_VERSION;
metadata.reserved = 0;
- metadata.mask = event->mask & FANOTIFY_OUTGOING_EVENTS;
+ metadata.mask = fanotify_event_mask(group, event);
metadata.pid = pid_vnr(event->pid);
/*
* For an unprivileged listener, event->pid can be used to identify the
@@ -1881,6 +1894,35 @@ static int fanotify_events_supported(struct fsnotify_group *group,
return 0;
}
+static bool fanotify_is_valid_mask(struct fsnotify_group *group, int mark_type,
+ __u64 mask)
+{
+ u32 valid_mask = 0;
+
+ /*
+ * Event bits for different group type may be overloaded but event
+ * flags are common to all group types.
+ */
+ BUILD_BUG_ON(FANOTIFY_EVENTS_ON_FS & FANOTIFY_EVENT_FLAGS);
+ BUILD_BUG_ON(FANOTIFY_EVENTS_ON_NS & FANOTIFY_EVENT_FLAGS);
+
+ switch (group->type) {
+ case FSNOTIFY_GROUP_TYPE_FS:
+ valid_mask = FANOTIFY_EVENTS_ON_FS | FANOTIFY_EVENT_FLAGS;
+ if (!IS_ENABLED(CONFIG_FANOTIFY_ACCESS_PERMISSIONS))
+ valid_mask &= ~FANOTIFY_PERM_EVENTS;
+ break;
+ case FSNOTIFY_GROUP_TYPE_NS:
+ /* Only report mount events on mntns mark */
+ if (mark_type == FAN_MARK_MNTNS &&
+ FAN_GROUP_FLAG(group, FAN_REPORT_MNT))
+ valid_mask = FANOTIFY_MOUNT_EVENTS;
+ break;
+ }
+
+ return !(mask & ~valid_mask);
+}
+
static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask,
int dfd, const char __user *pathname)
{
@@ -1890,7 +1932,6 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask,
struct fan_fsid __fsid, *fsid = NULL;
struct user_namespace *user_ns = NULL;
struct mnt_namespace *mntns;
- u32 valid_mask = FANOTIFY_EVENTS | FANOTIFY_EVENT_FLAGS;
unsigned int mark_type = flags & FANOTIFY_MARK_TYPE_BITS;
unsigned int mark_cmd = flags & FANOTIFY_MARK_CMD_BITS;
unsigned int ignore = flags & FANOTIFY_MARK_IGNORE_BITS;
@@ -1945,13 +1986,6 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask,
return -EINVAL;
}
- if (IS_ENABLED(CONFIG_FANOTIFY_ACCESS_PERMISSIONS))
- valid_mask |= FANOTIFY_PERM_EVENTS;
-
- if (mask & ~valid_mask)
- return -EINVAL;
-
-
/* We don't allow FAN_MARK_IGNORE & FAN_MARK_IGNORED_MASK together */
if (ignore == (FAN_MARK_IGNORE | FAN_MARK_IGNORED_MASK))
return -EINVAL;
@@ -1974,22 +2008,10 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask,
return -EINVAL;
group = fd_file(f)->private_data;
- if (group->type != group_type)
+ if (group->type != group_type ||
+ !fanotify_is_valid_mask(group, mark_type, mask))
return -EINVAL;
- /* Only report mount events on mnt namespace */
- if (FAN_GROUP_FLAG(group, FAN_REPORT_MNT)) {
- if (mask & ~FANOTIFY_MOUNT_EVENTS)
- return -EINVAL;
- if (mark_type != FAN_MARK_MNTNS)
- return -EINVAL;
- } else {
- if (mask & FANOTIFY_MOUNT_EVENTS)
- return -EINVAL;
- if (mark_type == FAN_MARK_MNTNS)
- return -EINVAL;
- }
-
/*
* A user is allowed to setup sb/mount/mntns marks only if it is
* capable in the user ns where the group was created.
diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c
index b646a861a84c6..429ce614233ce 100644
--- a/fs/notify/fsnotify.c
+++ b/fs/notify/fsnotify.c
@@ -149,7 +149,7 @@ static inline __u32 fsnotify_object_watched(struct inode *inode, __u32 mnt_mask,
__u32 marks_mask = READ_ONCE(inode->i_fsnotify_mask) | mnt_mask |
READ_ONCE(inode->i_sb->s_fsnotify_mask);
- return mask & marks_mask & ALL_FSNOTIFY_EVENTS;
+ return mask & marks_mask & FSNOTIFY_EVENTS_ON_FS;
}
/* Report pre-content event with optional range info */
@@ -219,7 +219,7 @@ int __fsnotify_parent(struct dentry *dentry, __u32 mask, const void *data,
* events can provide an undesirable side-channel for information
* exfiltration.
*/
- parent_interested = mask & p_mask & ALL_FSNOTIFY_EVENTS &&
+ parent_interested = mask & p_mask & FSNOTIFY_EVENTS_ON_FS &&
!(data_type == FSNOTIFY_EVENT_PATH &&
d_is_special(dentry) &&
(mask & (FS_ACCESS | FS_MODIFY)));
@@ -266,7 +266,7 @@ static int fsnotify_handle_inode_event(struct fsnotify_group *group,
return 0;
/* Check interest of this mark in case event was sent with two marks */
- if (!(mask & inode_mark->mask & ALL_FSNOTIFY_EVENTS))
+ if (!(mask & inode_mark->mask & FSNOTIFY_EVENTS_ON_FS))
return 0;
return ops->handle_inode_event(inode_mark, mask, inode, dir, name, cookie);
diff --git a/include/linux/fanotify.h b/include/linux/fanotify.h
index 1a09400b843b5..224303a0c31e1 100644
--- a/include/linux/fanotify.h
+++ b/include/linux/fanotify.h
@@ -113,27 +113,35 @@
/* Events that can only be reported with data type FSNOTIFY_EVENT_ERROR */
#define FANOTIFY_ERROR_EVENTS (FAN_FS_ERROR)
+/* Events that user can request to be notified on filesystem watchers */
+#define FANOTIFY_EVENTS_ON_FS (FANOTIFY_PATH_EVENTS | \
+ FANOTIFY_PERM_EVENTS | \
+ FANOTIFY_INODE_EVENTS | \
+ FANOTIFY_ERROR_EVENTS)
+
+/* Mount tree monitoring events */
#define FANOTIFY_MOUNT_EVENTS (FAN_MNT_ATTACH | FAN_MNT_DETACH)
-/* Events that user can request to be notified on */
-#define FANOTIFY_EVENTS (FANOTIFY_PATH_EVENTS | \
- FANOTIFY_INODE_EVENTS | \
- FANOTIFY_ERROR_EVENTS | \
- FANOTIFY_MOUNT_EVENTS)
+/* Events that user can request to be notified on namepsace watchers */
+#define FANOTIFY_EVENTS_ON_NS (FANOTIFY_MOUNT_EVENTS)
/* Extra flags that may be reported with event or control handling of events */
#define FANOTIFY_EVENT_FLAGS (FAN_EVENT_ON_CHILD | FAN_ONDIR)
-/* Events that may be reported to user */
-#define FANOTIFY_OUTGOING_EVENTS (FANOTIFY_EVENTS | \
- FANOTIFY_PERM_EVENTS | \
+/* Events that may be reported to user on filesystem watchers */
+#define FANOTIFY_OUTGOING_FS_EVENTS (FANOTIFY_EVENTS_ON_FS | \
FAN_Q_OVERFLOW | FAN_ONDIR)
+/* Events that may be reported to user on namepsace watchers */
+#define FANOTIFY_OUTGOING_NS_EVENTS (FANOTIFY_EVENTS_ON_NS | \
+ FAN_Q_OVERFLOW)
+
/* Events and flags relevant only for directories */
#define FANOTIFY_DIRONLY_EVENT_BITS (FANOTIFY_DIRENT_EVENTS | \
FAN_EVENT_ON_CHILD | FAN_ONDIR)
-#define ALL_FANOTIFY_EVENT_BITS (FANOTIFY_OUTGOING_EVENTS | \
+#define ALL_FANOTIFY_EVENT_BITS (FANOTIFY_OUTGOING_FS_EVENTS | \
+ FANOTIFY_OUTGOING_NS_EVENTS | \
FANOTIFY_EVENT_FLAGS)
/* These masks check for invalid bits in permission responses. */
diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
index 698fc75b0b6d4..8f2821735f068 100644
--- a/include/linux/fsnotify_backend.h
+++ b/include/linux/fsnotify_backend.h
@@ -59,22 +59,30 @@
#define FS_PRE_ACCESS 0x00100000 /* Pre-content access hook */
-#define FS_MNT_ATTACH 0x01000000 /* Mount was attached */
-#define FS_MNT_DETACH 0x02000000 /* Mount was detached */
-#define FS_MNT_MOVE (FS_MNT_ATTACH | FS_MNT_DETACH)
+#define FS_RENAME 0x10000000 /* File was renamed */
+
+#define FS_MOVE (FS_MOVED_FROM | FS_MOVED_TO)
/*
- * Set on inode mark that cares about things that happen to its children.
- * Always set for dnotify and inotify.
+ * Filter flags for watching filesystems
+ *
+ * NOTE: The ON_CHILD flag is set on inode mark that cares about things that
+ * happen to its children. Always set for dnotify and inotify.
* Set on inode/sb/mount marks that care about parent/name info.
*/
#define FS_EVENT_ON_CHILD 0x08000000
-
-#define FS_RENAME 0x10000000 /* File was renamed */
#define FS_DN_MULTISHOT 0x20000000 /* dnotify multishot */
#define FS_ISDIR 0x40000000 /* event occurred against dir */
-#define FS_MOVE (FS_MOVED_FROM | FS_MOVED_TO)
+/*
+ * Events that user-space can request when watching namespaces
+ *
+ * NOTE: These values may overload filesystem events, but not event flags
+ */
+#define FS_MNT_ATTACH 0x01000000 /* Mount was attached */
+#define FS_MNT_DETACH 0x02000000 /* Mount was detached */
+#define FS_MNT_MOVE (FS_MNT_ATTACH | FS_MNT_DETACH)
+
/*
* Directory entry modification events - reported only to directory
@@ -84,9 +92,6 @@
*/
#define ALL_FSNOTIFY_DIRENT_EVENTS (FS_CREATE | FS_DELETE | FS_MOVE | FS_RENAME)
-/* Mount namespace events */
-#define FSNOTIFY_MNT_EVENTS (FS_MNT_ATTACH | FS_MNT_DETACH)
-
/* Content events can be used to inspect file content */
#define FSNOTIFY_CONTENT_PERM_EVENTS (FS_OPEN_PERM | FS_OPEN_EXEC_PERM | \
FS_ACCESS_PERM)
@@ -113,14 +118,23 @@
*/
#define FS_EVENTS_POSS_TO_PARENT (FS_EVENTS_POSS_ON_CHILD)
-/* Events that can be reported to backends */
-#define ALL_FSNOTIFY_EVENTS (ALL_FSNOTIFY_DIRENT_EVENTS | \
- FSNOTIFY_MNT_EVENTS | \
+/* Events that can be reported to backends on filesystem watchers */
+#define FSNOTIFY_EVENTS_ON_FS (ALL_FSNOTIFY_DIRENT_EVENTS | \
FS_EVENTS_POSS_ON_CHILD | \
FS_DELETE_SELF | FS_MOVE_SELF | \
FS_UNMOUNT | FS_Q_OVERFLOW | FS_IN_IGNORED | \
FS_ERROR)
+/* Mount tree monitoring events */
+#define FSNOTIFY_MNT_EVENTS (FS_MNT_ATTACH | FS_MNT_DETACH)
+
+/* Events that can be reported to backends on namepsace watchers */
+#define FSNOTIFY_EVENTS_ON_NS (FSNOTIFY_MNT_EVENTS | \
+ FS_Q_OVERFLOW)
+
+/* Events that can be reported to backends */
+#define ALL_FSNOTIFY_EVENTS (FSNOTIFY_EVENTS_ON_FS | FSNOTIFY_EVENTS_ON_NS)
+
/* Extra flags that may be reported with event or control handling of events */
#define ALL_FSNOTIFY_FLAGS (FS_ISDIR | FS_EVENT_ON_CHILD | FS_DN_MULTISHOT)
diff --git a/include/uapi/linux/fanotify.h b/include/uapi/linux/fanotify.h
index e710967c7c263..cfcd193aee3e2 100644
--- a/include/uapi/linux/fanotify.h
+++ b/include/uapi/linux/fanotify.h
@@ -4,7 +4,9 @@
#include <linux/types.h>
-/* the following events that user-space can register for */
+/*
+ * Events that user-space can request when watching filesystems
+ */
#define FAN_ACCESS 0x00000001 /* File was accessed */
#define FAN_MODIFY 0x00000002 /* File was modified */
#define FAN_ATTRIB 0x00000004 /* Metadata changed */
@@ -28,19 +30,28 @@
/* #define FAN_DIR_MODIFY 0x00080000 */ /* Deprecated (reserved) */
#define FAN_PRE_ACCESS 0x00100000 /* Pre-content access hook */
-#define FAN_MNT_ATTACH 0x01000000 /* Mount was attached */
-#define FAN_MNT_DETACH 0x02000000 /* Mount was detached */
-
-#define FAN_EVENT_ON_CHILD 0x08000000 /* Interested in child events */
#define FAN_RENAME 0x10000000 /* File was renamed */
-#define FAN_ONDIR 0x40000000 /* Event occurred against dir */
-
/* helper events */
#define FAN_CLOSE (FAN_CLOSE_WRITE | FAN_CLOSE_NOWRITE) /* close */
#define FAN_MOVE (FAN_MOVED_FROM | FAN_MOVED_TO) /* moves */
+/*
+ * Filter flags for watching filesystems
+ */
+#define FAN_EVENT_ON_CHILD 0x08000000 /* Interested in child events */
+#define FAN_ONDIR 0x40000000 /* Event occurred against dir */
+
+/*
+ * Events that user-space can request when watching namespaces
+ *
+ * NOTE: These values may overload filesystem events, but not event flags
+ */
+#define FAN_MNT_ATTACH 0x01000000 /* Mount was attached */
+#define FAN_MNT_DETACH 0x02000000 /* Mount was detached */
+
+
/* flags used for fanotify_init() */
#define FAN_CLOEXEC 0x00000001
#define FAN_NONBLOCK 0x00000002
--
2.54.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v2 04/10] fanotify: test event->type instead of event mask when possible
2026-04-24 17:04 [PATCH v2 00/10] fanotify namespace monitoring Amir Goldstein
` (2 preceding siblings ...)
2026-04-24 17:04 ` [PATCH v2 03/10] fsnotify: separate the events bitmask macros by group type Amir Goldstein
@ 2026-04-24 17:04 ` Amir Goldstein
2026-04-24 17:04 ` [PATCH v2 05/10] fsnotify: do not report mount events with fsnotify() Amir Goldstein
` (5 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: Amir Goldstein @ 2026-04-24 17:04 UTC (permalink / raw)
To: Jan Kara; +Cc: Christian Brauner, linux-fsdevel
The fanotify_event already has a type field, so tests like
fanotify_is_mnt_event() should check the type and not the mask.
In fanotify_alloc_event() determine the mnt and fs_error event
types by data_type, which uniquely identifies those events.
The helper for classifying an fs permission event is renamed to
fanotify_is_fs_perm_event() and takes the group argument to verify
that this is a filesystem group type.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
fs/notify/fanotify/fanotify.c | 30 +++++++++++++++---------------
fs/notify/fanotify/fanotify.h | 29 +++++++++++++++++++++--------
fs/notify/fanotify/fanotify_user.c | 18 +++++++++---------
include/linux/fsnotify_backend.h | 5 -----
4 files changed, 45 insertions(+), 37 deletions(-)
diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c
index 494bba49634e1..80116026d3ae0 100644
--- a/fs/notify/fanotify/fanotify.c
+++ b/fs/notify/fanotify/fanotify.c
@@ -195,7 +195,7 @@ static int fanotify_merge(struct fsnotify_group *group,
* the event structure we have created in fanotify_handle_event() is the
* one we should check for permission response.
*/
- if (fanotify_is_perm_event(new->mask))
+ if (fanotify_is_perm_event(new))
return 0;
hlist_for_each_entry(old, hlist, merge_list) {
@@ -204,7 +204,7 @@ static int fanotify_merge(struct fsnotify_group *group,
if (fanotify_should_merge(old, new)) {
old->mask |= new->mask;
- if (fanotify_is_error_event(old->mask))
+ if (fanotify_is_error_event(old))
FANOTIFY_EE(old)->err_count++;
return 1;
@@ -711,11 +711,9 @@ static struct fanotify_event *fanotify_alloc_name_event(struct inode *dir,
static struct fanotify_event *fanotify_alloc_error_event(
struct fsnotify_group *group,
__kernel_fsid_t *fsid,
- const void *data, int data_type,
+ struct fs_error_report *report,
unsigned int *hash)
{
- struct fs_error_report *report =
- fsnotify_data_error_report(data, data_type);
struct inode *inode;
struct fanotify_error_event *fee;
int fh_len;
@@ -759,6 +757,8 @@ static struct fanotify_event *fanotify_alloc_event(
fid_mode);
struct inode *dirid = fanotify_dfid_inode(mask, data, data_type, dir);
const struct path *path = fsnotify_data_path(data, data_type);
+ struct fs_error_report *fs_error =
+ fsnotify_data_error_report(data, data_type);
u64 mnt_id = fsnotify_data_mnt_id(data, data_type);
struct mem_cgroup *old_memcg;
struct dentry *moved = NULL;
@@ -845,11 +845,10 @@ static struct fanotify_event *fanotify_alloc_event(
/* Whoever is interested in the event, pays for the allocation. */
old_memcg = set_active_memcg(group->memcg);
- if (fanotify_is_perm_event(mask)) {
+ if (fanotify_is_fs_perm_event(group, mask)) {
event = fanotify_alloc_perm_event(data, data_type, gfp);
- } else if (fanotify_is_error_event(mask)) {
- event = fanotify_alloc_error_event(group, fsid, data,
- data_type, &hash);
+ } else if (fs_error) {
+ event = fanotify_alloc_error_event(group, fsid, fs_error, &hash);
} else if (name_event && (file_name || moved || child)) {
event = fanotify_alloc_name_event(dirid, fsid, file_name, child,
moved, &hash, gfp);
@@ -916,7 +915,7 @@ static void fanotify_insert_event(struct fsnotify_group *group,
assert_spin_locked(&group->notification_lock);
- if (!fanotify_is_hashed_event(event->mask))
+ if (!fanotify_is_hashed_event(event))
return;
pr_debug("%s: group=%p event=%p bucket=%u\n", __func__,
@@ -970,7 +969,8 @@ static int fanotify_handle_event(struct fsnotify_group *group, u32 mask,
pr_debug("%s: group=%p mask=%x report_mask=%x\n", __func__,
group, mask, match_mask);
- if (fanotify_is_perm_event(mask)) {
+ bool is_perm = fanotify_is_fs_perm_event(group, mask);
+ if (is_perm) {
/*
* fsnotify_prepare_user_wait() fails if we race with mark
* deletion. Just let the operation pass in that case.
@@ -990,7 +990,7 @@ static int fanotify_handle_event(struct fsnotify_group *group, u32 mask,
* We don't queue overflow events for permission events as
* there the access is denied and so no event is in fact lost.
*/
- if (!fanotify_is_perm_event(mask))
+ if (!is_perm)
fsnotify_queue_overflow(group);
goto finish;
}
@@ -1000,17 +1000,17 @@ static int fanotify_handle_event(struct fsnotify_group *group, u32 mask,
fanotify_insert_event);
if (ret) {
/* Permission events shouldn't be merged */
- BUG_ON(ret == 1 && mask & FANOTIFY_PERM_EVENTS);
+ WARN_ON(ret == 1 && is_perm);
/* Our event wasn't used in the end. Free it. */
fsnotify_destroy_event(group, fsn_event);
ret = 0;
- } else if (fanotify_is_perm_event(mask)) {
+ } else if (is_perm) {
ret = fanotify_get_response(group, FANOTIFY_PERM(event),
iter_info);
}
finish:
- if (fanotify_is_perm_event(mask))
+ if (is_perm)
fsnotify_finish_user_wait(iter_info);
return ret;
diff --git a/fs/notify/fanotify/fanotify.h b/fs/notify/fanotify/fanotify.h
index a0619e7694d57..13e3787ddd558 100644
--- a/fs/notify/fanotify/fanotify.h
+++ b/fs/notify/fanotify/fanotify.h
@@ -265,6 +265,7 @@ struct fanotify_event {
struct pid *pid;
};
+
static inline void fanotify_init_event(struct fanotify_event *event,
unsigned int hash, u32 mask)
{
@@ -457,12 +458,24 @@ FANOTIFY_PERM(struct fanotify_event *event)
return container_of(event, struct fanotify_perm_event, fae);
}
-static inline bool fanotify_is_perm_event(u32 mask)
+static inline bool fanotify_is_fs_perm_event(struct fsnotify_group *group,
+ u32 mask)
{
return IS_ENABLED(CONFIG_FANOTIFY_ACCESS_PERMISSIONS) &&
+ fsnotify_is_fs_watcher(group) &&
mask & FANOTIFY_PERM_EVENTS;
}
+static inline bool fanotify_is_perm_event(struct fanotify_event *event)
+{
+ return event->type == FANOTIFY_EVENT_TYPE_PATH_PERM;
+}
+
+static inline bool fsnotify_is_overflow_event(struct fanotify_event *event)
+{
+ return event->type == FANOTIFY_EVENT_TYPE_OVERFLOW;
+}
+
static inline bool fanotify_event_has_access_range(struct fanotify_event *event)
{
if (!(event->mask & FANOTIFY_PRE_CONTENT_EVENTS))
@@ -476,14 +489,14 @@ static inline struct fanotify_event *FANOTIFY_E(struct fsnotify_event *fse)
return container_of(fse, struct fanotify_event, fse);
}
-static inline bool fanotify_is_error_event(u32 mask)
+static inline bool fanotify_is_error_event(struct fanotify_event *event)
{
- return mask & FAN_FS_ERROR;
+ return event->type == FANOTIFY_EVENT_TYPE_FS_ERROR;
}
-static inline bool fanotify_is_mnt_event(u32 mask)
+static inline bool fanotify_is_mnt_event(struct fanotify_event *event)
{
- return mask & (FAN_MNT_ATTACH | FAN_MNT_DETACH);
+ return event->type == FANOTIFY_EVENT_TYPE_MNT;
}
static inline const struct path *fanotify_event_path(struct fanotify_event *event)
@@ -506,10 +519,10 @@ static inline const struct path *fanotify_event_path(struct fanotify_event *even
/*
* Permission events and overflow event do not get merged - don't hash them.
*/
-static inline bool fanotify_is_hashed_event(u32 mask)
+static inline bool fanotify_is_hashed_event(struct fanotify_event *event)
{
- return !(fanotify_is_perm_event(mask) ||
- fsnotify_is_overflow_event(mask));
+ return !(fanotify_is_perm_event(event) ||
+ fsnotify_is_overflow_event(event));
}
static inline unsigned int fanotify_event_hash_bucket(
diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
index 7afb017d40f50..c41c83d86518a 100644
--- a/fs/notify/fanotify/fanotify_user.c
+++ b/fs/notify/fanotify/fanotify_user.c
@@ -257,7 +257,7 @@ static size_t fanotify_event_len(unsigned int info_mode,
int fh_len;
int dot_len = 0;
- if (fanotify_is_error_event(event->mask))
+ if (fanotify_is_error_event(event))
event_len += FANOTIFY_ERROR_INFO_LEN;
if (fanotify_event_has_any_dir_fh(event)) {
@@ -275,7 +275,7 @@ static size_t fanotify_event_len(unsigned int info_mode,
fh_len = fanotify_event_object_fh_len(event);
event_len += fanotify_fid_info_len(fh_len, dot_len);
}
- if (fanotify_is_mnt_event(event->mask))
+ if (fanotify_is_mnt_event(event))
event_len += FANOTIFY_MNT_INFO_LEN;
if (info_mode & FAN_REPORT_PIDFD)
@@ -338,9 +338,9 @@ static struct fanotify_event *get_one_event(struct fsnotify_group *group,
* same event we peeked above.
*/
fsnotify_remove_first_event(group);
- if (fanotify_is_perm_event(event->mask))
+ if (fanotify_is_perm_event(event))
FANOTIFY_PERM(event)->state = FAN_EVENT_REPORTED;
- if (fanotify_is_hashed_event(event->mask))
+ if (fanotify_is_hashed_event(event))
fanotify_unhash_event(group, event);
out:
spin_unlock(&group->notification_lock);
@@ -800,7 +800,7 @@ static int copy_info_records_to_user(struct fanotify_event *event,
total_bytes += ret;
}
- if (fanotify_is_error_event(event->mask)) {
+ if (fanotify_is_error_event(event)) {
ret = copy_error_info_to_user(event, buf, count);
if (ret < 0)
return ret;
@@ -818,7 +818,7 @@ static int copy_info_records_to_user(struct fanotify_event *event,
total_bytes += ret;
}
- if (fanotify_is_mnt_event(event->mask)) {
+ if (fanotify_is_mnt_event(event)) {
ret = copy_mnt_info_to_user(event, buf, count);
if (ret < 0)
return ret;
@@ -965,7 +965,7 @@ static ssize_t copy_event_to_user(struct fsnotify_group *group,
if (pidfd_file)
fd_install(pidfd, pidfd_file);
- if (fanotify_is_perm_event(event->mask))
+ if (fanotify_is_perm_event(event))
FANOTIFY_PERM(event)->fd = fd;
return metadata.event_len;
@@ -1048,7 +1048,7 @@ static ssize_t fanotify_read(struct file *file, char __user *buf,
* Permission events get queued to wait for response. Other
* events can be destroyed now.
*/
- if (!fanotify_is_perm_event(event->mask)) {
+ if (!fanotify_is_perm_event(event)) {
fsnotify_destroy_event(group, &event->fse);
} else {
if (ret <= 0 || FANOTIFY_PERM(event)->fd < 0) {
@@ -1145,7 +1145,7 @@ static int fanotify_release(struct inode *ignored, struct file *file)
while ((fsn_event = fsnotify_remove_first_event(group))) {
struct fanotify_event *event = FANOTIFY_E(fsn_event);
- if (!(event->mask & FANOTIFY_PERM_EVENTS)) {
+ if (!fanotify_is_perm_event(event)) {
spin_unlock(&group->notification_lock);
fsnotify_destroy_event(group, fsn_event);
} else {
diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
index 8f2821735f068..9ce08d03d041d 100644
--- a/include/linux/fsnotify_backend.h
+++ b/include/linux/fsnotify_backend.h
@@ -783,11 +783,6 @@ static inline void fsnotify_queue_overflow(struct fsnotify_group *group)
fsnotify_add_event(group, group->overflow_event, NULL);
}
-static inline bool fsnotify_is_overflow_event(u32 mask)
-{
- return mask & FS_Q_OVERFLOW;
-}
-
static inline bool fsnotify_notify_queue_is_empty(struct fsnotify_group *group)
{
assert_spin_locked(&group->notification_lock);
--
2.54.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v2 05/10] fsnotify: do not report mount events with fsnotify()
2026-04-24 17:04 [PATCH v2 00/10] fanotify namespace monitoring Amir Goldstein
` (3 preceding siblings ...)
2026-04-24 17:04 ` [PATCH v2 04/10] fanotify: test event->type instead of event mask when possible Amir Goldstein
@ 2026-04-24 17:04 ` Amir Goldstein
2026-04-24 17:04 ` [PATCH v2 06/10] fanotify: gate fs event classification by group type Amir Goldstein
` (4 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: Amir Goldstein @ 2026-04-24 17:04 UTC (permalink / raw)
To: Jan Kara; +Cc: Christian Brauner, linux-fsdevel
fsnotify() is now used to report events to both filesystem watchers and
(mnt) namepsace watchers, but those two distinct event categories can
never be sent to the same group.
Split out send_to_ns_groups() which only looks for interested mntns object
marks from fsnotify() which only looks for interested filesystem object
marks and let fsnotify_mnt() call send_to_ns_groups().
The intention is that send_to_ns_groups() will also be used to report
namespace events to watchers of userns object.
Gate the common code that is checking fs events with filesystem
group type check.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
fs/notify/fsnotify.c | 87 +++++++++++++++++++++++++++++++-------------
1 file changed, 62 insertions(+), 25 deletions(-)
diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c
index 429ce614233ce..db79f51d8109c 100644
--- a/fs/notify/fsnotify.c
+++ b/fs/notify/fsnotify.c
@@ -330,7 +330,8 @@ static int fsnotify_handle_event(struct fsnotify_group *group, __u32 mask,
static int send_to_group(__u32 mask, const void *data, int data_type,
struct inode *dir, const struct qstr *file_name,
- u32 cookie, struct fsnotify_iter_info *iter_info)
+ u32 cookie, struct fsnotify_iter_info *iter_info,
+ enum fsnotify_group_type group_type)
{
struct fsnotify_group *group = NULL;
__u32 test_mask = (mask & ALL_FSNOTIFY_EVENTS);
@@ -344,7 +345,7 @@ static int send_to_group(__u32 mask, const void *data, int data_type,
return 0;
/* clear ignored on inode modification */
- if (mask & FS_MODIFY) {
+ if (group_type == FSNOTIFY_GROUP_TYPE_FS && mask & FS_MODIFY) {
fsnotify_foreach_iter_mark_type(iter_info, mark, type) {
if (!(mark->flags &
FSNOTIFY_MARK_FLAG_IGNORED_SURV_MODIFY))
@@ -360,10 +361,13 @@ static int send_to_group(__u32 mask, const void *data, int data_type,
fsnotify_effective_ignore_mask(mark, is_dir, type);
}
- pr_debug("%s: group=%p mask=%x marks_mask=%x marks_ignore_mask=%x data=%p data_type=%d dir=%p cookie=%d\n",
- __func__, group, mask, marks_mask, marks_ignore_mask,
+ pr_debug("%s: group=%p type=%d mask=%x marks_mask=%x marks_ignore_mask=%x data=%p data_type=%d dir=%p cookie=%d\n",
+ __func__, group, group_type, mask, marks_mask, marks_ignore_mask,
data, data_type, dir, cookie);
+ if (WARN_ON_ONCE(group->type != group_type))
+ return 0;
+
if (!(test_mask & marks_mask & ~marks_ignore_mask))
return 0;
@@ -469,6 +473,26 @@ static void fsnotify_iter_next(struct fsnotify_iter_info *iter_info)
}
}
+static int send_to_groups(__u32 mask, const void *data, int data_type,
+ struct inode *dir, const struct qstr *file_name,
+ u32 cookie, struct fsnotify_iter_info *iter_info,
+ enum fsnotify_group_type group_type)
+{
+ int ret;
+
+ while (fsnotify_iter_select_report_types(iter_info)) {
+ ret = send_to_group(mask, data, data_type, dir, file_name,
+ cookie, iter_info, group_type);
+
+ if (ret && group_type == FSNOTIFY_GROUP_TYPE_FS &&
+ (mask & ALL_FSNOTIFY_PERM_EVENTS))
+ return ret;
+
+ fsnotify_iter_next(iter_info);
+ }
+ return 0;
+}
+
/*
* fsnotify - This is the main call to fsnotify.
*
@@ -494,7 +518,6 @@ int fsnotify(__u32 mask, const void *data, int data_type, struct inode *dir,
{
const struct path *path = fsnotify_data_path(data, data_type);
struct super_block *sb = fsnotify_data_sb(data, data_type);
- const struct fsnotify_mnt *mnt_data = fsnotify_data_mnt(data, data_type);
struct fsnotify_sb_info *sbinfo = sb ? fsnotify_sb_info(sb) : NULL;
struct fsnotify_iter_info iter_info = {};
struct mount *mnt = NULL;
@@ -535,8 +558,7 @@ int fsnotify(__u32 mask, const void *data, int data_type, struct inode *dir,
if ((!sbinfo || !sbinfo->sb_marks) &&
(!mnt || !mnt->mnt_fsnotify_marks) &&
(!inode || !inode->i_fsnotify_marks) &&
- (!inode2 || !inode2->i_fsnotify_marks) &&
- (!mnt_data || !mnt_data->ns->n_fsnotify_marks))
+ (!inode2 || !inode2->i_fsnotify_marks))
return 0;
if (sb)
@@ -547,8 +569,6 @@ int fsnotify(__u32 mask, const void *data, int data_type, struct inode *dir,
marks_mask |= READ_ONCE(inode->i_fsnotify_mask);
if (inode2)
marks_mask |= READ_ONCE(inode2->i_fsnotify_mask);
- if (mnt_data)
- marks_mask |= READ_ONCE(mnt_data->ns->n_fsnotify_mask);
/*
* If this is a modify event we may need to clear some ignore masks.
@@ -556,7 +576,7 @@ int fsnotify(__u32 mask, const void *data, int data_type, struct inode *dir,
* event in its mask.
* Otherwise, return if none of the marks care about this type of event.
*/
- test_mask = (mask & ALL_FSNOTIFY_EVENTS);
+ test_mask = (mask & FSNOTIFY_EVENTS_ON_FS);
if (!(test_mask & marks_mask))
return 0;
@@ -578,27 +598,15 @@ int fsnotify(__u32 mask, const void *data, int data_type, struct inode *dir,
iter_info.marks[inode2_type] =
fsnotify_first_mark(&inode2->i_fsnotify_marks);
}
- if (mnt_data) {
- iter_info.marks[FSNOTIFY_ITER_TYPE_MNTNS] =
- fsnotify_first_mark(&mnt_data->ns->n_fsnotify_marks);
- }
/*
* We need to merge inode/vfsmount/sb mark lists so that e.g. inode mark
* ignore masks are properly reflected for mount/sb mark notifications.
* That's why this traversal is so complicated...
*/
- while (fsnotify_iter_select_report_types(&iter_info)) {
- ret = send_to_group(mask, data, data_type, dir, file_name,
- cookie, &iter_info);
+ ret = send_to_groups(mask, data, data_type, dir, file_name,
+ cookie, &iter_info, FSNOTIFY_GROUP_TYPE_FS);
- if (ret && (mask & ALL_FSNOTIFY_PERM_EVENTS))
- goto out;
-
- fsnotify_iter_next(&iter_info);
- }
- ret = 0;
-out:
srcu_read_unlock(&fsnotify_mark_srcu, iter_info.srcu_idx);
return ret;
@@ -691,6 +699,35 @@ int fsnotify_open_perm_and_set_mode(struct file *file)
}
#endif
+static int send_to_ns_groups(__u32 mask, const void *data, int data_type)
+{
+ const struct fsnotify_mnt *mnt_data = fsnotify_data_mnt(data, data_type);
+ struct fsnotify_iter_info iter_info = {};
+ __u32 test_mask, marks_mask = 0;
+ int ret;
+
+ if (mnt_data)
+ marks_mask |= READ_ONCE(mnt_data->ns->n_fsnotify_mask);
+
+ test_mask = mask & FSNOTIFY_EVENTS_ON_NS;
+ if (!(test_mask & marks_mask))
+ return 0;
+
+ iter_info.srcu_idx = srcu_read_lock(&fsnotify_mark_srcu);
+
+ if (mnt_data) {
+ iter_info.marks[FSNOTIFY_ITER_TYPE_MNTNS] =
+ fsnotify_first_mark(&mnt_data->ns->n_fsnotify_marks);
+ }
+
+ ret = send_to_groups(mask, data, data_type, NULL, NULL, 0, &iter_info,
+ FSNOTIFY_GROUP_TYPE_NS);
+
+ srcu_read_unlock(&fsnotify_mark_srcu, iter_info.srcu_idx);
+
+ return ret;
+}
+
void fsnotify_mnt(__u32 mask, struct mnt_namespace *ns, struct vfsmount *mnt)
{
struct fsnotify_mnt data = {
@@ -708,7 +745,7 @@ void fsnotify_mnt(__u32 mask, struct mnt_namespace *ns, struct vfsmount *mnt)
if (!ns->n_fsnotify_marks)
return;
- fsnotify(mask, &data, FSNOTIFY_EVENT_MNT, NULL, NULL, NULL, 0);
+ send_to_ns_groups(mask, &data, FSNOTIFY_EVENT_MNT);
}
static __init int fsnotify_init(void)
--
2.54.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v2 06/10] fanotify: gate fs event classification by group type
2026-04-24 17:04 [PATCH v2 00/10] fanotify namespace monitoring Amir Goldstein
` (4 preceding siblings ...)
2026-04-24 17:04 ` [PATCH v2 05/10] fsnotify: do not report mount events with fsnotify() Amir Goldstein
@ 2026-04-24 17:04 ` Amir Goldstein
2026-04-24 17:05 ` [PATCH v2 07/10] fanotify: gate fs events checks in fanotify_mark() " Amir Goldstein
` (3 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: Amir Goldstein @ 2026-04-24 17:04 UTC (permalink / raw)
To: Jan Kara; +Cc: Christian Brauner, linux-fsdevel
fanotify decides on how to allocate events based on the event mask.
The event mask bits that correspond to filesystem watchers are only
meaningful in the context of filesystem group type.
Hence, before checking if event is a specific fs event (e.g. FS_RENAME)
need to first check the group type is filesystem.
Separate fanotify_alloc_event() into an allocator per group type, so
that all the fs events check inside fanotify_alloc_fs_watcher_event()
are already properly gated.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
fs/notify/fanotify/fanotify.c | 73 ++++++++++++++++++++++++++++-------
1 file changed, 58 insertions(+), 15 deletions(-)
diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c
index 80116026d3ae0..987092c38789b 100644
--- a/fs/notify/fanotify/fanotify.c
+++ b/fs/notify/fanotify/fanotify.c
@@ -744,11 +744,12 @@ static struct fanotify_event *fanotify_alloc_error_event(
return &fee->fae;
}
-static struct fanotify_event *fanotify_alloc_event(
+static struct fanotify_event *fanotify_alloc_fs_watcher_event(
struct fsnotify_group *group,
u32 mask, const void *data, int data_type,
struct inode *dir, const struct qstr *file_name,
- __kernel_fsid_t *fsid, u32 match_mask)
+ __kernel_fsid_t *fsid, u32 match_mask,
+ unsigned int *hash)
{
struct fanotify_event *event = NULL;
gfp_t gfp = GFP_KERNEL_ACCOUNT;
@@ -759,14 +760,11 @@ static struct fanotify_event *fanotify_alloc_event(
const struct path *path = fsnotify_data_path(data, data_type);
struct fs_error_report *fs_error =
fsnotify_data_error_report(data, data_type);
- u64 mnt_id = fsnotify_data_mnt_id(data, data_type);
struct mem_cgroup *old_memcg;
struct dentry *moved = NULL;
struct inode *child = NULL;
bool name_event = false;
- unsigned int hash = 0;
bool ondir = mask & FAN_ONDIR;
- struct pid *pid;
if ((fid_mode & FAN_REPORT_DIR_FID) && dirid) {
/*
@@ -848,22 +846,69 @@ static struct fanotify_event *fanotify_alloc_event(
if (fanotify_is_fs_perm_event(group, mask)) {
event = fanotify_alloc_perm_event(data, data_type, gfp);
} else if (fs_error) {
- event = fanotify_alloc_error_event(group, fsid, fs_error, &hash);
+ event = fanotify_alloc_error_event(group, fsid, fs_error, hash);
} else if (name_event && (file_name || moved || child)) {
event = fanotify_alloc_name_event(dirid, fsid, file_name, child,
- moved, &hash, gfp);
+ moved, hash, gfp);
} else if (fid_mode) {
- event = fanotify_alloc_fid_event(id, fsid, &hash, gfp);
+ event = fanotify_alloc_fid_event(id, fsid, hash, gfp);
} else if (path) {
- event = fanotify_alloc_path_event(path, &hash, gfp);
- } else if (mnt_id) {
+ event = fanotify_alloc_path_event(path, hash, gfp);
+ } else {
+ WARN_ON_ONCE(1);
+ }
+
+ set_active_memcg(old_memcg);
+ return event;
+}
+
+static struct fanotify_event *fanotify_alloc_ns_watcher_event(
+ struct fsnotify_group *group, u64 mask,
+ const void *data, int data_type)
+{
+ u64 mnt_id = fsnotify_data_mnt_id(data, data_type);
+ struct mem_cgroup *old_memcg;
+ struct fanotify_event *event = NULL;
+ gfp_t gfp = GFP_KERNEL_ACCOUNT;
+
+ if (group->max_events == UINT_MAX)
+ gfp |= __GFP_NOFAIL;
+ else
+ gfp |= __GFP_RETRY_MAYFAIL;
+
+ old_memcg = set_active_memcg(group->memcg);
+
+ if (mnt_id) {
event = fanotify_alloc_mnt_event(mnt_id, gfp);
} else {
WARN_ON_ONCE(1);
}
+ set_active_memcg(old_memcg);
+ return event;
+}
+
+static struct fanotify_event *fanotify_alloc_event(
+ struct fsnotify_group *group, u64 mask,
+ const void *data, int data_type,
+ struct inode *dir, const struct qstr *name,
+ __kernel_fsid_t *fsid, u32 match_mask)
+{
+ struct fanotify_event *event;
+ unsigned int hash = 0;
+ bool ondir = mask & FAN_ONDIR;
+ struct pid *pid;
+
+ if (fsnotify_is_ns_watcher(group)) {
+ event = fanotify_alloc_ns_watcher_event(group, mask, data,
+ data_type);
+ } else {
+ event = fanotify_alloc_fs_watcher_event(group, mask, data,
+ data_type, dir, name, fsid,
+ match_mask, &hash);
+ }
if (!event)
- goto out;
+ return NULL;
if (FAN_GROUP_FLAG(group, FAN_REPORT_TID))
pid = get_pid(task_pid(current));
@@ -875,8 +920,6 @@ static struct fanotify_event *fanotify_alloc_event(
fanotify_init_event(event, hash, mask);
event->pid = pid;
-out:
- set_active_memcg(old_memcg);
return event;
}
@@ -966,8 +1009,8 @@ static int fanotify_handle_event(struct fsnotify_group *group, u32 mask,
if (!mask)
return 0;
- pr_debug("%s: group=%p mask=%x report_mask=%x\n", __func__,
- group, mask, match_mask);
+ pr_debug("%s: group=%p type=%d mask=%x report_mask=%x\n", __func__,
+ group, group->type, mask, match_mask);
bool is_perm = fanotify_is_fs_perm_event(group, mask);
if (is_perm) {
--
2.54.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v2 07/10] fanotify: gate fs events checks in fanotify_mark() by group type
2026-04-24 17:04 [PATCH v2 00/10] fanotify namespace monitoring Amir Goldstein
` (5 preceding siblings ...)
2026-04-24 17:04 ` [PATCH v2 06/10] fanotify: gate fs event classification by group type Amir Goldstein
@ 2026-04-24 17:05 ` Amir Goldstein
2026-04-24 17:05 ` [PATCH v2 08/10] fanotify: add support for watching the namespaces tree Amir Goldstein
` (2 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: Amir Goldstein @ 2026-04-24 17:05 UTC (permalink / raw)
To: Jan Kara; +Cc: Christian Brauner, linux-fsdevel
fanotify_mark() has plenty of checks on the event mask.
The event mask bits that correspond to filesystem watchers are only
meaningful in the context of filesystem group type.
Hence, before checking if event is a specific event (e.g. FAN_FS_ERROR)
need to check that the group type as well (e.g. filesystem).
Add helpers fanotify_test_{fs,ns}_watcher_event() and use them instead
of checking the event mask directly.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
fs/notify/fanotify/fanotify.h | 16 ++++++++++--
fs/notify/fanotify/fanotify_user.c | 39 ++++++++++++++++++------------
2 files changed, 38 insertions(+), 17 deletions(-)
diff --git a/fs/notify/fanotify/fanotify.h b/fs/notify/fanotify/fanotify.h
index 13e3787ddd558..56bbee15b7ee3 100644
--- a/fs/notify/fanotify/fanotify.h
+++ b/fs/notify/fanotify/fanotify.h
@@ -458,12 +458,24 @@ FANOTIFY_PERM(struct fanotify_event *event)
return container_of(event, struct fanotify_perm_event, fae);
}
+static inline bool fanotify_test_fs_watcher_event(struct fsnotify_group *group,
+ u32 mask, u32 test_mask)
+{
+ return fsnotify_is_fs_watcher(group) && (mask & test_mask);
+}
+
+static inline bool fanotify_test_ns_watcher_event(struct fsnotify_group *group,
+ u32 mask, u32 test_mask)
+{
+ return fsnotify_is_ns_watcher(group) && (mask & test_mask);
+}
+
static inline bool fanotify_is_fs_perm_event(struct fsnotify_group *group,
u32 mask)
{
return IS_ENABLED(CONFIG_FANOTIFY_ACCESS_PERMISSIONS) &&
- fsnotify_is_fs_watcher(group) &&
- mask & FANOTIFY_PERM_EVENTS;
+ fanotify_test_fs_watcher_event(group, mask,
+ FANOTIFY_PERM_EVENTS);
}
static inline bool fanotify_is_perm_event(struct fanotify_event *event)
diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
index c41c83d86518a..4c1767b3c1a06 100644
--- a/fs/notify/fanotify/fanotify_user.c
+++ b/fs/notify/fanotify/fanotify_user.c
@@ -1509,7 +1509,9 @@ static int fanotify_may_update_existing_mark(struct fsnotify_mark *fsn_mark,
/* For now pre-content events are not generated for directories */
mask |= fsn_mark->mask;
- if (mask & FANOTIFY_PRE_CONTENT_EVENTS && mask & FAN_ONDIR)
+ if (mask & FAN_ONDIR &&
+ fanotify_test_fs_watcher_event(fsn_mark->group, mask,
+ FANOTIFY_PRE_CONTENT_EVENTS))
return -EEXIST;
return 0;
@@ -1546,8 +1548,8 @@ static int fanotify_add_mark(struct fsnotify_group *group,
* Error events are pre-allocated per group, only if strictly
* needed (i.e. FAN_FS_ERROR was requested).
*/
- if (!(fan_flags & FANOTIFY_MARK_IGNORE_BITS) &&
- (mask & FAN_FS_ERROR)) {
+ if (fanotify_test_fs_watcher_event(group, mask, FAN_FS_ERROR) &&
+ !(fan_flags & FANOTIFY_MARK_IGNORE_BITS)) {
ret = fanotify_group_init_error_pool(group);
if (ret)
goto out;
@@ -1562,7 +1564,8 @@ static int fanotify_add_mark(struct fsnotify_group *group,
fsnotify_put_mark(fsn_mark);
- if (!ret && (mask & FANOTIFY_PERM_EVENTS))
+ if (!ret && fanotify_test_fs_watcher_event(group, mask,
+ FANOTIFY_PERM_EVENTS))
fanotify_perm_watchdog_group_add(group);
return ret;
@@ -1842,14 +1845,15 @@ static int fanotify_events_supported(struct fsnotify_group *group,
bool is_dir = d_is_dir(path->dentry);
/* Strict validation of events in non-dir inode mask with v5.17+ APIs */
bool strict_dir_events = FAN_GROUP_FLAG(group, FAN_REPORT_TARGET_FID) ||
- (mask & FAN_RENAME) ||
- (flags & FAN_MARK_IGNORE);
+ fanotify_test_fs_watcher_event(group, mask, FAN_RENAME) ||
+ (flags & FAN_MARK_IGNORE);
/*
* Filesystems need to opt-into pre-content evnets (a.k.a HSM)
* and they are only supported on regular files and directories.
*/
- if (mask & FANOTIFY_PRE_CONTENT_EVENTS) {
+ if (fanotify_test_fs_watcher_event(group, mask,
+ FANOTIFY_PRE_CONTENT_EVENTS)) {
if (!(path->mnt->mnt_sb->s_iflags & SB_I_ALLOW_HSM))
return -EOPNOTSUPP;
if (!is_dir && !d_is_reg(path->dentry))
@@ -1864,7 +1868,7 @@ static int fanotify_events_supported(struct fsnotify_group *group,
* waits for fanotify permission event to be answered. Just disallow
* permission events for such filesystems.
*/
- if (mask & FANOTIFY_PERM_EVENTS &&
+ if (fanotify_test_fs_watcher_event(group, mask, FANOTIFY_PERM_EVENTS) &&
path->mnt->mnt_sb->s_type->fs_flags & FS_DISALLOW_NOTIFY_PERM)
return -EINVAL;
@@ -1887,8 +1891,9 @@ static int fanotify_events_supported(struct fsnotify_group *group,
* flags FAN_ONDIR and FAN_EVENT_ON_CHILD in mask of non-dir inode,
* but because we always allowed it, error only when using new APIs.
*/
- if (strict_dir_events && mark_type == FAN_MARK_INODE &&
- !is_dir && (mask & FANOTIFY_DIRONLY_EVENT_BITS))
+ if (strict_dir_events && mark_type == FAN_MARK_INODE && !is_dir &&
+ fanotify_test_fs_watcher_event(group, mask,
+ FANOTIFY_DIRONLY_EVENT_BITS))
return -ENOTDIR;
return 0;
@@ -2024,14 +2029,15 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask,
* Permission events are not allowed for FAN_CLASS_NOTIF.
* Pre-content permission events are not allowed for FAN_CLASS_CONTENT.
*/
- if (mask & FANOTIFY_PERM_EVENTS &&
+ if (fanotify_test_fs_watcher_event(group, mask, FANOTIFY_PERM_EVENTS) &&
group->priority == FSNOTIFY_PRIO_NORMAL)
return -EINVAL;
- else if (mask & FANOTIFY_PRE_CONTENT_EVENTS &&
+ else if (fanotify_test_fs_watcher_event(group, mask,
+ FANOTIFY_PRE_CONTENT_EVENTS) &&
group->priority == FSNOTIFY_PRIO_CONTENT)
return -EINVAL;
- if (mask & FAN_FS_ERROR &&
+ if (fanotify_test_fs_watcher_event(group, mask, FAN_FS_ERROR) &&
mark_type != FAN_MARK_FILESYSTEM)
return -EINVAL;
@@ -2061,11 +2067,14 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask,
* new parent+name. Reporting only old and new parent id is less
* useful and was not implemented.
*/
- if (mask & FAN_RENAME && !(fid_mode & FAN_REPORT_NAME))
+ if (fanotify_test_fs_watcher_event(group, mask, FAN_RENAME) &&
+ !(fid_mode & FAN_REPORT_NAME))
return -EINVAL;
/* Pre-content events are not currently generated for directories. */
- if (mask & FANOTIFY_PRE_CONTENT_EVENTS && mask & FAN_ONDIR)
+ if (mask & FAN_ONDIR &&
+ fanotify_test_fs_watcher_event(group, mask,
+ FANOTIFY_PRE_CONTENT_EVENTS))
return -EINVAL;
if (mark_cmd == FAN_MARK_FLUSH) {
--
2.54.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v2 08/10] fanotify: add support for watching the namespaces tree
2026-04-24 17:04 [PATCH v2 00/10] fanotify namespace monitoring Amir Goldstein
` (6 preceding siblings ...)
2026-04-24 17:05 ` [PATCH v2 07/10] fanotify: gate fs events checks in fanotify_mark() " Amir Goldstein
@ 2026-04-24 17:05 ` Amir Goldstein
2026-04-24 17:05 ` [PATCH v2 09/10] selftests/filesystems: create fanotify test dir Amir Goldstein
2026-04-24 17:05 ` [PATCH v2 10/10] selftests/filesystems: add fanotify namespace notifications test Amir Goldstein
9 siblings, 0 replies; 11+ messages in thread
From: Amir Goldstein @ 2026-04-24 17:05 UTC (permalink / raw)
To: Jan Kara; +Cc: Christian Brauner, linux-fsdevel
Introduce FAN_MARK_USERNS type to mark a user namespace object
from nsfs path.
Support two events FAN_NS_CREATE and FAN_NS_DELETE to report creation
and tear down of namespaces owned by the marked userns.
Introduce FAN_REPORT_NSID to report the self and owner nsid of
the created or torn down namespace.
An fanotify group initialized with flags FAN_REPORT_MNT and
FAN_REPORT_NSID, may add marks on both userns and mntns objects
to mix mount and namespace events, but the same group cannot also
request filesystem events.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
fs/notify/fanotify/fanotify.c | 33 +++++++++++++++-
fs/notify/fanotify/fanotify.h | 19 ++++++++++
fs/notify/fanotify/fanotify_user.c | 61 ++++++++++++++++++++++++++++--
fs/notify/fdinfo.c | 9 ++++-
fs/notify/fsnotify.c | 30 +++++++++++++++
fs/notify/fsnotify.h | 12 ++++++
fs/notify/mark.c | 7 ++++
fs/nsfs.c | 21 ++++++++++
include/linux/fanotify.h | 10 +++--
include/linux/fsnotify.h | 5 +++
include/linux/fsnotify_backend.h | 33 ++++++++++++++++
include/linux/proc_fs.h | 2 +
include/linux/user_namespace.h | 6 +++
include/uapi/linux/fanotify.h | 12 ++++++
kernel/nscommon.c | 47 +++++++++++++++++++++++
kernel/user_namespace.c | 2 +
16 files changed, 299 insertions(+), 10 deletions(-)
diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c
index 987092c38789b..b3add9ccea4cf 100644
--- a/fs/notify/fanotify/fanotify.c
+++ b/fs/notify/fanotify/fanotify.c
@@ -168,6 +168,8 @@ static bool fanotify_should_merge(struct fanotify_event *old,
FANOTIFY_EE(new));
case FANOTIFY_EVENT_TYPE_MNT:
return false;
+ case FANOTIFY_EVENT_TYPE_NS:
+ return false;
default:
WARN_ON_ONCE(1);
}
@@ -316,7 +318,8 @@ static u32 fanotify_group_event_mask(struct fsnotify_group *group,
if (fsnotify_is_ns_watcher(group)) {
user_mask = FANOTIFY_OUTGOING_NS_EVENTS;
- if (data_type != FSNOTIFY_EVENT_MNT)
+ if (data_type != FSNOTIFY_EVENT_MNT &&
+ data_type != FSNOTIFY_EVENT_NS)
return 0;
} else if (WARN_ON_ONCE(!fsnotify_is_fs_watcher(group))) {
return 0;
@@ -585,6 +588,23 @@ static struct fanotify_event *fanotify_alloc_mnt_event(u64 mnt_id, gfp_t gfp)
return &pevent->fae;
}
+static struct fanotify_event *fanotify_alloc_userns_event(
+ const struct fsnotify_ns *ns_data,
+ gfp_t gfp)
+{
+ struct fanotify_ns_event *pevent;
+
+ pevent = kmem_cache_alloc(fanotify_ns_event_cachep, gfp);
+ if (!pevent)
+ return NULL;
+
+ pevent->fae.type = FANOTIFY_EVENT_TYPE_NS;
+ pevent->self_nsid = ns_data->self_nsid;
+ pevent->owner_nsid = ns_data->owner_nsid;
+
+ return &pevent->fae;
+}
+
static struct fanotify_event *fanotify_alloc_perm_event(const void *data,
int data_type,
gfp_t gfp)
@@ -866,6 +886,7 @@ static struct fanotify_event *fanotify_alloc_ns_watcher_event(
struct fsnotify_group *group, u64 mask,
const void *data, int data_type)
{
+ const struct fsnotify_ns *ns_data = fsnotify_data_ns(data, data_type);
u64 mnt_id = fsnotify_data_mnt_id(data, data_type);
struct mem_cgroup *old_memcg;
struct fanotify_event *event = NULL;
@@ -880,6 +901,8 @@ static struct fanotify_event *fanotify_alloc_ns_watcher_event(
if (mnt_id) {
event = fanotify_alloc_mnt_event(mnt_id, gfp);
+ } else if (ns_data) {
+ event = fanotify_alloc_userns_event(ns_data, gfp);
} else {
WARN_ON_ONCE(1);
}
@@ -1110,6 +1133,11 @@ static void fanotify_free_mnt_event(struct fanotify_event *event)
kmem_cache_free(fanotify_mnt_event_cachep, FANOTIFY_ME(event));
}
+static void fanotify_free_ns_event(struct fanotify_event *event)
+{
+ kmem_cache_free(fanotify_ns_event_cachep, FANOTIFY_NSE(event));
+}
+
static void fanotify_free_event(struct fsnotify_group *group,
struct fsnotify_event *fsn_event)
{
@@ -1139,6 +1167,9 @@ static void fanotify_free_event(struct fsnotify_group *group,
case FANOTIFY_EVENT_TYPE_MNT:
fanotify_free_mnt_event(event);
break;
+ case FANOTIFY_EVENT_TYPE_NS:
+ fanotify_free_ns_event(event);
+ break;
default:
WARN_ON_ONCE(1);
}
diff --git a/fs/notify/fanotify/fanotify.h b/fs/notify/fanotify/fanotify.h
index 56bbee15b7ee3..c6c5145101908 100644
--- a/fs/notify/fanotify/fanotify.h
+++ b/fs/notify/fanotify/fanotify.h
@@ -11,6 +11,7 @@ extern struct kmem_cache *fanotify_fid_event_cachep;
extern struct kmem_cache *fanotify_path_event_cachep;
extern struct kmem_cache *fanotify_perm_event_cachep;
extern struct kmem_cache *fanotify_mnt_event_cachep;
+extern struct kmem_cache *fanotify_ns_event_cachep;
/* Possible states of the permission event */
enum {
@@ -246,6 +247,7 @@ enum fanotify_event_type {
FANOTIFY_EVENT_TYPE_OVERFLOW, /* struct fanotify_event */
FANOTIFY_EVENT_TYPE_FS_ERROR, /* struct fanotify_error_event */
FANOTIFY_EVENT_TYPE_MNT,
+ FANOTIFY_EVENT_TYPE_NS,
__FANOTIFY_EVENT_TYPE_NUM
};
@@ -417,6 +419,12 @@ struct fanotify_mnt_event {
u64 mnt_id;
};
+struct fanotify_ns_event {
+ struct fanotify_event fae;
+ u64 self_nsid;
+ u64 owner_nsid;
+};
+
static inline struct fanotify_path_event *
FANOTIFY_PE(struct fanotify_event *event)
{
@@ -429,6 +437,12 @@ FANOTIFY_ME(struct fanotify_event *event)
return container_of(event, struct fanotify_mnt_event, fae);
}
+static inline struct fanotify_ns_event *
+FANOTIFY_NSE(struct fanotify_event *event)
+{
+ return container_of(event, struct fanotify_ns_event, fae);
+}
+
/*
* Structure for permission fanotify events. It gets allocated and freed in
* fanotify_handle_event() since we wait there for user response. When the
@@ -511,6 +525,11 @@ static inline bool fanotify_is_mnt_event(struct fanotify_event *event)
return event->type == FANOTIFY_EVENT_TYPE_MNT;
}
+static inline bool fanotify_is_ns_event(const struct fanotify_event *event)
+{
+ return event->type == FANOTIFY_EVENT_TYPE_NS;
+}
+
static inline const struct path *fanotify_event_path(struct fanotify_event *event)
{
if (event->type == FANOTIFY_EVENT_TYPE_PATH)
diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
index 4c1767b3c1a06..b3f75aaed74ce 100644
--- a/fs/notify/fanotify/fanotify_user.c
+++ b/fs/notify/fanotify/fanotify_user.c
@@ -19,6 +19,7 @@
#include <linux/memcontrol.h>
#include <linux/statfs.h>
#include <linux/exportfs.h>
+#include <linux/proc_fs.h>
#include <asm/ioctls.h>
@@ -208,6 +209,7 @@ struct kmem_cache *fanotify_fid_event_cachep __ro_after_init;
struct kmem_cache *fanotify_path_event_cachep __ro_after_init;
struct kmem_cache *fanotify_perm_event_cachep __ro_after_init;
struct kmem_cache *fanotify_mnt_event_cachep __ro_after_init;
+struct kmem_cache *fanotify_ns_event_cachep __ro_after_init;
#define FANOTIFY_EVENT_ALIGN 4
#define FANOTIFY_FID_INFO_HDR_LEN \
@@ -220,6 +222,8 @@ struct kmem_cache *fanotify_mnt_event_cachep __ro_after_init;
(sizeof(struct fanotify_event_info_range))
#define FANOTIFY_MNT_INFO_LEN \
(sizeof(struct fanotify_event_info_mnt))
+#define FANOTIFY_NS_INFO_LEN \
+ (sizeof(struct fanotify_event_info_ns))
static int fanotify_fid_info_len(int fh_len, int name_len)
{
@@ -277,6 +281,8 @@ static size_t fanotify_event_len(unsigned int info_mode,
}
if (fanotify_is_mnt_event(event))
event_len += FANOTIFY_MNT_INFO_LEN;
+ if (fanotify_is_ns_event(event))
+ event_len += FANOTIFY_NS_INFO_LEN;
if (info_mode & FAN_REPORT_PIDFD)
event_len += FANOTIFY_PIDFD_INFO_LEN;
@@ -523,6 +529,26 @@ static size_t copy_mnt_info_to_user(struct fanotify_event *event,
return info.hdr.len;
}
+static size_t copy_ns_info_to_user(struct fanotify_event *event,
+ char __user *buf, int count)
+{
+ struct fanotify_event_info_ns info = { };
+
+ info.hdr.info_type = FAN_EVENT_INFO_TYPE_NS;
+ info.hdr.len = sizeof(info);
+
+ if (WARN_ON(count < info.hdr.len))
+ return -EFAULT;
+
+ info.self_nsid = FANOTIFY_NSE(event)->self_nsid;
+ info.owner_nsid = FANOTIFY_NSE(event)->owner_nsid;
+
+ if (copy_to_user(buf, &info, sizeof(info)))
+ return -EFAULT;
+
+ return info.hdr.len;
+}
+
static size_t copy_error_info_to_user(struct fanotify_event *event,
char __user *buf, int count)
{
@@ -827,6 +853,15 @@ static int copy_info_records_to_user(struct fanotify_event *event,
total_bytes += ret;
}
+ if (fanotify_is_ns_event(event)) {
+ ret = copy_ns_info_to_user(event, buf, count);
+ if (ret < 0)
+ return ret;
+ buf += ret;
+ count -= ret;
+ total_bytes += ret;
+ }
+
return total_bytes;
}
@@ -1918,10 +1953,17 @@ static bool fanotify_is_valid_mask(struct fsnotify_group *group, int mark_type,
valid_mask &= ~FANOTIFY_PERM_EVENTS;
break;
case FSNOTIFY_GROUP_TYPE_NS:
- /* Only report mount events on mntns mark */
+ /*
+ * Only report mount events on mntns mark
+ * Only report ns events on userns mark
+ */
if (mark_type == FAN_MARK_MNTNS &&
- FAN_GROUP_FLAG(group, FAN_REPORT_MNT))
+ FAN_GROUP_FLAG(group, FAN_REPORT_MNT)) {
valid_mask = FANOTIFY_MOUNT_EVENTS;
+ } else if (mark_type == FAN_MARK_USERNS &&
+ FAN_GROUP_FLAG(group, FAN_REPORT_NSID)) {
+ valid_mask = FANOTIFY_NS_EVENTS;
+ }
break;
}
@@ -1973,6 +2015,10 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask,
obj_type = FSNOTIFY_OBJ_TYPE_MNTNS;
group_type = FSNOTIFY_GROUP_TYPE_NS;
break;
+ case FAN_MARK_USERNS:
+ obj_type = FSNOTIFY_OBJ_TYPE_USERNS;
+ group_type = FSNOTIFY_GROUP_TYPE_NS;
+ break;
default:
return -EINVAL;
}
@@ -2136,6 +2182,12 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask,
goto path_put_and_out;
user_ns = mntns->user_ns;
obj = mntns;
+ } else if (obj_type == FSNOTIFY_OBJ_TYPE_USERNS) {
+ ret = -EINVAL;
+ user_ns = userns_from_dentry(path.dentry);
+ if (!user_ns)
+ goto path_put_and_out;
+ obj = user_ns;
}
ret = -EPERM;
@@ -2239,8 +2291,8 @@ static int __init fanotify_user_setup(void)
FANOTIFY_DEFAULT_MAX_USER_MARKS);
BUILD_BUG_ON(FANOTIFY_INIT_FLAGS & FANOTIFY_INTERNAL_GROUP_FLAGS);
- BUILD_BUG_ON(HWEIGHT32(FANOTIFY_INIT_FLAGS) != 14);
- BUILD_BUG_ON(HWEIGHT32(FANOTIFY_MARK_FLAGS) != 11);
+ BUILD_BUG_ON(HWEIGHT32(FANOTIFY_INIT_FLAGS) != 15);
+ BUILD_BUG_ON(HWEIGHT32(FANOTIFY_MARK_FLAGS) != 12);
fanotify_mark_cache = KMEM_CACHE(fanotify_mark,
SLAB_PANIC|SLAB_ACCOUNT);
@@ -2253,6 +2305,7 @@ static int __init fanotify_user_setup(void)
KMEM_CACHE(fanotify_perm_event, SLAB_PANIC);
}
fanotify_mnt_event_cachep = KMEM_CACHE(fanotify_mnt_event, SLAB_PANIC);
+ fanotify_ns_event_cachep = KMEM_CACHE(fanotify_ns_event, SLAB_PANIC);
fanotify_max_queued_events = FANOTIFY_DEFAULT_MAX_EVENTS;
init_user_ns.ucount_max[UCOUNT_FANOTIFY_GROUPS] =
diff --git a/fs/notify/fdinfo.c b/fs/notify/fdinfo.c
index 0f731eddeb8be..fa05253f19e19 100644
--- a/fs/notify/fdinfo.c
+++ b/fs/notify/fdinfo.c
@@ -130,8 +130,13 @@ static void fanotify_fdinfo(struct seq_file *m, struct fsnotify_mark *mark)
} else if (mark->connector->type == FSNOTIFY_OBJ_TYPE_MNTNS) {
struct mnt_namespace *mnt_ns = fsnotify_conn_mntns(mark->connector);
- seq_printf(m, "fanotify mnt_ns:%u mflags:%x mask:%x ignored_mask:%x\n",
- mnt_ns->ns.inum, mflags, mark->mask, mark->ignore_mask);
+ seq_printf(m, "fanotify mnt_ns_id:%llu mflags:%x mask:%x ignored_mask:%x\n",
+ mnt_ns->ns.ns_id, mflags, mark->mask, mark->ignore_mask);
+ } else if (mark->connector->type == FSNOTIFY_OBJ_TYPE_USERNS) {
+ struct user_namespace *userns = fsnotify_conn_userns(mark->connector);
+
+ seq_printf(m, "fanotify user_ns_id:%llu mflags:%x mask:%x ignored_mask:%x\n",
+ userns->ns.ns_id, mflags, mark->mask, mark->ignore_mask);
}
}
diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c
index db79f51d8109c..9ffa96e6e7f4d 100644
--- a/fs/notify/fsnotify.c
+++ b/fs/notify/fsnotify.c
@@ -33,6 +33,11 @@ void __fsnotify_mntns_delete(struct mnt_namespace *mntns)
fsnotify_clear_marks_by_mntns(mntns);
}
+void __fsnotify_userns_delete(struct user_namespace *userns)
+{
+ fsnotify_clear_marks_by_userns(userns);
+}
+
void fsnotify_sb_delete(struct super_block *sb)
{
struct fsnotify_sb_info *sbinfo = fsnotify_sb_info(sb);
@@ -702,12 +707,15 @@ int fsnotify_open_perm_and_set_mode(struct file *file)
static int send_to_ns_groups(__u32 mask, const void *data, int data_type)
{
const struct fsnotify_mnt *mnt_data = fsnotify_data_mnt(data, data_type);
+ const struct fsnotify_ns *ns_data = fsnotify_data_ns(data, data_type);
struct fsnotify_iter_info iter_info = {};
__u32 test_mask, marks_mask = 0;
int ret;
if (mnt_data)
marks_mask |= READ_ONCE(mnt_data->ns->n_fsnotify_mask);
+ if (ns_data)
+ marks_mask |= READ_ONCE(ns_data->userns->n_fsnotify_mask);
test_mask = mask & FSNOTIFY_EVENTS_ON_NS;
if (!(test_mask & marks_mask))
@@ -719,6 +727,10 @@ static int send_to_ns_groups(__u32 mask, const void *data, int data_type)
iter_info.marks[FSNOTIFY_ITER_TYPE_MNTNS] =
fsnotify_first_mark(&mnt_data->ns->n_fsnotify_marks);
}
+ if (ns_data) {
+ iter_info.marks[FSNOTIFY_ITER_TYPE_USERNS] =
+ fsnotify_first_mark(&ns_data->userns->n_fsnotify_marks);
+ }
ret = send_to_groups(mask, data, data_type, NULL, NULL, 0, &iter_info,
FSNOTIFY_GROUP_TYPE_NS);
@@ -748,6 +760,24 @@ void fsnotify_mnt(__u32 mask, struct mnt_namespace *ns, struct vfsmount *mnt)
send_to_ns_groups(mask, &data, FSNOTIFY_EVENT_MNT);
}
+void fsnotify_ns(__u32 mask, struct user_namespace *userns,
+ u64 self_nsid, u64 owner_nsid)
+{
+ struct fsnotify_ns data = {
+ .userns = userns,
+ .self_nsid = self_nsid,
+ .owner_nsid = owner_nsid,
+ };
+
+ if (WARN_ON_ONCE(!userns))
+ return;
+
+ if (!READ_ONCE(userns->n_fsnotify_marks))
+ return;
+
+ send_to_ns_groups(mask, &data, FSNOTIFY_EVENT_NS);
+}
+
static __init int fsnotify_init(void)
{
int ret;
diff --git a/fs/notify/fsnotify.h b/fs/notify/fsnotify.h
index 58c7bb25e5718..557a5734a6841 100644
--- a/fs/notify/fsnotify.h
+++ b/fs/notify/fsnotify.h
@@ -6,6 +6,7 @@
#include <linux/fsnotify.h>
#include <linux/srcu.h>
#include <linux/types.h>
+#include <linux/user_namespace.h>
#include "../mount.h"
@@ -39,6 +40,12 @@ static inline struct mnt_namespace *fsnotify_conn_mntns(
return conn->obj;
}
+static inline struct user_namespace *fsnotify_conn_userns(
+ struct fsnotify_mark_connector *conn)
+{
+ return conn->obj;
+}
+
static inline struct super_block *fsnotify_object_sb(void *obj,
enum fsnotify_obj_type obj_type)
{
@@ -103,6 +110,11 @@ static inline void fsnotify_clear_marks_by_mntns(struct mnt_namespace *mntns)
fsnotify_destroy_marks(&mntns->n_fsnotify_marks);
}
+static inline void fsnotify_clear_marks_by_userns(struct user_namespace *userns)
+{
+ fsnotify_destroy_marks(&userns->n_fsnotify_marks);
+}
+
/*
* update the dentry->d_flags of all of inode's children to indicate if inode cares
* about events that happen to its children.
diff --git a/fs/notify/mark.c b/fs/notify/mark.c
index 961475090f088..76b01dba7b727 100644
--- a/fs/notify/mark.c
+++ b/fs/notify/mark.c
@@ -74,6 +74,7 @@
#include <linux/atomic.h>
#include <linux/fsnotify_backend.h>
+#include <linux/user_namespace.h>
#include "fsnotify.h"
#define FSNOTIFY_REAPER_DELAY (1) /* 1 jiffy */
@@ -110,6 +111,8 @@ static fsnotify_connp_t *fsnotify_object_connp(void *obj,
return fsnotify_sb_marks(obj);
case FSNOTIFY_OBJ_TYPE_MNTNS:
return &((struct mnt_namespace *)obj)->n_fsnotify_marks;
+ case FSNOTIFY_OBJ_TYPE_USERNS:
+ return &((struct user_namespace *)obj)->n_fsnotify_marks;
default:
return NULL;
}
@@ -125,6 +128,8 @@ static __u32 *fsnotify_conn_mask_p(struct fsnotify_mark_connector *conn)
return &fsnotify_conn_sb(conn)->s_fsnotify_mask;
else if (conn->type == FSNOTIFY_OBJ_TYPE_MNTNS)
return &fsnotify_conn_mntns(conn)->n_fsnotify_mask;
+ else if (conn->type == FSNOTIFY_OBJ_TYPE_USERNS)
+ return &fsnotify_conn_userns(conn)->n_fsnotify_mask;
return NULL;
}
@@ -389,6 +394,8 @@ static void *fsnotify_detach_connector_from_object(
fsnotify_conn_sb(conn)->s_fsnotify_mask = 0;
} else if (conn->type == FSNOTIFY_OBJ_TYPE_MNTNS) {
fsnotify_conn_mntns(conn)->n_fsnotify_mask = 0;
+ } else if (conn->type == FSNOTIFY_OBJ_TYPE_USERNS) {
+ fsnotify_conn_userns(conn)->n_fsnotify_mask = 0;
}
rcu_assign_pointer(*connp, NULL);
diff --git a/fs/nsfs.c b/fs/nsfs.c
index 51e8c9430477b..b0c3ffe528b31 100644
--- a/fs/nsfs.c
+++ b/fs/nsfs.c
@@ -387,6 +387,27 @@ bool proc_ns_file(const struct file *file)
return file->f_op == &ns_file_operations;
}
+/**
+ * userns_from_dentry() - Return the user_namespace referenced by an nsfs dentry.
+ * @dentry: dentry of an open nsfs file
+ *
+ * Returns the user_namespace if @dentry is an nsfs file for a user namespace,
+ * NULL otherwise. The caller is responsible for ensuring the returned pointer
+ * remains valid (e.g. by holding a reference to the dentry).
+ */
+struct user_namespace *userns_from_dentry(struct dentry *dentry)
+{
+ struct inode *inode = d_inode(dentry);
+ struct ns_common *ns;
+
+ if (!inode || inode->i_sb->s_magic != NSFS_MAGIC)
+ return NULL;
+ ns = get_proc_ns(inode);
+ if (!ns || ns->ns_type != CLONE_NEWUSER)
+ return NULL;
+ return to_user_ns(ns);
+}
+
/**
* ns_match() - Returns true if current namespace matches dev/ino provided.
* @ns: current namespace
diff --git a/include/linux/fanotify.h b/include/linux/fanotify.h
index 224303a0c31e1..b1aa1e432e92a 100644
--- a/include/linux/fanotify.h
+++ b/include/linux/fanotify.h
@@ -28,7 +28,7 @@
#define FANOTIFY_INFO_MODES (FANOTIFY_FID_BITS | FAN_REPORT_PIDFD)
/* fanotify_init() flags to create a namepsace event watcher */
-#define FANOTIFY_NS_INIT_FLAGS (FAN_REPORT_MNT)
+#define FANOTIFY_NS_INIT_FLAGS (FAN_REPORT_MNT | FAN_REPORT_NSID)
/*
* fanotify_init() flags that require CAP_SYS_ADMIN.
@@ -62,7 +62,8 @@
#define FANOTIFY_INTERNAL_GROUP_FLAGS (FANOTIFY_UNPRIV)
#define FANOTIFY_MARK_TYPE_BITS (FAN_MARK_INODE | FAN_MARK_MOUNT | \
- FAN_MARK_FILESYSTEM | FAN_MARK_MNTNS)
+ FAN_MARK_FILESYSTEM | FAN_MARK_MNTNS | \
+ FAN_MARK_USERNS)
#define FANOTIFY_MARK_CMD_BITS (FAN_MARK_ADD | FAN_MARK_REMOVE | \
FAN_MARK_FLUSH)
@@ -122,8 +123,11 @@
/* Mount tree monitoring events */
#define FANOTIFY_MOUNT_EVENTS (FAN_MNT_ATTACH | FAN_MNT_DETACH)
+/* Namespace tree monitoring events */
+#define FANOTIFY_NS_EVENTS (FAN_NS_CREATE | FAN_NS_DELETE)
+
/* Events that user can request to be notified on namepsace watchers */
-#define FANOTIFY_EVENTS_ON_NS (FANOTIFY_MOUNT_EVENTS)
+#define FANOTIFY_EVENTS_ON_NS (FANOTIFY_MOUNT_EVENTS | FANOTIFY_NS_EVENTS)
/* Extra flags that may be reported with event or control handling of events */
#define FANOTIFY_EVENT_FLAGS (FAN_EVENT_ON_CHILD | FAN_ONDIR)
diff --git a/include/linux/fsnotify.h b/include/linux/fsnotify.h
index 079c18bcdbde6..ddb13cd960214 100644
--- a/include/linux/fsnotify.h
+++ b/include/linux/fsnotify.h
@@ -300,6 +300,11 @@ static inline void fsnotify_mntns_delete(struct mnt_namespace *mntns)
__fsnotify_mntns_delete(mntns);
}
+static inline void fsnotify_userns_delete(struct user_namespace *userns)
+{
+ __fsnotify_userns_delete(userns);
+}
+
/*
* fsnotify_inoderemove - an inode is going away
*/
diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
index 9ce08d03d041d..019807844ca9c 100644
--- a/include/linux/fsnotify_backend.h
+++ b/include/linux/fsnotify_backend.h
@@ -79,6 +79,9 @@
*
* NOTE: These values may overload filesystem events, but not event flags
*/
+#define FS_NS_CREATE 0x00000100 /* Sub namespace was created */
+#define FS_NS_DELETE 0x00000200 /* Sub namespace was deleted */
+
#define FS_MNT_ATTACH 0x01000000 /* Mount was attached */
#define FS_MNT_DETACH 0x02000000 /* Mount was detached */
#define FS_MNT_MOVE (FS_MNT_ATTACH | FS_MNT_DETACH)
@@ -128,8 +131,12 @@
/* Mount tree monitoring events */
#define FSNOTIFY_MNT_EVENTS (FS_MNT_ATTACH | FS_MNT_DETACH)
+/* Namespace tree monitoring events */
+#define FSNOTIFY_NS_EVENTS (FS_NS_CREATE | FS_NS_DELETE)
+
/* Events that can be reported to backends on namepsace watchers */
#define FSNOTIFY_EVENTS_ON_NS (FSNOTIFY_MNT_EVENTS | \
+ FSNOTIFY_NS_EVENTS | \
FS_Q_OVERFLOW)
/* Events that can be reported to backends */
@@ -344,6 +351,7 @@ enum fsnotify_data_type {
FSNOTIFY_EVENT_INODE,
FSNOTIFY_EVENT_DENTRY,
FSNOTIFY_EVENT_MNT,
+ FSNOTIFY_EVENT_NS,
FSNOTIFY_EVENT_ERROR,
};
@@ -369,6 +377,12 @@ struct fsnotify_mnt {
u64 mnt_id;
};
+struct fsnotify_ns {
+ const struct user_namespace *userns;
+ u64 self_nsid;
+ u64 owner_nsid;
+};
+
static inline struct inode *fsnotify_data_inode(const void *data, int data_type)
{
switch (data_type) {
@@ -445,6 +459,17 @@ static inline const struct fsnotify_mnt *fsnotify_data_mnt(const void *data,
}
}
+static inline const struct fsnotify_ns *fsnotify_data_ns(const void *data,
+ int data_type)
+{
+ switch (data_type) {
+ case FSNOTIFY_EVENT_NS:
+ return data;
+ default:
+ return NULL;
+ }
+}
+
static inline u64 fsnotify_data_mnt_id(const void *data, int data_type)
{
const struct fsnotify_mnt *mnt_data = fsnotify_data_mnt(data, data_type);
@@ -490,6 +515,7 @@ enum fsnotify_iter_type {
FSNOTIFY_ITER_TYPE_PARENT,
FSNOTIFY_ITER_TYPE_INODE2,
FSNOTIFY_ITER_TYPE_MNTNS,
+ FSNOTIFY_ITER_TYPE_USERNS,
FSNOTIFY_ITER_TYPE_COUNT
};
@@ -500,6 +526,7 @@ enum fsnotify_obj_type {
FSNOTIFY_OBJ_TYPE_VFSMOUNT,
FSNOTIFY_OBJ_TYPE_SB,
FSNOTIFY_OBJ_TYPE_MNTNS,
+ FSNOTIFY_OBJ_TYPE_USERNS,
FSNOTIFY_OBJ_TYPE_COUNT,
FSNOTIFY_OBJ_TYPE_DETACHED = FSNOTIFY_OBJ_TYPE_COUNT
};
@@ -688,9 +715,12 @@ extern void __fsnotify_inode_delete(struct inode *inode);
extern void __fsnotify_vfsmount_delete(struct vfsmount *mnt);
extern void fsnotify_sb_delete(struct super_block *sb);
extern void __fsnotify_mntns_delete(struct mnt_namespace *mntns);
+extern void __fsnotify_userns_delete(struct user_namespace *userns);
extern void fsnotify_sb_free(struct super_block *sb);
extern u32 fsnotify_get_cookie(void);
extern void fsnotify_mnt(__u32 mask, struct mnt_namespace *ns, struct vfsmount *mnt);
+extern void fsnotify_ns(__u32 mask, struct user_namespace *userns,
+ u64 self_nsid, u64 owner_nsid);
static inline __u32 fsnotify_parent_needed_mask(__u32 mask)
{
@@ -992,6 +1022,9 @@ static inline void fsnotify_sb_delete(struct super_block *sb)
static inline void __fsnotify_mntns_delete(struct mnt_namespace *mntns)
{}
+static inline void __fsnotify_userns_delete(struct user_namespace *userns)
+{}
+
static inline void fsnotify_sb_free(struct super_block *sb)
{}
diff --git a/include/linux/proc_fs.h b/include/linux/proc_fs.h
index 19d1c5e5f3350..3b7d2bc88ae6c 100644
--- a/include/linux/proc_fs.h
+++ b/include/linux/proc_fs.h
@@ -248,4 +248,6 @@ static inline struct pid_namespace *proc_pid_ns(struct super_block *sb)
bool proc_ns_file(const struct file *file);
+struct user_namespace *userns_from_dentry(struct dentry *dentry);
+
#endif /* _LINUX_PROC_FS_H */
diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
index 9c3be157397e0..7ff8420495308 100644
--- a/include/linux/user_namespace.h
+++ b/include/linux/user_namespace.h
@@ -13,6 +13,8 @@
#include <linux/sysctl.h>
#include <linux/err.h>
+struct fsnotify_mark_connector;
+
#define UID_GID_MAP_MAX_BASE_EXTENTS 5
#define UID_GID_MAP_MAX_EXTENTS 340
@@ -86,6 +88,10 @@ struct user_namespace {
/* parent_could_setfcap: true if the creator if this ns had CAP_SETFCAP
* in its effective capability set at the child ns creation time. */
bool parent_could_setfcap;
+#ifdef CONFIG_FSNOTIFY
+ __u32 n_fsnotify_mask;
+ struct fsnotify_mark_connector __rcu *n_fsnotify_marks;
+#endif
#ifdef CONFIG_KEYS
/* List of joinable keyrings in this namespace. Modification access of
diff --git a/include/uapi/linux/fanotify.h b/include/uapi/linux/fanotify.h
index cfcd193aee3e2..8a12db80f9d80 100644
--- a/include/uapi/linux/fanotify.h
+++ b/include/uapi/linux/fanotify.h
@@ -48,6 +48,9 @@
*
* NOTE: These values may overload filesystem events, but not event flags
*/
+#define FAN_NS_CREATE 0x00000100 /* Sub namespace was created */
+#define FAN_NS_DELETE 0x00000200 /* Sub namespace was deleted */
+
#define FAN_MNT_ATTACH 0x01000000 /* Mount was attached */
#define FAN_MNT_DETACH 0x02000000 /* Mount was detached */
@@ -78,6 +81,7 @@
#define FAN_REPORT_TARGET_FID 0x00001000 /* Report dirent target id */
#define FAN_REPORT_FD_ERROR 0x00002000 /* event->fd can report error */
#define FAN_REPORT_MNT 0x00004000 /* Report mount events */
+#define FAN_REPORT_NSID 0x00008000 /* Report namespace events */
/* Convenience macro - FAN_REPORT_NAME requires FAN_REPORT_DIR_FID */
#define FAN_REPORT_DFID_NAME (FAN_REPORT_DIR_FID | FAN_REPORT_NAME)
@@ -109,6 +113,7 @@
#define FAN_MARK_MOUNT 0x00000010
#define FAN_MARK_FILESYSTEM 0x00000100
#define FAN_MARK_MNTNS 0x00000110
+#define FAN_MARK_USERNS 0x00001000
/*
* Convenience macro - FAN_MARK_IGNORE requires FAN_MARK_IGNORED_SURV_MODIFY
@@ -163,6 +168,7 @@ struct fanotify_event_metadata {
#define FAN_EVENT_INFO_TYPE_ERROR 5
#define FAN_EVENT_INFO_TYPE_RANGE 6
#define FAN_EVENT_INFO_TYPE_MNT 7
+#define FAN_EVENT_INFO_TYPE_NS 8
/* Special info types for FAN_RENAME */
#define FAN_EVENT_INFO_TYPE_OLD_DFID_NAME 10
@@ -221,6 +227,12 @@ struct fanotify_event_info_mnt {
__u64 mnt_id;
};
+struct fanotify_event_info_ns {
+ struct fanotify_event_info_header hdr;
+ __u64 self_nsid; /* ns_id of the namespace */
+ __u64 owner_nsid; /* ns_id of its owning user namespace */
+};
+
/*
* User space may need to record additional information about its decision.
* The extra information type records what kind of information is included.
diff --git a/kernel/nscommon.c b/kernel/nscommon.c
index 3166c1fd844af..6317d1e830c58 100644
--- a/kernel/nscommon.c
+++ b/kernel/nscommon.c
@@ -6,6 +6,7 @@
#include <linux/proc_ns.h>
#include <linux/user_namespace.h>
#include <linux/vfsdebug.h>
+#include <linux/fsnotify_backend.h>
#ifdef CONFIG_DEBUG_VFS
static void ns_debug(struct ns_common *ns, const struct proc_ns_operations *ops)
@@ -111,6 +112,44 @@ struct ns_common *__must_check ns_owner(struct ns_common *ns)
return to_ns_common(owner);
}
+/*
+ * Return the owning user_namespace of @ns, including init_user_ns.
+ * Unlike ns_owner(), which returns NULL for namespaces owned by
+ * init_user_ns (to serve as a propagation terminator), this gives us
+ * the real owner for notification routing.
+ */
+static struct user_namespace *ns_direct_owner(struct ns_common *ns)
+{
+ if (unlikely(!ns->ops || !ns->ops->owner))
+ return NULL;
+ return ns->ops->owner(ns);
+}
+
+static void ns_common_notify(__u32 mask, struct ns_common *ns)
+{
+ struct user_namespace *owner_userns;
+
+ if (!IS_ENABLED(CONFIG_FSNOTIFY))
+ return;
+
+ owner_userns = ns_direct_owner(ns);
+ if (!owner_userns)
+ return;
+
+#ifdef CONFIG_FSNOTIFY
+ /*
+ * READ_ONCE macro expansion does not understand that this code
+ * is not reachable without CONFIG_FSNOTIFY.
+ */
+ if (!READ_ONCE(owner_userns->n_fsnotify_marks))
+ return;
+#endif
+
+ /* Report child namespace events to owner userns watchers */
+ fsnotify_ns(mask, owner_userns, ns->ns_id,
+ to_ns_common(owner_userns)->ns_id);
+}
+
/*
* The active reference count works by having each namespace that gets
* created take a single active reference on its owning user namespace.
@@ -172,6 +211,8 @@ void __ns_ref_active_put(struct ns_common *ns)
return;
}
+ ns_common_notify(FS_NS_DELETE, ns);
+
VFS_WARN_ON_ONCE(is_ns_init_id(ns));
VFS_WARN_ON_ONCE(!__ns_ref_read(ns));
@@ -184,6 +225,8 @@ void __ns_ref_active_put(struct ns_common *ns)
VFS_WARN_ON_ONCE(__ns_ref_active_read(ns) < 0);
return;
}
+
+ ns_common_notify(FS_NS_DELETE, ns);
}
}
@@ -293,6 +336,8 @@ void __ns_ref_active_get(struct ns_common *ns)
if (likely(prev))
return;
+ ns_common_notify(FS_NS_CREATE, ns);
+
/*
* We did resurrect it. Walk the ownership hierarchy upwards
* until we found an owning user namespace that is active.
@@ -307,6 +352,8 @@ void __ns_ref_active_get(struct ns_common *ns)
VFS_WARN_ON_ONCE(prev < 0);
if (likely(prev))
return;
+
+ ns_common_notify(FS_NS_CREATE, ns);
}
}
diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
index 0bed462e9b2a2..a7e8d1c33bfd5 100644
--- a/kernel/user_namespace.c
+++ b/kernel/user_namespace.c
@@ -22,6 +22,7 @@
#include <linux/bsearch.h>
#include <linux/sort.h>
#include <linux/nstree.h>
+#include <linux/fsnotify.h>
static struct kmem_cache *user_ns_cachep __ro_after_init;
static DEFINE_MUTEX(userns_state_mutex);
@@ -221,6 +222,7 @@ static void free_user_ns(struct work_struct *work)
retire_userns_sysctls(ns);
key_free_user_ns(ns);
ns_common_free(ns);
+ fsnotify_userns_delete(ns);
/* Concurrent nstree traversal depends on a grace period. */
kfree_rcu(ns, ns.ns_rcu);
dec_user_namespaces(ucounts);
--
2.54.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v2 09/10] selftests/filesystems: create fanotify test dir
2026-04-24 17:04 [PATCH v2 00/10] fanotify namespace monitoring Amir Goldstein
` (7 preceding siblings ...)
2026-04-24 17:05 ` [PATCH v2 08/10] fanotify: add support for watching the namespaces tree Amir Goldstein
@ 2026-04-24 17:05 ` Amir Goldstein
2026-04-24 17:05 ` [PATCH v2 10/10] selftests/filesystems: add fanotify namespace notifications test Amir Goldstein
9 siblings, 0 replies; 11+ messages in thread
From: Amir Goldstein @ 2026-04-24 17:05 UTC (permalink / raw)
To: Jan Kara; +Cc: Christian Brauner, linux-fsdevel
Rename the dir mount-notify with two fanotify mount notify tests
to fanotify before adding more fanotify tests.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
tools/testing/selftests/Makefile | 2 +-
.../selftests/filesystems/{mount-notify => fanotify}/.gitignore | 0
.../selftests/filesystems/{mount-notify => fanotify}/Makefile | 0
.../filesystems/{mount-notify => fanotify}/mount-notify_test.c | 0
.../{mount-notify => fanotify}/mount-notify_test_ns.c | 0
5 files changed, 1 insertion(+), 1 deletion(-)
rename tools/testing/selftests/filesystems/{mount-notify => fanotify}/.gitignore (100%)
rename tools/testing/selftests/filesystems/{mount-notify => fanotify}/Makefile (100%)
rename tools/testing/selftests/filesystems/{mount-notify => fanotify}/mount-notify_test.c (100%)
rename tools/testing/selftests/filesystems/{mount-notify => fanotify}/mount-notify_test_ns.c (100%)
diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile
index 984abb6d42ab9..29be6601249f1 100644
--- a/tools/testing/selftests/Makefile
+++ b/tools/testing/selftests/Makefile
@@ -36,7 +36,7 @@ TARGETS += filesystems/epoll
TARGETS += filesystems/fat
TARGETS += filesystems/overlayfs
TARGETS += filesystems/statmount
-TARGETS += filesystems/mount-notify
+TARGETS += filesystems/fanotify
TARGETS += filesystems/fuse
TARGETS += filesystems/move_mount
TARGETS += filesystems/empty_mntns
diff --git a/tools/testing/selftests/filesystems/mount-notify/.gitignore b/tools/testing/selftests/filesystems/fanotify/.gitignore
similarity index 100%
rename from tools/testing/selftests/filesystems/mount-notify/.gitignore
rename to tools/testing/selftests/filesystems/fanotify/.gitignore
diff --git a/tools/testing/selftests/filesystems/mount-notify/Makefile b/tools/testing/selftests/filesystems/fanotify/Makefile
similarity index 100%
rename from tools/testing/selftests/filesystems/mount-notify/Makefile
rename to tools/testing/selftests/filesystems/fanotify/Makefile
diff --git a/tools/testing/selftests/filesystems/mount-notify/mount-notify_test.c b/tools/testing/selftests/filesystems/fanotify/mount-notify_test.c
similarity index 100%
rename from tools/testing/selftests/filesystems/mount-notify/mount-notify_test.c
rename to tools/testing/selftests/filesystems/fanotify/mount-notify_test.c
diff --git a/tools/testing/selftests/filesystems/mount-notify/mount-notify_test_ns.c b/tools/testing/selftests/filesystems/fanotify/mount-notify_test_ns.c
similarity index 100%
rename from tools/testing/selftests/filesystems/mount-notify/mount-notify_test_ns.c
rename to tools/testing/selftests/filesystems/fanotify/mount-notify_test_ns.c
--
2.54.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v2 10/10] selftests/filesystems: add fanotify namespace notifications test
2026-04-24 17:04 [PATCH v2 00/10] fanotify namespace monitoring Amir Goldstein
` (8 preceding siblings ...)
2026-04-24 17:05 ` [PATCH v2 09/10] selftests/filesystems: create fanotify test dir Amir Goldstein
@ 2026-04-24 17:05 ` Amir Goldstein
9 siblings, 0 replies; 11+ messages in thread
From: Amir Goldstein @ 2026-04-24 17:05 UTC (permalink / raw)
To: Jan Kara; +Cc: Christian Brauner, linux-fsdevel
Test create and delete events for nsfs:
- For init userns and child userns
- Verify delete event is created regardless of vfs inode access
- Verify required ns capabilities
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
tools/include/uapi/linux/fanotify.h | 37 +-
.../selftests/filesystems/fanotify/Makefile | 3 +-
.../filesystems/fanotify/ns-notify_test.c | 330 ++++++++++++++++++
3 files changed, 362 insertions(+), 8 deletions(-)
create mode 100644 tools/testing/selftests/filesystems/fanotify/ns-notify_test.c
diff --git a/tools/include/uapi/linux/fanotify.h b/tools/include/uapi/linux/fanotify.h
index e710967c7c263..8a12db80f9d80 100644
--- a/tools/include/uapi/linux/fanotify.h
+++ b/tools/include/uapi/linux/fanotify.h
@@ -4,7 +4,9 @@
#include <linux/types.h>
-/* the following events that user-space can register for */
+/*
+ * Events that user-space can request when watching filesystems
+ */
#define FAN_ACCESS 0x00000001 /* File was accessed */
#define FAN_MODIFY 0x00000002 /* File was modified */
#define FAN_ATTRIB 0x00000004 /* Metadata changed */
@@ -28,19 +30,31 @@
/* #define FAN_DIR_MODIFY 0x00080000 */ /* Deprecated (reserved) */
#define FAN_PRE_ACCESS 0x00100000 /* Pre-content access hook */
-#define FAN_MNT_ATTACH 0x01000000 /* Mount was attached */
-#define FAN_MNT_DETACH 0x02000000 /* Mount was detached */
-
-#define FAN_EVENT_ON_CHILD 0x08000000 /* Interested in child events */
#define FAN_RENAME 0x10000000 /* File was renamed */
-#define FAN_ONDIR 0x40000000 /* Event occurred against dir */
-
/* helper events */
#define FAN_CLOSE (FAN_CLOSE_WRITE | FAN_CLOSE_NOWRITE) /* close */
#define FAN_MOVE (FAN_MOVED_FROM | FAN_MOVED_TO) /* moves */
+/*
+ * Filter flags for watching filesystems
+ */
+#define FAN_EVENT_ON_CHILD 0x08000000 /* Interested in child events */
+#define FAN_ONDIR 0x40000000 /* Event occurred against dir */
+
+/*
+ * Events that user-space can request when watching namespaces
+ *
+ * NOTE: These values may overload filesystem events, but not event flags
+ */
+#define FAN_NS_CREATE 0x00000100 /* Sub namespace was created */
+#define FAN_NS_DELETE 0x00000200 /* Sub namespace was deleted */
+
+#define FAN_MNT_ATTACH 0x01000000 /* Mount was attached */
+#define FAN_MNT_DETACH 0x02000000 /* Mount was detached */
+
+
/* flags used for fanotify_init() */
#define FAN_CLOEXEC 0x00000001
#define FAN_NONBLOCK 0x00000002
@@ -67,6 +81,7 @@
#define FAN_REPORT_TARGET_FID 0x00001000 /* Report dirent target id */
#define FAN_REPORT_FD_ERROR 0x00002000 /* event->fd can report error */
#define FAN_REPORT_MNT 0x00004000 /* Report mount events */
+#define FAN_REPORT_NSID 0x00008000 /* Report namespace events */
/* Convenience macro - FAN_REPORT_NAME requires FAN_REPORT_DIR_FID */
#define FAN_REPORT_DFID_NAME (FAN_REPORT_DIR_FID | FAN_REPORT_NAME)
@@ -98,6 +113,7 @@
#define FAN_MARK_MOUNT 0x00000010
#define FAN_MARK_FILESYSTEM 0x00000100
#define FAN_MARK_MNTNS 0x00000110
+#define FAN_MARK_USERNS 0x00001000
/*
* Convenience macro - FAN_MARK_IGNORE requires FAN_MARK_IGNORED_SURV_MODIFY
@@ -152,6 +168,7 @@ struct fanotify_event_metadata {
#define FAN_EVENT_INFO_TYPE_ERROR 5
#define FAN_EVENT_INFO_TYPE_RANGE 6
#define FAN_EVENT_INFO_TYPE_MNT 7
+#define FAN_EVENT_INFO_TYPE_NS 8
/* Special info types for FAN_RENAME */
#define FAN_EVENT_INFO_TYPE_OLD_DFID_NAME 10
@@ -210,6 +227,12 @@ struct fanotify_event_info_mnt {
__u64 mnt_id;
};
+struct fanotify_event_info_ns {
+ struct fanotify_event_info_header hdr;
+ __u64 self_nsid; /* ns_id of the namespace */
+ __u64 owner_nsid; /* ns_id of its owning user namespace */
+};
+
/*
* User space may need to record additional information about its decision.
* The extra information type records what kind of information is included.
diff --git a/tools/testing/selftests/filesystems/fanotify/Makefile b/tools/testing/selftests/filesystems/fanotify/Makefile
index 836a4eb7be062..d251249630985 100644
--- a/tools/testing/selftests/filesystems/fanotify/Makefile
+++ b/tools/testing/selftests/filesystems/fanotify/Makefile
@@ -3,9 +3,10 @@
CFLAGS += -Wall -O2 -g $(KHDR_INCLUDES) $(TOOLS_INCLUDES)
LDLIBS += -lcap
-TEST_GEN_PROGS := mount-notify_test mount-notify_test_ns
+TEST_GEN_PROGS := mount-notify_test mount-notify_test_ns ns-notify_test
include ../../lib.mk
$(OUTPUT)/mount-notify_test: ../utils.c
$(OUTPUT)/mount-notify_test_ns: ../utils.c
+$(OUTPUT)/ns-notify_test: ../utils.c
diff --git a/tools/testing/selftests/filesystems/fanotify/ns-notify_test.c b/tools/testing/selftests/filesystems/fanotify/ns-notify_test.c
new file mode 100644
index 0000000000000..012a62c92ee4a
--- /dev/null
+++ b/tools/testing/selftests/filesystems/fanotify/ns-notify_test.c
@@ -0,0 +1,330 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (c) 2025
+
+#define _GNU_SOURCE
+
+// Needed for linux/fanotify.h
+typedef struct {
+ int val[2];
+} __kernel_fsid_t;
+#define __kernel_fsid_t __kernel_fsid_t
+
+#include <fcntl.h>
+#include <sched.h>
+#include <signal.h>
+#include <stdio.h>
+#include <string.h>
+#include <sys/fanotify.h>
+#include <sys/wait.h>
+#include <unistd.h>
+
+#include "kselftest_harness.h"
+#include "../utils.h"
+
+#include <linux/fanotify.h>
+
+/*
+ * Retrieve the ns_id of a namespace fd via name_to_handle_at().
+ * nsfs encodes { ns_id(u64), ns_type(u32), ns_inum(u32) } in f_handle.
+ */
+static uint64_t get_ns_id(int fd)
+{
+ struct {
+ struct file_handle fh;
+ uint64_t ns_id;
+ uint32_t ns_type;
+ uint32_t ns_inum;
+ } h = { .fh.handle_bytes = sizeof(uint64_t) + sizeof(uint32_t) * 2 };
+ int mnt_id;
+
+ if (name_to_handle_at(fd, "", &h.fh, &mnt_id, AT_EMPTY_PATH))
+ return 0;
+ return h.ns_id;
+}
+
+static void read_ns_event_fd(struct __test_metadata *const _metadata,
+ int fd, char *buf, size_t buf_size,
+ uint64_t expect_mask,
+ uint64_t *self_nsid_out, uint64_t *owner_nsid_out)
+{
+ struct fanotify_event_metadata *meta;
+ struct fanotify_event_info_ns *info;
+ ssize_t len;
+
+ len = read(fd, buf, buf_size);
+ ASSERT_GT(len, 0);
+
+ meta = (struct fanotify_event_metadata *)buf;
+ ASSERT_TRUE(FAN_EVENT_OK(meta, len));
+ ASSERT_EQ(meta->mask, expect_mask);
+ ASSERT_EQ(meta->fd, FAN_NOFD);
+ ASSERT_EQ(meta->event_len,
+ sizeof(*meta) + sizeof(struct fanotify_event_info_ns));
+
+ info = (struct fanotify_event_info_ns *)(meta + 1);
+ ASSERT_EQ(info->hdr.info_type, FAN_EVENT_INFO_TYPE_NS);
+ ASSERT_EQ(info->hdr.len, sizeof(*info));
+
+ *self_nsid_out = info->self_nsid;
+ *owner_nsid_out = info->owner_nsid;
+}
+
+/* =========================================================================
+ * Outer tests: watch init_user_ns from root context (no setup_userns).
+ * ========================================================================= */
+
+/*
+ * Root-only: watch init_user_ns, fork a child that creates a user namespace
+ * owned by init_user_ns, verify FAN_CREATE, let the child exit, verify
+ * FAN_DELETE. The watched namespace is created and destroyed entirely within
+ * the test body so both events are observable.
+ */
+TEST(outer_create_delete_userns)
+{
+ int fan_fd, ns_fd;
+ int pipefd[2];
+ pid_t pid;
+ uint64_t ns_nsid, create_self, create_owner;
+ uint64_t delete_self, delete_owner;
+ char buf[256];
+ char c;
+
+ if (geteuid() != 0)
+ SKIP(return, "requires root");
+
+ ns_fd = open("/proc/self/ns/user", O_RDONLY);
+ ASSERT_GE(ns_fd, 0);
+
+ ns_nsid = get_ns_id(ns_fd);
+ ASSERT_NE(ns_nsid, 0);
+
+ fan_fd = fanotify_init(FAN_REPORT_NSID, 0);
+ ASSERT_GE(fan_fd, 0);
+
+ errno = 0;
+ ASSERT_EQ(fanotify_mark(fan_fd, FAN_MARK_ADD | FAN_MARK_USERNS,
+ FAN_NS_CREATE | FAN_NS_DELETE, ns_fd, NULL), 0)
+ TH_LOG("fanotify_mark errno=%d (%s)", errno, strerror(errno));
+
+ ASSERT_EQ(pipe(pipefd), 0);
+
+ pid = fork();
+ ASSERT_GE(pid, 0);
+
+ if (pid == 0) {
+ close(pipefd[0]);
+ if (unshare(CLONE_NEWUSER))
+ _exit(1);
+ if (write(pipefd[1], "r", 1) < 0)
+ _exit(1);
+ close(pipefd[1]);
+ pause();
+ _exit(0);
+ }
+
+ close(pipefd[1]);
+ ASSERT_EQ(read(pipefd[0], &c, 1), 1);
+ close(pipefd[0]);
+
+ /* --- FAN_NS_CREATE: new user namespace owned by init_user_ns --- */
+ read_ns_event_fd(_metadata, fan_fd, buf, sizeof(buf),
+ FAN_NS_CREATE, &create_self, &create_owner);
+ ASSERT_NE(create_self, 0);
+ ASSERT_EQ(create_owner, ns_nsid);
+
+ /* Let child exit, deactivating its user namespace */
+ kill(pid, SIGTERM);
+ waitpid(pid, NULL, 0);
+
+ /* --- FAN_NS_DELETE --- */
+ read_ns_event_fd(_metadata, fan_fd, buf, sizeof(buf),
+ FAN_NS_DELETE, &delete_self, &delete_owner);
+ ASSERT_EQ(delete_self, create_self);
+ ASSERT_EQ(delete_owner, ns_nsid);
+
+ close(fan_fd);
+ close(ns_fd);
+}
+
+/* =========================================================================
+ * Inner tests: watch a child userns from within it (via setup_userns).
+ * ========================================================================= */
+
+FIXTURE(userns_notify) {
+ int fan_fd;
+ int userns_fd;
+ int outer_ns_fd; /* init_user_ns fd, captured before setup_userns() */
+ uint64_t userns_nsid;
+ char buf[256];
+};
+
+FIXTURE_SETUP(userns_notify)
+{
+ int ret;
+
+ /* Capture the outer user namespace fd before setup_userns() */
+ self->outer_ns_fd = open("/proc/self/ns/user", O_RDONLY);
+ ASSERT_GE(self->outer_ns_fd, 0);
+
+ ret = setup_userns();
+ ASSERT_EQ(ret, 0);
+
+ self->userns_fd = open("/proc/self/ns/user", O_RDONLY);
+ ASSERT_GE(self->userns_fd, 0);
+
+ self->userns_nsid = get_ns_id(self->userns_fd);
+ ASSERT_NE(self->userns_nsid, 0);
+
+ self->fan_fd = fanotify_init(FAN_REPORT_NSID, 0);
+ ASSERT_GE(self->fan_fd, 0);
+
+ errno = 0;
+ ret = fanotify_mark(self->fan_fd, FAN_MARK_ADD | FAN_MARK_USERNS,
+ FAN_NS_CREATE | FAN_NS_DELETE,
+ self->userns_fd, NULL);
+ ASSERT_EQ(ret, 0)
+ TH_LOG("fanotify_mark errno=%d (%s)", errno, strerror(errno));
+}
+
+FIXTURE_TEARDOWN(userns_notify)
+{
+ close(self->fan_fd);
+ close(self->userns_fd);
+ close(self->outer_ns_fd);
+}
+
+static void read_ns_event(struct __test_metadata *const _metadata,
+ FIXTURE_DATA(userns_notify) *self,
+ uint64_t expect_mask,
+ uint64_t *self_nsid_out, uint64_t *owner_nsid_out)
+{
+ read_ns_event_fd(_metadata, self->fan_fd, self->buf, sizeof(self->buf),
+ expect_mask, self_nsid_out, owner_nsid_out);
+}
+
+/*
+ * Create a UTS namespace inside the watched user namespace, verify
+ * FAN_CREATE, then let the child exit and verify FAN_DELETE.
+ * Cross-check self_nsid against the actual ns_id obtained via
+ * name_to_handle_at() on the child's /proc/pid/ns/uts.
+ */
+TEST_F(userns_notify, inner_create_delete_uts)
+{
+ int pipefd[2];
+ pid_t pid;
+ uint64_t create_self, create_owner;
+ uint64_t delete_self, delete_owner;
+ char c;
+
+ ASSERT_EQ(pipe(pipefd), 0);
+
+ pid = fork();
+ ASSERT_GE(pid, 0);
+
+ if (pid == 0) {
+ close(pipefd[0]);
+ if (unshare(CLONE_NEWUTS))
+ _exit(1);
+ if (write(pipefd[1], "r", 1) < 0)
+ _exit(1);
+ close(pipefd[1]);
+ pause();
+ _exit(0);
+ }
+
+ close(pipefd[1]);
+ ASSERT_EQ(read(pipefd[0], &c, 1), 1);
+ close(pipefd[0]);
+
+ /* --- FAN_NS_CREATE --- */
+ read_ns_event(_metadata, self, FAN_NS_CREATE, &create_self, &create_owner);
+ ASSERT_NE(create_self, 0);
+ ASSERT_EQ(create_owner, self->userns_nsid);
+
+ /* Cross-check self_nsid against the child's actual UTS ns_id */
+ char path[64];
+ int ns_fd;
+ uint64_t uts_nsid;
+
+ snprintf(path, sizeof(path), "/proc/%d/ns/uts", pid);
+ ns_fd = open(path, O_RDONLY);
+ ASSERT_GE(ns_fd, 0);
+ uts_nsid = get_ns_id(ns_fd);
+ close(ns_fd);
+ ASSERT_EQ(uts_nsid, create_self);
+
+ kill(pid, SIGTERM);
+ waitpid(pid, NULL, 0);
+
+ /* --- FAN_NS_DELETE --- */
+ read_ns_event(_metadata, self, FAN_NS_DELETE, &delete_self, &delete_owner);
+ ASSERT_EQ(delete_self, create_self);
+ ASSERT_EQ(delete_owner, self->userns_nsid);
+}
+
+/*
+ * Same as inner_create_delete_uts but the namespace fd is never opened, so
+ * the stashed nsfs dentry/inode is never populated. Verifies that FAN_CREATE
+ * and FAN_DELETE are still delivered and carry a consistent self_nsid.
+ */
+TEST_F(userns_notify, inner_create_delete_uts_no_open)
+{
+ int pipefd[2];
+ pid_t pid;
+ uint64_t create_self, create_owner;
+ uint64_t delete_self, delete_owner;
+ char c;
+
+ ASSERT_EQ(pipe(pipefd), 0);
+
+ pid = fork();
+ ASSERT_GE(pid, 0);
+
+ if (pid == 0) {
+ close(pipefd[0]);
+ if (unshare(CLONE_NEWUTS))
+ _exit(1);
+ if (write(pipefd[1], "r", 1) < 0)
+ _exit(1);
+ close(pipefd[1]);
+ pause();
+ _exit(0);
+ }
+
+ close(pipefd[1]);
+ ASSERT_EQ(read(pipefd[0], &c, 1), 1);
+ close(pipefd[0]);
+
+ /* --- FAN_NS_CREATE (no open of /proc/pid/ns/uts) --- */
+ read_ns_event(_metadata, self, FAN_NS_CREATE, &create_self, &create_owner);
+ ASSERT_NE(create_self, 0);
+ ASSERT_EQ(create_owner, self->userns_nsid);
+
+ kill(pid, SIGTERM);
+ waitpid(pid, NULL, 0);
+
+ /* --- FAN_NS_DELETE --- */
+ read_ns_event(_metadata, self, FAN_NS_DELETE, &delete_self, &delete_owner);
+ ASSERT_EQ(delete_self, create_self);
+ ASSERT_EQ(delete_owner, self->userns_nsid);
+}
+
+/*
+ * Attempt to set a FAN_MARK_USERNS watch on the initial user namespace.
+ * Requires CAP_SYS_ADMIN in init_user_ns. Since FIXTURE_SETUP calls
+ * setup_userns(), the process lives in a child user namespace and cannot
+ * hold capabilities in init_user_ns, so the call must fail with EPERM
+ * regardless of the outer uid.
+ */
+TEST_F(userns_notify, inner_mark_init_userns_eperm)
+{
+ int ret;
+
+ ret = fanotify_mark(self->fan_fd, FAN_MARK_ADD | FAN_MARK_USERNS,
+ FAN_NS_CREATE | FAN_NS_DELETE,
+ self->outer_ns_fd, NULL);
+ EXPECT_EQ(ret, -1);
+ EXPECT_EQ(errno, EPERM);
+}
+
+TEST_HARNESS_MAIN
--
2.54.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
end of thread, other threads:[~2026-04-24 17:05 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-24 17:04 [PATCH v2 00/10] fanotify namespace monitoring Amir Goldstein
2026-04-24 17:04 ` [PATCH v2 01/10] fsnotify: rename fsnotify group flag macros Amir Goldstein
2026-04-24 17:04 ` [PATCH v2 02/10] fsnotify: introduce fsnotify group types Amir Goldstein
2026-04-24 17:04 ` [PATCH v2 03/10] fsnotify: separate the events bitmask macros by group type Amir Goldstein
2026-04-24 17:04 ` [PATCH v2 04/10] fanotify: test event->type instead of event mask when possible Amir Goldstein
2026-04-24 17:04 ` [PATCH v2 05/10] fsnotify: do not report mount events with fsnotify() Amir Goldstein
2026-04-24 17:04 ` [PATCH v2 06/10] fanotify: gate fs event classification by group type Amir Goldstein
2026-04-24 17:05 ` [PATCH v2 07/10] fanotify: gate fs events checks in fanotify_mark() " Amir Goldstein
2026-04-24 17:05 ` [PATCH v2 08/10] fanotify: add support for watching the namespaces tree Amir Goldstein
2026-04-24 17:05 ` [PATCH v2 09/10] selftests/filesystems: create fanotify test dir Amir Goldstein
2026-04-24 17:05 ` [PATCH v2 10/10] selftests/filesystems: add fanotify namespace notifications test Amir Goldstein
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox