* [PATCH v2 1/3] kernfs: allow passing fsnotify event types
2026-02-12 21:58 [PATCH v2 0/3] kernfs: Add inotify IN_DELETE_SELF, IN_IGNORED support for files T.J. Mercier
@ 2026-02-12 21:58 ` T.J. Mercier
2026-02-16 16:27 ` Amir Goldstein
2026-02-12 21:58 ` [PATCH v2 2/3] kernfs: send IN_DELETE_SELF and IN_IGNORED on file deletion T.J. Mercier
` (3 subsequent siblings)
4 siblings, 1 reply; 16+ messages in thread
From: T.J. Mercier @ 2026-02-12 21:58 UTC (permalink / raw)
To: gregkh, tj, driver-core, linux-kernel, cgroups, shuah,
linux-kselftest
Cc: T.J. Mercier
The kernfs_notify function is hardcoded to only issue FS_MODIFY events
since that is the only current use case. Allow for supporting other
events by adding a notify_event field to kernfs_elem_attr. The
limitation of only one queued event per kernfs_node continues to exist
as a consequence of the design of the kernfs_notify_list. The new
notify_event field is protected by the same kernfs_notify_lock as the
existing notify_next field.
Signed-off-by: T.J. Mercier <tjmercier@google.com>
---
fs/kernfs/file.c | 8 ++++++--
include/linux/kernfs.h | 1 +
2 files changed, 7 insertions(+), 2 deletions(-)
diff --git a/fs/kernfs/file.c b/fs/kernfs/file.c
index 9adf36e6364b..e978284ff983 100644
--- a/fs/kernfs/file.c
+++ b/fs/kernfs/file.c
@@ -914,6 +914,7 @@ static void kernfs_notify_workfn(struct work_struct *work)
struct kernfs_node *kn;
struct kernfs_super_info *info;
struct kernfs_root *root;
+ u32 notify_event;
repeat:
/* pop one off the notify_list */
spin_lock_irq(&kernfs_notify_lock);
@@ -924,6 +925,8 @@ static void kernfs_notify_workfn(struct work_struct *work)
}
kernfs_notify_list = kn->attr.notify_next;
kn->attr.notify_next = NULL;
+ notify_event = kn->attr.notify_event;
+ kn->attr.notify_event = 0;
spin_unlock_irq(&kernfs_notify_lock);
root = kernfs_root(kn);
@@ -954,7 +957,7 @@ static void kernfs_notify_workfn(struct work_struct *work)
if (parent) {
p_inode = ilookup(info->sb, kernfs_ino(parent));
if (p_inode) {
- fsnotify(FS_MODIFY | FS_EVENT_ON_CHILD,
+ fsnotify(notify_event | FS_EVENT_ON_CHILD,
inode, FSNOTIFY_EVENT_INODE,
p_inode, &name, inode, 0);
iput(p_inode);
@@ -964,7 +967,7 @@ static void kernfs_notify_workfn(struct work_struct *work)
}
if (!p_inode)
- fsnotify_inode(inode, FS_MODIFY);
+ fsnotify_inode(inode, notify_event);
iput(inode);
}
@@ -1005,6 +1008,7 @@ void kernfs_notify(struct kernfs_node *kn)
if (!kn->attr.notify_next) {
kernfs_get(kn);
kn->attr.notify_next = kernfs_notify_list;
+ kn->attr.notify_event = FS_MODIFY;
kernfs_notify_list = kn;
schedule_work(&kernfs_notify_work);
}
diff --git a/include/linux/kernfs.h b/include/linux/kernfs.h
index b5a5f32fdfd1..1762b32c1a8e 100644
--- a/include/linux/kernfs.h
+++ b/include/linux/kernfs.h
@@ -181,6 +181,7 @@ struct kernfs_elem_attr {
struct kernfs_open_node __rcu *open;
loff_t size;
struct kernfs_node *notify_next; /* for kernfs_notify() */
+ u32 notify_event; /* for kernfs_notify() */
};
/*
--
2.53.0.273.g2a3d683680-goog
^ permalink raw reply related [flat|nested] 16+ messages in thread* Re: [PATCH v2 1/3] kernfs: allow passing fsnotify event types
2026-02-12 21:58 ` [PATCH v2 1/3] kernfs: allow passing fsnotify event types T.J. Mercier
@ 2026-02-16 16:27 ` Amir Goldstein
2026-02-17 19:27 ` T.J. Mercier
0 siblings, 1 reply; 16+ messages in thread
From: Amir Goldstein @ 2026-02-16 16:27 UTC (permalink / raw)
To: T.J. Mercier
Cc: gregkh, tj, driver-core, linux-kernel, cgroups, linux-fsdevel,
shuah, linux-kselftest, jack
On Thu, Feb 12, 2026 at 01:58:12PM -0800, T.J. Mercier wrote:
> The kernfs_notify function is hardcoded to only issue FS_MODIFY events
> since that is the only current use case. Allow for supporting other
> events by adding a notify_event field to kernfs_elem_attr. The
> limitation of only one queued event per kernfs_node continues to exist
> as a consequence of the design of the kernfs_notify_list. The new
> notify_event field is protected by the same kernfs_notify_lock as the
> existing notify_next field.
>
> Signed-off-by: T.J. Mercier <tjmercier@google.com>
Looks fine
Feel free to add
Acked-by: Amir Goldstein <amir73il@gmail.com>
> ---
> fs/kernfs/file.c | 8 ++++++--
> include/linux/kernfs.h | 1 +
> 2 files changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/fs/kernfs/file.c b/fs/kernfs/file.c
> index 9adf36e6364b..e978284ff983 100644
> --- a/fs/kernfs/file.c
> +++ b/fs/kernfs/file.c
> @@ -914,6 +914,7 @@ static void kernfs_notify_workfn(struct work_struct *work)
> struct kernfs_node *kn;
> struct kernfs_super_info *info;
> struct kernfs_root *root;
> + u32 notify_event;
> repeat:
> /* pop one off the notify_list */
> spin_lock_irq(&kernfs_notify_lock);
> @@ -924,6 +925,8 @@ static void kernfs_notify_workfn(struct work_struct *work)
> }
> kernfs_notify_list = kn->attr.notify_next;
> kn->attr.notify_next = NULL;
> + notify_event = kn->attr.notify_event;
> + kn->attr.notify_event = 0;
> spin_unlock_irq(&kernfs_notify_lock);
>
> root = kernfs_root(kn);
> @@ -954,7 +957,7 @@ static void kernfs_notify_workfn(struct work_struct *work)
> if (parent) {
> p_inode = ilookup(info->sb, kernfs_ino(parent));
> if (p_inode) {
> - fsnotify(FS_MODIFY | FS_EVENT_ON_CHILD,
> + fsnotify(notify_event | FS_EVENT_ON_CHILD,
> inode, FSNOTIFY_EVENT_INODE,
> p_inode, &name, inode, 0);
> iput(p_inode);
> @@ -964,7 +967,7 @@ static void kernfs_notify_workfn(struct work_struct *work)
> }
>
> if (!p_inode)
> - fsnotify_inode(inode, FS_MODIFY);
> + fsnotify_inode(inode, notify_event);
>
> iput(inode);
> }
> @@ -1005,6 +1008,7 @@ void kernfs_notify(struct kernfs_node *kn)
> if (!kn->attr.notify_next) {
> kernfs_get(kn);
> kn->attr.notify_next = kernfs_notify_list;
> + kn->attr.notify_event = FS_MODIFY;
> kernfs_notify_list = kn;
> schedule_work(&kernfs_notify_work);
> }
> diff --git a/include/linux/kernfs.h b/include/linux/kernfs.h
> index b5a5f32fdfd1..1762b32c1a8e 100644
> --- a/include/linux/kernfs.h
> +++ b/include/linux/kernfs.h
> @@ -181,6 +181,7 @@ struct kernfs_elem_attr {
> struct kernfs_open_node __rcu *open;
> loff_t size;
> struct kernfs_node *notify_next; /* for kernfs_notify() */
> + u32 notify_event; /* for kernfs_notify() */
> };
>
> /*
> --
> 2.53.0.273.g2a3d683680-goog
>
^ permalink raw reply [flat|nested] 16+ messages in thread* Re: [PATCH v2 1/3] kernfs: allow passing fsnotify event types
2026-02-16 16:27 ` Amir Goldstein
@ 2026-02-17 19:27 ` T.J. Mercier
0 siblings, 0 replies; 16+ messages in thread
From: T.J. Mercier @ 2026-02-17 19:27 UTC (permalink / raw)
To: Amir Goldstein
Cc: gregkh, tj, driver-core, linux-kernel, cgroups, linux-fsdevel,
shuah, linux-kselftest, jack
On Mon, Feb 16, 2026 at 8:27 AM Amir Goldstein <amir73il@gmail.com> wrote:
>
> On Thu, Feb 12, 2026 at 01:58:12PM -0800, T.J. Mercier wrote:
> > The kernfs_notify function is hardcoded to only issue FS_MODIFY events
> > since that is the only current use case. Allow for supporting other
> > events by adding a notify_event field to kernfs_elem_attr. The
> > limitation of only one queued event per kernfs_node continues to exist
> > as a consequence of the design of the kernfs_notify_list. The new
> > notify_event field is protected by the same kernfs_notify_lock as the
> > existing notify_next field.
> >
> > Signed-off-by: T.J. Mercier <tjmercier@google.com>
>
> Looks fine
> Feel free to add
> Acked-by: Amir Goldstein <amir73il@gmail.com>
Thanks Amir.
>
> > ---
> > fs/kernfs/file.c | 8 ++++++--
> > include/linux/kernfs.h | 1 +
> > 2 files changed, 7 insertions(+), 2 deletions(-)
> >
> > diff --git a/fs/kernfs/file.c b/fs/kernfs/file.c
> > index 9adf36e6364b..e978284ff983 100644
> > --- a/fs/kernfs/file.c
> > +++ b/fs/kernfs/file.c
> > @@ -914,6 +914,7 @@ static void kernfs_notify_workfn(struct work_struct *work)
> > struct kernfs_node *kn;
> > struct kernfs_super_info *info;
> > struct kernfs_root *root;
> > + u32 notify_event;
> > repeat:
> > /* pop one off the notify_list */
> > spin_lock_irq(&kernfs_notify_lock);
> > @@ -924,6 +925,8 @@ static void kernfs_notify_workfn(struct work_struct *work)
> > }
> > kernfs_notify_list = kn->attr.notify_next;
> > kn->attr.notify_next = NULL;
> > + notify_event = kn->attr.notify_event;
> > + kn->attr.notify_event = 0;
> > spin_unlock_irq(&kernfs_notify_lock);
> >
> > root = kernfs_root(kn);
> > @@ -954,7 +957,7 @@ static void kernfs_notify_workfn(struct work_struct *work)
> > if (parent) {
> > p_inode = ilookup(info->sb, kernfs_ino(parent));
> > if (p_inode) {
> > - fsnotify(FS_MODIFY | FS_EVENT_ON_CHILD,
> > + fsnotify(notify_event | FS_EVENT_ON_CHILD,
> > inode, FSNOTIFY_EVENT_INODE,
> > p_inode, &name, inode, 0);
> > iput(p_inode);
> > @@ -964,7 +967,7 @@ static void kernfs_notify_workfn(struct work_struct *work)
> > }
> >
> > if (!p_inode)
> > - fsnotify_inode(inode, FS_MODIFY);
> > + fsnotify_inode(inode, notify_event);
> >
> > iput(inode);
> > }
> > @@ -1005,6 +1008,7 @@ void kernfs_notify(struct kernfs_node *kn)
> > if (!kn->attr.notify_next) {
> > kernfs_get(kn);
> > kn->attr.notify_next = kernfs_notify_list;
> > + kn->attr.notify_event = FS_MODIFY;
> > kernfs_notify_list = kn;
> > schedule_work(&kernfs_notify_work);
> > }
> > diff --git a/include/linux/kernfs.h b/include/linux/kernfs.h
> > index b5a5f32fdfd1..1762b32c1a8e 100644
> > --- a/include/linux/kernfs.h
> > +++ b/include/linux/kernfs.h
> > @@ -181,6 +181,7 @@ struct kernfs_elem_attr {
> > struct kernfs_open_node __rcu *open;
> > loff_t size;
> > struct kernfs_node *notify_next; /* for kernfs_notify() */
> > + u32 notify_event; /* for kernfs_notify() */
> > };
> >
> > /*
> > --
> > 2.53.0.273.g2a3d683680-goog
> >
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH v2 2/3] kernfs: send IN_DELETE_SELF and IN_IGNORED on file deletion
2026-02-12 21:58 [PATCH v2 0/3] kernfs: Add inotify IN_DELETE_SELF, IN_IGNORED support for files T.J. Mercier
2026-02-12 21:58 ` [PATCH v2 1/3] kernfs: allow passing fsnotify event types T.J. Mercier
@ 2026-02-12 21:58 ` T.J. Mercier
2026-02-17 10:18 ` Amir Goldstein
2026-02-12 21:58 ` [PATCH v2 3/3] selftests: memcg: Add tests IN_DELETE_SELF and IN_IGNORED on memory.events T.J. Mercier
` (2 subsequent siblings)
4 siblings, 1 reply; 16+ messages in thread
From: T.J. Mercier @ 2026-02-12 21:58 UTC (permalink / raw)
To: gregkh, tj, driver-core, linux-kernel, cgroups, shuah,
linux-kselftest
Cc: T.J. Mercier
Currently some kernfs files (e.g. cgroup.events, memory.events) support
inotify watches for IN_MODIFY, but unlike with regular filesystems, they
do not receive IN_DELETE_SELF or IN_IGNORED events when they are
removed.
This creates a problem for processes monitoring cgroups. For example, a
service monitoring memory.events for memory.high breaches needs to know
when a cgroup is removed to clean up its state. Where it's known that a
cgroup is removed when all processes die, without IN_DELETE_SELF the
service must resort to inefficient workarounds such as:
1. Periodically scanning procfs to detect process death (wastes CPU and
is susceptible to PID reuse).
2. Placing an additional IN_DELETE watch on the parent directory
(wastes resources managing double the watches).
3. Holding a pidfd for every monitored cgroup (can exhaust file
descriptors).
This patch enables kernfs to send IN_DELETE_SELF and IN_IGNORED events.
This allows applications to rely on a single existing watch on the file
of interest (e.g. memory.events) to receive notifications for both
modifications and the eventual removal of the file, as well as automatic
watch descriptor cleanup, simplifying userspace logic and improving
resource efficiency.
Implementation details:
The kernfs notification worker is updated to handle file deletion.
fsnotify handles sending MODIFY events to both a watched file and its
parent, but it does not handle sending a DELETE event to the parent and
a DELETE_SELF event to the watched file in a single call. Therefore,
separate fsnotify calls are made: one for the parent (DELETE) and one
for the child (DELETE_SELF), while retaining the optimized single call
for MODIFY events.
Signed-off-by: T.J. Mercier <tjmercier@google.com>
---
fs/kernfs/dir.c | 21 +++++++++++++++++++++
fs/kernfs/file.c | 16 ++++++++++------
fs/kernfs/kernfs-internal.h | 3 +++
3 files changed, 34 insertions(+), 6 deletions(-)
diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
index 29baeeb97871..e5bda829fcb8 100644
--- a/fs/kernfs/dir.c
+++ b/fs/kernfs/dir.c
@@ -9,6 +9,7 @@
#include <linux/sched.h>
#include <linux/fs.h>
+#include <linux/fsnotify_backend.h>
#include <linux/namei.h>
#include <linux/idr.h>
#include <linux/slab.h>
@@ -1471,6 +1472,23 @@ void kernfs_show(struct kernfs_node *kn, bool show)
up_write(&root->kernfs_rwsem);
}
+static void kernfs_notify_file_deleted(struct kernfs_node *kn)
+{
+ static DECLARE_WORK(kernfs_notify_deleted_work,
+ kernfs_notify_workfn);
+
+ guard(spinlock_irqsave)(&kernfs_notify_lock);
+ /* may overwite already pending FS_MODIFY events */
+ kn->attr.notify_event = FS_DELETE;
+
+ if (!kn->attr.notify_next) {
+ kernfs_get(kn);
+ kn->attr.notify_next = kernfs_notify_list;
+ kernfs_notify_list = kn;
+ schedule_work(&kernfs_notify_deleted_work);
+ }
+}
+
static void __kernfs_remove(struct kernfs_node *kn)
{
struct kernfs_node *pos, *parent;
@@ -1520,6 +1538,9 @@ static void __kernfs_remove(struct kernfs_node *kn)
struct kernfs_iattrs *ps_iattr =
parent ? parent->iattr : NULL;
+ if (kernfs_type(pos) == KERNFS_FILE)
+ kernfs_notify_file_deleted(pos);
+
/* update timestamps on the parent */
down_write(&kernfs_root(kn)->kernfs_iattr_rwsem);
diff --git a/fs/kernfs/file.c b/fs/kernfs/file.c
index e978284ff983..2d21af3cfcad 100644
--- a/fs/kernfs/file.c
+++ b/fs/kernfs/file.c
@@ -37,8 +37,8 @@ struct kernfs_open_node {
*/
#define KERNFS_NOTIFY_EOL ((void *)&kernfs_notify_list)
-static DEFINE_SPINLOCK(kernfs_notify_lock);
-static struct kernfs_node *kernfs_notify_list = KERNFS_NOTIFY_EOL;
+DEFINE_SPINLOCK(kernfs_notify_lock);
+struct kernfs_node *kernfs_notify_list = KERNFS_NOTIFY_EOL;
static inline struct mutex *kernfs_open_file_mutex_ptr(struct kernfs_node *kn)
{
@@ -909,7 +909,7 @@ static loff_t kernfs_fop_llseek(struct file *file, loff_t offset, int whence)
return ret;
}
-static void kernfs_notify_workfn(struct work_struct *work)
+void kernfs_notify_workfn(struct work_struct *work)
{
struct kernfs_node *kn;
struct kernfs_super_info *info;
@@ -959,15 +959,19 @@ static void kernfs_notify_workfn(struct work_struct *work)
if (p_inode) {
fsnotify(notify_event | FS_EVENT_ON_CHILD,
inode, FSNOTIFY_EVENT_INODE,
- p_inode, &name, inode, 0);
+ p_inode, &name,
+ (notify_event == FS_MODIFY) ?
+ inode : NULL, 0);
iput(p_inode);
}
kernfs_put(parent);
}
- if (!p_inode)
- fsnotify_inode(inode, notify_event);
+ if (notify_event == FS_DELETE)
+ fsnotify_inoderemove(inode);
+ else if (!p_inode)
+ fsnotify_inode(inode, FS_MODIFY);
iput(inode);
}
diff --git a/fs/kernfs/kernfs-internal.h b/fs/kernfs/kernfs-internal.h
index 6061b6f70d2a..cf4b21f4f3b6 100644
--- a/fs/kernfs/kernfs-internal.h
+++ b/fs/kernfs/kernfs-internal.h
@@ -199,6 +199,8 @@ struct kernfs_node *kernfs_new_node(struct kernfs_node *parent,
* file.c
*/
extern const struct file_operations kernfs_file_fops;
+extern struct kernfs_node *kernfs_notify_list;
+extern void kernfs_notify_workfn(struct work_struct *work);
bool kernfs_should_drain_open_files(struct kernfs_node *kn);
void kernfs_drain_open_files(struct kernfs_node *kn);
@@ -212,4 +214,5 @@ extern const struct inode_operations kernfs_symlink_iops;
* kernfs locks
*/
extern struct kernfs_global_locks *kernfs_locks;
+extern spinlock_t kernfs_notify_lock;
#endif /* __KERNFS_INTERNAL_H */
--
2.53.0.273.g2a3d683680-goog
^ permalink raw reply related [flat|nested] 16+ messages in thread* Re: [PATCH v2 2/3] kernfs: send IN_DELETE_SELF and IN_IGNORED on file deletion
2026-02-12 21:58 ` [PATCH v2 2/3] kernfs: send IN_DELETE_SELF and IN_IGNORED on file deletion T.J. Mercier
@ 2026-02-17 10:18 ` Amir Goldstein
2026-02-17 19:25 ` T.J. Mercier
0 siblings, 1 reply; 16+ messages in thread
From: Amir Goldstein @ 2026-02-17 10:18 UTC (permalink / raw)
To: T.J. Mercier
Cc: gregkh, tj, driver-core, linux-kernel, cgroups, linux-fsdevel,
jack, shuah, linux-kselftest
On Thu, Feb 12, 2026 at 01:58:13PM -0800, T.J. Mercier wrote:
> Currently some kernfs files (e.g. cgroup.events, memory.events) support
> inotify watches for IN_MODIFY, but unlike with regular filesystems, they
> do not receive IN_DELETE_SELF or IN_IGNORED events when they are
> removed.
>
> This creates a problem for processes monitoring cgroups. For example, a
> service monitoring memory.events for memory.high breaches needs to know
> when a cgroup is removed to clean up its state. Where it's known that a
> cgroup is removed when all processes die, without IN_DELETE_SELF the
> service must resort to inefficient workarounds such as:
> 1. Periodically scanning procfs to detect process death (wastes CPU and
> is susceptible to PID reuse).
> 2. Placing an additional IN_DELETE watch on the parent directory
> (wastes resources managing double the watches).
> 3. Holding a pidfd for every monitored cgroup (can exhaust file
> descriptors).
>
> This patch enables kernfs to send IN_DELETE_SELF and IN_IGNORED events.
> This allows applications to rely on a single existing watch on the file
> of interest (e.g. memory.events) to receive notifications for both
> modifications and the eventual removal of the file, as well as automatic
> watch descriptor cleanup, simplifying userspace logic and improving
> resource efficiency.
This looks very useful,
But,
How will the application know that ti can rely on IN_DELETE_SELF
from cgroups if this is not an opt-in feature?
Essentially, this is similar to the discussions on adding "remote"
fs notification support (e.g. for smb) and in those discussions
I insist that "remote" notification should be opt-in (which is
easy to do with an fanotify init flag) and I claim that mixing
"remote" events with "local" events on the same group is undesired.
However, IN_IGNORED is created when an inotify watch is removed
and IN_DELETE_SELF is called when a vfs inode is destroyed.
When setting an inotify watch for IN_IGNORED|IN_DELETE_SELF there
has to be a vfs inode with inotify mark attached, so why are those
events not created already? What am I missing?
Are you expecting to get IN_IGNORED|IN_DELETE_SELF on an entry
while watching the parent? Because this is not how the API works.
I think it should be possible to set a super block fanotify watch
on cgroupfs and get all the FAN_DELETE_SELF events, but maybe we
do not allow this right now, I did not check - just wanted to give
you another direction to follow.
>
> Implementation details:
> The kernfs notification worker is updated to handle file deletion.
> fsnotify handles sending MODIFY events to both a watched file and its
> parent, but it does not handle sending a DELETE event to the parent and
> a DELETE_SELF event to the watched file in a single call. Therefore,
> separate fsnotify calls are made: one for the parent (DELETE) and one
> for the child (DELETE_SELF), while retaining the optimized single call
IN_DELETE_SELF and IN_IGNORED are special and I don't really mind adding
them to kernfs seeing that they are very useful, but adding IN_DELETE
without adding IN_CREATE, that is very arbitrary and I don't like it as
much.
> for MODIFY events.
>
> Signed-off-by: T.J. Mercier <tjmercier@google.com>
> ---
> fs/kernfs/dir.c | 21 +++++++++++++++++++++
> fs/kernfs/file.c | 16 ++++++++++------
> fs/kernfs/kernfs-internal.h | 3 +++
> 3 files changed, 34 insertions(+), 6 deletions(-)
>
> diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
> index 29baeeb97871..e5bda829fcb8 100644
> --- a/fs/kernfs/dir.c
> +++ b/fs/kernfs/dir.c
> @@ -9,6 +9,7 @@
>
> #include <linux/sched.h>
> #include <linux/fs.h>
> +#include <linux/fsnotify_backend.h>
> #include <linux/namei.h>
> #include <linux/idr.h>
> #include <linux/slab.h>
> @@ -1471,6 +1472,23 @@ void kernfs_show(struct kernfs_node *kn, bool show)
> up_write(&root->kernfs_rwsem);
> }
>
> +static void kernfs_notify_file_deleted(struct kernfs_node *kn)
> +{
> + static DECLARE_WORK(kernfs_notify_deleted_work,
> + kernfs_notify_workfn);
> +
> + guard(spinlock_irqsave)(&kernfs_notify_lock);
> + /* may overwite already pending FS_MODIFY events */
> + kn->attr.notify_event = FS_DELETE;
> +
> + if (!kn->attr.notify_next) {
> + kernfs_get(kn);
> + kn->attr.notify_next = kernfs_notify_list;
> + kernfs_notify_list = kn;
> + schedule_work(&kernfs_notify_deleted_work);
> + }
> +}
> +
> static void __kernfs_remove(struct kernfs_node *kn)
> {
> struct kernfs_node *pos, *parent;
> @@ -1520,6 +1538,9 @@ static void __kernfs_remove(struct kernfs_node *kn)
> struct kernfs_iattrs *ps_iattr =
> parent ? parent->iattr : NULL;
>
> + if (kernfs_type(pos) == KERNFS_FILE)
> + kernfs_notify_file_deleted(pos);
> +
> /* update timestamps on the parent */
> down_write(&kernfs_root(kn)->kernfs_iattr_rwsem);
>
> diff --git a/fs/kernfs/file.c b/fs/kernfs/file.c
> index e978284ff983..2d21af3cfcad 100644
> --- a/fs/kernfs/file.c
> +++ b/fs/kernfs/file.c
> @@ -37,8 +37,8 @@ struct kernfs_open_node {
> */
> #define KERNFS_NOTIFY_EOL ((void *)&kernfs_notify_list)
>
> -static DEFINE_SPINLOCK(kernfs_notify_lock);
> -static struct kernfs_node *kernfs_notify_list = KERNFS_NOTIFY_EOL;
> +DEFINE_SPINLOCK(kernfs_notify_lock);
> +struct kernfs_node *kernfs_notify_list = KERNFS_NOTIFY_EOL;
>
> static inline struct mutex *kernfs_open_file_mutex_ptr(struct kernfs_node *kn)
> {
> @@ -909,7 +909,7 @@ static loff_t kernfs_fop_llseek(struct file *file, loff_t offset, int whence)
> return ret;
> }
>
> -static void kernfs_notify_workfn(struct work_struct *work)
> +void kernfs_notify_workfn(struct work_struct *work)
> {
> struct kernfs_node *kn;
> struct kernfs_super_info *info;
> @@ -959,15 +959,19 @@ static void kernfs_notify_workfn(struct work_struct *work)
> if (p_inode) {
> fsnotify(notify_event | FS_EVENT_ON_CHILD,
> inode, FSNOTIFY_EVENT_INODE,
> - p_inode, &name, inode, 0);
> + p_inode, &name,
> + (notify_event == FS_MODIFY) ?
> + inode : NULL, 0);
> iput(p_inode);
> }
>
> kernfs_put(parent);
> }
>
> - if (!p_inode)
> - fsnotify_inode(inode, notify_event);
> + if (notify_event == FS_DELETE)
> + fsnotify_inoderemove(inode);
> + else if (!p_inode)
> + fsnotify_inode(inode, FS_MODIFY);
Didn't you mean notify_event?
Thanks,
Amir.
^ permalink raw reply [flat|nested] 16+ messages in thread* Re: [PATCH v2 2/3] kernfs: send IN_DELETE_SELF and IN_IGNORED on file deletion
2026-02-17 10:18 ` Amir Goldstein
@ 2026-02-17 19:25 ` T.J. Mercier
2026-02-17 21:25 ` Amir Goldstein
0 siblings, 1 reply; 16+ messages in thread
From: T.J. Mercier @ 2026-02-17 19:25 UTC (permalink / raw)
To: Amir Goldstein
Cc: gregkh, tj, driver-core, linux-kernel, cgroups, linux-fsdevel,
jack, shuah, linux-kselftest
On Tue, Feb 17, 2026 at 2:19 AM Amir Goldstein <amir73il@gmail.com> wrote:
>
> On Thu, Feb 12, 2026 at 01:58:13PM -0800, T.J. Mercier wrote:
> > Currently some kernfs files (e.g. cgroup.events, memory.events) support
> > inotify watches for IN_MODIFY, but unlike with regular filesystems, they
> > do not receive IN_DELETE_SELF or IN_IGNORED events when they are
> > removed.
> >
> > This creates a problem for processes monitoring cgroups. For example, a
> > service monitoring memory.events for memory.high breaches needs to know
> > when a cgroup is removed to clean up its state. Where it's known that a
> > cgroup is removed when all processes die, without IN_DELETE_SELF the
> > service must resort to inefficient workarounds such as:
> > 1. Periodically scanning procfs to detect process death (wastes CPU and
> > is susceptible to PID reuse).
> > 2. Placing an additional IN_DELETE watch on the parent directory
> > (wastes resources managing double the watches).
> > 3. Holding a pidfd for every monitored cgroup (can exhaust file
> > descriptors).
> >
> > This patch enables kernfs to send IN_DELETE_SELF and IN_IGNORED events.
> > This allows applications to rely on a single existing watch on the file
> > of interest (e.g. memory.events) to receive notifications for both
> > modifications and the eventual removal of the file, as well as automatic
> > watch descriptor cleanup, simplifying userspace logic and improving
> > resource efficiency.
>
> This looks very useful,
> But,
> How will the application know that ti can rely on IN_DELETE_SELF
> from cgroups if this is not an opt-in feature?
>
> Essentially, this is similar to the discussions on adding "remote"
> fs notification support (e.g. for smb) and in those discussions
> I insist that "remote" notification should be opt-in (which is
> easy to do with an fanotify init flag) and I claim that mixing
> "remote" events with "local" events on the same group is undesired.
I think this situation is a bit different because this isn't adding
new features to fsnotify. This is filling a gap that you'd expect to
work if you only read the cgroups or inotify documentation without
realizing that kernfs is simply wired up differently for notification
support than most other filesystems, and only partially supports the
existing notification events. It's opt-in in the sense that an
application registers for IN_DELETE_SELF, but other than a runtime
test like what I added in the selftests I'm not sure if there's a good
way to detect the kernel will actually send the event. Practically
speaking though, if merged upstream I will backport these patches to
all the kernels we use so a runtime check shouldn't be necessary for
our applications.
> However, IN_IGNORED is created when an inotify watch is removed
> and IN_DELETE_SELF is called when a vfs inode is destroyed.
> When setting an inotify watch for IN_IGNORED|IN_DELETE_SELF there
> has to be a vfs inode with inotify mark attached, so why are those
> events not created already? What am I missing?
The difference is vfs isn't involved when kernfs files are unlinked.
When a cgroup removal occurs, we get to kernfs_remove via kernfs'
inode_operations without calling vfs_unlink. (You can't rm cgroup
files directly.)
> Are you expecting to get IN_IGNORED|IN_DELETE_SELF on an entry
> while watching the parent? Because this is not how the API works.
No, only on the file being watched. The parent should only get
IN_DELETE, but I read your feedback below and I'm fine with removing
that part and just sending the DELETE_SELF and IN_IGNORED events.
> I think it should be possible to set a super block fanotify watch
> on cgroupfs and get all the FAN_DELETE_SELF events, but maybe we
> do not allow this right now, I did not check - just wanted to give
> you another direction to follow.
>
> >
> > Implementation details:
> > The kernfs notification worker is updated to handle file deletion.
> > fsnotify handles sending MODIFY events to both a watched file and its
> > parent, but it does not handle sending a DELETE event to the parent and
> > a DELETE_SELF event to the watched file in a single call. Therefore,
> > separate fsnotify calls are made: one for the parent (DELETE) and one
> > for the child (DELETE_SELF), while retaining the optimized single call
>
> IN_DELETE_SELF and IN_IGNORED are special and I don't really mind adding
> them to kernfs seeing that they are very useful, but adding IN_DELETE
> without adding IN_CREATE, that is very arbitrary and I don't like it as
> much.
That's fair, and the IN_DELETE isn't actually needed for my use case,
but I figured I would add the parent notification for file deletions
since it is already there for MODIFY events, and I was modifying that
area of the code anyway. I'll remove the parent notification for
DELETE and just send DELETE_SELF and IGNORED with
fsnotify_inoderemove() in V3.
> > for MODIFY events.
> >
> > Signed-off-by: T.J. Mercier <tjmercier@google.com>
> > ---
> > fs/kernfs/dir.c | 21 +++++++++++++++++++++
> > fs/kernfs/file.c | 16 ++++++++++------
> > fs/kernfs/kernfs-internal.h | 3 +++
> > 3 files changed, 34 insertions(+), 6 deletions(-)
> >
> > diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
> > index 29baeeb97871..e5bda829fcb8 100644
> > --- a/fs/kernfs/dir.c
> > +++ b/fs/kernfs/dir.c
> > @@ -9,6 +9,7 @@
> >
> > #include <linux/sched.h>
> > #include <linux/fs.h>
> > +#include <linux/fsnotify_backend.h>
> > #include <linux/namei.h>
> > #include <linux/idr.h>
> > #include <linux/slab.h>
> > @@ -1471,6 +1472,23 @@ void kernfs_show(struct kernfs_node *kn, bool show)
> > up_write(&root->kernfs_rwsem);
> > }
> >
> > +static void kernfs_notify_file_deleted(struct kernfs_node *kn)
> > +{
> > + static DECLARE_WORK(kernfs_notify_deleted_work,
> > + kernfs_notify_workfn);
> > +
> > + guard(spinlock_irqsave)(&kernfs_notify_lock);
> > + /* may overwite already pending FS_MODIFY events */
> > + kn->attr.notify_event = FS_DELETE;
> > +
> > + if (!kn->attr.notify_next) {
> > + kernfs_get(kn);
> > + kn->attr.notify_next = kernfs_notify_list;
> > + kernfs_notify_list = kn;
> > + schedule_work(&kernfs_notify_deleted_work);
> > + }
> > +}
> > +
> > static void __kernfs_remove(struct kernfs_node *kn)
> > {
> > struct kernfs_node *pos, *parent;
> > @@ -1520,6 +1538,9 @@ static void __kernfs_remove(struct kernfs_node *kn)
> > struct kernfs_iattrs *ps_iattr =
> > parent ? parent->iattr : NULL;
> >
> > + if (kernfs_type(pos) == KERNFS_FILE)
> > + kernfs_notify_file_deleted(pos);
> > +
> > /* update timestamps on the parent */
> > down_write(&kernfs_root(kn)->kernfs_iattr_rwsem);
> >
> > diff --git a/fs/kernfs/file.c b/fs/kernfs/file.c
> > index e978284ff983..2d21af3cfcad 100644
> > --- a/fs/kernfs/file.c
> > +++ b/fs/kernfs/file.c
> > @@ -37,8 +37,8 @@ struct kernfs_open_node {
> > */
> > #define KERNFS_NOTIFY_EOL ((void *)&kernfs_notify_list)
> >
> > -static DEFINE_SPINLOCK(kernfs_notify_lock);
> > -static struct kernfs_node *kernfs_notify_list = KERNFS_NOTIFY_EOL;
> > +DEFINE_SPINLOCK(kernfs_notify_lock);
> > +struct kernfs_node *kernfs_notify_list = KERNFS_NOTIFY_EOL;
> >
> > static inline struct mutex *kernfs_open_file_mutex_ptr(struct kernfs_node *kn)
> > {
> > @@ -909,7 +909,7 @@ static loff_t kernfs_fop_llseek(struct file *file, loff_t offset, int whence)
> > return ret;
> > }
> >
> > -static void kernfs_notify_workfn(struct work_struct *work)
> > +void kernfs_notify_workfn(struct work_struct *work)
> > {
> > struct kernfs_node *kn;
> > struct kernfs_super_info *info;
> > @@ -959,15 +959,19 @@ static void kernfs_notify_workfn(struct work_struct *work)
> > if (p_inode) {
> > fsnotify(notify_event | FS_EVENT_ON_CHILD,
> > inode, FSNOTIFY_EVENT_INODE,
> > - p_inode, &name, inode, 0);
> > + p_inode, &name,
> > + (notify_event == FS_MODIFY) ?
> > + inode : NULL, 0);
> > iput(p_inode);
> > }
> >
> > kernfs_put(parent);
> > }
> >
> > - if (!p_inode)
> > - fsnotify_inode(inode, notify_event);
> > + if (notify_event == FS_DELETE)
> > + fsnotify_inoderemove(inode);
> > + else if (!p_inode)
> > + fsnotify_inode(inode, FS_MODIFY);
>
> Didn't you mean notify_event?
Yes, that would be better.
> Thanks,
> Amir.
Thanks for looking at my patches Amir,
T.J.
^ permalink raw reply [flat|nested] 16+ messages in thread* Re: [PATCH v2 2/3] kernfs: send IN_DELETE_SELF and IN_IGNORED on file deletion
2026-02-17 19:25 ` T.J. Mercier
@ 2026-02-17 21:25 ` Amir Goldstein
2026-02-17 22:32 ` T.J. Mercier
0 siblings, 1 reply; 16+ messages in thread
From: Amir Goldstein @ 2026-02-17 21:25 UTC (permalink / raw)
To: T.J. Mercier
Cc: gregkh, tj, driver-core, linux-kernel, cgroups, linux-fsdevel,
jack, shuah, linux-kselftest
On Tue, Feb 17, 2026 at 9:26 PM T.J. Mercier <tjmercier@google.com> wrote:
>
> On Tue, Feb 17, 2026 at 2:19 AM Amir Goldstein <amir73il@gmail.com> wrote:
> >
> > On Thu, Feb 12, 2026 at 01:58:13PM -0800, T.J. Mercier wrote:
> > > Currently some kernfs files (e.g. cgroup.events, memory.events) support
> > > inotify watches for IN_MODIFY, but unlike with regular filesystems, they
> > > do not receive IN_DELETE_SELF or IN_IGNORED events when they are
> > > removed.
> > >
> > > This creates a problem for processes monitoring cgroups. For example, a
> > > service monitoring memory.events for memory.high breaches needs to know
> > > when a cgroup is removed to clean up its state. Where it's known that a
> > > cgroup is removed when all processes die, without IN_DELETE_SELF the
> > > service must resort to inefficient workarounds such as:
> > > 1. Periodically scanning procfs to detect process death (wastes CPU and
> > > is susceptible to PID reuse).
> > > 2. Placing an additional IN_DELETE watch on the parent directory
> > > (wastes resources managing double the watches).
> > > 3. Holding a pidfd for every monitored cgroup (can exhaust file
> > > descriptors).
> > >
> > > This patch enables kernfs to send IN_DELETE_SELF and IN_IGNORED events.
> > > This allows applications to rely on a single existing watch on the file
> > > of interest (e.g. memory.events) to receive notifications for both
> > > modifications and the eventual removal of the file, as well as automatic
> > > watch descriptor cleanup, simplifying userspace logic and improving
> > > resource efficiency.
> >
> > This looks very useful,
> > But,
> > How will the application know that ti can rely on IN_DELETE_SELF
> > from cgroups if this is not an opt-in feature?
> >
> > Essentially, this is similar to the discussions on adding "remote"
> > fs notification support (e.g. for smb) and in those discussions
> > I insist that "remote" notification should be opt-in (which is
> > easy to do with an fanotify init flag) and I claim that mixing
> > "remote" events with "local" events on the same group is undesired.
>
> I think this situation is a bit different because this isn't adding
> new features to fsnotify. This is filling a gap that you'd expect to
> work if you only read the cgroups or inotify documentation without
> realizing that kernfs is simply wired up differently for notification
> support than most other filesystems, and only partially supports the
> existing notification events. It's opt-in in the sense that an
> application registers for IN_DELETE_SELF, but other than a runtime
> test like what I added in the selftests I'm not sure if there's a good
> way to detect the kernel will actually send the event. Practically
> speaking though, if merged upstream I will backport these patches to
> all the kernels we use so a runtime check shouldn't be necessary for
> our applications.
>
That's besides the point.
An application does not know if it running on a kernel with the backported
patch or not, so an application needs to either rely on getting the event
or it has to poll. How will the application know if it needs to poll or not?
> > However, IN_IGNORED is created when an inotify watch is removed
> > and IN_DELETE_SELF is called when a vfs inode is destroyed.
> > When setting an inotify watch for IN_IGNORED|IN_DELETE_SELF there
> > has to be a vfs inode with inotify mark attached, so why are those
> > events not created already? What am I missing?
>
> The difference is vfs isn't involved when kernfs files are unlinked.
No, but the vfs is involved when the last reference on the kernfs inode
is dropped.
> When a cgroup removal occurs, we get to kernfs_remove via kernfs'
> inode_operations without calling vfs_unlink. (You can't rm cgroup
> files directly.)
>
Yes and if there was a vfs inode for this kernfs object, the vfs inode needs to
be dropped.
> > Are you expecting to get IN_IGNORED|IN_DELETE_SELF on an entry
> > while watching the parent? Because this is not how the API works.
>
> No, only on the file being watched. The parent should only get
> IN_DELETE, but I read your feedback below and I'm fine with removing
> that part and just sending the DELETE_SELF and IN_IGNORED events.
>
So if the file was being watched, some application needed to call
inotify_add_watch() with the user path to the cgroupfs inode
and inotify watch keeps a live reference to this vfs inode.
When the cgroup is being destroyed something needs to drop
this vfs inode and call __destroy_inode() -> fsnotify_inode_delete()
which should remove the inotify watch and result in IN_IGNORED.
IN_DELETE_SELF is a different story, because the inode does not
have zero i_nlink.
I did not try to follow the code path of cgroupfs destroy when an
inotify watch on a cgroup file exists, but this is what I expect.
Please explain - what am I missing?
> > I think it should be possible to set a super block fanotify watch
> > on cgroupfs and get all the FAN_DELETE_SELF events, but maybe we
> > do not allow this right now, I did not check - just wanted to give
> > you another direction to follow.
> >
> > >
> > > Implementation details:
> > > The kernfs notification worker is updated to handle file deletion.
> > > fsnotify handles sending MODIFY events to both a watched file and its
> > > parent, but it does not handle sending a DELETE event to the parent and
> > > a DELETE_SELF event to the watched file in a single call. Therefore,
> > > separate fsnotify calls are made: one for the parent (DELETE) and one
> > > for the child (DELETE_SELF), while retaining the optimized single call
> >
> > IN_DELETE_SELF and IN_IGNORED are special and I don't really mind adding
> > them to kernfs seeing that they are very useful, but adding IN_DELETE
> > without adding IN_CREATE, that is very arbitrary and I don't like it as
> > much.
>
> That's fair, and the IN_DELETE isn't actually needed for my use case,
> but I figured I would add the parent notification for file deletions
> since it is already there for MODIFY events, and I was modifying that
> area of the code anyway. I'll remove the parent notification for
> DELETE and just send DELETE_SELF and IGNORED with
> fsnotify_inoderemove() in V3.
I do not object to adding explicit IN_DELETE_SELF, especially
because that would be usable also in fanotify, but I'd like to
understand what's the story with IN_IGNORED.
Thanks,
Amir.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2 2/3] kernfs: send IN_DELETE_SELF and IN_IGNORED on file deletion
2026-02-17 21:25 ` Amir Goldstein
@ 2026-02-17 22:32 ` T.J. Mercier
2026-02-17 23:13 ` Amir Goldstein
2026-02-18 11:23 ` Jan Kara
0 siblings, 2 replies; 16+ messages in thread
From: T.J. Mercier @ 2026-02-17 22:32 UTC (permalink / raw)
To: Amir Goldstein
Cc: gregkh, tj, driver-core, linux-kernel, cgroups, linux-fsdevel,
jack, shuah, linux-kselftest
On Tue, Feb 17, 2026 at 1:25 PM Amir Goldstein <amir73il@gmail.com> wrote:
>
> On Tue, Feb 17, 2026 at 9:26 PM T.J. Mercier <tjmercier@google.com> wrote:
> >
> > On Tue, Feb 17, 2026 at 2:19 AM Amir Goldstein <amir73il@gmail.com> wrote:
> > >
> > > On Thu, Feb 12, 2026 at 01:58:13PM -0800, T.J. Mercier wrote:
> > > > Currently some kernfs files (e.g. cgroup.events, memory.events) support
> > > > inotify watches for IN_MODIFY, but unlike with regular filesystems, they
> > > > do not receive IN_DELETE_SELF or IN_IGNORED events when they are
> > > > removed.
> > > >
> > > > This creates a problem for processes monitoring cgroups. For example, a
> > > > service monitoring memory.events for memory.high breaches needs to know
> > > > when a cgroup is removed to clean up its state. Where it's known that a
> > > > cgroup is removed when all processes die, without IN_DELETE_SELF the
> > > > service must resort to inefficient workarounds such as:
> > > > 1. Periodically scanning procfs to detect process death (wastes CPU and
> > > > is susceptible to PID reuse).
> > > > 2. Placing an additional IN_DELETE watch on the parent directory
> > > > (wastes resources managing double the watches).
> > > > 3. Holding a pidfd for every monitored cgroup (can exhaust file
> > > > descriptors).
> > > >
> > > > This patch enables kernfs to send IN_DELETE_SELF and IN_IGNORED events.
> > > > This allows applications to rely on a single existing watch on the file
> > > > of interest (e.g. memory.events) to receive notifications for both
> > > > modifications and the eventual removal of the file, as well as automatic
> > > > watch descriptor cleanup, simplifying userspace logic and improving
> > > > resource efficiency.
> > >
> > > This looks very useful,
> > > But,
> > > How will the application know that ti can rely on IN_DELETE_SELF
> > > from cgroups if this is not an opt-in feature?
> > >
> > > Essentially, this is similar to the discussions on adding "remote"
> > > fs notification support (e.g. for smb) and in those discussions
> > > I insist that "remote" notification should be opt-in (which is
> > > easy to do with an fanotify init flag) and I claim that mixing
> > > "remote" events with "local" events on the same group is undesired.
> >
> > I think this situation is a bit different because this isn't adding
> > new features to fsnotify. This is filling a gap that you'd expect to
> > work if you only read the cgroups or inotify documentation without
> > realizing that kernfs is simply wired up differently for notification
> > support than most other filesystems, and only partially supports the
> > existing notification events. It's opt-in in the sense that an
> > application registers for IN_DELETE_SELF, but other than a runtime
> > test like what I added in the selftests I'm not sure if there's a good
> > way to detect the kernel will actually send the event. Practically
> > speaking though, if merged upstream I will backport these patches to
> > all the kernels we use so a runtime check shouldn't be necessary for
> > our applications.
> >
>
> That's besides the point.
> An application does not know if it running on a kernel with the backported
> patch or not, so an application needs to either rely on getting the event
> or it has to poll. How will the application know if it needs to poll or not?
Either by testing for the behavior at runtime like I mentioned, or by
depending on certification testing for the platform the application is
running on which would verify that the selftests I added pass. We do
the former to check for the presence of other features like swappiness
support with memory.reclaim, and also the latter for all devices.
> > > However, IN_IGNORED is created when an inotify watch is removed
> > > and IN_DELETE_SELF is called when a vfs inode is destroyed.
> > > When setting an inotify watch for IN_IGNORED|IN_DELETE_SELF there
> > > has to be a vfs inode with inotify mark attached, so why are those
> > > events not created already? What am I missing?
> >
> > The difference is vfs isn't involved when kernfs files are unlinked.
>
> No, but the vfs is involved when the last reference on the kernfs inode
> is dropped.
>
> > When a cgroup removal occurs, we get to kernfs_remove via kernfs'
> > inode_operations without calling vfs_unlink. (You can't rm cgroup
> > files directly.)
> >
>
> Yes and if there was a vfs inode for this kernfs object, the vfs inode needs to
> be dropped.
It should be, but it isn't right now.
> > > Are you expecting to get IN_IGNORED|IN_DELETE_SELF on an entry
> > > while watching the parent? Because this is not how the API works.
> >
> > No, only on the file being watched. The parent should only get
> > IN_DELETE, but I read your feedback below and I'm fine with removing
> > that part and just sending the DELETE_SELF and IN_IGNORED events.
> >
>
> So if the file was being watched, some application needed to call
> inotify_add_watch() with the user path to the cgroupfs inode
> and inotify watch keeps a live reference to this vfs inode.
>
> When the cgroup is being destroyed something needs to drop
> this vfs inode and call __destroy_inode() -> fsnotify_inode_delete()
> which should remove the inotify watch and result in IN_IGNORED.
Nothing like this exists before this patch.
> IN_DELETE_SELF is a different story, because the inode does not
> have zero i_nlink.
>
> I did not try to follow the code path of cgroupfs destroy when an
> inotify watch on a cgroup file exists, but this is what I expect.
> Please explain - what am I missing?
Yes that's the problem here. The inode isn't dropped unless the watch
is removed, and the watch isn't removed because kernfs doesn't go
through vfs to notify about file removal. There is nothing to trigger
dropping the watch and the associated inode reference except this
patch calling into fsnotify_inoderemove which both sends
IN_DELETE_SELF and calls __fsnotify_inode_delete for the IN_IGNORED
and inode cleanup.
Without this, the watch and inode persist after file deletion until
the process exits and file descriptors are cleaned up, or until
inotify_rm_watch gets called manually.
> > > I think it should be possible to set a super block fanotify watch
> > > on cgroupfs and get all the FAN_DELETE_SELF events, but maybe we
> > > do not allow this right now, I did not check - just wanted to give
> > > you another direction to follow.
> > >
> > > >
> > > > Implementation details:
> > > > The kernfs notification worker is updated to handle file deletion.
> > > > fsnotify handles sending MODIFY events to both a watched file and its
> > > > parent, but it does not handle sending a DELETE event to the parent and
> > > > a DELETE_SELF event to the watched file in a single call. Therefore,
> > > > separate fsnotify calls are made: one for the parent (DELETE) and one
> > > > for the child (DELETE_SELF), while retaining the optimized single call
> > >
> > > IN_DELETE_SELF and IN_IGNORED are special and I don't really mind adding
> > > them to kernfs seeing that they are very useful, but adding IN_DELETE
> > > without adding IN_CREATE, that is very arbitrary and I don't like it as
> > > much.
> >
> > That's fair, and the IN_DELETE isn't actually needed for my use case,
> > but I figured I would add the parent notification for file deletions
> > since it is already there for MODIFY events, and I was modifying that
> > area of the code anyway. I'll remove the parent notification for
> > DELETE and just send DELETE_SELF and IGNORED with
> > fsnotify_inoderemove() in V3.
>
> I do not object to adding explicit IN_DELETE_SELF, especially
> because that would be usable also in fanotify, but I'd like to
> understand what's the story with IN_IGNORED.
>
> Thanks,
> Amir.
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2 2/3] kernfs: send IN_DELETE_SELF and IN_IGNORED on file deletion
2026-02-17 22:32 ` T.J. Mercier
@ 2026-02-17 23:13 ` Amir Goldstein
2026-02-18 11:23 ` Jan Kara
1 sibling, 0 replies; 16+ messages in thread
From: Amir Goldstein @ 2026-02-17 23:13 UTC (permalink / raw)
To: T.J. Mercier
Cc: gregkh, tj, driver-core, linux-kernel, cgroups, linux-fsdevel,
jack, shuah, linux-kselftest
On Wed, Feb 18, 2026 at 12:32 AM T.J. Mercier <tjmercier@google.com> wrote:
>
> On Tue, Feb 17, 2026 at 1:25 PM Amir Goldstein <amir73il@gmail.com> wrote:
> >
> > On Tue, Feb 17, 2026 at 9:26 PM T.J. Mercier <tjmercier@google.com> wrote:
> > >
> > > On Tue, Feb 17, 2026 at 2:19 AM Amir Goldstein <amir73il@gmail.com> wrote:
> > > >
> > > > On Thu, Feb 12, 2026 at 01:58:13PM -0800, T.J. Mercier wrote:
> > > > > Currently some kernfs files (e.g. cgroup.events, memory.events) support
> > > > > inotify watches for IN_MODIFY, but unlike with regular filesystems, they
> > > > > do not receive IN_DELETE_SELF or IN_IGNORED events when they are
> > > > > removed.
> > > > >
> > > > > This creates a problem for processes monitoring cgroups. For example, a
> > > > > service monitoring memory.events for memory.high breaches needs to know
> > > > > when a cgroup is removed to clean up its state. Where it's known that a
> > > > > cgroup is removed when all processes die, without IN_DELETE_SELF the
> > > > > service must resort to inefficient workarounds such as:
> > > > > 1. Periodically scanning procfs to detect process death (wastes CPU and
> > > > > is susceptible to PID reuse).
> > > > > 2. Placing an additional IN_DELETE watch on the parent directory
> > > > > (wastes resources managing double the watches).
> > > > > 3. Holding a pidfd for every monitored cgroup (can exhaust file
> > > > > descriptors).
> > > > >
> > > > > This patch enables kernfs to send IN_DELETE_SELF and IN_IGNORED events.
> > > > > This allows applications to rely on a single existing watch on the file
> > > > > of interest (e.g. memory.events) to receive notifications for both
> > > > > modifications and the eventual removal of the file, as well as automatic
> > > > > watch descriptor cleanup, simplifying userspace logic and improving
> > > > > resource efficiency.
> > > >
> > > > This looks very useful,
> > > > But,
> > > > How will the application know that ti can rely on IN_DELETE_SELF
> > > > from cgroups if this is not an opt-in feature?
> > > >
> > > > Essentially, this is similar to the discussions on adding "remote"
> > > > fs notification support (e.g. for smb) and in those discussions
> > > > I insist that "remote" notification should be opt-in (which is
> > > > easy to do with an fanotify init flag) and I claim that mixing
> > > > "remote" events with "local" events on the same group is undesired.
> > >
> > > I think this situation is a bit different because this isn't adding
> > > new features to fsnotify. This is filling a gap that you'd expect to
> > > work if you only read the cgroups or inotify documentation without
> > > realizing that kernfs is simply wired up differently for notification
> > > support than most other filesystems, and only partially supports the
> > > existing notification events. It's opt-in in the sense that an
> > > application registers for IN_DELETE_SELF, but other than a runtime
> > > test like what I added in the selftests I'm not sure if there's a good
> > > way to detect the kernel will actually send the event. Practically
> > > speaking though, if merged upstream I will backport these patches to
> > > all the kernels we use so a runtime check shouldn't be necessary for
> > > our applications.
> > >
> >
> > That's besides the point.
> > An application does not know if it running on a kernel with the backported
> > patch or not, so an application needs to either rely on getting the event
> > or it has to poll. How will the application know if it needs to poll or not?
>
> Either by testing for the behavior at runtime like I mentioned, or by
> depending on certification testing for the platform the application is
> running on which would verify that the selftests I added pass. We do
> the former to check for the presence of other features like swappiness
> support with memory.reclaim, and also the latter for all devices.
>
> > > > However, IN_IGNORED is created when an inotify watch is removed
> > > > and IN_DELETE_SELF is called when a vfs inode is destroyed.
> > > > When setting an inotify watch for IN_IGNORED|IN_DELETE_SELF there
> > > > has to be a vfs inode with inotify mark attached, so why are those
> > > > events not created already? What am I missing?
> > >
> > > The difference is vfs isn't involved when kernfs files are unlinked.
> >
> > No, but the vfs is involved when the last reference on the kernfs inode
> > is dropped.
> >
> > > When a cgroup removal occurs, we get to kernfs_remove via kernfs'
> > > inode_operations without calling vfs_unlink. (You can't rm cgroup
> > > files directly.)
> > >
> >
> > Yes and if there was a vfs inode for this kernfs object, the vfs inode needs to
> > be dropped.
>
> It should be, but it isn't right now.
>
> > > > Are you expecting to get IN_IGNORED|IN_DELETE_SELF on an entry
> > > > while watching the parent? Because this is not how the API works.
> > >
> > > No, only on the file being watched. The parent should only get
> > > IN_DELETE, but I read your feedback below and I'm fine with removing
> > > that part and just sending the DELETE_SELF and IN_IGNORED events.
> > >
> >
> > So if the file was being watched, some application needed to call
> > inotify_add_watch() with the user path to the cgroupfs inode
> > and inotify watch keeps a live reference to this vfs inode.
> >
> > When the cgroup is being destroyed something needs to drop
> > this vfs inode and call __destroy_inode() -> fsnotify_inode_delete()
> > which should remove the inotify watch and result in IN_IGNORED.
>
> Nothing like this exists before this patch.
>
> > IN_DELETE_SELF is a different story, because the inode does not
> > have zero i_nlink.
> >
> > I did not try to follow the code path of cgroupfs destroy when an
> > inotify watch on a cgroup file exists, but this is what I expect.
> > Please explain - what am I missing?
>
> Yes that's the problem here. The inode isn't dropped unless the watch
> is removed, and the watch isn't removed because kernfs doesn't go
> through vfs to notify about file removal. There is nothing to trigger
> dropping the watch and the associated inode reference except this
> patch calling into fsnotify_inoderemove which both sends
> IN_DELETE_SELF and calls __fsnotify_inode_delete for the IN_IGNORED
> and inode cleanup.
>
> Without this, the watch and inode persist after file deletion until
> the process exits and file descriptors are cleaned up, or until
> inotify_rm_watch gets called manually.
>
Yeh, that's not good.
Will be happy to see that fixed.
Thanks,
Amir.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2 2/3] kernfs: send IN_DELETE_SELF and IN_IGNORED on file deletion
2026-02-17 22:32 ` T.J. Mercier
2026-02-17 23:13 ` Amir Goldstein
@ 2026-02-18 11:23 ` Jan Kara
1 sibling, 0 replies; 16+ messages in thread
From: Jan Kara @ 2026-02-18 11:23 UTC (permalink / raw)
To: T.J. Mercier
Cc: Amir Goldstein, gregkh, tj, driver-core, linux-kernel, cgroups,
linux-fsdevel, jack, shuah, linux-kselftest
On Tue 17-02-26 14:32:25, T.J. Mercier wrote:
> On Tue, Feb 17, 2026 at 1:25 PM Amir Goldstein <amir73il@gmail.com> wrote:
> > > > Are you expecting to get IN_IGNORED|IN_DELETE_SELF on an entry
> > > > while watching the parent? Because this is not how the API works.
> > >
> > > No, only on the file being watched. The parent should only get
> > > IN_DELETE, but I read your feedback below and I'm fine with removing
> > > that part and just sending the DELETE_SELF and IN_IGNORED events.
> > >
> >
> > So if the file was being watched, some application needed to call
> > inotify_add_watch() with the user path to the cgroupfs inode
> > and inotify watch keeps a live reference to this vfs inode.
> >
> > When the cgroup is being destroyed something needs to drop
> > this vfs inode and call __destroy_inode() -> fsnotify_inode_delete()
> > which should remove the inotify watch and result in IN_IGNORED.
>
> Nothing like this exists before this patch.
>
> > IN_DELETE_SELF is a different story, because the inode does not
> > have zero i_nlink.
> >
> > I did not try to follow the code path of cgroupfs destroy when an
> > inotify watch on a cgroup file exists, but this is what I expect.
> > Please explain - what am I missing?
>
> Yes that's the problem here. The inode isn't dropped unless the watch
> is removed, and the watch isn't removed because kernfs doesn't go
> through vfs to notify about file removal. There is nothing to trigger
> dropping the watch and the associated inode reference except this
> patch calling into fsnotify_inoderemove which both sends
> IN_DELETE_SELF and calls __fsnotify_inode_delete for the IN_IGNORED
> and inode cleanup.
>
> Without this, the watch and inode persist after file deletion until
> the process exits and file descriptors are cleaned up, or until
> inotify_rm_watch gets called manually.
Hrm. I was scratching my head how it is possible VFS isn't involved for a
while. So let me share what I found:
Normally fsnotify_inoderemove() is called from dentry_unlink_inode() which
is called from d_delete() (name unlinked) and __dentry_kill() (last dput()).
Now it is true that kernfs doesn't bother with pruning child dentries from
its rmdir implementation. It just marks all corresponding kernfs_nodes
(inodes) as dead and that's it so d_delete() isn't called. But vfs_rmdir()
makes up for this by calling shrink_dcache_parent() on the removed
directory so the child dentries end up going through __dentry_kill(). *But*
kernfs also doesn't bother to set i_nlink for these child dentries to 0
when marking them as dead and so __dentry_kill() doesn't call
fsnotify_inoderemove(). So at this point it seems more like a kernfs bug
that children inodes aren't properly cleaned up by setting i_nlink to 0 and
I don't think we should paper over this by calling fsnotify_inoderemove()
explicitely.
Honza
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH v2 3/3] selftests: memcg: Add tests IN_DELETE_SELF and IN_IGNORED on memory.events
2026-02-12 21:58 [PATCH v2 0/3] kernfs: Add inotify IN_DELETE_SELF, IN_IGNORED support for files T.J. Mercier
2026-02-12 21:58 ` [PATCH v2 1/3] kernfs: allow passing fsnotify event types T.J. Mercier
2026-02-12 21:58 ` [PATCH v2 2/3] kernfs: send IN_DELETE_SELF and IN_IGNORED on file deletion T.J. Mercier
@ 2026-02-12 21:58 ` T.J. Mercier
2026-02-16 16:21 ` [PATCH v2 0/3] kernfs: Add inotify IN_DELETE_SELF, IN_IGNORED support for files Amir Goldstein
2026-02-17 6:43 ` Tejun Heo
4 siblings, 0 replies; 16+ messages in thread
From: T.J. Mercier @ 2026-02-12 21:58 UTC (permalink / raw)
To: gregkh, tj, driver-core, linux-kernel, cgroups, shuah,
linux-kselftest
Cc: T.J. Mercier
Add two new tests that verify inotify events are sent when memcg files
are removed.
Signed-off-by: T.J. Mercier <tjmercier@google.com>
---
.../selftests/cgroup/test_memcontrol.c | 122 ++++++++++++++++++
1 file changed, 122 insertions(+)
diff --git a/tools/testing/selftests/cgroup/test_memcontrol.c b/tools/testing/selftests/cgroup/test_memcontrol.c
index 4e1647568c5b..be0e78809494 100644
--- a/tools/testing/selftests/cgroup/test_memcontrol.c
+++ b/tools/testing/selftests/cgroup/test_memcontrol.c
@@ -10,6 +10,7 @@
#include <sys/stat.h>
#include <sys/types.h>
#include <unistd.h>
+#include <sys/inotify.h>
#include <sys/socket.h>
#include <sys/wait.h>
#include <arpa/inet.h>
@@ -1625,6 +1626,125 @@ static int test_memcg_oom_group_score_events(const char *root)
return ret;
}
+static int read_event(int inotify_fd, int expected_event, int expected_wd)
+{
+ struct inotify_event event;
+ ssize_t len = 0;
+
+ len = read(inotify_fd, &event, sizeof(event));
+ if (len < (ssize_t)sizeof(event))
+ return -1;
+
+ if (event.mask != expected_event || event.wd != expected_wd) {
+ fprintf(stderr,
+ "event does not match expected values: mask %d (expected %d) wd %d (expected %d)\n",
+ event.mask, expected_event, event.wd, expected_wd);
+ return -1;
+ }
+
+ return 0;
+}
+
+static int test_memcg_inotify_delete_file(const char *root)
+{
+ int ret = KSFT_FAIL;
+ char *memcg, *child_memcg;
+ int fd, wd;
+
+ memcg = cg_name(root, "memcg_test_0");
+
+ if (!memcg)
+ goto cleanup;
+
+ if (cg_create(memcg))
+ goto cleanup;
+
+ if (cg_write(memcg, "cgroup.subtree_control", "+memory"))
+ goto cleanup;
+
+ child_memcg = cg_name(memcg, "child");
+ if (!child_memcg)
+ goto cleanup;
+
+ if (cg_create(child_memcg))
+ goto cleanup;
+
+ fd = inotify_init1(0);
+ if (fd == -1)
+ goto cleanup;
+
+ wd = inotify_add_watch(fd, cg_control(child_memcg, "memory.events"), IN_DELETE_SELF);
+ if (wd == -1)
+ goto cleanup;
+
+ cg_write(memcg, "cgroup.subtree_control", "-memory");
+
+ if (read_event(fd, IN_DELETE_SELF, wd))
+ goto cleanup;
+
+ if (read_event(fd, IN_IGNORED, wd))
+ goto cleanup;
+
+ ret = KSFT_PASS;
+
+cleanup:
+ if (fd >= 0)
+ close(fd);
+ if (child_memcg)
+ cg_destroy(child_memcg);
+ free(child_memcg);
+ if (memcg)
+ cg_destroy(memcg);
+ free(memcg);
+
+ return ret;
+}
+
+static int test_memcg_inotify_delete_rmdir(const char *root)
+{
+ int ret = KSFT_FAIL;
+ char *memcg;
+ int fd, wd;
+
+ memcg = cg_name(root, "memcg_test_0");
+
+ if (!memcg)
+ goto cleanup;
+
+ if (cg_create(memcg))
+ goto cleanup;
+
+ fd = inotify_init1(0);
+ if (fd == -1)
+ goto cleanup;
+
+ wd = inotify_add_watch(fd, cg_control(memcg, "memory.events"), IN_DELETE_SELF);
+ if (wd == -1)
+ goto cleanup;
+
+ if (cg_destroy(memcg))
+ goto cleanup;
+ free(memcg);
+ memcg = NULL;
+
+ if (read_event(fd, IN_DELETE_SELF, wd))
+ goto cleanup;
+
+ if (read_event(fd, IN_IGNORED, wd))
+ goto cleanup;
+
+ ret = KSFT_PASS;
+
+cleanup:
+ if (fd >= 0)
+ close(fd);
+ if (memcg)
+ cg_destroy(memcg);
+ free(memcg);
+
+ return ret;
+}
+
#define T(x) { x, #x }
struct memcg_test {
int (*fn)(const char *root);
@@ -1644,6 +1764,8 @@ struct memcg_test {
T(test_memcg_oom_group_leaf_events),
T(test_memcg_oom_group_parent_events),
T(test_memcg_oom_group_score_events),
+ T(test_memcg_inotify_delete_file),
+ T(test_memcg_inotify_delete_rmdir),
};
#undef T
--
2.53.0.273.g2a3d683680-goog
^ permalink raw reply related [flat|nested] 16+ messages in thread* Re: [PATCH v2 0/3] kernfs: Add inotify IN_DELETE_SELF, IN_IGNORED support for files
2026-02-12 21:58 [PATCH v2 0/3] kernfs: Add inotify IN_DELETE_SELF, IN_IGNORED support for files T.J. Mercier
` (2 preceding siblings ...)
2026-02-12 21:58 ` [PATCH v2 3/3] selftests: memcg: Add tests IN_DELETE_SELF and IN_IGNORED on memory.events T.J. Mercier
@ 2026-02-16 16:21 ` Amir Goldstein
2026-02-17 19:25 ` T.J. Mercier
2026-02-17 6:43 ` Tejun Heo
4 siblings, 1 reply; 16+ messages in thread
From: Amir Goldstein @ 2026-02-16 16:21 UTC (permalink / raw)
To: T.J. Mercier
Cc: gregkh, tj, driver-core, linux-kernel, cgroups, linux-fsdevel,
Jan Kara, shuah, linux-kselftest
On Thu, Feb 12, 2026 at 01:58:11PM -0800, T.J. Mercier wrote:
> This series adds support for IN_DELETE_SELF and IN_IGNORED inotify
> events to kernfs files.
>
> Currently, kernfs (used by cgroup and others) supports IN_MODIFY events
> but fails to notify watchers when the file is removed (e.g. during
> cgroup destruction). This forces userspace monitors to maintain resource
> intensive side-channels like pidfds, procfs polling, or redundant
> directory watches to detect when a cgroup dies and a watched file is
> removed.
>
> By generating IN_DELETE_SELF events on destruction, we allow watchers to
> rely on a single watch descriptor for the entire lifecycle of the
> monitored file, reducing resource usage (file descriptors, CPU cycles)
> and complexity in userspace.
>
> The series is structured as follows:
> Patch 1 refactors kernfs_elem_attr to support arbitrary event types.
> Patch 2 implements the logic to generate DELETE_SELF and IGNORED events
> on file removal.
> Patch 3 adds selftests to verify the new behavior.
>
> ---
> Changes in v2:
> Remove unused variables from new selftests per kernel test robot
> Fix kernfs_type argument per Tejun
> Inline checks for FS_MODIFY, FS_DELETE in kernfs_notify_workfn per Tejun
>
> T.J. Mercier (3):
> kernfs: allow passing fsnotify event types
> kernfs: send IN_DELETE_SELF and IN_IGNORED on file deletion
> selftests: memcg: Add tests IN_DELETE_SELF and IN_IGNORED on
> memory.events
>
> fs/kernfs/dir.c | 21 +++
> fs/kernfs/file.c | 20 ++-
> fs/kernfs/kernfs-internal.h | 3 +
> include/linux/kernfs.h | 1 +
> .../selftests/cgroup/test_memcontrol.c | 122 ++++++++++++++++++
> 5 files changed, 161 insertions(+), 6 deletions(-)
>
>
> base-commit: ba268514ea14b44570030e8ed2aef92a38679e85
> --
> 2.53.0.273.g2a3d683680-goog
>
In future posts, please CC inotify patches to fsdevel and inotify maintainers.
Thanks,
Amir.
^ permalink raw reply [flat|nested] 16+ messages in thread* Re: [PATCH v2 0/3] kernfs: Add inotify IN_DELETE_SELF, IN_IGNORED support for files
2026-02-16 16:21 ` [PATCH v2 0/3] kernfs: Add inotify IN_DELETE_SELF, IN_IGNORED support for files Amir Goldstein
@ 2026-02-17 19:25 ` T.J. Mercier
0 siblings, 0 replies; 16+ messages in thread
From: T.J. Mercier @ 2026-02-17 19:25 UTC (permalink / raw)
To: Amir Goldstein
Cc: gregkh, tj, driver-core, linux-kernel, cgroups, linux-fsdevel,
Jan Kara, shuah, linux-kselftest
On Mon, Feb 16, 2026 at 8:21 AM Amir Goldstein <amir73il@gmail.com> wrote:
>
> On Thu, Feb 12, 2026 at 01:58:11PM -0800, T.J. Mercier wrote:
> > This series adds support for IN_DELETE_SELF and IN_IGNORED inotify
> > events to kernfs files.
> >
> > Currently, kernfs (used by cgroup and others) supports IN_MODIFY events
> > but fails to notify watchers when the file is removed (e.g. during
> > cgroup destruction). This forces userspace monitors to maintain resource
> > intensive side-channels like pidfds, procfs polling, or redundant
> > directory watches to detect when a cgroup dies and a watched file is
> > removed.
> >
> > By generating IN_DELETE_SELF events on destruction, we allow watchers to
> > rely on a single watch descriptor for the entire lifecycle of the
> > monitored file, reducing resource usage (file descriptors, CPU cycles)
> > and complexity in userspace.
> >
> > The series is structured as follows:
> > Patch 1 refactors kernfs_elem_attr to support arbitrary event types.
> > Patch 2 implements the logic to generate DELETE_SELF and IGNORED events
> > on file removal.
> > Patch 3 adds selftests to verify the new behavior.
> >
> > ---
> > Changes in v2:
> > Remove unused variables from new selftests per kernel test robot
> > Fix kernfs_type argument per Tejun
> > Inline checks for FS_MODIFY, FS_DELETE in kernfs_notify_workfn per Tejun
> >
> > T.J. Mercier (3):
> > kernfs: allow passing fsnotify event types
> > kernfs: send IN_DELETE_SELF and IN_IGNORED on file deletion
> > selftests: memcg: Add tests IN_DELETE_SELF and IN_IGNORED on
> > memory.events
> >
> > fs/kernfs/dir.c | 21 +++
> > fs/kernfs/file.c | 20 ++-
> > fs/kernfs/kernfs-internal.h | 3 +
> > include/linux/kernfs.h | 1 +
> > .../selftests/cgroup/test_memcontrol.c | 122 ++++++++++++++++++
> > 5 files changed, 161 insertions(+), 6 deletions(-)
> >
> >
> > base-commit: ba268514ea14b44570030e8ed2aef92a38679e85
> > --
> > 2.53.0.273.g2a3d683680-goog
> >
>
> In future posts, please CC inotify patches to fsdevel and inotify maintainers.
Got it, will do.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2 0/3] kernfs: Add inotify IN_DELETE_SELF, IN_IGNORED support for files
2026-02-12 21:58 [PATCH v2 0/3] kernfs: Add inotify IN_DELETE_SELF, IN_IGNORED support for files T.J. Mercier
` (3 preceding siblings ...)
2026-02-16 16:21 ` [PATCH v2 0/3] kernfs: Add inotify IN_DELETE_SELF, IN_IGNORED support for files Amir Goldstein
@ 2026-02-17 6:43 ` Tejun Heo
2026-02-17 19:25 ` T.J. Mercier
4 siblings, 1 reply; 16+ messages in thread
From: Tejun Heo @ 2026-02-17 6:43 UTC (permalink / raw)
To: T.J. Mercier
Cc: gregkh, driver-core, linux-kernel, cgroups, shuah,
linux-kselftest
On Thu, Feb 12, 2026 at 01:58:11PM -0800, T.J. Mercier wrote:
> This series adds support for IN_DELETE_SELF and IN_IGNORED inotify
> events to kernfs files.
>
> Currently, kernfs (used by cgroup and others) supports IN_MODIFY events
> but fails to notify watchers when the file is removed (e.g. during
> cgroup destruction). This forces userspace monitors to maintain resource
> intensive side-channels like pidfds, procfs polling, or redundant
> directory watches to detect when a cgroup dies and a watched file is
> removed.
>
> By generating IN_DELETE_SELF events on destruction, we allow watchers to
> rely on a single watch descriptor for the entire lifecycle of the
> monitored file, reducing resource usage (file descriptors, CPU cycles)
> and complexity in userspace.
>
> The series is structured as follows:
> Patch 1 refactors kernfs_elem_attr to support arbitrary event types.
> Patch 2 implements the logic to generate DELETE_SELF and IGNORED events
> on file removal.
> Patch 3 adds selftests to verify the new behavior.
The patchset looks good to me.
Acked-by: Tejun Heo <tj@kernel.org>
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 16+ messages in thread* Re: [PATCH v2 0/3] kernfs: Add inotify IN_DELETE_SELF, IN_IGNORED support for files
2026-02-17 6:43 ` Tejun Heo
@ 2026-02-17 19:25 ` T.J. Mercier
0 siblings, 0 replies; 16+ messages in thread
From: T.J. Mercier @ 2026-02-17 19:25 UTC (permalink / raw)
To: Tejun Heo
Cc: gregkh, driver-core, linux-kernel, cgroups, shuah,
linux-kselftest
On Mon, Feb 16, 2026 at 10:43 PM Tejun Heo <tj@kernel.org> wrote:
>
> On Thu, Feb 12, 2026 at 01:58:11PM -0800, T.J. Mercier wrote:
> > This series adds support for IN_DELETE_SELF and IN_IGNORED inotify
> > events to kernfs files.
> >
> > Currently, kernfs (used by cgroup and others) supports IN_MODIFY events
> > but fails to notify watchers when the file is removed (e.g. during
> > cgroup destruction). This forces userspace monitors to maintain resource
> > intensive side-channels like pidfds, procfs polling, or redundant
> > directory watches to detect when a cgroup dies and a watched file is
> > removed.
> >
> > By generating IN_DELETE_SELF events on destruction, we allow watchers to
> > rely on a single watch descriptor for the entire lifecycle of the
> > monitored file, reducing resource usage (file descriptors, CPU cycles)
> > and complexity in userspace.
> >
> > The series is structured as follows:
> > Patch 1 refactors kernfs_elem_attr to support arbitrary event types.
> > Patch 2 implements the logic to generate DELETE_SELF and IGNORED events
> > on file removal.
> > Patch 3 adds selftests to verify the new behavior.
>
> The patchset looks good to me.
>
> Acked-by: Tejun Heo <tj@kernel.org>
>
> Thanks.
>
> --
> tejun
Thanks Tejun.
Amir would prefer I remove the new DELETE event support and keep only
the part for DELETE_SELF + IGNORED since adding only DELETE would
create an asymmetry with the missing CREATE support. So I will plan to
do that in V3 for this series.
Thanks,
T.J.
^ permalink raw reply [flat|nested] 16+ messages in thread