* [PATCH v3 fanotify 0/2] Fanotify in kernel filter
@ 2024-11-22 22:59 Song Liu
2024-11-22 22:59 ` [PATCH v3 fanotify 1/2] fanotify: Introduce fanotify filter Song Liu
2024-11-22 22:59 ` [PATCH v3 fanotify 2/2] samples/fanotify: Add a sample fanotify fiter Song Liu
0 siblings, 2 replies; 14+ messages in thread
From: Song Liu @ 2024-11-22 22:59 UTC (permalink / raw)
To: bpf, linux-fsdevel, linux-kernel, linux-security-module
Cc: kernel-team, andrii, eddyz87, ast, daniel, martin.lau, viro,
brauner, jack, kpsingh, mattbobrowski, amir73il, repnop, jlayton,
josef, mic, gnoack, Song Liu
This is the first part of v2.
Changes v2 => v3:
1. v3 is the fanotify part of v2.
2. Rebrand as fanotify in kernel filter (instead of fastpath handler).
3. Fix logic and add sample for permission mmode.
[v2]: https://lore.kernel.org/linux-fsdevel/20241114084345.1564165-1-song@kernel.org/
Changes v1 => v2:
1. Add sysfs entries for fastpath handler.
2. Rewrite the sample and bpf selftest to handle subtree monitoring.
This requires quite some work from BPF side to properly handle
inode, dentry, etc.
3. Add CONFIG_FANOTIFY_FASTPATH.
4. Add more documents.
From v1 RFC:
This RFC set introduces in-kernel fastpath handler for fanotify. The
fastpath handler can be used to handle/filter some events without going
through userspace.
In LPC 2024, multiple talks covered use cases of monitoring a subtree in
the VFS (fanotify: [1], bpf/lsm: [2]). This work is inspired by these
discussions. Reliably monitoring of a subtree with low overhead is a hard
problem. We do not claim this set fully solves problem. But we think this
work can be a very useful building block of the solution to this problem.
The fastpath handler can be implemented with built-in logic, in a kernel
module, or a bpf program. The fastpath handler is attached to a fsnotify
group. With current implementation, the multiple fastpath handlers are
maintained in a global list. Only users with CAP_SYS_ADMIN can add
fastpath handlers to the list by loading a kernel module. User without
CAP_SYS_ADMIN can attach a loaded fastpath handler to fanotify instances.
During the attach operation, the fastpath handler can take an argument.
This enables non-CAP_SYSADMIN users to customize/configure the fastpath
handler, for example, with a specific allowlist/denylist.
As the patchset grows to 1000+ lines (including samples and tests), I
would like some feedback before pushing it further.
[1] https://lpc.events/event/18/contributions/1717/
[2] https://lpc.events/event/18/contributions/1940/
Song Liu (2):
fanotify: Introduce fanotify filter
samples/fanotify: Add a sample fanotify fiter
MAINTAINERS | 1 +
fs/notify/fanotify/Kconfig | 13 ++
fs/notify/fanotify/Makefile | 1 +
fs/notify/fanotify/fanotify.c | 44 +++-
fs/notify/fanotify/fanotify_filter.c | 289 +++++++++++++++++++++++++++
fs/notify/fanotify/fanotify_user.c | 7 +
include/linux/fanotify.h | 128 ++++++++++++
include/linux/fsnotify_backend.h | 6 +-
include/uapi/linux/fanotify.h | 36 ++++
samples/Kconfig | 20 +-
samples/Makefile | 2 +-
samples/fanotify/.gitignore | 1 +
samples/fanotify/Makefile | 5 +-
samples/fanotify/filter-mod.c | 105 ++++++++++
samples/fanotify/filter-user.c | 131 ++++++++++++
samples/fanotify/filter.h | 19 ++
16 files changed, 801 insertions(+), 7 deletions(-)
create mode 100644 fs/notify/fanotify/fanotify_filter.c
create mode 100644 samples/fanotify/filter-mod.c
create mode 100644 samples/fanotify/filter-user.c
create mode 100644 samples/fanotify/filter.h
--
2.43.5
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH v3 fanotify 1/2] fanotify: Introduce fanotify filter
2024-11-22 22:59 [PATCH v3 fanotify 0/2] Fanotify in kernel filter Song Liu
@ 2024-11-22 22:59 ` Song Liu
2024-11-24 6:25 ` Amir Goldstein
` (2 more replies)
2024-11-22 22:59 ` [PATCH v3 fanotify 2/2] samples/fanotify: Add a sample fanotify fiter Song Liu
1 sibling, 3 replies; 14+ messages in thread
From: Song Liu @ 2024-11-22 22:59 UTC (permalink / raw)
To: bpf, linux-fsdevel, linux-kernel, linux-security-module
Cc: kernel-team, andrii, eddyz87, ast, daniel, martin.lau, viro,
brauner, jack, kpsingh, mattbobrowski, amir73il, repnop, jlayton,
josef, mic, gnoack, Song Liu
fanotify filter enables handling fanotify events within the kernel, and
thus saves a trip to the user space. fanotify filter can be useful in
many use cases. For example, if a user is only interested in events for
some files in side a directory, a filter can be used to filter out
irrelevant events.
fanotify filter is attached to fsnotify_group. At most one filter can
be attached to a fsnotify_group. The attach/detach of filter are
controlled by two new ioctls on the fanotify fds: FAN_IOC_ADD_FILTER
and FAN_IOC_DEL_FILTER.
fanotify filter is packaged in a kernel module. In the future, it is
also possible to package fanotify filter in a BPF program. Since loading
modules requires CAP_SYS_ADMIN, _loading_ fanotify filter in kernel
modules is limited to CAP_SYS_ADMIN. However, non-SYS_CAP_ADMIN users
can _attach_ filter loaded by sys admin to their fanotify fds. The owner
of the fanotify fitler can use flag FAN_FILTER_F_SYS_ADMIN_ONLY to
make a filter available only to users with CAP_SYS_ADMIN.
To make fanotify filters more flexible, a filter can take arguments at
attach time.
sysfs entry /sys/kernel/fanotify_filter is added to help users know
which fanotify filters are available. At the moment, these files are
added for each filter: flags, desc, and init_args.
Signed-off-by: Song Liu <song@kernel.org>
---
fs/notify/fanotify/Kconfig | 13 ++
fs/notify/fanotify/Makefile | 1 +
fs/notify/fanotify/fanotify.c | 44 +++-
fs/notify/fanotify/fanotify_filter.c | 289 +++++++++++++++++++++++++++
fs/notify/fanotify/fanotify_user.c | 7 +
include/linux/fanotify.h | 128 ++++++++++++
include/linux/fsnotify_backend.h | 6 +-
include/uapi/linux/fanotify.h | 36 ++++
8 files changed, 520 insertions(+), 4 deletions(-)
create mode 100644 fs/notify/fanotify/fanotify_filter.c
diff --git a/fs/notify/fanotify/Kconfig b/fs/notify/fanotify/Kconfig
index 0e36aaf379b7..abfd59d95f49 100644
--- a/fs/notify/fanotify/Kconfig
+++ b/fs/notify/fanotify/Kconfig
@@ -24,3 +24,16 @@ config FANOTIFY_ACCESS_PERMISSIONS
hierarchical storage management systems.
If unsure, say N.
+
+config FANOTIFY_FILTER
+ bool "fanotify in kernel filter"
+ depends on FANOTIFY
+ default y
+ help
+ Say Y here if you want to use fanotify in kernel filter.
+ The filter can be implemented in a kernel module or a
+ BPF program. The filter can speed up fanotify in many
+ use cases. For example, when the listener is only interested in
+ a subset of events.
+
+ If unsure, say Y.
\ No newline at end of file
diff --git a/fs/notify/fanotify/Makefile b/fs/notify/fanotify/Makefile
index 25ef222915e5..d95ec0aeffb5 100644
--- a/fs/notify/fanotify/Makefile
+++ b/fs/notify/fanotify/Makefile
@@ -1,2 +1,3 @@
# SPDX-License-Identifier: GPL-2.0-only
obj-$(CONFIG_FANOTIFY) += fanotify.o fanotify_user.o
+obj-$(CONFIG_FANOTIFY_FILTER) += fanotify_filter.o
diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c
index 224bccaab4cc..c70184cd2d45 100644
--- a/fs/notify/fanotify/fanotify.c
+++ b/fs/notify/fanotify/fanotify.c
@@ -18,6 +18,8 @@
#include "fanotify.h"
+extern struct srcu_struct fsnotify_mark_srcu;
+
static bool fanotify_path_equal(const struct path *p1, const struct path *p2)
{
return p1->mnt == p2->mnt && p1->dentry == p2->dentry;
@@ -888,6 +890,7 @@ static int fanotify_handle_event(struct fsnotify_group *group, u32 mask,
struct fsnotify_event *fsn_event;
__kernel_fsid_t fsid = {};
u32 match_mask = 0;
+ struct fanotify_filter_hook *filter_hook __maybe_unused;
BUILD_BUG_ON(FAN_ACCESS != FS_ACCESS);
BUILD_BUG_ON(FAN_MODIFY != FS_MODIFY);
@@ -921,6 +924,39 @@ static int fanotify_handle_event(struct fsnotify_group *group, u32 mask,
pr_debug("%s: group=%p mask=%x report_mask=%x\n", __func__,
group, mask, match_mask);
+ if (FAN_GROUP_FLAG(group, FANOTIFY_FID_BITS))
+ fsid = fanotify_get_fsid(iter_info);
+
+#ifdef CONFIG_FANOTIFY_FILTER
+ filter_hook = srcu_dereference(group->fanotify_data.filter_hook, &fsnotify_mark_srcu);
+ if (filter_hook) {
+ struct fanotify_filter_event filter_event = {
+ .mask = mask,
+ .data = data,
+ .data_type = data_type,
+ .dir = dir,
+ .file_name = file_name,
+ .fsid = &fsid,
+ .match_mask = match_mask,
+ };
+
+ ret = filter_hook->ops->filter(group, filter_hook, &filter_event);
+
+ /*
+ * The filter may return
+ * - FAN_FILTER_RET_SEND_TO_USERSPACE => continue the rest;
+ * - FAN_FILTER_RET_SKIP_EVENT => return 0 now;
+ * - < 0 error => return error now.
+ *
+ * For the latter two cases, we can just return ret.
+ */
+ BUILD_BUG_ON(FAN_FILTER_RET_SKIP_EVENT != 0);
+
+ if (ret != FAN_FILTER_RET_SEND_TO_USERSPACE)
+ return ret;
+ }
+#endif
+
if (fanotify_is_perm_event(mask)) {
/*
* fsnotify_prepare_user_wait() fails if we race with mark
@@ -930,9 +966,6 @@ static int fanotify_handle_event(struct fsnotify_group *group, u32 mask,
return 0;
}
- if (FAN_GROUP_FLAG(group, FANOTIFY_FID_BITS))
- fsid = fanotify_get_fsid(iter_info);
-
event = fanotify_alloc_event(group, mask, data, data_type, dir,
file_name, &fsid, match_mask);
ret = -ENOMEM;
@@ -976,6 +1009,11 @@ static void fanotify_free_group_priv(struct fsnotify_group *group)
if (mempool_initialized(&group->fanotify_data.error_events_pool))
mempool_exit(&group->fanotify_data.error_events_pool);
+
+#ifdef CONFIG_FANOTIFY_FILTER
+ if (group->fanotify_data.filter_hook)
+ fanotify_filter_hook_free(group->fanotify_data.filter_hook);
+#endif
}
static void fanotify_free_path_event(struct fanotify_event *event)
diff --git a/fs/notify/fanotify/fanotify_filter.c b/fs/notify/fanotify/fanotify_filter.c
new file mode 100644
index 000000000000..9215612e2fcb
--- /dev/null
+++ b/fs/notify/fanotify/fanotify_filter.c
@@ -0,0 +1,289 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/fanotify.h>
+#include <linux/kobject.h>
+#include <linux/module.h>
+
+#include "fanotify.h"
+
+extern struct srcu_struct fsnotify_mark_srcu;
+
+static DEFINE_SPINLOCK(filter_list_lock);
+static LIST_HEAD(filter_list);
+
+static struct kobject *fan_filter_root_kobj;
+
+static struct {
+ enum fanotify_filter_flags flag;
+ const char *name;
+} fanotify_filter_flags_names[] = {
+ {
+ .flag = FAN_FILTER_F_SYS_ADMIN_ONLY,
+ .name = "SYS_ADMIN_ONLY",
+ }
+};
+
+static ssize_t flags_show(struct kobject *kobj,
+ struct kobj_attribute *attr, char *buf)
+{
+ struct fanotify_filter_ops *ops;
+ ssize_t len = 0;
+ int i;
+
+ ops = container_of(kobj, struct fanotify_filter_ops, kobj);
+ for (i = 0; i < ARRAY_SIZE(fanotify_filter_flags_names); i++) {
+ if (ops->flags & fanotify_filter_flags_names[i].flag) {
+ len += sysfs_emit_at(buf, len, "%s%s", len ? " " : "",
+ fanotify_filter_flags_names[i].name);
+ }
+ }
+ len += sysfs_emit_at(buf, len, "\n");
+ return len;
+}
+
+static ssize_t desc_show(struct kobject *kobj,
+ struct kobj_attribute *attr, char *buf)
+{
+ struct fanotify_filter_ops *ops;
+
+ ops = container_of(kobj, struct fanotify_filter_ops, kobj);
+
+ return sysfs_emit(buf, "%s\n", ops->desc ?: "N/A");
+}
+
+static ssize_t init_args_show(struct kobject *kobj,
+ struct kobj_attribute *attr, char *buf)
+{
+ struct fanotify_filter_ops *ops;
+
+ ops = container_of(kobj, struct fanotify_filter_ops, kobj);
+
+ return sysfs_emit(buf, "%s\n", ops->init_args ?: "N/A");
+}
+
+static struct kobj_attribute flags_kobj_attr = __ATTR_RO(flags);
+static struct kobj_attribute desc_kobj_attr = __ATTR_RO(desc);
+static struct kobj_attribute init_args_kobj_attr = __ATTR_RO(init_args);
+
+static struct attribute *fan_filter_attrs[] = {
+ &flags_kobj_attr.attr,
+ &desc_kobj_attr.attr,
+ &init_args_kobj_attr.attr,
+ NULL,
+};
+ATTRIBUTE_GROUPS(fan_filter);
+
+static void fan_filter_kobj_release(struct kobject *kobj)
+{
+}
+
+static const struct kobj_type fan_filter_ktype = {
+ .release = fan_filter_kobj_release,
+ .sysfs_ops = &kobj_sysfs_ops,
+ .default_groups = fan_filter_groups,
+};
+
+static struct fanotify_filter_ops *fanotify_filter_find(const char *name)
+{
+ struct fanotify_filter_ops *ops;
+
+ list_for_each_entry(ops, &filter_list, list) {
+ if (!strcmp(ops->name, name))
+ return ops;
+ }
+ return NULL;
+}
+
+static void __fanotify_filter_unregister(struct fanotify_filter_ops *ops)
+{
+ spin_lock(&filter_list_lock);
+ list_del_init(&ops->list);
+ spin_unlock(&filter_list_lock);
+}
+
+/*
+ * fanotify_filter_register - Register a new filter.
+ *
+ * Add a filter to the filter_list. These filter are
+ * available for all users in the system.
+ *
+ * @ops: pointer to fanotify_filter_ops to add.
+ *
+ * Returns:
+ * 0 - on success;
+ * -EEXIST - filter of the same name already exists.
+ * -ENODEV - fanotify filter was not properly initialized.
+ */
+int fanotify_filter_register(struct fanotify_filter_ops *ops)
+{
+ int ret;
+
+ if (!fan_filter_root_kobj)
+ return -ENODEV;
+
+ spin_lock(&filter_list_lock);
+ if (fanotify_filter_find(ops->name)) {
+ /* cannot register two filters with the same name */
+ spin_unlock(&filter_list_lock);
+ return -EEXIST;
+ }
+ list_add_tail(&ops->list, &filter_list);
+ spin_unlock(&filter_list_lock);
+
+
+ kobject_init(&ops->kobj, &fan_filter_ktype);
+ ret = kobject_add(&ops->kobj, fan_filter_root_kobj, "%s", ops->name);
+ if (ret) {
+ __fanotify_filter_unregister(ops);
+ return ret;
+ }
+ return 0;
+}
+EXPORT_SYMBOL_GPL(fanotify_filter_register);
+
+/*
+ * fanotify_filter_unregister - Unregister a new filter.
+ *
+ * Remove a filter from filter_list.
+ *
+ * @ops: pointer to fanotify_filter_ops to remove.
+ */
+void fanotify_filter_unregister(struct fanotify_filter_ops *ops)
+{
+ kobject_put(&ops->kobj);
+ __fanotify_filter_unregister(ops);
+}
+EXPORT_SYMBOL_GPL(fanotify_filter_unregister);
+
+/*
+ * fanotify_filter_add - Add a filter to fsnotify_group.
+ *
+ * Add a filter from filter_list to a fsnotify_group.
+ *
+ * @group: fsnotify_group that will have add
+ * @argp: fanotify_filter_args that specifies the filter
+ * and the init arguments of the filter.
+ *
+ * Returns:
+ * 0 - on success;
+ * -EEXIST - filter of the same name already exists.
+ */
+int fanotify_filter_add(struct fsnotify_group *group,
+ struct fanotify_filter_args __user *argp)
+{
+ struct fanotify_filter_hook *filter_hook;
+ struct fanotify_filter_ops *filter_ops;
+ struct fanotify_filter_args args;
+ void *init_args = NULL;
+ int ret = 0;
+
+ ret = copy_from_user(&args, argp, sizeof(args));
+ if (ret)
+ return -EFAULT;
+
+ if (args.init_args_size > FAN_FILTER_ARGS_MAX)
+ return -EINVAL;
+
+ args.name[FAN_FILTER_NAME_MAX - 1] = '\0';
+
+ fsnotify_group_lock(group);
+
+ if (rcu_access_pointer(group->fanotify_data.filter_hook)) {
+ fsnotify_group_unlock(group);
+ return -EBUSY;
+ }
+
+ filter_hook = kzalloc(sizeof(*filter_hook), GFP_KERNEL);
+ if (!filter_hook) {
+ ret = -ENOMEM;
+ goto out;
+ }
+
+ spin_lock(&filter_list_lock);
+ filter_ops = fanotify_filter_find(args.name);
+ if (!filter_ops || !try_module_get(filter_ops->owner)) {
+ spin_unlock(&filter_list_lock);
+ ret = -ENOENT;
+ goto err_free_hook;
+ }
+ spin_unlock(&filter_list_lock);
+
+ if (!capable(CAP_SYS_ADMIN) && (filter_ops->flags & FAN_FILTER_F_SYS_ADMIN_ONLY)) {
+ ret = -EPERM;
+ goto err_module_put;
+ }
+
+ if (filter_ops->filter_init) {
+ if (args.init_args_size != filter_ops->init_args_size) {
+ ret = -EINVAL;
+ goto err_module_put;
+ }
+ if (args.init_args_size) {
+ init_args = kzalloc(args.init_args_size, GFP_KERNEL);
+ if (!init_args) {
+ ret = -ENOMEM;
+ goto err_module_put;
+ }
+ if (copy_from_user(init_args, (void __user *)args.init_args,
+ args.init_args_size)) {
+ ret = -EFAULT;
+ goto err_free_args;
+ }
+
+ }
+ ret = filter_ops->filter_init(group, filter_hook, init_args);
+ if (ret)
+ goto err_free_args;
+ kfree(init_args);
+ }
+ filter_hook->ops = filter_ops;
+ rcu_assign_pointer(group->fanotify_data.filter_hook, filter_hook);
+
+out:
+ fsnotify_group_unlock(group);
+ return ret;
+
+err_free_args:
+ kfree(init_args);
+err_module_put:
+ module_put(filter_ops->owner);
+err_free_hook:
+ kfree(filter_hook);
+ goto out;
+}
+
+void fanotify_filter_hook_free(struct fanotify_filter_hook *filter_hook)
+{
+ if (filter_hook->ops->filter_free)
+ filter_hook->ops->filter_free(filter_hook);
+
+ module_put(filter_hook->ops->owner);
+ kfree(filter_hook);
+}
+
+/*
+ * fanotify_filter_del - Delete a filter from fsnotify_group.
+ */
+void fanotify_filter_del(struct fsnotify_group *group)
+{
+ struct fanotify_filter_hook *filter_hook;
+
+ fsnotify_group_lock(group);
+ filter_hook = group->fanotify_data.filter_hook;
+ if (!filter_hook)
+ goto out;
+
+ rcu_assign_pointer(group->fanotify_data.filter_hook, NULL);
+ fanotify_filter_hook_free(filter_hook);
+
+out:
+ fsnotify_group_unlock(group);
+}
+
+static int __init fanotify_filter_init(void)
+{
+ fan_filter_root_kobj = kobject_create_and_add("fanotify_filter", kernel_kobj);
+ if (!fan_filter_root_kobj)
+ return -ENOMEM;
+ return 0;
+}
+device_initcall(fanotify_filter_init);
diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
index 8e2d43fc6f7c..6445256541d2 100644
--- a/fs/notify/fanotify/fanotify_user.c
+++ b/fs/notify/fanotify/fanotify_user.c
@@ -987,6 +987,13 @@ static long fanotify_ioctl(struct file *file, unsigned int cmd, unsigned long ar
spin_unlock(&group->notification_lock);
ret = put_user(send_len, (int __user *) p);
break;
+ case FAN_IOC_ADD_FILTER:
+ ret = fanotify_filter_add(group, p);
+ break;
+ case FAN_IOC_DEL_FILTER:
+ fanotify_filter_del(group);
+ ret = 0;
+ break;
}
return ret;
diff --git a/include/linux/fanotify.h b/include/linux/fanotify.h
index 89ff45bd6f01..d3d9af81d2c5 100644
--- a/include/linux/fanotify.h
+++ b/include/linux/fanotify.h
@@ -2,6 +2,7 @@
#ifndef _LINUX_FANOTIFY_H
#define _LINUX_FANOTIFY_H
+#include <linux/kobject.h>
#include <linux/sysctl.h>
#include <uapi/linux/fanotify.h>
@@ -136,4 +137,131 @@
#undef FAN_ALL_PERM_EVENTS
#undef FAN_ALL_OUTGOING_EVENTS
+struct fsnotify_group;
+struct qstr;
+struct inode;
+struct fanotify_filter_hook;
+
+/*
+ * Event passed to fanotify filter
+ *
+ * @mask: event type and flags
+ * @data: object that event happened on
+ * @data_type: type of object for fanotify_data_XXX() accessors
+ * @dir: optional directory associated with event -
+ * if @file_name is not NULL, this is the directory that
+ * @file_name is relative to
+ * @file_name: optional file name associated with event
+ * @match_mask: mark types of this group that matched the event
+ */
+struct fanotify_filter_event {
+ u32 mask;
+ const void *data;
+ int data_type;
+ struct inode *dir;
+ const struct qstr *file_name;
+ __kernel_fsid_t *fsid;
+ u32 match_mask;
+};
+
+/*
+ * fanotify filter should implement these ops.
+ *
+ * filter - Main call for the filter.
+ * @group: The group being notified
+ * @filter_hook: fanotify_filter_hook for the attach on @group.
+ * Returns: enum fanotify_filter_return.
+ *
+ * filter_init - Initialize the fanotify_filter_hook.
+ * @group: The group that getting the filter
+ * @hook: fanotify_filter_hook to be initialized
+ * @args: Arguments used to initialize @hook
+ *
+ * filter_free - Free the fanotify_filter_hook.
+ * @hook: fanotify_filter_hook to be freed.
+ *
+ * @name: Name of the fanotify_filter_ops. This need to be unique
+ * in the system
+ * @owner: Owner module of this fanotify_filter_ops
+ * @list: Attach to global list of fanotify_filter_ops
+ * @flags: Flags for the fanotify_filter_ops
+ * @init_args_size: expected size of @args of filter_init()
+ * @desc: Description of what this filter do (optional)
+ * @init_args: Description of the init_args in a string (optional)
+ */
+struct fanotify_filter_ops {
+ int (*filter)(struct fsnotify_group *group,
+ struct fanotify_filter_hook *filter_hook,
+ struct fanotify_filter_event *filter_event);
+ int (*filter_init)(struct fsnotify_group *group,
+ struct fanotify_filter_hook *hook,
+ void *args);
+ void (*filter_free)(struct fanotify_filter_hook *hook);
+
+ char name[FAN_FILTER_NAME_MAX];
+ struct module *owner;
+ struct list_head list;
+ u32 flags;
+ u32 init_args_size;
+ const char *desc;
+ const char *init_args;
+
+ /* internal */
+ struct kobject kobj;
+};
+
+/* Flags for fanotify_filter_ops->flags */
+enum fanotify_filter_flags {
+ /* CAP_SYS_ADMIN is required to use this filter */
+ FAN_FILTER_F_SYS_ADMIN_ONLY = BIT(0),
+
+ FAN_FILTER_F_ALL = FAN_FILTER_F_SYS_ADMIN_ONLY,
+};
+
+/*
+ * Hook that attaches fanotify_filter_ops to a group.
+ * @ops: the ops
+ * @data: per group data used by the ops
+ */
+struct fanotify_filter_hook {
+ struct fanotify_filter_ops *ops;
+ void *data;
+};
+
+#ifdef CONFIG_FANOTIFY_FILTER
+
+int fanotify_filter_register(struct fanotify_filter_ops *ops);
+void fanotify_filter_unregister(struct fanotify_filter_ops *ops);
+int fanotify_filter_add(struct fsnotify_group *group,
+ struct fanotify_filter_args __user *args);
+void fanotify_filter_del(struct fsnotify_group *group);
+void fanotify_filter_hook_free(struct fanotify_filter_hook *filter_hook);
+
+#else /* CONFIG_FANOTIFY_FILTER */
+
+static inline int fanotify_filter_register(struct fanotify_filter_ops *ops)
+{
+ return -EOPNOTSUPP;
+}
+
+static inline void fanotify_filter_unregister(struct fanotify_filter_ops *ops)
+{
+}
+
+static inline int fanotify_filter_add(struct fsnotify_group *group,
+ struct fanotify_filter_args __user *args)
+{
+ return -ENOENT;
+}
+
+static inline void fanotify_filter_del(struct fsnotify_group *group)
+{
+}
+
+static inline void fanotify_filter_hook_free(struct fanotify_filter_hook *filter_hook)
+{
+}
+
+#endif /* CONFIG_FANOTIFY_FILTER */
+
#endif /* _LINUX_FANOTIFY_H */
diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
index 3ecf7768e577..8cc2d6f737a6 100644
--- a/include/linux/fsnotify_backend.h
+++ b/include/linux/fsnotify_backend.h
@@ -117,6 +117,7 @@ struct fsnotify_fname;
struct fsnotify_iter_info;
struct mem_cgroup;
+struct fanotify_filter_hook;
/*
* Each group much define these ops. The fsnotify infrastructure will call
@@ -255,8 +256,11 @@ struct fsnotify_group {
int f_flags; /* event_f_flags from fanotify_init() */
struct ucounts *ucounts;
mempool_t error_events_pool;
+#ifdef CONFIG_FANOTIFY_FILTER
+ struct fanotify_filter_hook __rcu *filter_hook;
+#endif /* CONFIG_FANOTIFY_FILTER */
} fanotify_data;
-#endif /* CONFIG_FANOTIFY */
+#endif
};
};
diff --git a/include/uapi/linux/fanotify.h b/include/uapi/linux/fanotify.h
index 34f221d3a1b9..33fd493558c3 100644
--- a/include/uapi/linux/fanotify.h
+++ b/include/uapi/linux/fanotify.h
@@ -3,6 +3,7 @@
#define _UAPI_LINUX_FANOTIFY_H
#include <linux/types.h>
+#include <linux/ioctl.h>
/* the following events that user-space can register for */
#define FAN_ACCESS 0x00000001 /* File was accessed */
@@ -243,4 +244,39 @@ struct fanotify_response_info_audit_rule {
(long)(meta)->event_len >= (long)FAN_EVENT_METADATA_LEN && \
(long)(meta)->event_len <= (long)(len))
+/*
+ * fanotify filter.
+ * Please check out include/linux/fanotify.h for more information about
+ * the kernel module API.
+ */
+
+/* Return value of filter */
+enum fanotify_filter_return {
+ /* The event should NOT be sent to user space */
+ FAN_FILTER_RET_SKIP_EVENT = 0,
+ /* The event should be sent to user space */
+ FAN_FILTER_RET_SEND_TO_USERSPACE = 1,
+};
+
+#define FAN_FILTER_NAME_MAX 64
+#define FAN_FILTER_ARGS_MAX 64
+
+/* This is the arguments used to add filter to a group. */
+struct fanotify_filter_args {
+ char name[FAN_FILTER_NAME_MAX];
+
+ /*
+ * user space pointer to the init args of filter,
+ * up to init_args_len (<= FAN_FILTER_ARGS_MAX).
+ */
+ __u64 init_args;
+ /* size of init_args */
+ __u32 init_args_size;
+} __attribute__((__packed__));
+
+#define FAN_IOC_MAGIC 'F'
+
+#define FAN_IOC_ADD_FILTER _IOW(FAN_IOC_MAGIC, 0, struct fanotify_filter_args)
+#define FAN_IOC_DEL_FILTER _IOW(FAN_IOC_MAGIC, 1, char[FAN_FILTER_NAME_MAX])
+
#endif /* _UAPI_LINUX_FANOTIFY_H */
--
2.43.5
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH v3 fanotify 2/2] samples/fanotify: Add a sample fanotify fiter
2024-11-22 22:59 [PATCH v3 fanotify 0/2] Fanotify in kernel filter Song Liu
2024-11-22 22:59 ` [PATCH v3 fanotify 1/2] fanotify: Introduce fanotify filter Song Liu
@ 2024-11-22 22:59 ` Song Liu
2024-11-24 5:07 ` Amir Goldstein
1 sibling, 1 reply; 14+ messages in thread
From: Song Liu @ 2024-11-22 22:59 UTC (permalink / raw)
To: bpf, linux-fsdevel, linux-kernel, linux-security-module
Cc: kernel-team, andrii, eddyz87, ast, daniel, martin.lau, viro,
brauner, jack, kpsingh, mattbobrowski, amir73il, repnop, jlayton,
josef, mic, gnoack, Song Liu
This sample can run in two different mode: filter mode and block mode.
In filter mode, the filter only sends events in a subtree to user space.
To use it:
[root] insmod ./filter-mod.ko
[root] mkdir -p /tmp/a/b/c/d
[root] ./filter-user /tmp/ /tmp/a/b &
Running in filter mode
[root] touch /tmp/xx # Doesn't generate event
[root]# touch /tmp/a/xxa # Doesn't generate event
[root]# touch /tmp/a/b/xxab # Generates an event
Accessing file xxab # this is the output from filter-user
[root@]# touch /tmp/a/b/c/xxabc # Generates an event
Accessing file xxabc # this is the output from filter-user
In block mode, the filter will block accesses to file in a subtree. To
use it:
[root] insmod ./filter-mod.ko
[root] mkdir -p /tmp/a/b/c/d
[root] ./filter-user /tmp/ /tmp/a/b block &
Running in block mode
[root]# dd if=/dev/zero of=/tmp/a/b/xx # this will fail
dd: failed to open '/tmp/a/b/xx': Operation not permitted
Signed-off-by: Song Liu <song@kernel.org>
---
MAINTAINERS | 1 +
samples/Kconfig | 20 ++++-
samples/Makefile | 2 +-
samples/fanotify/.gitignore | 1 +
samples/fanotify/Makefile | 5 +-
samples/fanotify/filter-mod.c | 105 ++++++++++++++++++++++++++
samples/fanotify/filter-user.c | 131 +++++++++++++++++++++++++++++++++
samples/fanotify/filter.h | 19 +++++
8 files changed, 281 insertions(+), 3 deletions(-)
create mode 100644 samples/fanotify/filter-mod.c
create mode 100644 samples/fanotify/filter-user.c
create mode 100644 samples/fanotify/filter.h
diff --git a/MAINTAINERS b/MAINTAINERS
index 7ad507f49324..8939a48b2d99 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -8658,6 +8658,7 @@ S: Maintained
F: fs/notify/fanotify/
F: include/linux/fanotify.h
F: include/uapi/linux/fanotify.h
+F: samples/fanotify/
FARADAY FOTG210 USB2 DUAL-ROLE CONTROLLER
M: Linus Walleij <linus.walleij@linaro.org>
diff --git a/samples/Kconfig b/samples/Kconfig
index b288d9991d27..9cc0a5cdf604 100644
--- a/samples/Kconfig
+++ b/samples/Kconfig
@@ -149,15 +149,33 @@ config SAMPLE_CONNECTOR
with it.
See also Documentation/driver-api/connector.rst
+config SAMPLE_FANOTIFY
+ bool "Build fanotify monitoring sample"
+ depends on FANOTIFY && CC_CAN_LINK && HEADERS_INSTALL
+ help
+ When enabled, this builds samples for fanotify.
+ There multiple samples for fanotify. Please see the
+ following configs for more details of these
+ samples.
+
config SAMPLE_FANOTIFY_ERROR
bool "Build fanotify error monitoring sample"
- depends on FANOTIFY && CC_CAN_LINK && HEADERS_INSTALL
+ depends on SAMPLE_FANOTIFY
help
When enabled, this builds an example code that uses the
FAN_FS_ERROR fanotify mechanism to monitor filesystem
errors.
See also Documentation/admin-guide/filesystem-monitoring.rst.
+config SAMPLE_FANOTIFY_FILTER
+ tristate "Build fanotify filter sample"
+ depends on SAMPLE_FANOTIFY && m
+ help
+ When enabled, this builds kernel module that contains a
+ fanotify filter handler.
+ The filter handler filters out certain filename
+ prefixes for the fanotify user.
+
config SAMPLE_HIDRAW
bool "hidraw sample"
depends on CC_CAN_LINK && HEADERS_INSTALL
diff --git a/samples/Makefile b/samples/Makefile
index b85fa64390c5..108360972626 100644
--- a/samples/Makefile
+++ b/samples/Makefile
@@ -6,7 +6,7 @@ subdir-$(CONFIG_SAMPLE_ANDROID_BINDERFS) += binderfs
subdir-$(CONFIG_SAMPLE_CGROUP) += cgroup
obj-$(CONFIG_SAMPLE_CONFIGFS) += configfs/
obj-$(CONFIG_SAMPLE_CONNECTOR) += connector/
-obj-$(CONFIG_SAMPLE_FANOTIFY_ERROR) += fanotify/
+obj-$(CONFIG_SAMPLE_FANOTIFY) += fanotify/
subdir-$(CONFIG_SAMPLE_HIDRAW) += hidraw
obj-$(CONFIG_SAMPLE_HW_BREAKPOINT) += hw_breakpoint/
obj-$(CONFIG_SAMPLE_KDB) += kdb/
diff --git a/samples/fanotify/.gitignore b/samples/fanotify/.gitignore
index d74593e8b2de..df75eb5b8f95 100644
--- a/samples/fanotify/.gitignore
+++ b/samples/fanotify/.gitignore
@@ -1 +1,2 @@
fs-monitor
+filter-user
diff --git a/samples/fanotify/Makefile b/samples/fanotify/Makefile
index e20db1bdde3b..c33e9460772e 100644
--- a/samples/fanotify/Makefile
+++ b/samples/fanotify/Makefile
@@ -1,5 +1,8 @@
# SPDX-License-Identifier: GPL-2.0-only
-userprogs-always-y += fs-monitor
+userprogs-always-$(CONFIG_SAMPLE_FANOTIFY_ERROR) += fs-monitor
userccflags += -I usr/include -Wall
+obj-$(CONFIG_SAMPLE_FANOTIFY_FILTER) += filter-mod.o
+
+userprogs-always-$(CONFIG_SAMPLE_FANOTIFY_FILTER) += filter-user
diff --git a/samples/fanotify/filter-mod.c b/samples/fanotify/filter-mod.c
new file mode 100644
index 000000000000..eafe55b1840a
--- /dev/null
+++ b/samples/fanotify/filter-mod.c
@@ -0,0 +1,105 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* Copyright (c) 2024 Meta Platforms, Inc. and affiliates. */
+
+#include <linux/fsnotify.h>
+#include <linux/fanotify.h>
+#include <linux/module.h>
+#include <linux/path.h>
+#include <linux/file.h>
+#include "filter.h"
+
+struct fan_filter_sample_data {
+ struct path subtree_path;
+ enum fan_filter_sample_mode mode;
+};
+
+static int sample_filter(struct fsnotify_group *group,
+ struct fanotify_filter_hook *filter_hook,
+ struct fanotify_filter_event *filter_event)
+{
+ struct fan_filter_sample_data *data;
+ struct dentry *dentry;
+
+ dentry = fsnotify_data_dentry(filter_event->data, filter_event->data_type);
+ if (!dentry)
+ return FAN_FILTER_RET_SEND_TO_USERSPACE;
+
+ data = filter_hook->data;
+
+ if (is_subdir(dentry, data->subtree_path.dentry)) {
+ if (data->mode == FAN_FILTER_SAMPLE_MODE_BLOCK)
+ return -EPERM;
+ return FAN_FILTER_RET_SEND_TO_USERSPACE;
+ }
+ return FAN_FILTER_RET_SKIP_EVENT;
+}
+
+static int sample_filter_init(struct fsnotify_group *group,
+ struct fanotify_filter_hook *filter_hook,
+ void *argp)
+{
+ struct fan_filter_sample_args *args;
+ struct fan_filter_sample_data *data;
+ struct file *file;
+ int fd;
+
+ args = (struct fan_filter_sample_args *)argp;
+ fd = args->subtree_fd;
+
+ file = fget(fd);
+ if (!file)
+ return -EBADF;
+ data = kzalloc(sizeof(struct fan_filter_sample_data), GFP_KERNEL);
+ if (!data) {
+ fput(file);
+ return -ENOMEM;
+ }
+ path_get(&file->f_path);
+ data->subtree_path = file->f_path;
+ fput(file);
+ data->mode = args->mode;
+ filter_hook->data = data;
+ return 0;
+}
+
+static void sample_filter_free(struct fanotify_filter_hook *filter_hook)
+{
+ struct fan_filter_sample_data *data = filter_hook->data;
+
+ path_put(&data->subtree_path);
+ kfree(data);
+}
+
+static struct fanotify_filter_ops fan_filter_sample_ops = {
+ .filter = sample_filter,
+ .filter_init = sample_filter_init,
+ .filter_free = sample_filter_free,
+ .name = "monitor-subtree",
+ .owner = THIS_MODULE,
+ .flags = FAN_FILTER_F_SYS_ADMIN_ONLY,
+ .init_args_size = sizeof(struct fan_filter_sample_args),
+ .desc =
+ "mode = 1: only emit events under a subtree\n"
+ "mode = 2: block accesses under a subtree",
+ .init_args =
+ "struct fan_filter_sample_args {\n"
+ " int subtree_fd;\n"
+ " enum fan_filter_sample_mode mode;\n"
+ "};",
+};
+
+static int __init fanotify_filter_sample_init(void)
+{
+ return fanotify_filter_register(&fan_filter_sample_ops);
+}
+static void __exit fanotify_filter_sample_exit(void)
+{
+ fanotify_filter_unregister(&fan_filter_sample_ops);
+}
+
+module_init(fanotify_filter_sample_init);
+module_exit(fanotify_filter_sample_exit);
+
+MODULE_AUTHOR("Song Liu");
+MODULE_DESCRIPTION("Example fanotify filter handler");
+MODULE_LICENSE("GPL");
diff --git a/samples/fanotify/filter-user.c b/samples/fanotify/filter-user.c
new file mode 100644
index 000000000000..8cdf8ff2bd6d
--- /dev/null
+++ b/samples/fanotify/filter-user.c
@@ -0,0 +1,131 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2024 Meta Platforms, Inc. and affiliates. */
+
+#define _GNU_SOURCE
+#include <err.h>
+#include <fcntl.h>
+#include <stdio.h>
+#include <string.h>
+#include <sys/fanotify.h>
+#include <unistd.h>
+#include <sys/ioctl.h>
+#include "filter.h"
+
+static int total_event_cnt;
+
+static void handle_notifications(char *buffer, int len)
+{
+ struct fanotify_event_metadata *event =
+ (struct fanotify_event_metadata *) buffer;
+ struct fanotify_event_info_header *info;
+ struct fanotify_event_info_fid *fid;
+ struct file_handle *handle;
+ char *name;
+ int off;
+
+ for (; FAN_EVENT_OK(event, len); event = FAN_EVENT_NEXT(event, len)) {
+ for (off = sizeof(*event) ; off < event->event_len;
+ off += info->len) {
+ info = (struct fanotify_event_info_header *)
+ ((char *) event + off);
+ switch (info->info_type) {
+ case FAN_EVENT_INFO_TYPE_DFID_NAME:
+ fid = (struct fanotify_event_info_fid *) info;
+ handle = (struct file_handle *)&fid->handle;
+ name = (char *)handle + sizeof(*handle) + handle->handle_bytes;
+
+ printf("Accessing file %s\n", name);
+ total_event_cnt++;
+ break;
+ default:
+ break;
+ }
+ }
+ }
+}
+
+int main(int argc, char **argv)
+{
+ struct fanotify_filter_args args = {
+ .name = "monitor-subtree",
+ };
+ struct fan_filter_sample_args init_args;
+ int fanotify_fd, subtree_fd;
+ char buffer[BUFSIZ];
+ const char *msg;
+ __u64 mask;
+ int flags;
+
+ if (argc < 3) {
+ printf("Usage:\n"
+ "\t %s <mount point> <subtree to monitor>\n",
+ argv[0]);
+ return 1;
+ }
+
+ subtree_fd = open(argv[2], O_RDONLY | O_CLOEXEC);
+
+ if (subtree_fd < 0)
+ errx(1, "open subtree_fd");
+
+ init_args.subtree_fd = subtree_fd;
+
+ if (argc == 4 && strcmp(argv[3], "block") == 0)
+ init_args.mode = FAN_FILTER_SAMPLE_MODE_BLOCK;
+ else
+ init_args.mode = FAN_FILTER_SAMPLE_MODE_FILTER;
+
+ args.init_args = (__u64)&init_args;
+ args.init_args_size = sizeof(init_args);
+
+ flags = FAN_CLASS_NOTIF | FAN_REPORT_NAME | FAN_REPORT_DIR_FID;
+ mask = FAN_OPEN | FAN_ONDIR | FAN_EVENT_ON_CHILD;
+
+ if (init_args.mode == FAN_FILTER_SAMPLE_MODE_BLOCK) {
+ flags = FAN_CLASS_CONTENT;
+ mask |= FAN_OPEN_PERM;
+ }
+
+ fanotify_fd = fanotify_init(flags, O_RDWR);
+ if (fanotify_fd < 0) {
+ close(subtree_fd);
+ errx(1, "fanotify_init");
+ }
+
+
+ if (fanotify_mark(fanotify_fd, FAN_MARK_ADD | FAN_MARK_FILESYSTEM,
+ mask, AT_FDCWD, argv[1])) {
+ msg = "fanotify_mark";
+ goto err_out;
+ }
+
+ if (ioctl(fanotify_fd, FAN_IOC_ADD_FILTER, &args)) {
+ msg = "ioctl";
+ goto err_out;
+ }
+
+ printf("Running in %s mode\n",
+ init_args.mode == FAN_FILTER_SAMPLE_MODE_BLOCK ? "block" : "filter");
+
+ while (total_event_cnt < 10) {
+ int n = read(fanotify_fd, buffer, BUFSIZ);
+
+ if (n < 0) {
+ msg = "read";
+ goto err_out;
+ }
+
+ handle_notifications(buffer, n);
+ }
+
+ ioctl(fanotify_fd, FAN_IOC_DEL_FILTER);
+ close(fanotify_fd);
+ close(subtree_fd);
+ return 0;
+
+err_out:
+ close(fanotify_fd);
+ close(subtree_fd);
+ errx(1, msg);
+ return 0;
+}
diff --git a/samples/fanotify/filter.h b/samples/fanotify/filter.h
new file mode 100644
index 000000000000..cdb97499019a
--- /dev/null
+++ b/samples/fanotify/filter.h
@@ -0,0 +1,19 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/* Copyright (c) 2024 Meta Platforms, Inc. and affiliates. */
+
+#ifndef _SAMPLES_FANOTIFY_FILTER_H
+#define _SAMPLES_FANOTIFY_FILTER_H
+
+enum fan_filter_sample_mode {
+ /* Only show event in the subtree */
+ FAN_FILTER_SAMPLE_MODE_FILTER = 1,
+ /* Block access to files in the subtree */
+ FAN_FILTER_SAMPLE_MODE_BLOCK = 2,
+};
+
+struct fan_filter_sample_args {
+ int subtree_fd;
+ enum fan_filter_sample_mode mode;
+};
+
+#endif /* _SAMPLES_FANOTIFY_FILTER_H */
--
2.43.5
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH v3 fanotify 2/2] samples/fanotify: Add a sample fanotify fiter
2024-11-22 22:59 ` [PATCH v3 fanotify 2/2] samples/fanotify: Add a sample fanotify fiter Song Liu
@ 2024-11-24 5:07 ` Amir Goldstein
2024-11-24 18:59 ` Song Liu
2024-11-27 0:50 ` Alexei Starovoitov
0 siblings, 2 replies; 14+ messages in thread
From: Amir Goldstein @ 2024-11-24 5:07 UTC (permalink / raw)
To: Song Liu
Cc: bpf, linux-fsdevel, linux-kernel, linux-security-module,
kernel-team, andrii, eddyz87, ast, daniel, martin.lau, viro,
brauner, jack, kpsingh, mattbobrowski, repnop, jlayton, josef,
mic, gnoack
On Sat, Nov 23, 2024 at 12:00 AM Song Liu <song@kernel.org> wrote:
>
> This sample can run in two different mode: filter mode and block mode.
> In filter mode, the filter only sends events in a subtree to user space.
> To use it:
>
> [root] insmod ./filter-mod.ko
> [root] mkdir -p /tmp/a/b/c/d
> [root] ./filter-user /tmp/ /tmp/a/b &
> Running in filter mode
> [root] touch /tmp/xx # Doesn't generate event
> [root]# touch /tmp/a/xxa # Doesn't generate event
> [root]# touch /tmp/a/b/xxab # Generates an event
> Accessing file xxab # this is the output from filter-user
> [root@]# touch /tmp/a/b/c/xxabc # Generates an event
> Accessing file xxabc # this is the output from filter-user
>
> In block mode, the filter will block accesses to file in a subtree. To
> use it:
>
> [root] insmod ./filter-mod.ko
> [root] mkdir -p /tmp/a/b/c/d
> [root] ./filter-user /tmp/ /tmp/a/b block &
> Running in block mode
> [root]# dd if=/dev/zero of=/tmp/a/b/xx # this will fail
> dd: failed to open '/tmp/a/b/xx': Operation not permitted
>
> Signed-off-by: Song Liu <song@kernel.org>
> ---
> MAINTAINERS | 1 +
> samples/Kconfig | 20 ++++-
> samples/Makefile | 2 +-
> samples/fanotify/.gitignore | 1 +
> samples/fanotify/Makefile | 5 +-
> samples/fanotify/filter-mod.c | 105 ++++++++++++++++++++++++++
> samples/fanotify/filter-user.c | 131 +++++++++++++++++++++++++++++++++
> samples/fanotify/filter.h | 19 +++++
> 8 files changed, 281 insertions(+), 3 deletions(-)
> create mode 100644 samples/fanotify/filter-mod.c
> create mode 100644 samples/fanotify/filter-user.c
> create mode 100644 samples/fanotify/filter.h
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 7ad507f49324..8939a48b2d99 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -8658,6 +8658,7 @@ S: Maintained
> F: fs/notify/fanotify/
> F: include/linux/fanotify.h
> F: include/uapi/linux/fanotify.h
> +F: samples/fanotify/
>
> FARADAY FOTG210 USB2 DUAL-ROLE CONTROLLER
> M: Linus Walleij <linus.walleij@linaro.org>
> diff --git a/samples/Kconfig b/samples/Kconfig
> index b288d9991d27..9cc0a5cdf604 100644
> --- a/samples/Kconfig
> +++ b/samples/Kconfig
> @@ -149,15 +149,33 @@ config SAMPLE_CONNECTOR
> with it.
> See also Documentation/driver-api/connector.rst
>
> +config SAMPLE_FANOTIFY
> + bool "Build fanotify monitoring sample"
> + depends on FANOTIFY && CC_CAN_LINK && HEADERS_INSTALL
> + help
> + When enabled, this builds samples for fanotify.
> + There multiple samples for fanotify. Please see the
> + following configs for more details of these
> + samples.
> +
> config SAMPLE_FANOTIFY_ERROR
> bool "Build fanotify error monitoring sample"
> - depends on FANOTIFY && CC_CAN_LINK && HEADERS_INSTALL
> + depends on SAMPLE_FANOTIFY
> help
> When enabled, this builds an example code that uses the
> FAN_FS_ERROR fanotify mechanism to monitor filesystem
> errors.
> See also Documentation/admin-guide/filesystem-monitoring.rst.
>
> +config SAMPLE_FANOTIFY_FILTER
> + tristate "Build fanotify filter sample"
> + depends on SAMPLE_FANOTIFY && m
> + help
> + When enabled, this builds kernel module that contains a
> + fanotify filter handler.
> + The filter handler filters out certain filename
> + prefixes for the fanotify user.
> +
> config SAMPLE_HIDRAW
> bool "hidraw sample"
> depends on CC_CAN_LINK && HEADERS_INSTALL
> diff --git a/samples/Makefile b/samples/Makefile
> index b85fa64390c5..108360972626 100644
> --- a/samples/Makefile
> +++ b/samples/Makefile
> @@ -6,7 +6,7 @@ subdir-$(CONFIG_SAMPLE_ANDROID_BINDERFS) += binderfs
> subdir-$(CONFIG_SAMPLE_CGROUP) += cgroup
> obj-$(CONFIG_SAMPLE_CONFIGFS) += configfs/
> obj-$(CONFIG_SAMPLE_CONNECTOR) += connector/
> -obj-$(CONFIG_SAMPLE_FANOTIFY_ERROR) += fanotify/
> +obj-$(CONFIG_SAMPLE_FANOTIFY) += fanotify/
> subdir-$(CONFIG_SAMPLE_HIDRAW) += hidraw
> obj-$(CONFIG_SAMPLE_HW_BREAKPOINT) += hw_breakpoint/
> obj-$(CONFIG_SAMPLE_KDB) += kdb/
> diff --git a/samples/fanotify/.gitignore b/samples/fanotify/.gitignore
> index d74593e8b2de..df75eb5b8f95 100644
> --- a/samples/fanotify/.gitignore
> +++ b/samples/fanotify/.gitignore
> @@ -1 +1,2 @@
> fs-monitor
> +filter-user
> diff --git a/samples/fanotify/Makefile b/samples/fanotify/Makefile
> index e20db1bdde3b..c33e9460772e 100644
> --- a/samples/fanotify/Makefile
> +++ b/samples/fanotify/Makefile
> @@ -1,5 +1,8 @@
> # SPDX-License-Identifier: GPL-2.0-only
> -userprogs-always-y += fs-monitor
> +userprogs-always-$(CONFIG_SAMPLE_FANOTIFY_ERROR) += fs-monitor
>
> userccflags += -I usr/include -Wall
>
> +obj-$(CONFIG_SAMPLE_FANOTIFY_FILTER) += filter-mod.o
> +
> +userprogs-always-$(CONFIG_SAMPLE_FANOTIFY_FILTER) += filter-user
> diff --git a/samples/fanotify/filter-mod.c b/samples/fanotify/filter-mod.c
> new file mode 100644
> index 000000000000..eafe55b1840a
> --- /dev/null
> +++ b/samples/fanotify/filter-mod.c
> @@ -0,0 +1,105 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/* Copyright (c) 2024 Meta Platforms, Inc. and affiliates. */
> +
> +#include <linux/fsnotify.h>
> +#include <linux/fanotify.h>
> +#include <linux/module.h>
> +#include <linux/path.h>
> +#include <linux/file.h>
> +#include "filter.h"
> +
> +struct fan_filter_sample_data {
> + struct path subtree_path;
> + enum fan_filter_sample_mode mode;
> +};
> +
> +static int sample_filter(struct fsnotify_group *group,
> + struct fanotify_filter_hook *filter_hook,
> + struct fanotify_filter_event *filter_event)
> +{
> + struct fan_filter_sample_data *data;
> + struct dentry *dentry;
> +
> + dentry = fsnotify_data_dentry(filter_event->data, filter_event->data_type);
> + if (!dentry)
> + return FAN_FILTER_RET_SEND_TO_USERSPACE;
> +
> + data = filter_hook->data;
> +
> + if (is_subdir(dentry, data->subtree_path.dentry)) {
> + if (data->mode == FAN_FILTER_SAMPLE_MODE_BLOCK)
> + return -EPERM;
> + return FAN_FILTER_RET_SEND_TO_USERSPACE;
> + }
> + return FAN_FILTER_RET_SKIP_EVENT;
> +}
> +
> +static int sample_filter_init(struct fsnotify_group *group,
> + struct fanotify_filter_hook *filter_hook,
> + void *argp)
> +{
> + struct fan_filter_sample_args *args;
> + struct fan_filter_sample_data *data;
> + struct file *file;
> + int fd;
> +
> + args = (struct fan_filter_sample_args *)argp;
> + fd = args->subtree_fd;
> +
> + file = fget(fd);
> + if (!file)
> + return -EBADF;
> + data = kzalloc(sizeof(struct fan_filter_sample_data), GFP_KERNEL);
> + if (!data) {
> + fput(file);
> + return -ENOMEM;
> + }
> + path_get(&file->f_path);
> + data->subtree_path = file->f_path;
> + fput(file);
> + data->mode = args->mode;
> + filter_hook->data = data;
> + return 0;
> +}
> +
> +static void sample_filter_free(struct fanotify_filter_hook *filter_hook)
> +{
> + struct fan_filter_sample_data *data = filter_hook->data;
> +
> + path_put(&data->subtree_path);
> + kfree(data);
> +}
> +
Hi Song,
This example looks fine but it raises a question.
This filter will keep the mount of subtree_path busy until the group is closed
or the filter is detached.
This is probably fine for many services that keep the mount busy anyway.
But what if this wasn't the intention?
What if an Anti-malware engine that watches all mounts wanted to use that
for configuring some ignore/block subtree filters?
One way would be to use a is_subtree() variant that looks for a
subtree root inode
number and then verifies it with a subtree root fid.
A production subtree filter will need to use a variant of is_subtree()
anyway that
looks for a set of subtree root inodes, because doing a loop of is_subtree() for
multiple paths is a no go.
Don't need to change anything in the example, unless other people
think that we do need to set a better example to begin with...
Thanks,
Amir.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v3 fanotify 1/2] fanotify: Introduce fanotify filter
2024-11-22 22:59 ` [PATCH v3 fanotify 1/2] fanotify: Introduce fanotify filter Song Liu
@ 2024-11-24 6:25 ` Amir Goldstein
2024-11-24 19:08 ` Song Liu
2024-11-28 16:37 ` Jan Kara
2024-11-25 11:01 ` kernel test robot
2024-11-26 0:06 ` kernel test robot
2 siblings, 2 replies; 14+ messages in thread
From: Amir Goldstein @ 2024-11-24 6:25 UTC (permalink / raw)
To: Song Liu
Cc: bpf, linux-fsdevel, linux-kernel, linux-security-module,
kernel-team, andrii, eddyz87, ast, daniel, martin.lau, viro,
brauner, jack, kpsingh, mattbobrowski, repnop, jlayton, josef,
mic, gnoack
On Sat, Nov 23, 2024 at 12:00 AM Song Liu <song@kernel.org> wrote:
>
> fanotify filter enables handling fanotify events within the kernel, and
> thus saves a trip to the user space. fanotify filter can be useful in
> many use cases. For example, if a user is only interested in events for
> some files in side a directory, a filter can be used to filter out
> irrelevant events.
>
> fanotify filter is attached to fsnotify_group. At most one filter can
> be attached to a fsnotify_group. The attach/detach of filter are
> controlled by two new ioctls on the fanotify fds: FAN_IOC_ADD_FILTER
> and FAN_IOC_DEL_FILTER.
>
> fanotify filter is packaged in a kernel module. In the future, it is
> also possible to package fanotify filter in a BPF program. Since loading
> modules requires CAP_SYS_ADMIN, _loading_ fanotify filter in kernel
> modules is limited to CAP_SYS_ADMIN. However, non-SYS_CAP_ADMIN users
> can _attach_ filter loaded by sys admin to their fanotify fds. The owner
> of the fanotify fitler can use flag FAN_FILTER_F_SYS_ADMIN_ONLY to
> make a filter available only to users with CAP_SYS_ADMIN.
>
> To make fanotify filters more flexible, a filter can take arguments at
> attach time.
>
> sysfs entry /sys/kernel/fanotify_filter is added to help users know
> which fanotify filters are available. At the moment, these files are
> added for each filter: flags, desc, and init_args.
It's a shame that we have fanotify knobs at /proc/sys/fs/fanotify/ and
in sysfs, but understand we don't want to make more use of proc for this.
Still I would add the filter files under a new /sys/fs/fanotify/ dir and not
directly under /sys/kernel/
>
> Signed-off-by: Song Liu <song@kernel.org>
> ---
> fs/notify/fanotify/Kconfig | 13 ++
> fs/notify/fanotify/Makefile | 1 +
> fs/notify/fanotify/fanotify.c | 44 +++-
> fs/notify/fanotify/fanotify_filter.c | 289 +++++++++++++++++++++++++++
> fs/notify/fanotify/fanotify_user.c | 7 +
> include/linux/fanotify.h | 128 ++++++++++++
> include/linux/fsnotify_backend.h | 6 +-
> include/uapi/linux/fanotify.h | 36 ++++
> 8 files changed, 520 insertions(+), 4 deletions(-)
> create mode 100644 fs/notify/fanotify/fanotify_filter.c
>
> diff --git a/fs/notify/fanotify/Kconfig b/fs/notify/fanotify/Kconfig
> index 0e36aaf379b7..abfd59d95f49 100644
> --- a/fs/notify/fanotify/Kconfig
> +++ b/fs/notify/fanotify/Kconfig
> @@ -24,3 +24,16 @@ config FANOTIFY_ACCESS_PERMISSIONS
> hierarchical storage management systems.
>
> If unsure, say N.
> +
> +config FANOTIFY_FILTER
> + bool "fanotify in kernel filter"
> + depends on FANOTIFY
> + default y
> + help
> + Say Y here if you want to use fanotify in kernel filter.
> + The filter can be implemented in a kernel module or a
> + BPF program. The filter can speed up fanotify in many
> + use cases. For example, when the listener is only interested in
> + a subset of events.
> +
> + If unsure, say Y.
> \ No newline at end of file
> diff --git a/fs/notify/fanotify/Makefile b/fs/notify/fanotify/Makefile
> index 25ef222915e5..d95ec0aeffb5 100644
> --- a/fs/notify/fanotify/Makefile
> +++ b/fs/notify/fanotify/Makefile
> @@ -1,2 +1,3 @@
> # SPDX-License-Identifier: GPL-2.0-only
> obj-$(CONFIG_FANOTIFY) += fanotify.o fanotify_user.o
> +obj-$(CONFIG_FANOTIFY_FILTER) += fanotify_filter.o
> diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c
> index 224bccaab4cc..c70184cd2d45 100644
> --- a/fs/notify/fanotify/fanotify.c
> +++ b/fs/notify/fanotify/fanotify.c
> @@ -18,6 +18,8 @@
>
> #include "fanotify.h"
>
> +extern struct srcu_struct fsnotify_mark_srcu;
> +
> static bool fanotify_path_equal(const struct path *p1, const struct path *p2)
> {
> return p1->mnt == p2->mnt && p1->dentry == p2->dentry;
> @@ -888,6 +890,7 @@ static int fanotify_handle_event(struct fsnotify_group *group, u32 mask,
> struct fsnotify_event *fsn_event;
> __kernel_fsid_t fsid = {};
> u32 match_mask = 0;
> + struct fanotify_filter_hook *filter_hook __maybe_unused;
>
> BUILD_BUG_ON(FAN_ACCESS != FS_ACCESS);
> BUILD_BUG_ON(FAN_MODIFY != FS_MODIFY);
> @@ -921,6 +924,39 @@ static int fanotify_handle_event(struct fsnotify_group *group, u32 mask,
> pr_debug("%s: group=%p mask=%x report_mask=%x\n", __func__,
> group, mask, match_mask);
>
> + if (FAN_GROUP_FLAG(group, FANOTIFY_FID_BITS))
> + fsid = fanotify_get_fsid(iter_info);
> +
> +#ifdef CONFIG_FANOTIFY_FILTER
> + filter_hook = srcu_dereference(group->fanotify_data.filter_hook, &fsnotify_mark_srcu);
Do we actually need the sleeping rcu protection for calling the hook?
Can regular rcu read side be nested inside srcu read side?
Jan,
I don't remember why srcu is needed since we are not holding it
when waiting for userspace anymore?
> + if (filter_hook) {
> + struct fanotify_filter_event filter_event = {
> + .mask = mask,
> + .data = data,
> + .data_type = data_type,
> + .dir = dir,
> + .file_name = file_name,
> + .fsid = &fsid,
> + .match_mask = match_mask,
> + };
> +
> + ret = filter_hook->ops->filter(group, filter_hook, &filter_event);
> +
> + /*
> + * The filter may return
> + * - FAN_FILTER_RET_SEND_TO_USERSPACE => continue the rest;
> + * - FAN_FILTER_RET_SKIP_EVENT => return 0 now;
> + * - < 0 error => return error now.
> + *
> + * For the latter two cases, we can just return ret.
> + */
> + BUILD_BUG_ON(FAN_FILTER_RET_SKIP_EVENT != 0);
> +
> + if (ret != FAN_FILTER_RET_SEND_TO_USERSPACE)
> + return ret;
> + }
> +#endif
> +
> if (fanotify_is_perm_event(mask)) {
> /*
> * fsnotify_prepare_user_wait() fails if we race with mark
> @@ -930,9 +966,6 @@ static int fanotify_handle_event(struct fsnotify_group *group, u32 mask,
> return 0;
> }
>
> - if (FAN_GROUP_FLAG(group, FANOTIFY_FID_BITS))
> - fsid = fanotify_get_fsid(iter_info);
> -
> event = fanotify_alloc_event(group, mask, data, data_type, dir,
> file_name, &fsid, match_mask);
> ret = -ENOMEM;
> @@ -976,6 +1009,11 @@ static void fanotify_free_group_priv(struct fsnotify_group *group)
>
> if (mempool_initialized(&group->fanotify_data.error_events_pool))
> mempool_exit(&group->fanotify_data.error_events_pool);
> +
> +#ifdef CONFIG_FANOTIFY_FILTER
> + if (group->fanotify_data.filter_hook)
> + fanotify_filter_hook_free(group->fanotify_data.filter_hook);
> +#endif
> }
>
> static void fanotify_free_path_event(struct fanotify_event *event)
> diff --git a/fs/notify/fanotify/fanotify_filter.c b/fs/notify/fanotify/fanotify_filter.c
> new file mode 100644
> index 000000000000..9215612e2fcb
> --- /dev/null
> +++ b/fs/notify/fanotify/fanotify_filter.c
> @@ -0,0 +1,289 @@
> +// SPDX-License-Identifier: GPL-2.0
> +#include <linux/fanotify.h>
> +#include <linux/kobject.h>
> +#include <linux/module.h>
> +
> +#include "fanotify.h"
> +
> +extern struct srcu_struct fsnotify_mark_srcu;
> +
> +static DEFINE_SPINLOCK(filter_list_lock);
> +static LIST_HEAD(filter_list);
> +
> +static struct kobject *fan_filter_root_kobj;
> +
> +static struct {
> + enum fanotify_filter_flags flag;
> + const char *name;
> +} fanotify_filter_flags_names[] = {
> + {
> + .flag = FAN_FILTER_F_SYS_ADMIN_ONLY,
> + .name = "SYS_ADMIN_ONLY",
> + }
> +};
> +
> +static ssize_t flags_show(struct kobject *kobj,
> + struct kobj_attribute *attr, char *buf)
> +{
> + struct fanotify_filter_ops *ops;
> + ssize_t len = 0;
> + int i;
> +
> + ops = container_of(kobj, struct fanotify_filter_ops, kobj);
> + for (i = 0; i < ARRAY_SIZE(fanotify_filter_flags_names); i++) {
> + if (ops->flags & fanotify_filter_flags_names[i].flag) {
> + len += sysfs_emit_at(buf, len, "%s%s", len ? " " : "",
> + fanotify_filter_flags_names[i].name);
> + }
> + }
> + len += sysfs_emit_at(buf, len, "\n");
> + return len;
> +}
> +
> +static ssize_t desc_show(struct kobject *kobj,
> + struct kobj_attribute *attr, char *buf)
> +{
> + struct fanotify_filter_ops *ops;
> +
> + ops = container_of(kobj, struct fanotify_filter_ops, kobj);
> +
> + return sysfs_emit(buf, "%s\n", ops->desc ?: "N/A");
> +}
> +
> +static ssize_t init_args_show(struct kobject *kobj,
> + struct kobj_attribute *attr, char *buf)
> +{
> + struct fanotify_filter_ops *ops;
> +
> + ops = container_of(kobj, struct fanotify_filter_ops, kobj);
> +
> + return sysfs_emit(buf, "%s\n", ops->init_args ?: "N/A");
> +}
> +
> +static struct kobj_attribute flags_kobj_attr = __ATTR_RO(flags);
> +static struct kobj_attribute desc_kobj_attr = __ATTR_RO(desc);
> +static struct kobj_attribute init_args_kobj_attr = __ATTR_RO(init_args);
> +
> +static struct attribute *fan_filter_attrs[] = {
> + &flags_kobj_attr.attr,
> + &desc_kobj_attr.attr,
> + &init_args_kobj_attr.attr,
> + NULL,
> +};
> +ATTRIBUTE_GROUPS(fan_filter);
> +
> +static void fan_filter_kobj_release(struct kobject *kobj)
> +{
> +}
> +
> +static const struct kobj_type fan_filter_ktype = {
> + .release = fan_filter_kobj_release,
> + .sysfs_ops = &kobj_sysfs_ops,
> + .default_groups = fan_filter_groups,
> +};
> +
> +static struct fanotify_filter_ops *fanotify_filter_find(const char *name)
> +{
> + struct fanotify_filter_ops *ops;
> +
> + list_for_each_entry(ops, &filter_list, list) {
> + if (!strcmp(ops->name, name))
> + return ops;
> + }
> + return NULL;
> +}
> +
> +static void __fanotify_filter_unregister(struct fanotify_filter_ops *ops)
> +{
> + spin_lock(&filter_list_lock);
> + list_del_init(&ops->list);
> + spin_unlock(&filter_list_lock);
> +}
> +
> +/*
> + * fanotify_filter_register - Register a new filter.
> + *
> + * Add a filter to the filter_list. These filter are
> + * available for all users in the system.
> + *
> + * @ops: pointer to fanotify_filter_ops to add.
> + *
> + * Returns:
> + * 0 - on success;
> + * -EEXIST - filter of the same name already exists.
> + * -ENODEV - fanotify filter was not properly initialized.
> + */
> +int fanotify_filter_register(struct fanotify_filter_ops *ops)
> +{
> + int ret;
> +
> + if (!fan_filter_root_kobj)
> + return -ENODEV;
> +
> + spin_lock(&filter_list_lock);
> + if (fanotify_filter_find(ops->name)) {
> + /* cannot register two filters with the same name */
> + spin_unlock(&filter_list_lock);
> + return -EEXIST;
> + }
> + list_add_tail(&ops->list, &filter_list);
> + spin_unlock(&filter_list_lock);
> +
> +
> + kobject_init(&ops->kobj, &fan_filter_ktype);
> + ret = kobject_add(&ops->kobj, fan_filter_root_kobj, "%s", ops->name);
> + if (ret) {
> + __fanotify_filter_unregister(ops);
> + return ret;
> + }
> + return 0;
> +}
> +EXPORT_SYMBOL_GPL(fanotify_filter_register);
> +
> +/*
> + * fanotify_filter_unregister - Unregister a new filter.
> + *
> + * Remove a filter from filter_list.
> + *
> + * @ops: pointer to fanotify_filter_ops to remove.
> + */
> +void fanotify_filter_unregister(struct fanotify_filter_ops *ops)
> +{
> + kobject_put(&ops->kobj);
> + __fanotify_filter_unregister(ops);
> +}
> +EXPORT_SYMBOL_GPL(fanotify_filter_unregister);
> +
> +/*
> + * fanotify_filter_add - Add a filter to fsnotify_group.
> + *
> + * Add a filter from filter_list to a fsnotify_group.
> + *
> + * @group: fsnotify_group that will have add
> + * @argp: fanotify_filter_args that specifies the filter
> + * and the init arguments of the filter.
> + *
> + * Returns:
> + * 0 - on success;
> + * -EEXIST - filter of the same name already exists.
> + */
> +int fanotify_filter_add(struct fsnotify_group *group,
> + struct fanotify_filter_args __user *argp)
> +{
> + struct fanotify_filter_hook *filter_hook;
> + struct fanotify_filter_ops *filter_ops;
> + struct fanotify_filter_args args;
> + void *init_args = NULL;
> + int ret = 0;
> +
> + ret = copy_from_user(&args, argp, sizeof(args));
> + if (ret)
> + return -EFAULT;
> +
> + if (args.init_args_size > FAN_FILTER_ARGS_MAX)
> + return -EINVAL;
> +
> + args.name[FAN_FILTER_NAME_MAX - 1] = '\0';
> +
> + fsnotify_group_lock(group);
> +
> + if (rcu_access_pointer(group->fanotify_data.filter_hook)) {
> + fsnotify_group_unlock(group);
> + return -EBUSY;
> + }
> +
> + filter_hook = kzalloc(sizeof(*filter_hook), GFP_KERNEL);
> + if (!filter_hook) {
> + ret = -ENOMEM;
> + goto out;
> + }
> +
> + spin_lock(&filter_list_lock);
> + filter_ops = fanotify_filter_find(args.name);
> + if (!filter_ops || !try_module_get(filter_ops->owner)) {
> + spin_unlock(&filter_list_lock);
> + ret = -ENOENT;
> + goto err_free_hook;
> + }
> + spin_unlock(&filter_list_lock);
> +
> + if (!capable(CAP_SYS_ADMIN) && (filter_ops->flags & FAN_FILTER_F_SYS_ADMIN_ONLY)) {
1. feels better to opt-in for UNPRIV (and maybe later on) rather than
make it the default.
2. need to check that filter_ops->flags has only "known" flags
3. probably need to add filter_ops->version check in case we want to
change the ABI
> + ret = -EPERM;
> + goto err_module_put;
> + }
> +
> + if (filter_ops->filter_init) {
> + if (args.init_args_size != filter_ops->init_args_size) {
> + ret = -EINVAL;
> + goto err_module_put;
> + }
> + if (args.init_args_size) {
> + init_args = kzalloc(args.init_args_size, GFP_KERNEL);
> + if (!init_args) {
> + ret = -ENOMEM;
> + goto err_module_put;
> + }
> + if (copy_from_user(init_args, (void __user *)args.init_args,
> + args.init_args_size)) {
> + ret = -EFAULT;
> + goto err_free_args;
> + }
> +
> + }
> + ret = filter_ops->filter_init(group, filter_hook, init_args);
> + if (ret)
> + goto err_free_args;
> + kfree(init_args);
> + }
> + filter_hook->ops = filter_ops;
> + rcu_assign_pointer(group->fanotify_data.filter_hook, filter_hook);
> +
> +out:
> + fsnotify_group_unlock(group);
> + return ret;
> +
> +err_free_args:
> + kfree(init_args);
> +err_module_put:
> + module_put(filter_ops->owner);
> +err_free_hook:
> + kfree(filter_hook);
> + goto out;
> +}
> +
> +void fanotify_filter_hook_free(struct fanotify_filter_hook *filter_hook)
> +{
> + if (filter_hook->ops->filter_free)
> + filter_hook->ops->filter_free(filter_hook);
> +
> + module_put(filter_hook->ops->owner);
> + kfree(filter_hook);
> +}
> +
> +/*
> + * fanotify_filter_del - Delete a filter from fsnotify_group.
> + */
> +void fanotify_filter_del(struct fsnotify_group *group)
> +{
> + struct fanotify_filter_hook *filter_hook;
> +
> + fsnotify_group_lock(group);
> + filter_hook = group->fanotify_data.filter_hook;
> + if (!filter_hook)
> + goto out;
> +
> + rcu_assign_pointer(group->fanotify_data.filter_hook, NULL);
> + fanotify_filter_hook_free(filter_hook);
The read side is protected with srcu and there is no srcu/rcu delay of freeing.
You will either need something along the lines of
fsnotify_connector_destroy_workfn() with synchronize_srcu()
or use regular rcu delay and read side (assuming that it can be nested inside
srcu read side?).
Thanks,
Amir.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v3 fanotify 2/2] samples/fanotify: Add a sample fanotify fiter
2024-11-24 5:07 ` Amir Goldstein
@ 2024-11-24 18:59 ` Song Liu
2024-11-27 0:50 ` Alexei Starovoitov
1 sibling, 0 replies; 14+ messages in thread
From: Song Liu @ 2024-11-24 18:59 UTC (permalink / raw)
To: Amir Goldstein
Cc: Song Liu, bpf@vger.kernel.org, linux-fsdevel@vger.kernel.org,
linux-kernel@vger.kernel.org,
linux-security-module@vger.kernel.org, Kernel Team,
andrii@kernel.org, eddyz87@gmail.com, ast@kernel.org,
daniel@iogearbox.net, martin.lau@linux.dev,
viro@zeniv.linux.org.uk, brauner@kernel.org, jack@suse.cz,
kpsingh@kernel.org, mattbobrowski@google.com, repnop@google.com,
jlayton@kernel.org, Josef Bacik, mic@digikod.net,
gnoack@google.com
> On Nov 23, 2024, at 9:07 PM, Amir Goldstein <amir73il@gmail.com> wrote:
>
> On Sat, Nov 23, 2024 at 12:00 AM Song Liu <song@kernel.org> wrote:
[...]
>> +}
>> +
>> +static void sample_filter_free(struct fanotify_filter_hook *filter_hook)
>> +{
>> + struct fan_filter_sample_data *data = filter_hook->data;
>> +
>> + path_put(&data->subtree_path);
>> + kfree(data);
>> +}
>> +
>
> Hi Song,
>
> This example looks fine but it raises a question.
> This filter will keep the mount of subtree_path busy until the group is closed
> or the filter is detached.
> This is probably fine for many services that keep the mount busy anyway.
>
> But what if this wasn't the intention?
> What if an Anti-malware engine that watches all mounts wanted to use that
> for configuring some ignore/block subtree filters?
>
> One way would be to use a is_subtree() variant that looks for a
> subtree root inode
> number and then verifies it with a subtree root fid.
> A production subtree filter will need to use a variant of is_subtree()
> anyway that
> looks for a set of subtree root inodes, because doing a loop of is_subtree() for
> multiple paths is a no go.
Maybe some cache mechanism will be sufficient (and maybe also the
best we can do) in this case?
Thanks,
Song
>
> Don't need to change anything in the example, unless other people
> think that we do need to set a better example to begin with...
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v3 fanotify 1/2] fanotify: Introduce fanotify filter
2024-11-24 6:25 ` Amir Goldstein
@ 2024-11-24 19:08 ` Song Liu
2024-11-28 16:37 ` Jan Kara
1 sibling, 0 replies; 14+ messages in thread
From: Song Liu @ 2024-11-24 19:08 UTC (permalink / raw)
To: Amir Goldstein
Cc: Song Liu, bpf@vger.kernel.org, linux-fsdevel@vger.kernel.org,
linux-kernel@vger.kernel.org,
linux-security-module@vger.kernel.org, Kernel Team,
andrii@kernel.org, eddyz87@gmail.com, ast@kernel.org,
daniel@iogearbox.net, martin.lau@linux.dev,
viro@zeniv.linux.org.uk, brauner@kernel.org, jack@suse.cz,
kpsingh@kernel.org, mattbobrowski@google.com, repnop@google.com,
jlayton@kernel.org, Josef Bacik, mic@digikod.net,
gnoack@google.com
> On Nov 23, 2024, at 10:25 PM, Amir Goldstein <amir73il@gmail.com> wrote:
>
> On Sat, Nov 23, 2024 at 12:00 AM Song Liu <song@kernel.org> wrote:
[...]
>>
>> To make fanotify filters more flexible, a filter can take arguments at
>> attach time.
>>
>> sysfs entry /sys/kernel/fanotify_filter is added to help users know
>> which fanotify filters are available. At the moment, these files are
>> added for each filter: flags, desc, and init_args.
>
> It's a shame that we have fanotify knobs at /proc/sys/fs/fanotify/ and
> in sysfs, but understand we don't want to make more use of proc for this.
>
> Still I would add the filter files under a new /sys/fs/fanotify/ dir and not
> directly under /sys/kernel/
I don't have a strong preference either way. We can create it under
/sys/fs/fanotify if that makes more sense.
>
>>
>> Signed-off-by: Song Liu <song@kernel.org>
>> ---
>> fs/notify/fanotify/Kconfig | 13 ++
>> fs/notify/fanotify/Makefile | 1 +
>> fs/notify/fanotify/fanotify.c | 44 +++-
>> fs/notify/fanotify/fanotify_filter.c | 289 +++++++++++++++++++++++++++
>> fs/notify/fanotify/fanotify_user.c | 7 +
>> include/linux/fanotify.h | 128 ++++++++++++
>> include/linux/fsnotify_backend.h | 6 +-
>> include/uapi/linux/fanotify.h | 36 ++++
>> 8 files changed, 520 insertions(+), 4 deletions(-)
>> create mode 100644 fs/notify/fanotify/fanotify_filter.c
[...]
>>
>> BUILD_BUG_ON(FAN_ACCESS != FS_ACCESS);
>> BUILD_BUG_ON(FAN_MODIFY != FS_MODIFY);
>> @@ -921,6 +924,39 @@ static int fanotify_handle_event(struct fsnotify_group *group, u32 mask,
>> pr_debug("%s: group=%p mask=%x report_mask=%x\n", __func__,
>> group, mask, match_mask);
>>
>> + if (FAN_GROUP_FLAG(group, FANOTIFY_FID_BITS))
>> + fsid = fanotify_get_fsid(iter_info);
>> +
>> +#ifdef CONFIG_FANOTIFY_FILTER
>> + filter_hook = srcu_dereference(group->fanotify_data.filter_hook, &fsnotify_mark_srcu);
>
> Do we actually need the sleeping rcu protection for calling the hook?
> Can regular rcu read side be nested inside srcu read side?
I was thinking the filter function can still sleep, for example, to
read an xattr.
>
> Jan,
>
> I don't remember why srcu is needed since we are not holding it
> when waiting for userspace anymore?
>
>> + if (filter_hook) {
>> + struct fanotify_filter_event filter_event = {
>> + .mask = mask,
>> + .data = data,
>> + .data_type = data_type,
>> + .dir = dir,
>> + .file_name = file_name,
>> + .fsid = &fsid,
>> + .match_mask = match_mask,
>> + };
[...]
>> +
>> + spin_lock(&filter_list_lock);
>> + filter_ops = fanotify_filter_find(args.name);
>> + if (!filter_ops || !try_module_get(filter_ops->owner)) {
>> + spin_unlock(&filter_list_lock);
>> + ret = -ENOENT;
>> + goto err_free_hook;
>> + }
>> + spin_unlock(&filter_list_lock);
>> +
>> + if (!capable(CAP_SYS_ADMIN) && (filter_ops->flags & FAN_FILTER_F_SYS_ADMIN_ONLY)) {
>
> 1. feels better to opt-in for UNPRIV (and maybe later on) rather than
> make it the default.
Sure.
> 2. need to check that filter_ops->flags has only "known" flags
Agreed.
> 3. probably need to add filter_ops->version check in case we want to
> change the ABI
I think we can let the author of the filter handle versioning
inside filter_init function. Many users may not need any logic
for compatibility.
>
>> + ret = -EPERM;
>> + goto err_module_put;
>> + }
>> +
[...]
>> +
>> +/*
>> + * fanotify_filter_del - Delete a filter from fsnotify_group.
>> + */
>> +void fanotify_filter_del(struct fsnotify_group *group)
>> +{
>> + struct fanotify_filter_hook *filter_hook;
>> +
>> + fsnotify_group_lock(group);
>> + filter_hook = group->fanotify_data.filter_hook;
>> + if (!filter_hook)
>> + goto out;
>> +
>> + rcu_assign_pointer(group->fanotify_data.filter_hook, NULL);
>> + fanotify_filter_hook_free(filter_hook);
>
> The read side is protected with srcu and there is no srcu/rcu delay of freeing.
> You will either need something along the lines of
> fsnotify_connector_destroy_workfn() with synchronize_srcu()
Yeah, we do need a synchronize_srcu() here. I will fix this in
the next version.
> or use regular rcu delay and read side (assuming that it can be nested inside
> srcu read side?).
Thanks for the review!
Song
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v3 fanotify 1/2] fanotify: Introduce fanotify filter
2024-11-22 22:59 ` [PATCH v3 fanotify 1/2] fanotify: Introduce fanotify filter Song Liu
2024-11-24 6:25 ` Amir Goldstein
@ 2024-11-25 11:01 ` kernel test robot
2024-11-26 0:06 ` kernel test robot
2 siblings, 0 replies; 14+ messages in thread
From: kernel test robot @ 2024-11-25 11:01 UTC (permalink / raw)
To: Song Liu, bpf, linux-fsdevel, linux-kernel, linux-security-module
Cc: oe-kbuild-all, kernel-team, andrii, eddyz87, ast, daniel,
martin.lau, viro, brauner, jack, kpsingh, mattbobrowski, amir73il,
repnop, jlayton, josef, mic, gnoack, Song Liu
Hi Song,
kernel test robot noticed the following build warnings:
[auto build test WARNING on jack-fs/fsnotify]
[also build test WARNING on linus/master v6.12 next-20241125]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Song-Liu/fanotify-Introduce-fanotify-filter/20241125-110818
base: https://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs.git fsnotify
patch link: https://lore.kernel.org/r/20241122225958.1775625-2-song%40kernel.org
patch subject: [PATCH v3 fanotify 1/2] fanotify: Introduce fanotify filter
config: i386-randconfig-001-20241125 (https://download.01.org/0day-ci/archive/20241125/202411251801.nLqLjFGW-lkp@intel.com/config)
compiler: gcc-12 (Debian 12.2.0-14) 12.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20241125/202411251801.nLqLjFGW-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202411251801.nLqLjFGW-lkp@intel.com/
All warnings (new ones prefixed by >>):
fs/notify/fanotify/fanotify_filter.c: In function 'fanotify_filter_add':
>> fs/notify/fanotify/fanotify_filter.c:226:55: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
226 | if (copy_from_user(init_args, (void __user *)args.init_args,
| ^
vim +226 fs/notify/fanotify/fanotify_filter.c
156
157 /*
158 * fanotify_filter_add - Add a filter to fsnotify_group.
159 *
160 * Add a filter from filter_list to a fsnotify_group.
161 *
162 * @group: fsnotify_group that will have add
163 * @argp: fanotify_filter_args that specifies the filter
164 * and the init arguments of the filter.
165 *
166 * Returns:
167 * 0 - on success;
168 * -EEXIST - filter of the same name already exists.
169 */
170 int fanotify_filter_add(struct fsnotify_group *group,
171 struct fanotify_filter_args __user *argp)
172 {
173 struct fanotify_filter_hook *filter_hook;
174 struct fanotify_filter_ops *filter_ops;
175 struct fanotify_filter_args args;
176 void *init_args = NULL;
177 int ret = 0;
178
179 ret = copy_from_user(&args, argp, sizeof(args));
180 if (ret)
181 return -EFAULT;
182
183 if (args.init_args_size > FAN_FILTER_ARGS_MAX)
184 return -EINVAL;
185
186 args.name[FAN_FILTER_NAME_MAX - 1] = '\0';
187
188 fsnotify_group_lock(group);
189
190 if (rcu_access_pointer(group->fanotify_data.filter_hook)) {
191 fsnotify_group_unlock(group);
192 return -EBUSY;
193 }
194
195 filter_hook = kzalloc(sizeof(*filter_hook), GFP_KERNEL);
196 if (!filter_hook) {
197 ret = -ENOMEM;
198 goto out;
199 }
200
201 spin_lock(&filter_list_lock);
202 filter_ops = fanotify_filter_find(args.name);
203 if (!filter_ops || !try_module_get(filter_ops->owner)) {
204 spin_unlock(&filter_list_lock);
205 ret = -ENOENT;
206 goto err_free_hook;
207 }
208 spin_unlock(&filter_list_lock);
209
210 if (!capable(CAP_SYS_ADMIN) && (filter_ops->flags & FAN_FILTER_F_SYS_ADMIN_ONLY)) {
211 ret = -EPERM;
212 goto err_module_put;
213 }
214
215 if (filter_ops->filter_init) {
216 if (args.init_args_size != filter_ops->init_args_size) {
217 ret = -EINVAL;
218 goto err_module_put;
219 }
220 if (args.init_args_size) {
221 init_args = kzalloc(args.init_args_size, GFP_KERNEL);
222 if (!init_args) {
223 ret = -ENOMEM;
224 goto err_module_put;
225 }
> 226 if (copy_from_user(init_args, (void __user *)args.init_args,
227 args.init_args_size)) {
228 ret = -EFAULT;
229 goto err_free_args;
230 }
231
232 }
233 ret = filter_ops->filter_init(group, filter_hook, init_args);
234 if (ret)
235 goto err_free_args;
236 kfree(init_args);
237 }
238 filter_hook->ops = filter_ops;
239 rcu_assign_pointer(group->fanotify_data.filter_hook, filter_hook);
240
241 out:
242 fsnotify_group_unlock(group);
243 return ret;
244
245 err_free_args:
246 kfree(init_args);
247 err_module_put:
248 module_put(filter_ops->owner);
249 err_free_hook:
250 kfree(filter_hook);
251 goto out;
252 }
253
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v3 fanotify 1/2] fanotify: Introduce fanotify filter
2024-11-22 22:59 ` [PATCH v3 fanotify 1/2] fanotify: Introduce fanotify filter Song Liu
2024-11-24 6:25 ` Amir Goldstein
2024-11-25 11:01 ` kernel test robot
@ 2024-11-26 0:06 ` kernel test robot
2 siblings, 0 replies; 14+ messages in thread
From: kernel test robot @ 2024-11-26 0:06 UTC (permalink / raw)
To: Song Liu, bpf, linux-fsdevel, linux-kernel, linux-security-module
Cc: oe-kbuild-all, kernel-team, andrii, eddyz87, ast, daniel,
martin.lau, viro, brauner, jack, kpsingh, mattbobrowski, amir73il,
repnop, jlayton, josef, mic, gnoack, Song Liu
Hi Song,
kernel test robot noticed the following build warnings:
[auto build test WARNING on jack-fs/fsnotify]
[also build test WARNING on linus/master v6.12 next-20241125]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Song-Liu/fanotify-Introduce-fanotify-filter/20241125-110818
base: https://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs.git fsnotify
patch link: https://lore.kernel.org/r/20241122225958.1775625-2-song%40kernel.org
patch subject: [PATCH v3 fanotify 1/2] fanotify: Introduce fanotify filter
config: x86_64-randconfig-123-20241125 (https://download.01.org/0day-ci/archive/20241126/202411260746.XEdTrLMj-lkp@intel.com/config)
compiler: gcc-12 (Debian 12.2.0-14) 12.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20241126/202411260746.XEdTrLMj-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202411260746.XEdTrLMj-lkp@intel.com/
sparse warnings: (new ones prefixed by >>)
>> fs/notify/fanotify/fanotify_filter.c:271:21: sparse: sparse: incorrect type in assignment (different address spaces) @@ expected struct fanotify_filter_hook *filter_hook @@ got struct fanotify_filter_hook [noderef] __rcu *filter_hook @@
fs/notify/fanotify/fanotify_filter.c:271:21: sparse: expected struct fanotify_filter_hook *filter_hook
fs/notify/fanotify/fanotify_filter.c:271:21: sparse: got struct fanotify_filter_hook [noderef] __rcu *filter_hook
fs/notify/fanotify/fanotify_filter.c: note: in included file (through include/linux/kobject.h, include/linux/fanotify.h):
include/linux/list.h:83:21: sparse: sparse: self-comparison always evaluates to true
--
>> fs/notify/fanotify/fanotify.c:1015:63: sparse: sparse: incorrect type in argument 1 (different address spaces) @@ expected struct fanotify_filter_hook *filter_hook @@ got struct fanotify_filter_hook [noderef] __rcu *filter_hook @@
fs/notify/fanotify/fanotify.c:1015:63: sparse: expected struct fanotify_filter_hook *filter_hook
fs/notify/fanotify/fanotify.c:1015:63: sparse: got struct fanotify_filter_hook [noderef] __rcu *filter_hook
vim +271 fs/notify/fanotify/fanotify_filter.c
262
263 /*
264 * fanotify_filter_del - Delete a filter from fsnotify_group.
265 */
266 void fanotify_filter_del(struct fsnotify_group *group)
267 {
268 struct fanotify_filter_hook *filter_hook;
269
270 fsnotify_group_lock(group);
> 271 filter_hook = group->fanotify_data.filter_hook;
272 if (!filter_hook)
273 goto out;
274
275 rcu_assign_pointer(group->fanotify_data.filter_hook, NULL);
276 fanotify_filter_hook_free(filter_hook);
277
278 out:
279 fsnotify_group_unlock(group);
280 }
281
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v3 fanotify 2/2] samples/fanotify: Add a sample fanotify fiter
2024-11-24 5:07 ` Amir Goldstein
2024-11-24 18:59 ` Song Liu
@ 2024-11-27 0:50 ` Alexei Starovoitov
2024-11-27 2:16 ` Song Liu
1 sibling, 1 reply; 14+ messages in thread
From: Alexei Starovoitov @ 2024-11-27 0:50 UTC (permalink / raw)
To: Amir Goldstein
Cc: Song Liu, bpf, Linux-Fsdevel, LKML, LSM List, Kernel Team,
Andrii Nakryiko, Eddy Z, Alexei Starovoitov, Daniel Borkmann,
Martin KaFai Lau, Alexander Viro, Christian Brauner, Jan Kara,
KP Singh, Matt Bobrowski, repnop, Jeff Layton, Josef Bacik,
Mickaël Salaün, gnoack
On Sat, Nov 23, 2024 at 9:07 PM Amir Goldstein <amir73il@gmail.com> wrote:
> > +++ b/samples/fanotify/filter-mod.c
> > @@ -0,0 +1,105 @@
> > +// SPDX-License-Identifier: GPL-2.0-only
> > +/* Copyright (c) 2024 Meta Platforms, Inc. and affiliates. */
> > +
> > +#include <linux/fsnotify.h>
> > +#include <linux/fanotify.h>
> > +#include <linux/module.h>
> > +#include <linux/path.h>
> > +#include <linux/file.h>
> > +#include "filter.h"
> > +
> > +struct fan_filter_sample_data {
> > + struct path subtree_path;
> > + enum fan_filter_sample_mode mode;
> > +};
> > +
> > +static int sample_filter(struct fsnotify_group *group,
> > + struct fanotify_filter_hook *filter_hook,
> > + struct fanotify_filter_event *filter_event)
> > +{
> > + struct fan_filter_sample_data *data;
> > + struct dentry *dentry;
> > +
> > + dentry = fsnotify_data_dentry(filter_event->data, filter_event->data_type);
> > + if (!dentry)
> > + return FAN_FILTER_RET_SEND_TO_USERSPACE;
> > +
> > + data = filter_hook->data;
> > +
> > + if (is_subdir(dentry, data->subtree_path.dentry)) {
> > + if (data->mode == FAN_FILTER_SAMPLE_MODE_BLOCK)
> > + return -EPERM;
> > + return FAN_FILTER_RET_SEND_TO_USERSPACE;
> > + }
> > + return FAN_FILTER_RET_SKIP_EVENT;
> > +}
> > +
> > +static int sample_filter_init(struct fsnotify_group *group,
> > + struct fanotify_filter_hook *filter_hook,
> > + void *argp)
> > +{
> > + struct fan_filter_sample_args *args;
> > + struct fan_filter_sample_data *data;
> > + struct file *file;
> > + int fd;
> > +
> > + args = (struct fan_filter_sample_args *)argp;
> > + fd = args->subtree_fd;
> > +
> > + file = fget(fd);
> > + if (!file)
> > + return -EBADF;
> > + data = kzalloc(sizeof(struct fan_filter_sample_data), GFP_KERNEL);
> > + if (!data) {
> > + fput(file);
> > + return -ENOMEM;
> > + }
> > + path_get(&file->f_path);
> > + data->subtree_path = file->f_path;
> > + fput(file);
> > + data->mode = args->mode;
> > + filter_hook->data = data;
> > + return 0;
> > +}
> > +
> > +static void sample_filter_free(struct fanotify_filter_hook *filter_hook)
> > +{
> > + struct fan_filter_sample_data *data = filter_hook->data;
> > +
> > + path_put(&data->subtree_path);
> > + kfree(data);
> > +}
> > +
>
> Hi Song,
>
> This example looks fine but it raises a question.
> This filter will keep the mount of subtree_path busy until the group is closed
> or the filter is detached.
> This is probably fine for many services that keep the mount busy anyway.
>
> But what if this wasn't the intention?
> What if an Anti-malware engine that watches all mounts wanted to use that
> for configuring some ignore/block subtree filters?
>
> One way would be to use a is_subtree() variant that looks for a
> subtree root inode
> number and then verifies it with a subtree root fid.
> A production subtree filter will need to use a variant of is_subtree()
> anyway that
> looks for a set of subtree root inodes, because doing a loop of is_subtree() for
> multiple paths is a no go.
>
> Don't need to change anything in the example, unless other people
> think that we do need to set a better example to begin with...
I think we have to treat this patch as a real filter and not as an example
to make sure that the whole approach is workable end to end.
The point about not holding path/dentry is very valid.
The algorithm needs to support that.
It may very well turn out that the logic of handling many filters
without a loop and not grabbing a path refcnt is too complex for bpf.
Then this subtree filtering would have to stay as a kernel module
or extra flag/feature for fanotify.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v3 fanotify 2/2] samples/fanotify: Add a sample fanotify fiter
2024-11-27 0:50 ` Alexei Starovoitov
@ 2024-11-27 2:16 ` Song Liu
2024-11-28 17:10 ` Jan Kara
0 siblings, 1 reply; 14+ messages in thread
From: Song Liu @ 2024-11-27 2:16 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: Amir Goldstein, Song Liu, bpf, Linux-Fsdevel, LKML, LSM List,
Kernel Team, Andrii Nakryiko, Eddy Z, Alexei Starovoitov,
Daniel Borkmann, Martin KaFai Lau, Alexander Viro,
Christian Brauner, Jan Kara, KP Singh, Matt Bobrowski,
repnop@google.com, Jeff Layton, Josef Bacik,
Mickaël Salaün, gnoack@google.com
> On Nov 26, 2024, at 4:50 PM, Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
>
[...]
>>> +
>>> +static void sample_filter_free(struct fanotify_filter_hook *filter_hook)
>>> +{
>>> + struct fan_filter_sample_data *data = filter_hook->data;
>>> +
>>> + path_put(&data->subtree_path);
>>> + kfree(data);
>>> +}
>>> +
>>
>> Hi Song,
>>
>> This example looks fine but it raises a question.
>> This filter will keep the mount of subtree_path busy until the group is closed
>> or the filter is detached.
>> This is probably fine for many services that keep the mount busy anyway.
>>
>> But what if this wasn't the intention?
>> What if an Anti-malware engine that watches all mounts wanted to use that
>> for configuring some ignore/block subtree filters?
>>
>> One way would be to use a is_subtree() variant that looks for a
>> subtree root inode
>> number and then verifies it with a subtree root fid.
>> A production subtree filter will need to use a variant of is_subtree()
>> anyway that
>> looks for a set of subtree root inodes, because doing a loop of is_subtree() for
>> multiple paths is a no go.
>>
>> Don't need to change anything in the example, unless other people
>> think that we do need to set a better example to begin with...
>
> I think we have to treat this patch as a real filter and not as an example
> to make sure that the whole approach is workable end to end.
> The point about not holding path/dentry is very valid.
> The algorithm needs to support that.
Hmm.. I am not sure whether we cannot hold a refcount. If that is a
requirement, the algorithm will be more complex.
IIUC, fsnotify_mark on a inode does not hold a refcount to inode.
And when the inode is evicted, the mark is freed. I guess this
requires the user space, the AntiVirus scanner for example, to
hold a reference to the inode? If this is the case, I think it
is OK for the filter, either bpf or kernel module, to hold a
reference to the subtree root.
> It may very well turn out that the logic of handling many filters
> without a loop and not grabbing a path refcnt is too complex for bpf.
> Then this subtree filtering would have to stay as a kernel module
> or extra flag/feature for fanotify.
Handling multiple subtrees is indeed an issue. Since we rely on
the mark in the SB, multiple subtrees under the same SB will share
that mark. Unless we use some cache, accessing a file will
trigger multiple is_subdir() calls.
One possible solution is that have a new helper that checks
is_subdir() for a list of parent subtrees with a single series
of dentry walk. IOW, something like:
bool is_subdir_of_any(struct dentry *new_dentry,
struct list_head *list_of_dentry).
For BPF, one possible solution is to walk the dentry tree
up to the root, under bpf_rcu_read_lock().
Does this sound reasonable?
Thanks,
Song
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v3 fanotify 1/2] fanotify: Introduce fanotify filter
2024-11-24 6:25 ` Amir Goldstein
2024-11-24 19:08 ` Song Liu
@ 2024-11-28 16:37 ` Jan Kara
1 sibling, 0 replies; 14+ messages in thread
From: Jan Kara @ 2024-11-28 16:37 UTC (permalink / raw)
To: Amir Goldstein
Cc: Song Liu, bpf, linux-fsdevel, linux-kernel, linux-security-module,
kernel-team, andrii, eddyz87, ast, daniel, martin.lau, viro,
brauner, jack, kpsingh, mattbobrowski, repnop, jlayton, josef,
mic, gnoack
On Sun 24-11-24 07:25:20, Amir Goldstein wrote:
> On Sat, Nov 23, 2024 at 12:00 AM Song Liu <song@kernel.org> wrote:
...
> > @@ -921,6 +924,39 @@ static int fanotify_handle_event(struct fsnotify_group *group, u32 mask,
> > pr_debug("%s: group=%p mask=%x report_mask=%x\n", __func__,
> > group, mask, match_mask);
> >
> > + if (FAN_GROUP_FLAG(group, FANOTIFY_FID_BITS))
> > + fsid = fanotify_get_fsid(iter_info);
> > +
> > +#ifdef CONFIG_FANOTIFY_FILTER
> > + filter_hook = srcu_dereference(group->fanotify_data.filter_hook, &fsnotify_mark_srcu);
>
> Do we actually need the sleeping rcu protection for calling the hook?
> Can regular rcu read side be nested inside srcu read side?
Nesting rcu inside srcu is fine.
> Jan,
>
> I don't remember why srcu is needed since we are not holding it
> when waiting for userspace anymore?
You need srcu to access marks or notification group unless you grab a
reference to them (which is what waiting for permission event reply code
does).
Honza
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v3 fanotify 2/2] samples/fanotify: Add a sample fanotify fiter
2024-11-27 2:16 ` Song Liu
@ 2024-11-28 17:10 ` Jan Kara
2024-12-02 17:38 ` Song Liu
0 siblings, 1 reply; 14+ messages in thread
From: Jan Kara @ 2024-11-28 17:10 UTC (permalink / raw)
To: Song Liu
Cc: Alexei Starovoitov, Amir Goldstein, Song Liu, bpf, Linux-Fsdevel,
LKML, LSM List, Kernel Team, Andrii Nakryiko, Eddy Z,
Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau,
Alexander Viro, Christian Brauner, Jan Kara, KP Singh,
Matt Bobrowski, repnop@google.com, Jeff Layton, Josef Bacik,
Mickaël Salaün, gnoack@google.com
On Wed 27-11-24 02:16:09, Song Liu wrote:
> > On Nov 26, 2024, at 4:50 PM, Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> >
>
> [...]
>
> >>> +
> >>> +static void sample_filter_free(struct fanotify_filter_hook *filter_hook)
> >>> +{
> >>> + struct fan_filter_sample_data *data = filter_hook->data;
> >>> +
> >>> + path_put(&data->subtree_path);
> >>> + kfree(data);
> >>> +}
> >>> +
> >>
> >> Hi Song,
> >>
> >> This example looks fine but it raises a question.
> >> This filter will keep the mount of subtree_path busy until the group is closed
> >> or the filter is detached.
> >> This is probably fine for many services that keep the mount busy anyway.
> >>
> >> But what if this wasn't the intention?
> >> What if an Anti-malware engine that watches all mounts wanted to use that
> >> for configuring some ignore/block subtree filters?
> >>
> >> One way would be to use a is_subtree() variant that looks for a
> >> subtree root inode
> >> number and then verifies it with a subtree root fid.
> >> A production subtree filter will need to use a variant of is_subtree()
> >> anyway that
> >> looks for a set of subtree root inodes, because doing a loop of is_subtree() for
> >> multiple paths is a no go.
> >>
> >> Don't need to change anything in the example, unless other people
> >> think that we do need to set a better example to begin with...
> >
> > I think we have to treat this patch as a real filter and not as an example
> > to make sure that the whole approach is workable end to end.
> > The point about not holding path/dentry is very valid.
> > The algorithm needs to support that.
>
> Hmm.. I am not sure whether we cannot hold a refcount. If that is a
> requirement, the algorithm will be more complex.
Well, for production use that would certainly be a requirement. Many years
ago dnotify (the first fs notification subsystem) was preventing
filesystems from being unmounted because it required open file and it was a
pain.
> IIUC, fsnotify_mark on a inode does not hold a refcount to inode.
The connector (head of the mark list) does hold inode reference. But we
have a hook in the unmount path (fsnotify_unmount_inodes()) which drops all
the marks and connectors for the filesystem.
> And when the inode is evicted, the mark is freed. I guess this
> requires the user space, the AntiVirus scanner for example, to
> hold a reference to the inode? If this is the case, I think it
> is OK for the filter, either bpf or kernel module, to hold a
> reference to the subtree root.
No, fsnotify pins the inodes in memory (which if fine) but releases them
when unmount should happen. Userspace doesn't need to pin anything.
> > It may very well turn out that the logic of handling many filters
> > without a loop and not grabbing a path refcnt is too complex for bpf.
> > Then this subtree filtering would have to stay as a kernel module
> > or extra flag/feature for fanotify.
>
> Handling multiple subtrees is indeed an issue. Since we rely on
> the mark in the SB, multiple subtrees under the same SB will share
> that mark. Unless we use some cache, accessing a file will
> trigger multiple is_subdir() calls.
>
> One possible solution is that have a new helper that checks
> is_subdir() for a list of parent subtrees with a single series
> of dentry walk. IOW, something like:
>
> bool is_subdir_of_any(struct dentry *new_dentry,
> struct list_head *list_of_dentry).
>
> For BPF, one possible solution is to walk the dentry tree
> up to the root, under bpf_rcu_read_lock().
I can see two possible issues with this. Firstly, you don't have list_head
in a dentry you could easily use to pass dentries to a function like this.
Probably you'll need an external array with dentry pointers or something
like that.
Second issue is more inherent in the BPF filter approach - if there would
be more notification groups each watching for some subtree (like users
watching their home dirs, apps watching their subtrees with data etc.), then
we'd still end up traversing the directory tree for each such notification
group. That seems suboptimal but I have to think how much we care how we
could possibly avoid that.
Honza
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v3 fanotify 2/2] samples/fanotify: Add a sample fanotify fiter
2024-11-28 17:10 ` Jan Kara
@ 2024-12-02 17:38 ` Song Liu
0 siblings, 0 replies; 14+ messages in thread
From: Song Liu @ 2024-12-02 17:38 UTC (permalink / raw)
To: Jan Kara
Cc: Song Liu, Alexei Starovoitov, Amir Goldstein, Song Liu, bpf,
Linux-Fsdevel, LKML, LSM List, Kernel Team, Andrii Nakryiko,
Eddy Z, Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau,
Alexander Viro, Christian Brauner, KP Singh, Matt Bobrowski,
repnop@google.com, Jeff Layton, Josef Bacik,
Mickaël Salaün, gnoack@google.com
> On Nov 28, 2024, at 9:10 AM, Jan Kara <jack@suse.cz> wrote:
>
> On Wed 27-11-24 02:16:09, Song Liu wrote:
>>> On Nov 26, 2024, at 4:50 PM, Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
>>>
>>
>> [...]
>>
>>>>> +
>>>>> +static void sample_filter_free(struct fanotify_filter_hook *filter_hook)
>>>>> +{
>>>>> + struct fan_filter_sample_data *data = filter_hook->data;
>>>>> +
>>>>> + path_put(&data->subtree_path);
>>>>> + kfree(data);
>>>>> +}
>>>>> +
>>>>
>>>> Hi Song,
>>>>
>>>> This example looks fine but it raises a question.
>>>> This filter will keep the mount of subtree_path busy until the group is closed
>>>> or the filter is detached.
>>>> This is probably fine for many services that keep the mount busy anyway.
>>>>
>>>> But what if this wasn't the intention?
>>>> What if an Anti-malware engine that watches all mounts wanted to use that
>>>> for configuring some ignore/block subtree filters?
>>>>
>>>> One way would be to use a is_subtree() variant that looks for a
>>>> subtree root inode
>>>> number and then verifies it with a subtree root fid.
>>>> A production subtree filter will need to use a variant of is_subtree()
>>>> anyway that
>>>> looks for a set of subtree root inodes, because doing a loop of is_subtree() for
>>>> multiple paths is a no go.
>>>>
>>>> Don't need to change anything in the example, unless other people
>>>> think that we do need to set a better example to begin with...
>>>
>>> I think we have to treat this patch as a real filter and not as an example
>>> to make sure that the whole approach is workable end to end.
>>> The point about not holding path/dentry is very valid.
>>> The algorithm needs to support that.
>>
>> Hmm.. I am not sure whether we cannot hold a refcount. If that is a
>> requirement, the algorithm will be more complex.
>
> Well, for production use that would certainly be a requirement. Many years
> ago dnotify (the first fs notification subsystem) was preventing
> filesystems from being unmounted because it required open file and it was a
> pain.
>
>> IIUC, fsnotify_mark on a inode does not hold a refcount to inode.
>
> The connector (head of the mark list) does hold inode reference. But we
> have a hook in the unmount path (fsnotify_unmount_inodes()) which drops all
> the marks and connectors for the filesystem.
>
>> And when the inode is evicted, the mark is freed. I guess this
>> requires the user space, the AntiVirus scanner for example, to
>> hold a reference to the inode? If this is the case, I think it
>> is OK for the filter, either bpf or kernel module, to hold a
>> reference to the subtree root.
>
> No, fsnotify pins the inodes in memory (which if fine) but releases them
> when unmount should happen. Userspace doesn't need to pin anything.
To get similar logic for the filter (module or BPF), we will need
another call back. We should add this, though we may not use it
in the subtree monitoring case.
>
>>> It may very well turn out that the logic of handling many filters
>>> without a loop and not grabbing a path refcnt is too complex for bpf.
>>> Then this subtree filtering would have to stay as a kernel module
>>> or extra flag/feature for fanotify.
>>
>> Handling multiple subtrees is indeed an issue. Since we rely on
>> the mark in the SB, multiple subtrees under the same SB will share
>> that mark. Unless we use some cache, accessing a file will
>> trigger multiple is_subdir() calls.
>>
>> One possible solution is that have a new helper that checks
>> is_subdir() for a list of parent subtrees with a single series
>> of dentry walk. IOW, something like:
>>
>> bool is_subdir_of_any(struct dentry *new_dentry,
>> struct list_head *list_of_dentry).
>>
>> For BPF, one possible solution is to walk the dentry tree
>> up to the root, under bpf_rcu_read_lock().
>
> I can see two possible issues with this. Firstly, you don't have list_head
> in a dentry you could easily use to pass dentries to a function like this.
> Probably you'll need an external array with dentry pointers or something
> like that.
>
> Second issue is more inherent in the BPF filter approach - if there would
> be more notification groups each watching for some subtree (like users
> watching their home dirs, apps watching their subtrees with data etc.), then
> we'd still end up traversing the directory tree for each such notification
> group. That seems suboptimal but I have to think how much we care how we
> could possibly avoid that.
For the subtree monitoring example, maybe "monitor the whole sb
and filter the events" is not the right approach. Maintaining a
mark on each directory under the subtree might be a better approach.
Any comments or suggestions on this?
Thanks,
Song
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2024-12-02 17:38 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-11-22 22:59 [PATCH v3 fanotify 0/2] Fanotify in kernel filter Song Liu
2024-11-22 22:59 ` [PATCH v3 fanotify 1/2] fanotify: Introduce fanotify filter Song Liu
2024-11-24 6:25 ` Amir Goldstein
2024-11-24 19:08 ` Song Liu
2024-11-28 16:37 ` Jan Kara
2024-11-25 11:01 ` kernel test robot
2024-11-26 0:06 ` kernel test robot
2024-11-22 22:59 ` [PATCH v3 fanotify 2/2] samples/fanotify: Add a sample fanotify fiter Song Liu
2024-11-24 5:07 ` Amir Goldstein
2024-11-24 18:59 ` Song Liu
2024-11-27 0:50 ` Alexei Starovoitov
2024-11-27 2:16 ` Song Liu
2024-11-28 17:10 ` Jan Kara
2024-12-02 17:38 ` Song Liu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).