* [RFC/PATCH v2 bpf-next fanotify 0/7] Fanotify fastpath handler
@ 2024-11-14 8:43 Song Liu
2024-11-14 8:43 ` [RFC/PATCH v2 bpf-next fanotify 1/7] fanotify: Introduce fanotify " Song Liu
` (6 more replies)
0 siblings, 7 replies; 26+ messages in thread
From: Song Liu @ 2024-11-14 8:43 UTC (permalink / raw)
To: bpf, linux-fsdevel, linux-kernel, linux-security-module
Cc: kernel-team, andrii, eddyz87, ast, daniel, martin.lau, viro,
brauner, jack, kpsingh, mattbobrowski, amir73il, repnop, jlayton,
josef, mic, gnoack, Song Liu
Overview of v2:
Patch 1/7 adds logic to write fastpath handlers in kernel modules.
Patch 2/7 adds a sample of a fastpath handler in a kernel module.
Patch 3/7 to 5/7 are preparation work on BPF side.
Patch 6/7 adds logic to write fastpath handlers in bpf programs.
Patch 7/7 is a selftest and example of bpf based fastpath handler.
Changes v1 => v2:
1. Add sysfs entries for fastpath handler.
2. Rewrite the sample and bpf selftest to handle subtree monitoring.
This requires quite some work from BPF side to properly handle
inode, dentry, etc.
3. Add CONFIG_FANOTIFY_FASTPATH.
4. Add more documents.
TODO of v2:
1. Enable prviate (not added to global list) bpf based fastpath handlers.
4. Man pages.
From v1 RFC:
This RFC set introduces in-kernel fastpath handler for fanotify. The
fastpath handler can be used to handle/filter some events without going
through userspace.
In LPC 2024, multiple talks covered use cases of monitoring a subtree in
the VFS (fanotify: [1], bpf/lsm: [2]). This work is inspired by these
discussions. Reliably monitoring of a subtree with low overhead is a hard
problem. We do not claim this set fully solves problem. But we think this
work can be a very useful building block of the solution to this problem.
The fastpath handler can be implemented with built-in logic, in a kernel
module, or a bpf program. The fastpath handler is attached to a fsnotify
group. With current implementation, the multiple fastpath handlers are
maintained in a global list. Only users with CAP_SYS_ADMIN can add
fastpath handlers to the list by loading a kernel module. User without
CAP_SYS_ADMIN can attach a loaded fastpath handler to fanotify instances.
During the attach operation, the fastpath handler can take an argument.
This enables non-CAP_SYSADMIN users to customize/configure the fastpath
handler, for example, with a specific allowlist/denylist.
As the patchset grows to 1000+ lines (including samples and tests), I
would like some feedback before pushing it further.
[1] https://lpc.events/event/18/contributions/1717/
[2] https://lpc.events/event/18/contributions/1940/
Song Liu (7):
fanotify: Introduce fanotify fastpath handler
samples/fanotify: Add a sample fanotify fastpath handler
bpf: Make bpf inode storage available to tracing programs
bpf: fs: Add three kfuncs
bpf: Allow bpf map hold reference on dentry
fanotify: Enable bpf based fanotify fastpath handler
selftests/bpf: Add test for BPF based fanotify fastpath handler
MAINTAINERS | 1 +
fs/Makefile | 2 +-
fs/bpf_fs_kfuncs.c | 51 +-
fs/inode.c | 2 +
fs/notify/fanotify/Kconfig | 13 +
fs/notify/fanotify/Makefile | 1 +
fs/notify/fanotify/fanotify.c | 29 ++
fs/notify/fanotify/fanotify_fastpath.c | 448 ++++++++++++++++++
fs/notify/fanotify/fanotify_user.c | 7 +
include/linux/bpf.h | 9 +
include/linux/bpf_lsm.h | 29 --
include/linux/fanotify.h | 131 +++++
include/linux/fs.h | 4 +
include/linux/fsnotify_backend.h | 4 +
include/uapi/linux/fanotify.h | 25 +
kernel/bpf/Makefile | 3 +-
kernel/bpf/bpf_inode_storage.c | 176 +++++--
kernel/bpf/bpf_lsm.c | 4 -
kernel/bpf/helpers.c | 14 +-
kernel/bpf/verifier.c | 6 +
kernel/trace/bpf_trace.c | 8 +
samples/Kconfig | 20 +-
samples/Makefile | 2 +-
samples/fanotify/.gitignore | 1 +
samples/fanotify/Makefile | 5 +-
samples/fanotify/fastpath-mod.c | 82 ++++
samples/fanotify/fastpath-user.c | 111 +++++
security/bpf/hooks.c | 7 -
tools/testing/selftests/bpf/bpf_kfuncs.h | 5 +
tools/testing/selftests/bpf/config | 2 +
.../testing/selftests/bpf/prog_tests/fan_fp.c | 264 +++++++++++
tools/testing/selftests/bpf/progs/fan_fp.c | 154 ++++++
32 files changed, 1530 insertions(+), 90 deletions(-)
create mode 100644 fs/notify/fanotify/fanotify_fastpath.c
create mode 100644 samples/fanotify/fastpath-mod.c
create mode 100644 samples/fanotify/fastpath-user.c
create mode 100644 tools/testing/selftests/bpf/prog_tests/fan_fp.c
create mode 100644 tools/testing/selftests/bpf/progs/fan_fp.c
--
2.43.5
^ permalink raw reply [flat|nested] 26+ messages in thread
* [RFC/PATCH v2 bpf-next fanotify 1/7] fanotify: Introduce fanotify fastpath handler
2024-11-14 8:43 [RFC/PATCH v2 bpf-next fanotify 0/7] Fanotify fastpath handler Song Liu
@ 2024-11-14 8:43 ` Song Liu
2024-11-15 8:51 ` Amir Goldstein
2024-11-14 8:43 ` [RFC/PATCH v2 bpf-next fanotify 2/7] samples/fanotify: Add a sample " Song Liu
` (5 subsequent siblings)
6 siblings, 1 reply; 26+ messages in thread
From: Song Liu @ 2024-11-14 8:43 UTC (permalink / raw)
To: bpf, linux-fsdevel, linux-kernel, linux-security-module
Cc: kernel-team, andrii, eddyz87, ast, daniel, martin.lau, viro,
brauner, jack, kpsingh, mattbobrowski, amir73il, repnop, jlayton,
josef, mic, gnoack, Song Liu
fanotify fastpath handler enables handling fanotify events within the
kernel, and thus saves a trip to the user space. fanotify fastpath handler
can be useful in many use cases. For example, if a user is only interested
in events for some files in side a directory, a fastpath handler can be
used to filter out irrelevant events.
fanotify fastpath handler is attached to fsnotify_group. At most one
fastpath handler can be attached to a fsnotify_group. The attach/detach
of fastpath handlers are controlled by two new ioctls on the fanotify fds:
FAN_IOC_ADD_FP and FAN_IOC_DEL_FP.
fanotify fastpath handler is packaged in a kernel module. In the future,
it is also possible to package fastpath handler in a BPF program. Since
loading modules requires CAP_SYS_ADMIN, _loading_ fanotify fastpath
handler in kernel modules is limited to CAP_SYS_ADMIN. However,
non-SYS_CAP_ADMIN users can _attach_ fastpath handler loaded by sys admin
to their fanotify fds. To make fanotify fastpath handler more useful
for non-CAP_SYS_ADMIN users, a fastpath handler can take arguments at
attach time.
sysfs entry /sys/kernel/fanotify_fastpath is added to help users know
which fastpath handlers are available. At the moment, files are added for
each fastpath handler: flags, desc, and init_args.
Signed-off-by: Song Liu <song@kernel.org>
---
fs/notify/fanotify/Kconfig | 13 ++
fs/notify/fanotify/Makefile | 1 +
fs/notify/fanotify/fanotify.c | 29 +++
fs/notify/fanotify/fanotify_fastpath.c | 282 +++++++++++++++++++++++++
fs/notify/fanotify/fanotify_user.c | 7 +
include/linux/fanotify.h | 131 ++++++++++++
include/linux/fsnotify_backend.h | 4 +
include/uapi/linux/fanotify.h | 25 +++
8 files changed, 492 insertions(+)
create mode 100644 fs/notify/fanotify/fanotify_fastpath.c
diff --git a/fs/notify/fanotify/Kconfig b/fs/notify/fanotify/Kconfig
index 0e36aaf379b7..74677d3699a3 100644
--- a/fs/notify/fanotify/Kconfig
+++ b/fs/notify/fanotify/Kconfig
@@ -24,3 +24,16 @@ config FANOTIFY_ACCESS_PERMISSIONS
hierarchical storage management systems.
If unsure, say N.
+
+config FANOTIFY_FASTPATH
+ bool "fanotify fastpath handler"
+ depends on FANOTIFY
+ default y
+ help
+ Say Y here if you want to use fanotify in kernel fastpath handler.
+ The fastpath handler can be implemented in a kernel module or a
+ BPF program. The fastpath handler can speed up fanotify in many
+ use cases. For example, when the listener is only interested in
+ a subset of events.
+
+ If unsure, say Y.
\ No newline at end of file
diff --git a/fs/notify/fanotify/Makefile b/fs/notify/fanotify/Makefile
index 25ef222915e5..543cb7aa08fc 100644
--- a/fs/notify/fanotify/Makefile
+++ b/fs/notify/fanotify/Makefile
@@ -1,2 +1,3 @@
# SPDX-License-Identifier: GPL-2.0-only
obj-$(CONFIG_FANOTIFY) += fanotify.o fanotify_user.o
+obj-$(CONFIG_FANOTIFY_FASTPATH) += fanotify_fastpath.o
diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c
index 224bccaab4cc..b395b628a58b 100644
--- a/fs/notify/fanotify/fanotify.c
+++ b/fs/notify/fanotify/fanotify.c
@@ -18,6 +18,8 @@
#include "fanotify.h"
+extern struct srcu_struct fsnotify_mark_srcu;
+
static bool fanotify_path_equal(const struct path *p1, const struct path *p2)
{
return p1->mnt == p2->mnt && p1->dentry == p2->dentry;
@@ -888,6 +890,7 @@ static int fanotify_handle_event(struct fsnotify_group *group, u32 mask,
struct fsnotify_event *fsn_event;
__kernel_fsid_t fsid = {};
u32 match_mask = 0;
+ struct fanotify_fastpath_hook *fp_hook __maybe_unused;
BUILD_BUG_ON(FAN_ACCESS != FS_ACCESS);
BUILD_BUG_ON(FAN_MODIFY != FS_MODIFY);
@@ -933,6 +936,27 @@ static int fanotify_handle_event(struct fsnotify_group *group, u32 mask,
if (FAN_GROUP_FLAG(group, FANOTIFY_FID_BITS))
fsid = fanotify_get_fsid(iter_info);
+#ifdef CONFIG_FANOTIFY_FASTPATH
+ fp_hook = srcu_dereference(group->fanotify_data.fp_hook, &fsnotify_mark_srcu);
+ if (fp_hook) {
+ struct fanotify_fastpath_event fp_event = {
+ .mask = mask,
+ .data = data,
+ .data_type = data_type,
+ .dir = dir,
+ .file_name = file_name,
+ .fsid = &fsid,
+ .match_mask = match_mask,
+ };
+
+ ret = fp_hook->ops->fp_handler(group, fp_hook, &fp_event);
+ if (ret == FAN_FP_RET_SKIP_EVENT) {
+ ret = 0;
+ goto finish;
+ }
+ }
+#endif
+
event = fanotify_alloc_event(group, mask, data, data_type, dir,
file_name, &fsid, match_mask);
ret = -ENOMEM;
@@ -976,6 +1000,11 @@ static void fanotify_free_group_priv(struct fsnotify_group *group)
if (mempool_initialized(&group->fanotify_data.error_events_pool))
mempool_exit(&group->fanotify_data.error_events_pool);
+
+#ifdef CONFIG_FANOTIFY_FASTPATH
+ if (group->fanotify_data.fp_hook)
+ fanotify_fastpath_hook_free(group->fanotify_data.fp_hook);
+#endif
}
static void fanotify_free_path_event(struct fanotify_event *event)
diff --git a/fs/notify/fanotify/fanotify_fastpath.c b/fs/notify/fanotify/fanotify_fastpath.c
new file mode 100644
index 000000000000..f2aefcf0ca6a
--- /dev/null
+++ b/fs/notify/fanotify/fanotify_fastpath.c
@@ -0,0 +1,282 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/fanotify.h>
+#include <linux/kobject.h>
+#include <linux/module.h>
+
+#include "fanotify.h"
+
+extern struct srcu_struct fsnotify_mark_srcu;
+
+static DEFINE_SPINLOCK(fp_list_lock);
+static LIST_HEAD(fp_list);
+
+static struct kobject *fan_fp_root_kobj;
+
+static struct {
+ enum fanotify_fastpath_flags flag;
+ const char *name;
+} fanotify_fastpath_flags_names[] = {
+ {
+ .flag = FAN_FP_F_SYS_ADMIN_ONLY,
+ .name = "SYS_ADMIN_ONLY",
+ }
+};
+
+static ssize_t flags_show(struct kobject *kobj,
+ struct kobj_attribute *attr, char *buf)
+{
+ struct fanotify_fastpath_ops *ops;
+ ssize_t len = 0;
+ int i;
+
+ ops = container_of(kobj, struct fanotify_fastpath_ops, kobj);
+ for (i = 0; i < ARRAY_SIZE(fanotify_fastpath_flags_names); i++) {
+ if (ops->flags & fanotify_fastpath_flags_names[i].flag) {
+ len += sysfs_emit_at(buf, len, "%s%s", len ? " " : "",
+ fanotify_fastpath_flags_names[i].name);
+ }
+ }
+ len += sysfs_emit_at(buf, len, "\n");
+ return len;
+}
+
+static ssize_t desc_show(struct kobject *kobj,
+ struct kobj_attribute *attr, char *buf)
+{
+ struct fanotify_fastpath_ops *ops;
+
+ ops = container_of(kobj, struct fanotify_fastpath_ops, kobj);
+
+ return sysfs_emit(buf, "%s\n", ops->desc ?: "N/A");
+}
+
+static ssize_t init_args_show(struct kobject *kobj,
+ struct kobj_attribute *attr, char *buf)
+{
+ struct fanotify_fastpath_ops *ops;
+
+ ops = container_of(kobj, struct fanotify_fastpath_ops, kobj);
+
+ return sysfs_emit(buf, "%s\n", ops->init_args ?: "N/A");
+}
+
+static struct kobj_attribute flags_kobj_attr = __ATTR_RO(flags);
+static struct kobj_attribute desc_kobj_attr = __ATTR_RO(desc);
+static struct kobj_attribute init_args_kobj_attr = __ATTR_RO(init_args);
+
+static struct attribute *fan_fp_attrs[] = {
+ &flags_kobj_attr.attr,
+ &desc_kobj_attr.attr,
+ &init_args_kobj_attr.attr,
+ NULL,
+};
+ATTRIBUTE_GROUPS(fan_fp);
+
+static void fan_fp_kobj_release(struct kobject *kobj)
+{
+
+}
+
+static const struct kobj_type fan_fp_ktype = {
+ .release = fan_fp_kobj_release,
+ .sysfs_ops = &kobj_sysfs_ops,
+ .default_groups = fan_fp_groups,
+};
+
+static struct fanotify_fastpath_ops *fanotify_fastpath_find(const char *name)
+{
+ struct fanotify_fastpath_ops *ops;
+
+ list_for_each_entry(ops, &fp_list, list) {
+ if (!strcmp(ops->name, name))
+ return ops;
+ }
+ return NULL;
+}
+
+static void __fanotify_fastpath_unregister(struct fanotify_fastpath_ops *ops)
+{
+ spin_lock(&fp_list_lock);
+ list_del_init(&ops->list);
+ spin_unlock(&fp_list_lock);
+}
+
+/*
+ * fanotify_fastpath_register - Register a new fastpath handler.
+ *
+ * Add a fastpath handler to the fp_list. These fastpath handlers are
+ * available for all users in the system.
+ *
+ * @ops: pointer to fanotify_fastpath_ops to add.
+ *
+ * Returns:
+ * 0 - on success;
+ * -EEXIST - fastpath handler of the same name already exists.
+ */
+int fanotify_fastpath_register(struct fanotify_fastpath_ops *ops)
+{
+ int ret;
+
+ spin_lock(&fp_list_lock);
+ if (fanotify_fastpath_find(ops->name)) {
+ /* cannot register two handlers with the same name */
+ spin_unlock(&fp_list_lock);
+ return -EEXIST;
+ }
+ list_add_tail(&ops->list, &fp_list);
+ spin_unlock(&fp_list_lock);
+
+
+ kobject_init(&ops->kobj, &fan_fp_ktype);
+ ret = kobject_add(&ops->kobj, fan_fp_root_kobj, "%s", ops->name);
+ if (ret) {
+ __fanotify_fastpath_unregister(ops);
+ return ret;
+ }
+ return 0;
+}
+EXPORT_SYMBOL_GPL(fanotify_fastpath_register);
+
+/*
+ * fanotify_fastpath_unregister - Unregister a new fastpath handler.
+ *
+ * Remove a fastpath handler from fp_list.
+ *
+ * @ops: pointer to fanotify_fastpath_ops to remove.
+ */
+void fanotify_fastpath_unregister(struct fanotify_fastpath_ops *ops)
+{
+ kobject_put(&ops->kobj);
+ __fanotify_fastpath_unregister(ops);
+}
+EXPORT_SYMBOL_GPL(fanotify_fastpath_unregister);
+
+/*
+ * fanotify_fastpath_add - Add a fastpath handler to fsnotify_group.
+ *
+ * Add a fastpath handler from fp_list to a fsnotify_group.
+ *
+ * @group: fsnotify_group that will have add
+ * @argp: fanotify_fastpath_args that specifies the fastpath handler
+ * and the init arguments of the fastpath handler.
+ *
+ * Returns:
+ * 0 - on success;
+ * -EEXIST - fastpath handler of the same name already exists.
+ */
+int fanotify_fastpath_add(struct fsnotify_group *group,
+ struct fanotify_fastpath_args __user *argp)
+{
+ struct fanotify_fastpath_hook *fp_hook;
+ struct fanotify_fastpath_ops *fp_ops;
+ struct fanotify_fastpath_args args;
+ void *init_args = NULL;
+ int ret = 0;
+
+ ret = copy_from_user(&args, argp, sizeof(args));
+ if (ret)
+ return -EFAULT;
+
+ if (args.version != 1 || args.flags || args.init_args_size > FAN_FP_ARGS_MAX)
+ return -EINVAL;
+
+ args.name[FAN_FP_NAME_MAX - 1] = '\0';
+
+ fsnotify_group_lock(group);
+
+ if (rcu_access_pointer(group->fanotify_data.fp_hook)) {
+ fsnotify_group_unlock(group);
+ return -EBUSY;
+ }
+
+ fp_hook = kzalloc(sizeof(*fp_hook), GFP_KERNEL);
+ if (!fp_hook) {
+ ret = -ENOMEM;
+ goto out;
+ }
+
+ spin_lock(&fp_list_lock);
+ fp_ops = fanotify_fastpath_find(args.name);
+ if (!fp_ops || !try_module_get(fp_ops->owner)) {
+ spin_unlock(&fp_list_lock);
+ ret = -ENOENT;
+ goto err_free_hook;
+ }
+ spin_unlock(&fp_list_lock);
+
+ if (!capable(CAP_SYS_ADMIN) && (fp_ops->flags & FAN_FP_F_SYS_ADMIN_ONLY)) {
+ ret = -EPERM;
+ goto err_module_put;
+ }
+
+ if (fp_ops->fp_init) {
+ if (args.init_args_size) {
+ init_args = kzalloc(args.init_args_size, GFP_KERNEL);
+ if (!init_args) {
+ ret = -ENOMEM;
+ goto err_module_put;
+ }
+ if (copy_from_user(init_args, (void __user *)args.init_args,
+ args.init_args_size)) {
+ ret = -EFAULT;
+ goto err_free_args;
+ }
+
+ }
+ ret = fp_ops->fp_init(fp_hook, init_args);
+ if (ret)
+ goto err_free_args;
+ kfree(init_args);
+ }
+ fp_hook->ops = fp_ops;
+ rcu_assign_pointer(group->fanotify_data.fp_hook, fp_hook);
+
+out:
+ fsnotify_group_unlock(group);
+ return ret;
+
+err_free_args:
+ kfree(init_args);
+err_module_put:
+ module_put(fp_ops->owner);
+err_free_hook:
+ kfree(fp_hook);
+ goto out;
+}
+
+void fanotify_fastpath_hook_free(struct fanotify_fastpath_hook *fp_hook)
+{
+ if (fp_hook->ops->fp_free)
+ fp_hook->ops->fp_free(fp_hook);
+
+ module_put(fp_hook->ops->owner);
+ kfree(fp_hook);
+}
+
+/*
+ * fanotify_fastpath_add - Delete a fastpath handler from fsnotify_group.
+ */
+void fanotify_fastpath_del(struct fsnotify_group *group)
+{
+ struct fanotify_fastpath_hook *fp_hook;
+
+ fsnotify_group_lock(group);
+ fp_hook = group->fanotify_data.fp_hook;
+ if (!fp_hook)
+ goto out;
+
+ rcu_assign_pointer(group->fanotify_data.fp_hook, NULL);
+ fanotify_fastpath_hook_free(fp_hook);
+
+out:
+ fsnotify_group_unlock(group);
+}
+
+static int __init fanotify_fastpath_init(void)
+{
+ fan_fp_root_kobj = kobject_create_and_add("fanotify_fastpath", kernel_kobj);
+ if (!fan_fp_root_kobj)
+ return -ENOMEM;
+ return 0;
+}
+device_initcall(fanotify_fastpath_init);
diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
index 8e2d43fc6f7c..e96cb83f8409 100644
--- a/fs/notify/fanotify/fanotify_user.c
+++ b/fs/notify/fanotify/fanotify_user.c
@@ -987,6 +987,13 @@ static long fanotify_ioctl(struct file *file, unsigned int cmd, unsigned long ar
spin_unlock(&group->notification_lock);
ret = put_user(send_len, (int __user *) p);
break;
+ case FAN_IOC_ADD_FP:
+ ret = fanotify_fastpath_add(group, p);
+ break;
+ case FAN_IOC_DEL_FP:
+ fanotify_fastpath_del(group);
+ ret = 0;
+ break;
}
return ret;
diff --git a/include/linux/fanotify.h b/include/linux/fanotify.h
index 89ff45bd6f01..8645d0b29e9d 100644
--- a/include/linux/fanotify.h
+++ b/include/linux/fanotify.h
@@ -2,6 +2,7 @@
#ifndef _LINUX_FANOTIFY_H
#define _LINUX_FANOTIFY_H
+#include <linux/kobject.h>
#include <linux/sysctl.h>
#include <uapi/linux/fanotify.h>
@@ -136,4 +137,134 @@
#undef FAN_ALL_PERM_EVENTS
#undef FAN_ALL_OUTGOING_EVENTS
+struct fsnotify_group;
+struct qstr;
+struct inode;
+struct fanotify_fastpath_hook;
+
+/*
+ * Event passed to fanotify fastpath handler
+ *
+ * @mask: event type and flags
+ * @data: object that event happened on
+ * @data_type: type of object for fanotify_data_XXX() accessors
+ * @dir: optional directory associated with event -
+ * if @file_name is not NULL, this is the directory that
+ * @file_name is relative to
+ * @file_name: optional file name associated with event
+ * @match_mask: mark types of this group that matched the event
+ */
+struct fanotify_fastpath_event {
+ u32 mask;
+ const void *data;
+ int data_type;
+ struct inode *dir;
+ const struct qstr *file_name;
+ __kernel_fsid_t *fsid;
+ u32 match_mask;
+};
+
+/*
+ * fanotify fastpath handler should implement these ops.
+ *
+ * fp_handler - Main call for the fastpath handler.
+ * @group: The group being notified
+ * @fp_hook: fanotify_fastpath_hook for the attach on @group.
+ * Returns: enum fanotify_fastpath_return.
+ *
+ * fp_init - Initialize the fanotify_fastpath_hook.
+ * @hook: fanotify_fastpath_hook to be initialized
+ * @args: Arguments used to initialize @hook
+ *
+ * fp_free - Free the fanotify_fastpath_hook.
+ * @hook: fanotify_fastpath_hook to be freed.
+ *
+ * @name: Name of the fanotify_fastpath_ops. This need to be unique
+ * in the system
+ * @owner: Owner module of this fanotify_fastpath_ops
+ * @list: Attach to global list of fanotify_fastpath_ops
+ * @flags: Flags for the fanotify_fastpath_ops
+ * @desc: Description of what this fastpath handler do (optional)
+ * @init_args: Description of the init_args in a string (optional)
+ */
+struct fanotify_fastpath_ops {
+ int (*fp_handler)(struct fsnotify_group *group,
+ struct fanotify_fastpath_hook *fp_hook,
+ struct fanotify_fastpath_event *fp_event);
+ int (*fp_init)(struct fanotify_fastpath_hook *hook, void *args);
+ void (*fp_free)(struct fanotify_fastpath_hook *hook);
+
+ char name[FAN_FP_NAME_MAX];
+ struct module *owner;
+ struct list_head list;
+ u32 flags;
+ const char *desc;
+ const char *init_args;
+
+ /* internal */
+ struct kobject kobj;
+};
+
+/* Flags for fanotify_fastpath_ops->flags */
+enum fanotify_fastpath_flags {
+ /* CAP_SYS_ADMIN is required to use this fastpath handler */
+ FAN_FP_F_SYS_ADMIN_ONLY = BIT(0),
+
+ FAN_FP_F_ALL = FAN_FP_F_SYS_ADMIN_ONLY,
+};
+
+/* Return value of fp_handler */
+enum fanotify_fastpath_return {
+ /* The event should be sent to user space */
+ FAN_FP_RET_SEND_TO_USERSPACE = 0,
+ /* The event should NOT be sent to user space */
+ FAN_FP_RET_SKIP_EVENT = 1,
+};
+
+/*
+ * Hook that attaches fanotify_fastpath_ops to a group.
+ * @ops: the ops
+ * @data: per group data used by the ops
+ */
+struct fanotify_fastpath_hook {
+ struct fanotify_fastpath_ops *ops;
+ void *data;
+};
+
+#ifdef CONFIG_FANOTIFY_FASTPATH
+
+int fanotify_fastpath_register(struct fanotify_fastpath_ops *ops);
+void fanotify_fastpath_unregister(struct fanotify_fastpath_ops *ops);
+int fanotify_fastpath_add(struct fsnotify_group *group,
+ struct fanotify_fastpath_args __user *args);
+void fanotify_fastpath_del(struct fsnotify_group *group);
+void fanotify_fastpath_hook_free(struct fanotify_fastpath_hook *fp_hook);
+
+#else /* CONFIG_FANOTIFY_FASTPATH */
+
+static inline int fanotify_fastpath_register(struct fanotify_fastpath_ops *ops)
+{
+ return -EOPNOTSUPP;
+}
+
+static inline void fanotify_fastpath_unregister(struct fanotify_fastpath_ops *ops)
+{
+}
+
+static inline int fanotify_fastpath_add(struct fsnotify_group *group,
+ struct fanotify_fastpath_args __user *args)
+{
+ return -ENOENT;
+}
+
+static inline void fanotify_fastpath_del(struct fsnotify_group *group)
+{
+}
+
+static inline void fanotify_fastpath_hook_free(struct fanotify_fastpath_hook *fp_hook)
+{
+}
+
+#endif /* CONFIG_FANOTIFY_FASTPATH */
+
#endif /* _LINUX_FANOTIFY_H */
diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
index 3ecf7768e577..9b22d9b9d0bb 100644
--- a/include/linux/fsnotify_backend.h
+++ b/include/linux/fsnotify_backend.h
@@ -117,6 +117,7 @@ struct fsnotify_fname;
struct fsnotify_iter_info;
struct mem_cgroup;
+struct fanotify_fastpath_hook;
/*
* Each group much define these ops. The fsnotify infrastructure will call
@@ -255,6 +256,9 @@ struct fsnotify_group {
int f_flags; /* event_f_flags from fanotify_init() */
struct ucounts *ucounts;
mempool_t error_events_pool;
+#ifdef CONFIG_FANOTIFY_FASTPATH
+ struct fanotify_fastpath_hook __rcu *fp_hook;
+#endif /* CONFIG_FANOTIFY_FASTPATH */
} fanotify_data;
#endif /* CONFIG_FANOTIFY */
};
diff --git a/include/uapi/linux/fanotify.h b/include/uapi/linux/fanotify.h
index 34f221d3a1b9..654d5ab44143 100644
--- a/include/uapi/linux/fanotify.h
+++ b/include/uapi/linux/fanotify.h
@@ -3,6 +3,7 @@
#define _UAPI_LINUX_FANOTIFY_H
#include <linux/types.h>
+#include <linux/ioctl.h>
/* the following events that user-space can register for */
#define FAN_ACCESS 0x00000001 /* File was accessed */
@@ -243,4 +244,28 @@ struct fanotify_response_info_audit_rule {
(long)(meta)->event_len >= (long)FAN_EVENT_METADATA_LEN && \
(long)(meta)->event_len <= (long)(len))
+#define FAN_FP_NAME_MAX 64
+#define FAN_FP_ARGS_MAX 64
+
+/* This is the arguments used to add fastpath handler to a group. */
+struct fanotify_fastpath_args {
+ char name[FAN_FP_NAME_MAX];
+
+ __u32 version;
+ __u32 flags;
+
+ /*
+ * user space pointer to the init args of fastpath handler,
+ * up to init_args_len (<= FAN_FP_ARGS_MAX).
+ */
+ __u64 init_args;
+ /* size of init_args */
+ __u32 init_args_size;
+} __attribute__((__packed__));
+
+#define FAN_IOC_MAGIC 'F'
+
+#define FAN_IOC_ADD_FP _IOW(FAN_IOC_MAGIC, 0, struct fanotify_fastpath_args)
+#define FAN_IOC_DEL_FP _IOW(FAN_IOC_MAGIC, 1, char[FAN_FP_NAME_MAX])
+
#endif /* _UAPI_LINUX_FANOTIFY_H */
--
2.43.5
^ permalink raw reply related [flat|nested] 26+ messages in thread
* [RFC/PATCH v2 bpf-next fanotify 2/7] samples/fanotify: Add a sample fanotify fastpath handler
2024-11-14 8:43 [RFC/PATCH v2 bpf-next fanotify 0/7] Fanotify fastpath handler Song Liu
2024-11-14 8:43 ` [RFC/PATCH v2 bpf-next fanotify 1/7] fanotify: Introduce fanotify " Song Liu
@ 2024-11-14 8:43 ` Song Liu
2024-11-14 8:43 ` [RFC/PATCH v2 bpf-next fanotify 3/7] bpf: Make bpf inode storage available to tracing programs Song Liu
` (4 subsequent siblings)
6 siblings, 0 replies; 26+ messages in thread
From: Song Liu @ 2024-11-14 8:43 UTC (permalink / raw)
To: bpf, linux-fsdevel, linux-kernel, linux-security-module
Cc: kernel-team, andrii, eddyz87, ast, daniel, martin.lau, viro,
brauner, jack, kpsingh, mattbobrowski, amir73il, repnop, jlayton,
josef, mic, gnoack, Song Liu
This fastpath handler monitors a subtree inside a mount point.
To use it:
[root] insmod ./fastpath-mod.ko
[root] mkdir -p /tmp/a/b/c/d
[root] ./fastpath-user /tmp/ /tmp/a/b &
[root] touch /tmp/xx # Doesn't generate event
[root]# touch /tmp/a/xxa # Doesn't generate event
[root]# touch /tmp/a/b/xxab # Generates an event
Accessing file xxab # this is the output from fastpath-user
[root@]# touch /tmp/a/b/c/xxabc # Generates an event
Accessing file xxabc # this is the output from fastpath-user
Signed-off-by: Song Liu <song@kernel.org>
---
MAINTAINERS | 1 +
samples/Kconfig | 20 +++++-
samples/Makefile | 2 +-
samples/fanotify/.gitignore | 1 +
samples/fanotify/Makefile | 5 +-
samples/fanotify/fastpath-mod.c | 82 +++++++++++++++++++++++
samples/fanotify/fastpath-user.c | 111 +++++++++++++++++++++++++++++++
7 files changed, 219 insertions(+), 3 deletions(-)
create mode 100644 samples/fanotify/fastpath-mod.c
create mode 100644 samples/fanotify/fastpath-user.c
diff --git a/MAINTAINERS b/MAINTAINERS
index 7ad507f49324..8939a48b2d99 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -8658,6 +8658,7 @@ S: Maintained
F: fs/notify/fanotify/
F: include/linux/fanotify.h
F: include/uapi/linux/fanotify.h
+F: samples/fanotify/
FARADAY FOTG210 USB2 DUAL-ROLE CONTROLLER
M: Linus Walleij <linus.walleij@linaro.org>
diff --git a/samples/Kconfig b/samples/Kconfig
index b288d9991d27..b0d3dff48bb0 100644
--- a/samples/Kconfig
+++ b/samples/Kconfig
@@ -149,15 +149,33 @@ config SAMPLE_CONNECTOR
with it.
See also Documentation/driver-api/connector.rst
+config SAMPLE_FANOTIFY
+ bool "Build fanotify monitoring sample"
+ depends on FANOTIFY && CC_CAN_LINK && HEADERS_INSTALL
+ help
+ When enabled, this builds samples for fanotify.
+ There multiple samples for fanotify. Please see the
+ following configs for more details of these
+ samples.
+
config SAMPLE_FANOTIFY_ERROR
bool "Build fanotify error monitoring sample"
- depends on FANOTIFY && CC_CAN_LINK && HEADERS_INSTALL
+ depends on SAMPLE_FANOTIFY
help
When enabled, this builds an example code that uses the
FAN_FS_ERROR fanotify mechanism to monitor filesystem
errors.
See also Documentation/admin-guide/filesystem-monitoring.rst.
+config SAMPLE_FANOTIFY_FASTPATH
+ tristate "Build fanotify fastpath sample"
+ depends on SAMPLE_FANOTIFY && m
+ help
+ When enabled, this builds kernel module that contains a
+ fanotify fastpath handler.
+ The fastpath handler filters out certain filename
+ prefixes for the fanotify user.
+
config SAMPLE_HIDRAW
bool "hidraw sample"
depends on CC_CAN_LINK && HEADERS_INSTALL
diff --git a/samples/Makefile b/samples/Makefile
index b85fa64390c5..108360972626 100644
--- a/samples/Makefile
+++ b/samples/Makefile
@@ -6,7 +6,7 @@ subdir-$(CONFIG_SAMPLE_ANDROID_BINDERFS) += binderfs
subdir-$(CONFIG_SAMPLE_CGROUP) += cgroup
obj-$(CONFIG_SAMPLE_CONFIGFS) += configfs/
obj-$(CONFIG_SAMPLE_CONNECTOR) += connector/
-obj-$(CONFIG_SAMPLE_FANOTIFY_ERROR) += fanotify/
+obj-$(CONFIG_SAMPLE_FANOTIFY) += fanotify/
subdir-$(CONFIG_SAMPLE_HIDRAW) += hidraw
obj-$(CONFIG_SAMPLE_HW_BREAKPOINT) += hw_breakpoint/
obj-$(CONFIG_SAMPLE_KDB) += kdb/
diff --git a/samples/fanotify/.gitignore b/samples/fanotify/.gitignore
index d74593e8b2de..306e1ddec4e0 100644
--- a/samples/fanotify/.gitignore
+++ b/samples/fanotify/.gitignore
@@ -1 +1,2 @@
fs-monitor
+fastpath-user
diff --git a/samples/fanotify/Makefile b/samples/fanotify/Makefile
index e20db1bdde3b..f5bbd7380104 100644
--- a/samples/fanotify/Makefile
+++ b/samples/fanotify/Makefile
@@ -1,5 +1,8 @@
# SPDX-License-Identifier: GPL-2.0-only
-userprogs-always-y += fs-monitor
+userprogs-always-$(CONFIG_SAMPLE_FANOTIFY_ERROR) += fs-monitor
userccflags += -I usr/include -Wall
+obj-$(CONFIG_SAMPLE_FANOTIFY_FASTPATH) += fastpath-mod.o
+
+userprogs-always-$(CONFIG_SAMPLE_FANOTIFY_FASTPATH) += fastpath-user
diff --git a/samples/fanotify/fastpath-mod.c b/samples/fanotify/fastpath-mod.c
new file mode 100644
index 000000000000..7e2e1878f8e7
--- /dev/null
+++ b/samples/fanotify/fastpath-mod.c
@@ -0,0 +1,82 @@
+// SPDX-License-Identifier: GPL-2.0-only
+#include <linux/fsnotify.h>
+#include <linux/fanotify.h>
+#include <linux/module.h>
+#include <linux/path.h>
+#include <linux/file.h>
+
+static int sample_fp_handler(struct fsnotify_group *group,
+ struct fanotify_fastpath_hook *fp_hook,
+ struct fanotify_fastpath_event *fp_event)
+{
+ struct dentry *dentry;
+ struct path *subtree;
+
+ dentry = fsnotify_data_dentry(fp_event->data, fp_event->data_type);
+ if (!dentry)
+ return FAN_FP_RET_SEND_TO_USERSPACE;
+
+ subtree = fp_hook->data;
+
+ if (is_subdir(dentry, subtree->dentry))
+ return FAN_FP_RET_SEND_TO_USERSPACE;
+ return FAN_FP_RET_SKIP_EVENT;
+}
+
+static int sample_fp_init(struct fanotify_fastpath_hook *fp_hook, void *args)
+{
+ struct path *subtree;
+ struct file *file;
+ int fd;
+
+ fd = *(int *)args;
+
+ file = fget(fd);
+ if (!file)
+ return -EBADF;
+ subtree = kzalloc(sizeof(struct path), GFP_KERNEL);
+ if (!subtree) {
+ fput(file);
+ return -ENOMEM;
+ }
+ path_get(&file->f_path);
+ *subtree = file->f_path;
+ fput(file);
+ fp_hook->data = subtree;
+ return 0;
+}
+
+static void sample_fp_free(struct fanotify_fastpath_hook *fp_hook)
+{
+ struct path *subtree = fp_hook->data;
+
+ path_put(subtree);
+ kfree(subtree);
+}
+
+static struct fanotify_fastpath_ops fan_fp_ignore_a_ops = {
+ .fp_handler = sample_fp_handler,
+ .fp_init = sample_fp_init,
+ .fp_free = sample_fp_free,
+ .name = "monitor-subtree",
+ .owner = THIS_MODULE,
+ .flags = FAN_FP_F_SYS_ADMIN_ONLY,
+ .desc = "only emit events under a subtree",
+ .init_args = "struct {\n\tint subtree_fd;\n};",
+};
+
+static int __init fanotify_fastpath_sample_init(void)
+{
+ return fanotify_fastpath_register(&fan_fp_ignore_a_ops);
+}
+static void __exit fanotify_fastpath_sample_exit(void)
+{
+ fanotify_fastpath_unregister(&fan_fp_ignore_a_ops);
+}
+
+module_init(fanotify_fastpath_sample_init);
+module_exit(fanotify_fastpath_sample_exit);
+
+MODULE_AUTHOR("Song Liu");
+MODULE_DESCRIPTION("Example fanotify fastpath handler");
+MODULE_LICENSE("GPL");
diff --git a/samples/fanotify/fastpath-user.c b/samples/fanotify/fastpath-user.c
new file mode 100644
index 000000000000..abe33a6b6b41
--- /dev/null
+++ b/samples/fanotify/fastpath-user.c
@@ -0,0 +1,111 @@
+// SPDX-License-Identifier: GPL-2.0
+#define _GNU_SOURCE
+#include <err.h>
+#include <fcntl.h>
+#include <stdio.h>
+#include <string.h>
+#include <sys/fanotify.h>
+#include <unistd.h>
+#include <sys/ioctl.h>
+
+static int total_event_cnt;
+
+static void handle_notifications(char *buffer, int len)
+{
+ struct fanotify_event_metadata *event =
+ (struct fanotify_event_metadata *) buffer;
+ struct fanotify_event_info_header *info;
+ struct fanotify_event_info_fid *fid;
+ struct file_handle *handle;
+ char *name;
+ int off;
+
+ for (; FAN_EVENT_OK(event, len); event = FAN_EVENT_NEXT(event, len)) {
+ for (off = sizeof(*event) ; off < event->event_len;
+ off += info->len) {
+ info = (struct fanotify_event_info_header *)
+ ((char *) event + off);
+ switch (info->info_type) {
+ case FAN_EVENT_INFO_TYPE_DFID_NAME:
+ fid = (struct fanotify_event_info_fid *) info;
+ handle = (struct file_handle *)&fid->handle;
+ name = (char *)handle + sizeof(*handle) + handle->handle_bytes;
+
+ printf("Accessing file %s\n", name);
+ total_event_cnt++;
+ break;
+ default:
+ break;
+ }
+ }
+ }
+}
+
+int main(int argc, char **argv)
+{
+ struct fanotify_fastpath_args args = {
+ .name = "monitor-subtree",
+ .version = 1,
+ .flags = 0,
+ };
+ char buffer[BUFSIZ];
+ const char *msg;
+ int fanotify_fd;
+ int subtree_fd;
+
+ if (argc < 3) {
+ printf("Usage:\n"
+ "\t %s <mount point> <subtree to monitor>\n",
+ argv[0]);
+ return 1;
+ }
+
+ subtree_fd = open(argv[2], O_RDONLY | O_CLOEXEC);
+
+ if (subtree_fd < 0)
+ errx(1, "open subtree_fd");
+
+ args.init_args = (__u64)&subtree_fd;
+ args.init_args_size = sizeof(int);
+
+ fanotify_fd = fanotify_init(FAN_CLASS_NOTIF | FAN_REPORT_NAME | FAN_REPORT_DIR_FID,
+ O_RDONLY);
+ if (fanotify_fd < 0) {
+ close(subtree_fd);
+ errx(1, "fanotify_init");
+ }
+
+ if (fanotify_mark(fanotify_fd, FAN_MARK_ADD | FAN_MARK_FILESYSTEM,
+ FAN_OPEN | FAN_ONDIR | FAN_EVENT_ON_CHILD,
+ AT_FDCWD, argv[1])) {
+ msg = "fanotify_mark";
+ goto err_out;
+ }
+
+ if (ioctl(fanotify_fd, FAN_IOC_ADD_FP, &args)) {
+ msg = "ioctl";
+ goto err_out;
+ }
+
+ while (total_event_cnt < 10) {
+ int n = read(fanotify_fd, buffer, BUFSIZ);
+
+ if (n < 0) {
+ msg = "read";
+ goto err_out;
+ }
+
+ handle_notifications(buffer, n);
+ }
+
+ ioctl(fanotify_fd, FAN_IOC_DEL_FP);
+ close(fanotify_fd);
+ close(subtree_fd);
+ return 0;
+
+err_out:
+ close(fanotify_fd);
+ close(subtree_fd);
+ errx(1, msg);
+ return 0;
+}
--
2.43.5
^ permalink raw reply related [flat|nested] 26+ messages in thread
* [RFC/PATCH v2 bpf-next fanotify 3/7] bpf: Make bpf inode storage available to tracing programs
2024-11-14 8:43 [RFC/PATCH v2 bpf-next fanotify 0/7] Fanotify fastpath handler Song Liu
2024-11-14 8:43 ` [RFC/PATCH v2 bpf-next fanotify 1/7] fanotify: Introduce fanotify " Song Liu
2024-11-14 8:43 ` [RFC/PATCH v2 bpf-next fanotify 2/7] samples/fanotify: Add a sample " Song Liu
@ 2024-11-14 8:43 ` Song Liu
2024-11-14 8:43 ` [RFC/PATCH v2 bpf-next fanotify 4/7] bpf: fs: Add three kfuncs Song Liu
` (3 subsequent siblings)
6 siblings, 0 replies; 26+ messages in thread
From: Song Liu @ 2024-11-14 8:43 UTC (permalink / raw)
To: bpf, linux-fsdevel, linux-kernel, linux-security-module
Cc: kernel-team, andrii, eddyz87, ast, daniel, martin.lau, viro,
brauner, jack, kpsingh, mattbobrowski, amir73il, repnop, jlayton,
josef, mic, gnoack, Song Liu
Use the same recursion voidance mechanism as task local storage.
Signed-off-by: Song Liu <song@kernel.org>
---
Do not apply this. Another version of this with selftest is submitted
separately, with proper selftests.
---
fs/inode.c | 2 +
include/linux/bpf.h | 9 ++
include/linux/bpf_lsm.h | 29 ------
include/linux/fs.h | 4 +
kernel/bpf/Makefile | 3 +-
kernel/bpf/bpf_inode_storage.c | 176 +++++++++++++++++++++++++--------
kernel/bpf/bpf_lsm.c | 4 -
kernel/trace/bpf_trace.c | 8 ++
security/bpf/hooks.c | 7 --
9 files changed, 159 insertions(+), 83 deletions(-)
diff --git a/fs/inode.c b/fs/inode.c
index 8dabb224f941..c196a62bd48f 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -250,6 +250,8 @@ EXPORT_SYMBOL(free_inode_nonrcu);
static void i_callback(struct rcu_head *head)
{
struct inode *inode = container_of(head, struct inode, i_rcu);
+
+ bpf_inode_storage_free(inode);
if (inode->free_inode)
inode->free_inode(inode);
else
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 19d8ca8ac960..863cb972d1fa 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -2630,6 +2630,7 @@ struct bpf_link *bpf_link_by_id(u32 id);
const struct bpf_func_proto *bpf_base_func_proto(enum bpf_func_id func_id,
const struct bpf_prog *prog);
void bpf_task_storage_free(struct task_struct *task);
+void bpf_inode_storage_free(struct inode *inode);
void bpf_cgrp_storage_free(struct cgroup *cgroup);
bool bpf_prog_has_kfunc_call(const struct bpf_prog *prog);
const struct btf_func_model *
@@ -2900,6 +2901,10 @@ static inline void bpf_task_storage_free(struct task_struct *task)
{
}
+static inline void bpf_inode_storage_free(struct inode *inode)
+{
+}
+
static inline bool bpf_prog_has_kfunc_call(const struct bpf_prog *prog)
{
return false;
@@ -3263,6 +3268,10 @@ extern const struct bpf_func_proto bpf_task_storage_get_recur_proto;
extern const struct bpf_func_proto bpf_task_storage_get_proto;
extern const struct bpf_func_proto bpf_task_storage_delete_recur_proto;
extern const struct bpf_func_proto bpf_task_storage_delete_proto;
+extern const struct bpf_func_proto bpf_inode_storage_get_proto;
+extern const struct bpf_func_proto bpf_inode_storage_get_recur_proto;
+extern const struct bpf_func_proto bpf_inode_storage_delete_proto;
+extern const struct bpf_func_proto bpf_inode_storage_delete_recur_proto;
extern const struct bpf_func_proto bpf_for_each_map_elem_proto;
extern const struct bpf_func_proto bpf_btf_find_by_name_kind_proto;
extern const struct bpf_func_proto bpf_sk_setsockopt_proto;
diff --git a/include/linux/bpf_lsm.h b/include/linux/bpf_lsm.h
index aefcd6564251..a819c2f0a062 100644
--- a/include/linux/bpf_lsm.h
+++ b/include/linux/bpf_lsm.h
@@ -19,31 +19,12 @@
#include <linux/lsm_hook_defs.h>
#undef LSM_HOOK
-struct bpf_storage_blob {
- struct bpf_local_storage __rcu *storage;
-};
-
-extern struct lsm_blob_sizes bpf_lsm_blob_sizes;
-
int bpf_lsm_verify_prog(struct bpf_verifier_log *vlog,
const struct bpf_prog *prog);
bool bpf_lsm_is_sleepable_hook(u32 btf_id);
bool bpf_lsm_is_trusted(const struct bpf_prog *prog);
-static inline struct bpf_storage_blob *bpf_inode(
- const struct inode *inode)
-{
- if (unlikely(!inode->i_security))
- return NULL;
-
- return inode->i_security + bpf_lsm_blob_sizes.lbs_inode;
-}
-
-extern const struct bpf_func_proto bpf_inode_storage_get_proto;
-extern const struct bpf_func_proto bpf_inode_storage_delete_proto;
-void bpf_inode_storage_free(struct inode *inode);
-
void bpf_lsm_find_cgroup_shim(const struct bpf_prog *prog, bpf_func_t *bpf_func);
int bpf_lsm_get_retval_range(const struct bpf_prog *prog,
@@ -66,16 +47,6 @@ static inline int bpf_lsm_verify_prog(struct bpf_verifier_log *vlog,
return -EOPNOTSUPP;
}
-static inline struct bpf_storage_blob *bpf_inode(
- const struct inode *inode)
-{
- return NULL;
-}
-
-static inline void bpf_inode_storage_free(struct inode *inode)
-{
-}
-
static inline void bpf_lsm_find_cgroup_shim(const struct bpf_prog *prog,
bpf_func_t *bpf_func)
{
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 3559446279c1..479097e4dd5b 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -79,6 +79,7 @@ struct fs_context;
struct fs_parameter_spec;
struct fileattr;
struct iomap_ops;
+struct bpf_local_storage;
extern void __init inode_init(void);
extern void __init inode_init_early(void);
@@ -648,6 +649,9 @@ struct inode {
#ifdef CONFIG_SECURITY
void *i_security;
#endif
+#ifdef CONFIG_BPF_SYSCALL
+ struct bpf_local_storage __rcu *i_bpf_storage;
+#endif
/* Stat data, not accessed from path walking */
unsigned long i_ino;
diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile
index 9b9c151b5c82..a5b7136b4884 100644
--- a/kernel/bpf/Makefile
+++ b/kernel/bpf/Makefile
@@ -10,8 +10,7 @@ obj-$(CONFIG_BPF_SYSCALL) += syscall.o verifier.o inode.o helpers.o tnum.o log.o
obj-$(CONFIG_BPF_SYSCALL) += bpf_iter.o map_iter.o task_iter.o prog_iter.o link_iter.o
obj-$(CONFIG_BPF_SYSCALL) += hashtab.o arraymap.o percpu_freelist.o bpf_lru_list.o lpm_trie.o map_in_map.o bloom_filter.o
obj-$(CONFIG_BPF_SYSCALL) += local_storage.o queue_stack_maps.o ringbuf.o
-obj-$(CONFIG_BPF_SYSCALL) += bpf_local_storage.o bpf_task_storage.o
-obj-${CONFIG_BPF_LSM} += bpf_inode_storage.o
+obj-$(CONFIG_BPF_SYSCALL) += bpf_local_storage.o bpf_task_storage.o bpf_inode_storage.o
obj-$(CONFIG_BPF_SYSCALL) += disasm.o mprog.o
obj-$(CONFIG_BPF_JIT) += trampoline.o
obj-$(CONFIG_BPF_SYSCALL) += btf.o memalloc.o
diff --git a/kernel/bpf/bpf_inode_storage.c b/kernel/bpf/bpf_inode_storage.c
index 29da6d3838f6..50c082da8dc5 100644
--- a/kernel/bpf/bpf_inode_storage.c
+++ b/kernel/bpf/bpf_inode_storage.c
@@ -21,16 +21,36 @@
DEFINE_BPF_STORAGE_CACHE(inode_cache);
-static struct bpf_local_storage __rcu **
-inode_storage_ptr(void *owner)
+static DEFINE_PER_CPU(int, bpf_inode_storage_busy);
+
+static void bpf_inode_storage_lock(void)
+{
+ migrate_disable();
+ this_cpu_inc(bpf_inode_storage_busy);
+}
+
+static void bpf_inode_storage_unlock(void)
+{
+ this_cpu_dec(bpf_inode_storage_busy);
+ migrate_enable();
+}
+
+static bool bpf_inode_storage_trylock(void)
+{
+ migrate_disable();
+ if (unlikely(this_cpu_inc_return(bpf_inode_storage_busy) != 1)) {
+ this_cpu_dec(bpf_inode_storage_busy);
+ migrate_enable();
+ return false;
+ }
+ return true;
+}
+
+static struct bpf_local_storage __rcu **inode_storage_ptr(void *owner)
{
struct inode *inode = owner;
- struct bpf_storage_blob *bsb;
- bsb = bpf_inode(inode);
- if (!bsb)
- return NULL;
- return &bsb->storage;
+ return &inode->i_bpf_storage;
}
static struct bpf_local_storage_data *inode_storage_lookup(struct inode *inode,
@@ -39,14 +59,9 @@ static struct bpf_local_storage_data *inode_storage_lookup(struct inode *inode,
{
struct bpf_local_storage *inode_storage;
struct bpf_local_storage_map *smap;
- struct bpf_storage_blob *bsb;
-
- bsb = bpf_inode(inode);
- if (!bsb)
- return NULL;
inode_storage =
- rcu_dereference_check(bsb->storage, bpf_rcu_lock_held());
+ rcu_dereference_check(inode->i_bpf_storage, bpf_rcu_lock_held());
if (!inode_storage)
return NULL;
@@ -57,21 +72,18 @@ static struct bpf_local_storage_data *inode_storage_lookup(struct inode *inode,
void bpf_inode_storage_free(struct inode *inode)
{
struct bpf_local_storage *local_storage;
- struct bpf_storage_blob *bsb;
-
- bsb = bpf_inode(inode);
- if (!bsb)
- return;
rcu_read_lock();
- local_storage = rcu_dereference(bsb->storage);
+ local_storage = rcu_dereference(inode->i_bpf_storage);
if (!local_storage) {
rcu_read_unlock();
return;
}
+ bpf_inode_storage_lock();
bpf_local_storage_destroy(local_storage);
+ bpf_inode_storage_unlock();
rcu_read_unlock();
}
@@ -83,7 +95,9 @@ static void *bpf_fd_inode_storage_lookup_elem(struct bpf_map *map, void *key)
if (fd_empty(f))
return ERR_PTR(-EBADF);
+ bpf_inode_storage_lock();
sdata = inode_storage_lookup(file_inode(fd_file(f)), map, true);
+ bpf_inode_storage_unlock();
return sdata ? sdata->data : NULL;
}
@@ -98,13 +112,16 @@ static long bpf_fd_inode_storage_update_elem(struct bpf_map *map, void *key,
if (!inode_storage_ptr(file_inode(fd_file(f))))
return -EBADF;
+ bpf_inode_storage_lock();
sdata = bpf_local_storage_update(file_inode(fd_file(f)),
(struct bpf_local_storage_map *)map,
value, map_flags, GFP_ATOMIC);
+ bpf_inode_storage_unlock();
return PTR_ERR_OR_ZERO(sdata);
}
-static int inode_storage_delete(struct inode *inode, struct bpf_map *map)
+static int inode_storage_delete(struct inode *inode, struct bpf_map *map,
+ bool nobusy)
{
struct bpf_local_storage_data *sdata;
@@ -112,6 +129,9 @@ static int inode_storage_delete(struct inode *inode, struct bpf_map *map)
if (!sdata)
return -ENOENT;
+ if (!nobusy)
+ return -EBUSY;
+
bpf_selem_unlink(SELEM(sdata), false);
return 0;
@@ -119,60 +139,114 @@ static int inode_storage_delete(struct inode *inode, struct bpf_map *map)
static long bpf_fd_inode_storage_delete_elem(struct bpf_map *map, void *key)
{
+ int err;
+
CLASS(fd_raw, f)(*(int *)key);
if (fd_empty(f))
return -EBADF;
- return inode_storage_delete(file_inode(fd_file(f)), map);
+ bpf_inode_storage_lock();
+ err = inode_storage_delete(file_inode(fd_file(f)), map, true);
+ bpf_inode_storage_unlock();
+ return err;
}
-/* *gfp_flags* is a hidden argument provided by the verifier */
-BPF_CALL_5(bpf_inode_storage_get, struct bpf_map *, map, struct inode *, inode,
- void *, value, u64, flags, gfp_t, gfp_flags)
+static void *__bpf_inode_storage_get(struct bpf_map *map, struct inode *inode,
+ void *value, u64 flags, gfp_t gfp_flags, bool nobusy)
{
struct bpf_local_storage_data *sdata;
- WARN_ON_ONCE(!bpf_rcu_lock_held());
- if (flags & ~(BPF_LOCAL_STORAGE_GET_F_CREATE))
- return (unsigned long)NULL;
-
/* explicitly check that the inode_storage_ptr is not
* NULL as inode_storage_lookup returns NULL in this case and
* bpf_local_storage_update expects the owner to have a
* valid storage pointer.
*/
if (!inode || !inode_storage_ptr(inode))
- return (unsigned long)NULL;
+ return NULL;
- sdata = inode_storage_lookup(inode, map, true);
+ sdata = inode_storage_lookup(inode, map, nobusy);
if (sdata)
- return (unsigned long)sdata->data;
+ return sdata->data;
- /* This helper must only called from where the inode is guaranteed
- * to have a refcount and cannot be freed.
- */
- if (flags & BPF_LOCAL_STORAGE_GET_F_CREATE) {
+ /* only allocate new storage, when the inode is refcounted */
+ if (atomic_read(&inode->i_count) &&
+ flags & BPF_LOCAL_STORAGE_GET_F_CREATE && nobusy) {
sdata = bpf_local_storage_update(
inode, (struct bpf_local_storage_map *)map, value,
BPF_NOEXIST, gfp_flags);
- return IS_ERR(sdata) ? (unsigned long)NULL :
- (unsigned long)sdata->data;
+ return IS_ERR(sdata) ? NULL : sdata->data;
}
- return (unsigned long)NULL;
+ return NULL;
}
-BPF_CALL_2(bpf_inode_storage_delete,
- struct bpf_map *, map, struct inode *, inode)
+/* *gfp_flags* is a hidden argument provided by the verifier */
+BPF_CALL_5(bpf_inode_storage_get_recur, struct bpf_map *, map, struct inode *, inode,
+ void *, value, u64, flags, gfp_t, gfp_flags)
{
+ bool nobusy;
+ void *data;
+
+ WARN_ON_ONCE(!bpf_rcu_lock_held());
+ if (flags & ~(BPF_LOCAL_STORAGE_GET_F_CREATE))
+ return (unsigned long)NULL;
+
+ nobusy = bpf_inode_storage_trylock();
+ data = __bpf_inode_storage_get(map, inode, value, flags, gfp_flags, nobusy);
+ if (nobusy)
+ bpf_inode_storage_unlock();
+ return (unsigned long)data;
+}
+
+/* *gfp_flags* is a hidden argument provided by the verifier */
+BPF_CALL_5(bpf_inode_storage_get, struct bpf_map *, map, struct inode *, inode,
+ void *, value, u64, flags, gfp_t, gfp_flags)
+{
+ void *data;
+
+ WARN_ON_ONCE(!bpf_rcu_lock_held());
+ if (flags & ~(BPF_LOCAL_STORAGE_GET_F_CREATE))
+ return (unsigned long)NULL;
+
+ bpf_inode_storage_lock();
+ data = __bpf_inode_storage_get(map, inode, value, flags, gfp_flags, true);
+ bpf_inode_storage_unlock();
+ return (unsigned long)data;
+}
+
+BPF_CALL_2(bpf_inode_storage_delete_recur, struct bpf_map *, map, struct inode *, inode)
+{
+ bool nobusy;
+ int ret;
+
WARN_ON_ONCE(!bpf_rcu_lock_held());
if (!inode)
return -EINVAL;
+ nobusy = bpf_inode_storage_trylock();
/* This helper must only called from where the inode is guaranteed
* to have a refcount and cannot be freed.
*/
- return inode_storage_delete(inode, map);
+ ret = inode_storage_delete(inode, map, nobusy);
+ bpf_inode_storage_unlock();
+ return ret;
+}
+
+BPF_CALL_2(bpf_inode_storage_delete, struct bpf_map *, map, struct inode *, inode)
+{
+ int ret;
+
+ WARN_ON_ONCE(!bpf_rcu_lock_held());
+ if (!inode)
+ return -EINVAL;
+
+ bpf_inode_storage_lock();
+ /* This helper must only called from where the inode is guaranteed
+ * to have a refcount and cannot be freed.
+ */
+ ret = inode_storage_delete(inode, map, true);
+ bpf_inode_storage_unlock();
+ return ret;
}
static int notsupp_get_next_key(struct bpf_map *map, void *key,
@@ -208,6 +282,17 @@ const struct bpf_map_ops inode_storage_map_ops = {
BTF_ID_LIST_SINGLE(bpf_inode_storage_btf_ids, struct, inode)
+const struct bpf_func_proto bpf_inode_storage_get_recur_proto = {
+ .func = bpf_inode_storage_get_recur,
+ .gpl_only = false,
+ .ret_type = RET_PTR_TO_MAP_VALUE_OR_NULL,
+ .arg1_type = ARG_CONST_MAP_PTR,
+ .arg2_type = ARG_PTR_TO_BTF_ID_OR_NULL,
+ .arg2_btf_id = &bpf_inode_storage_btf_ids[0],
+ .arg3_type = ARG_PTR_TO_MAP_VALUE_OR_NULL,
+ .arg4_type = ARG_ANYTHING,
+};
+
const struct bpf_func_proto bpf_inode_storage_get_proto = {
.func = bpf_inode_storage_get,
.gpl_only = false,
@@ -219,6 +304,15 @@ const struct bpf_func_proto bpf_inode_storage_get_proto = {
.arg4_type = ARG_ANYTHING,
};
+const struct bpf_func_proto bpf_inode_storage_delete_recur_proto = {
+ .func = bpf_inode_storage_delete_recur,
+ .gpl_only = false,
+ .ret_type = RET_INTEGER,
+ .arg1_type = ARG_CONST_MAP_PTR,
+ .arg2_type = ARG_PTR_TO_BTF_ID_OR_NULL,
+ .arg2_btf_id = &bpf_inode_storage_btf_ids[0],
+};
+
const struct bpf_func_proto bpf_inode_storage_delete_proto = {
.func = bpf_inode_storage_delete,
.gpl_only = false,
diff --git a/kernel/bpf/bpf_lsm.c b/kernel/bpf/bpf_lsm.c
index 6292ac5f9bd1..51e2de17325a 100644
--- a/kernel/bpf/bpf_lsm.c
+++ b/kernel/bpf/bpf_lsm.c
@@ -231,10 +231,6 @@ bpf_lsm_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
}
switch (func_id) {
- case BPF_FUNC_inode_storage_get:
- return &bpf_inode_storage_get_proto;
- case BPF_FUNC_inode_storage_delete:
- return &bpf_inode_storage_delete_proto;
#ifdef CONFIG_NET
case BPF_FUNC_sk_storage_get:
return &bpf_sk_storage_get_proto;
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index a582cd25ca87..3ec39e6704e2 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -1529,6 +1529,14 @@ bpf_tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
if (bpf_prog_check_recur(prog))
return &bpf_task_storage_delete_recur_proto;
return &bpf_task_storage_delete_proto;
+ case BPF_FUNC_inode_storage_get:
+ if (bpf_prog_check_recur(prog))
+ return &bpf_inode_storage_get_recur_proto;
+ return &bpf_inode_storage_get_proto;
+ case BPF_FUNC_inode_storage_delete:
+ if (bpf_prog_check_recur(prog))
+ return &bpf_inode_storage_delete_recur_proto;
+ return &bpf_inode_storage_delete_proto;
case BPF_FUNC_for_each_map_elem:
return &bpf_for_each_map_elem_proto;
case BPF_FUNC_snprintf:
diff --git a/security/bpf/hooks.c b/security/bpf/hooks.c
index 3663aec7bcbd..67719a04bb0b 100644
--- a/security/bpf/hooks.c
+++ b/security/bpf/hooks.c
@@ -12,8 +12,6 @@ static struct security_hook_list bpf_lsm_hooks[] __ro_after_init = {
LSM_HOOK_INIT(NAME, bpf_lsm_##NAME),
#include <linux/lsm_hook_defs.h>
#undef LSM_HOOK
- LSM_HOOK_INIT(inode_free_security, bpf_inode_storage_free),
- LSM_HOOK_INIT(task_free, bpf_task_storage_free),
};
static const struct lsm_id bpf_lsmid = {
@@ -29,12 +27,7 @@ static int __init bpf_lsm_init(void)
return 0;
}
-struct lsm_blob_sizes bpf_lsm_blob_sizes __ro_after_init = {
- .lbs_inode = sizeof(struct bpf_storage_blob),
-};
-
DEFINE_LSM(bpf) = {
.name = "bpf",
.init = bpf_lsm_init,
- .blobs = &bpf_lsm_blob_sizes
};
--
2.43.5
^ permalink raw reply related [flat|nested] 26+ messages in thread
* [RFC/PATCH v2 bpf-next fanotify 4/7] bpf: fs: Add three kfuncs
2024-11-14 8:43 [RFC/PATCH v2 bpf-next fanotify 0/7] Fanotify fastpath handler Song Liu
` (2 preceding siblings ...)
2024-11-14 8:43 ` [RFC/PATCH v2 bpf-next fanotify 3/7] bpf: Make bpf inode storage available to tracing programs Song Liu
@ 2024-11-14 8:43 ` Song Liu
2024-11-14 8:43 ` [RFC/PATCH v2 bpf-next fanotify 5/7] bpf: Allow bpf map hold reference on dentry Song Liu
` (2 subsequent siblings)
6 siblings, 0 replies; 26+ messages in thread
From: Song Liu @ 2024-11-14 8:43 UTC (permalink / raw)
To: bpf, linux-fsdevel, linux-kernel, linux-security-module
Cc: kernel-team, andrii, eddyz87, ast, daniel, martin.lau, viro,
brauner, jack, kpsingh, mattbobrowski, amir73il, repnop, jlayton,
josef, mic, gnoack, Song Liu
Add the following kfuncs:
- bpf_iput
- bpf_dput
- bpf_is_subdir
These kfuncs can be used by bpf fanotify fastpath.
Both bpf_iput and bpf_dput are marked as KF_SLEEPABLE | KF_RELEASE.
They will be used to release reference on inode and dentry.
bpf_is_subdir is marked as KF_RCU. It will be used to take rcu protected
pointers, for example, kptr saved to a bpf map.
Signed-off-by: Song Liu <song@kernel.org>
---
fs/bpf_fs_kfuncs.c | 41 +++++++++++++++++++++++++++++++++++++++++
kernel/bpf/verifier.c | 1 +
2 files changed, 42 insertions(+)
diff --git a/fs/bpf_fs_kfuncs.c b/fs/bpf_fs_kfuncs.c
index 3fe9f59ef867..03ad3a2faec8 100644
--- a/fs/bpf_fs_kfuncs.c
+++ b/fs/bpf_fs_kfuncs.c
@@ -152,6 +152,44 @@ __bpf_kfunc int bpf_get_file_xattr(struct file *file, const char *name__str,
return bpf_get_dentry_xattr(dentry, name__str, value_p);
}
+/**
+ * bpf_iput - Drop a reference on the inode
+ *
+ * @inode: inode to drop reference.
+ *
+ * Drop a refcount on inode.
+ */
+__bpf_kfunc void bpf_iput(struct inode *inode)
+{
+ iput(inode);
+}
+
+/**
+ * bpf_dput - Drop a reference on the dentry
+ *
+ * @dentry: dentry to drop reference.
+ *
+ * Drop a refcount on dentry.
+ */
+__bpf_kfunc void bpf_dput(struct dentry *dentry)
+{
+ dput(dentry);
+}
+
+/**
+ * bpf_is_subdir - is new dentry a subdirectory of old_dentry
+ * @new_dentry: new dentry
+ * @old_dentry: old dentry
+ *
+ * Returns true if new_dentry is a subdirectory of the parent (at any depth).
+ * Returns false otherwise.
+ * Caller must ensure that "new_dentry" is pinned before calling is_subdir()
+ */
+__bpf_kfunc bool bpf_is_subdir(struct dentry *new_dentry, struct dentry *old_dentry)
+{
+ return is_subdir(new_dentry, old_dentry);
+}
+
__bpf_kfunc_end_defs();
BTF_KFUNCS_START(bpf_fs_kfunc_set_ids)
@@ -161,6 +199,9 @@ BTF_ID_FLAGS(func, bpf_put_file, KF_RELEASE)
BTF_ID_FLAGS(func, bpf_path_d_path, KF_TRUSTED_ARGS)
BTF_ID_FLAGS(func, bpf_get_dentry_xattr, KF_SLEEPABLE | KF_TRUSTED_ARGS)
BTF_ID_FLAGS(func, bpf_get_file_xattr, KF_SLEEPABLE | KF_TRUSTED_ARGS)
+BTF_ID_FLAGS(func, bpf_iput, KF_SLEEPABLE | KF_RELEASE)
+BTF_ID_FLAGS(func, bpf_dput, KF_SLEEPABLE | KF_RELEASE)
+BTF_ID_FLAGS(func, bpf_is_subdir, KF_RCU)
BTF_KFUNCS_END(bpf_fs_kfunc_set_ids)
static int bpf_fs_kfuncs_filter(const struct bpf_prog *prog, u32 kfunc_id)
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 9a7ed527e47e..65abb2d74ee5 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -5432,6 +5432,7 @@ BTF_ID(struct, bpf_cpumask)
#endif
BTF_ID(struct, task_struct)
BTF_ID(struct, bpf_crypto_ctx)
+BTF_ID(struct, dentry)
BTF_SET_END(rcu_protected_types)
static bool rcu_protected_object(const struct btf *btf, u32 btf_id)
--
2.43.5
^ permalink raw reply related [flat|nested] 26+ messages in thread
* [RFC/PATCH v2 bpf-next fanotify 5/7] bpf: Allow bpf map hold reference on dentry
2024-11-14 8:43 [RFC/PATCH v2 bpf-next fanotify 0/7] Fanotify fastpath handler Song Liu
` (3 preceding siblings ...)
2024-11-14 8:43 ` [RFC/PATCH v2 bpf-next fanotify 4/7] bpf: fs: Add three kfuncs Song Liu
@ 2024-11-14 8:43 ` Song Liu
2024-11-14 8:43 ` [RFC/PATCH v2 bpf-next fanotify 6/7] fanotify: Enable bpf based fanotify fastpath handler Song Liu
2024-11-14 8:43 ` [RFC/PATCH v2 bpf-next fanotify 7/7] selftests/bpf: Add test for BPF " Song Liu
6 siblings, 0 replies; 26+ messages in thread
From: Song Liu @ 2024-11-14 8:43 UTC (permalink / raw)
To: bpf, linux-fsdevel, linux-kernel, linux-security-module
Cc: kernel-team, andrii, eddyz87, ast, daniel, martin.lau, viro,
brauner, jack, kpsingh, mattbobrowski, amir73il, repnop, jlayton,
josef, mic, gnoack, Song Liu
To save a dentry in a bpf map, proper logic is needed to free the dentry
properly on map termination.
Signed-off-by: Song Liu <song@kernel.org>
---
kernel/bpf/helpers.c | 14 +++++++++++++-
1 file changed, 13 insertions(+), 1 deletion(-)
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 1a43d06eab28..5e3bf2c188a8 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -3004,6 +3004,12 @@ __bpf_kfunc int bpf_copy_from_user_str(void *dst, u32 dst__sz, const void __user
return ret + 1;
}
+__bpf_kfunc void bpf_dentry_release_dtor(struct dentry *dentry)
+{
+ dput(dentry);
+}
+CFI_NOSEAL(bpf_dentry_release_dtor);
+
__bpf_kfunc_end_defs();
BTF_KFUNCS_START(generic_btf_ids)
@@ -3046,6 +3052,8 @@ static const struct btf_kfunc_id_set generic_kfunc_set = {
BTF_ID_LIST(generic_dtor_ids)
BTF_ID(struct, task_struct)
BTF_ID(func, bpf_task_release_dtor)
+BTF_ID(struct, dentry)
+BTF_ID(func, bpf_dentry_release_dtor)
#ifdef CONFIG_CGROUPS
BTF_ID(struct, cgroup)
BTF_ID(func, bpf_cgroup_release_dtor)
@@ -3105,11 +3113,15 @@ static int __init kfunc_init(void)
.btf_id = generic_dtor_ids[0],
.kfunc_btf_id = generic_dtor_ids[1]
},
-#ifdef CONFIG_CGROUPS
{
.btf_id = generic_dtor_ids[2],
.kfunc_btf_id = generic_dtor_ids[3]
},
+#ifdef CONFIG_CGROUPS
+ {
+ .btf_id = generic_dtor_ids[4],
+ .kfunc_btf_id = generic_dtor_ids[5]
+ },
#endif
};
--
2.43.5
^ permalink raw reply related [flat|nested] 26+ messages in thread
* [RFC/PATCH v2 bpf-next fanotify 6/7] fanotify: Enable bpf based fanotify fastpath handler
2024-11-14 8:43 [RFC/PATCH v2 bpf-next fanotify 0/7] Fanotify fastpath handler Song Liu
` (4 preceding siblings ...)
2024-11-14 8:43 ` [RFC/PATCH v2 bpf-next fanotify 5/7] bpf: Allow bpf map hold reference on dentry Song Liu
@ 2024-11-14 8:43 ` Song Liu
2024-11-14 8:43 ` [RFC/PATCH v2 bpf-next fanotify 7/7] selftests/bpf: Add test for BPF " Song Liu
6 siblings, 0 replies; 26+ messages in thread
From: Song Liu @ 2024-11-14 8:43 UTC (permalink / raw)
To: bpf, linux-fsdevel, linux-kernel, linux-security-module
Cc: kernel-team, andrii, eddyz87, ast, daniel, martin.lau, viro,
brauner, jack, kpsingh, mattbobrowski, amir73il, repnop, jlayton,
josef, mic, gnoack, Song Liu
Allow user to write fanotify fastpath handlers with bpf programs.
Major changes:
1. Make kfuncs in fs/bpf_fs_kfuncs.c available to STRUCT_OPS programs.
2. Add kfunc bpf_fanotify_data_inode, bpf_fanotify_data_dentry.
e. Add struct_ops bpf_fanotify_fastpath_ops.
TODO:
1. With current logic, the bpf based fastpath handler is added to the
global list, and thus available to all users. This is similar to
bpf based tcp congestion algorithms. It is possible to add an API
so that the bpf based handler is not added to global list, which is
similar to hid-bpf. I plan to add that API later.
Signed-off-by: Song Liu <song@kernel.org>
---
fs/Makefile | 2 +-
fs/bpf_fs_kfuncs.c | 10 +-
fs/notify/fanotify/fanotify_fastpath.c | 172 ++++++++++++++++++++++++-
kernel/bpf/verifier.c | 5 +
4 files changed, 183 insertions(+), 6 deletions(-)
diff --git a/fs/Makefile b/fs/Makefile
index 61679fd587b7..1043d999262d 100644
--- a/fs/Makefile
+++ b/fs/Makefile
@@ -129,4 +129,4 @@ obj-$(CONFIG_EFIVAR_FS) += efivarfs/
obj-$(CONFIG_EROFS_FS) += erofs/
obj-$(CONFIG_VBOXSF_FS) += vboxsf/
obj-$(CONFIG_ZONEFS_FS) += zonefs/
-obj-$(CONFIG_BPF_LSM) += bpf_fs_kfuncs.o
+obj-$(CONFIG_BPF_SYSCALL) += bpf_fs_kfuncs.o
diff --git a/fs/bpf_fs_kfuncs.c b/fs/bpf_fs_kfuncs.c
index 03ad3a2faec8..2f7b91f10175 100644
--- a/fs/bpf_fs_kfuncs.c
+++ b/fs/bpf_fs_kfuncs.c
@@ -207,7 +207,8 @@ BTF_KFUNCS_END(bpf_fs_kfunc_set_ids)
static int bpf_fs_kfuncs_filter(const struct bpf_prog *prog, u32 kfunc_id)
{
if (!btf_id_set8_contains(&bpf_fs_kfunc_set_ids, kfunc_id) ||
- prog->type == BPF_PROG_TYPE_LSM)
+ prog->type == BPF_PROG_TYPE_LSM ||
+ prog->type == BPF_PROG_TYPE_STRUCT_OPS)
return 0;
return -EACCES;
}
@@ -220,7 +221,12 @@ static const struct btf_kfunc_id_set bpf_fs_kfunc_set = {
static int __init bpf_fs_kfuncs_init(void)
{
- return register_btf_kfunc_id_set(BPF_PROG_TYPE_LSM, &bpf_fs_kfunc_set);
+ int ret;
+
+ ret = register_btf_kfunc_id_set(BPF_PROG_TYPE_LSM, &bpf_fs_kfunc_set);
+ ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_STRUCT_OPS, &bpf_fs_kfunc_set);
+
+ return ret;
}
late_initcall(bpf_fs_kfuncs_init);
diff --git a/fs/notify/fanotify/fanotify_fastpath.c b/fs/notify/fanotify/fanotify_fastpath.c
index f2aefcf0ca6a..ec7a1143d687 100644
--- a/fs/notify/fanotify/fanotify_fastpath.c
+++ b/fs/notify/fanotify/fanotify_fastpath.c
@@ -2,6 +2,7 @@
#include <linux/fanotify.h>
#include <linux/kobject.h>
#include <linux/module.h>
+#include <linux/bpf.h>
#include "fanotify.h"
@@ -197,7 +198,7 @@ int fanotify_fastpath_add(struct fsnotify_group *group,
spin_lock(&fp_list_lock);
fp_ops = fanotify_fastpath_find(args.name);
- if (!fp_ops || !try_module_get(fp_ops->owner)) {
+ if (!fp_ops || !bpf_try_module_get(fp_ops, fp_ops->owner)) {
spin_unlock(&fp_list_lock);
ret = -ENOENT;
goto err_free_hook;
@@ -238,7 +239,7 @@ int fanotify_fastpath_add(struct fsnotify_group *group,
err_free_args:
kfree(init_args);
err_module_put:
- module_put(fp_ops->owner);
+ bpf_module_put(fp_ops, fp_ops->owner);
err_free_hook:
kfree(fp_hook);
goto out;
@@ -249,7 +250,7 @@ void fanotify_fastpath_hook_free(struct fanotify_fastpath_hook *fp_hook)
if (fp_hook->ops->fp_free)
fp_hook->ops->fp_free(fp_hook);
- module_put(fp_hook->ops->owner);
+ bpf_module_put(fp_hook->ops, fp_hook->ops->owner);
kfree(fp_hook);
}
@@ -280,3 +281,168 @@ static int __init fanotify_fastpath_init(void)
return 0;
}
device_initcall(fanotify_fastpath_init);
+
+__bpf_kfunc_start_defs();
+
+/**
+ * bpf_fanotify_data_inode - get inode from fanotify_fastpath_event
+ *
+ * @event: fanotify_fastpath_event to get inode from
+ *
+ * Get referenced inode from fanotify_fastpath_event.
+ *
+ * Return: A refcounted inode or NULL.
+ *
+ */
+__bpf_kfunc struct inode *bpf_fanotify_data_inode(struct fanotify_fastpath_event *event)
+{
+ struct inode *inode = fsnotify_data_inode(event->data, event->data_type);
+
+ return inode ? igrab(inode) : NULL;
+}
+
+/**
+ * bpf_fanotify_data_dentry - get dentry from fanotify_fastpath_event
+ *
+ * @event: fanotify_fastpath_event to get dentry from
+ *
+ * Get referenced dentry from fanotify_fastpath_event.
+ *
+ * Return: A refcounted inode or NULL.
+ *
+ */
+__bpf_kfunc struct dentry *bpf_fanotify_data_dentry(struct fanotify_fastpath_event *event)
+{
+ struct dentry *dentry = fsnotify_data_dentry(event->data, event->data_type);
+
+ return dget(dentry);
+}
+
+__bpf_kfunc_end_defs();
+
+BTF_KFUNCS_START(bpf_fanotify_kfunc_set_ids)
+BTF_ID_FLAGS(func, bpf_fanotify_data_inode,
+ KF_ACQUIRE | KF_TRUSTED_ARGS | KF_RET_NULL)
+BTF_ID_FLAGS(func, bpf_fanotify_data_dentry,
+ KF_ACQUIRE | KF_TRUSTED_ARGS | KF_RET_NULL)
+BTF_KFUNCS_END(bpf_fanotify_kfunc_set_ids)
+
+static const struct btf_kfunc_id_set bpf_fanotify_kfunc_set = {
+ .owner = THIS_MODULE,
+ .set = &bpf_fanotify_kfunc_set_ids,
+};
+
+static const struct bpf_func_proto *
+bpf_fanotify_fastpath_get_func_proto(enum bpf_func_id func_id,
+ const struct bpf_prog *prog)
+{
+ return tracing_prog_func_proto(func_id, prog);
+}
+
+static bool bpf_fanotify_fastpath_is_valid_access(int off, int size,
+ enum bpf_access_type type,
+ const struct bpf_prog *prog,
+ struct bpf_insn_access_aux *info)
+{
+ if (!bpf_tracing_btf_ctx_access(off, size, type, prog, info))
+ return false;
+
+ return true;
+}
+
+static int bpf_fanotify_fastpath_btf_struct_access(struct bpf_verifier_log *log,
+ const struct bpf_reg_state *reg,
+ int off, int size)
+{
+ return 0;
+}
+
+static const struct bpf_verifier_ops bpf_fanotify_fastpath_verifier_ops = {
+ .get_func_proto = bpf_fanotify_fastpath_get_func_proto,
+ .is_valid_access = bpf_fanotify_fastpath_is_valid_access,
+ .btf_struct_access = bpf_fanotify_fastpath_btf_struct_access,
+};
+
+static int bpf_fanotify_fastpath_reg(void *kdata, struct bpf_link *link)
+{
+ return fanotify_fastpath_register(kdata);
+}
+
+static void bpf_fanotify_fastpath_unreg(void *kdata, struct bpf_link *link)
+{
+ fanotify_fastpath_unregister(kdata);
+}
+
+static int bpf_fanotify_fastpath_init(struct btf *btf)
+{
+ return 0;
+}
+
+static int bpf_fanotify_fastpath_init_member(const struct btf_type *t,
+ const struct btf_member *member,
+ void *kdata, const void *udata)
+{
+ const struct fanotify_fastpath_ops *uops;
+ struct fanotify_fastpath_ops *ops;
+ u32 moff;
+ int ret;
+
+ uops = (const struct fanotify_fastpath_ops *)udata;
+ ops = (struct fanotify_fastpath_ops *)kdata;
+
+ moff = __btf_member_bit_offset(t, member) / 8;
+ switch (moff) {
+ case offsetof(struct fanotify_fastpath_ops, name):
+ ret = bpf_obj_name_cpy(ops->name, uops->name,
+ sizeof(ops->name));
+ if (ret <= 0)
+ return -EINVAL;
+ return 1;
+ }
+
+ return 0;
+}
+
+static int __bpf_fan_fp_handler(struct fsnotify_group *group,
+ struct fanotify_fastpath_hook *fp_hook,
+ struct fanotify_fastpath_event *fp_event)
+{
+ return 0;
+}
+
+static int __bpf_fan_fp_init(struct fanotify_fastpath_hook *hook, void *args)
+{
+ return 0;
+}
+
+static void __bpf_fan_fp_free(struct fanotify_fastpath_hook *hook)
+{
+}
+
+/* For bpf_struct_ops->cfi_stubs */
+static struct fanotify_fastpath_ops __bpf_fanotify_fastpath_ops = {
+ .fp_handler = __bpf_fan_fp_handler,
+ .fp_init = __bpf_fan_fp_init,
+ .fp_free = __bpf_fan_fp_free,
+};
+
+static struct bpf_struct_ops bpf_fanotify_fastpath_ops = {
+ .verifier_ops = &bpf_fanotify_fastpath_verifier_ops,
+ .reg = bpf_fanotify_fastpath_reg,
+ .unreg = bpf_fanotify_fastpath_unreg,
+ .init = bpf_fanotify_fastpath_init,
+ .init_member = bpf_fanotify_fastpath_init_member,
+ .name = "fanotify_fastpath_ops",
+ .cfi_stubs = &__bpf_fanotify_fastpath_ops,
+ .owner = THIS_MODULE,
+};
+
+static int __init bpf_fanotify_fastpath_struct_ops_init(void)
+{
+ int ret;
+
+ ret = register_btf_kfunc_id_set(BPF_PROG_TYPE_STRUCT_OPS, &bpf_fanotify_kfunc_set);
+ ret = ret ?: register_bpf_struct_ops(&bpf_fanotify_fastpath_ops, fanotify_fastpath_ops);
+ return ret;
+}
+late_initcall(bpf_fanotify_fastpath_struct_ops_init);
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 65abb2d74ee5..cf7af86118fe 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -6529,6 +6529,10 @@ BTF_TYPE_SAFE_TRUSTED(struct dentry) {
struct inode *d_inode;
};
+BTF_TYPE_SAFE_TRUSTED(struct fanotify_fastpath_event) {
+ struct inode *dir;
+};
+
BTF_TYPE_SAFE_TRUSTED_OR_NULL(struct socket) {
struct sock *sk;
};
@@ -6564,6 +6568,7 @@ static bool type_is_trusted(struct bpf_verifier_env *env,
BTF_TYPE_EMIT(BTF_TYPE_SAFE_TRUSTED(struct linux_binprm));
BTF_TYPE_EMIT(BTF_TYPE_SAFE_TRUSTED(struct file));
BTF_TYPE_EMIT(BTF_TYPE_SAFE_TRUSTED(struct dentry));
+ BTF_TYPE_EMIT(BTF_TYPE_SAFE_TRUSTED(struct fanotify_fastpath_event));
return btf_nested_type_is_trusted(&env->log, reg, field_name, btf_id, "__safe_trusted");
}
--
2.43.5
^ permalink raw reply related [flat|nested] 26+ messages in thread
* [RFC/PATCH v2 bpf-next fanotify 7/7] selftests/bpf: Add test for BPF based fanotify fastpath handler
2024-11-14 8:43 [RFC/PATCH v2 bpf-next fanotify 0/7] Fanotify fastpath handler Song Liu
` (5 preceding siblings ...)
2024-11-14 8:43 ` [RFC/PATCH v2 bpf-next fanotify 6/7] fanotify: Enable bpf based fanotify fastpath handler Song Liu
@ 2024-11-14 8:43 ` Song Liu
2024-11-14 20:14 ` Alexei Starovoitov
6 siblings, 1 reply; 26+ messages in thread
From: Song Liu @ 2024-11-14 8:43 UTC (permalink / raw)
To: bpf, linux-fsdevel, linux-kernel, linux-security-module
Cc: kernel-team, andrii, eddyz87, ast, daniel, martin.lau, viro,
brauner, jack, kpsingh, mattbobrowski, amir73il, repnop, jlayton,
josef, mic, gnoack, Song Liu
This test shows a simplified logic that monitors a subtree. This is
simplified as it doesn't handle all the scenarios, such as:
1) moving a subsubtree into/outof the being monitoring subtree;
2) mount point inside the being monitored subtree
Therefore, this is not to show a way to reliably monitor a subtree.
Instead, this is to test the functionalities of bpf based fastpath.
To really monitor a subtree reliably, we will need more complex logic.
Overview of the logic:
1. fanotify is created for the whole file system (/tmp).
2. dentry of the subtree root is saved in map subtree_root.
3. bpf_is_subdir() is used to check whether a fanotify event happens
inside the subtree. Only events happened in the subtree are passed
to userspace.
4. A bpf map (inode_storage_map) is used to cache result from
bpf_is_subdir().
5. subsubtree moving is not handled. This is because we don't yet have
a good way to walk a subtree from BPF (something similar to d_walk).
Signed-off-by: Song Liu <song@kernel.org>
---
tools/testing/selftests/bpf/bpf_kfuncs.h | 5 +
tools/testing/selftests/bpf/config | 2 +
.../testing/selftests/bpf/prog_tests/fan_fp.c | 264 ++++++++++++++++++
tools/testing/selftests/bpf/progs/fan_fp.c | 154 ++++++++++
4 files changed, 425 insertions(+)
create mode 100644 tools/testing/selftests/bpf/prog_tests/fan_fp.c
create mode 100644 tools/testing/selftests/bpf/progs/fan_fp.c
diff --git a/tools/testing/selftests/bpf/bpf_kfuncs.h b/tools/testing/selftests/bpf/bpf_kfuncs.h
index 2eb3483f2fb0..6ccfef9685e1 100644
--- a/tools/testing/selftests/bpf/bpf_kfuncs.h
+++ b/tools/testing/selftests/bpf/bpf_kfuncs.h
@@ -87,4 +87,9 @@ struct dentry;
*/
extern int bpf_get_dentry_xattr(struct dentry *dentry, const char *name,
struct bpf_dynptr *value_ptr) __ksym __weak;
+
+struct fanotify_fastpath_event;
+extern struct inode *bpf_fanotify_data_inode(struct fanotify_fastpath_event *event) __ksym __weak;
+extern void bpf_iput(struct inode *inode) __ksym __weak;
+extern bool bpf_is_subdir(struct dentry *new_dentry, struct dentry *old_dentry) __ksym __weak;
#endif
diff --git a/tools/testing/selftests/bpf/config b/tools/testing/selftests/bpf/config
index 4ca84c8d9116..505327f53f07 100644
--- a/tools/testing/selftests/bpf/config
+++ b/tools/testing/selftests/bpf/config
@@ -24,6 +24,8 @@ CONFIG_DEBUG_INFO_BTF=y
CONFIG_DEBUG_INFO_DWARF4=y
CONFIG_DUMMY=y
CONFIG_DYNAMIC_FTRACE=y
+CONFIG_FANOTIFY=y
+CONFIG_FANOTIFY_FASTPATH=y
CONFIG_FPROBE=y
CONFIG_FTRACE_SYSCALLS=y
CONFIG_FUNCTION_ERROR_INJECTION=y
diff --git a/tools/testing/selftests/bpf/prog_tests/fan_fp.c b/tools/testing/selftests/bpf/prog_tests/fan_fp.c
new file mode 100644
index 000000000000..92929b811282
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/fan_fp.c
@@ -0,0 +1,264 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2024 Meta Platforms, Inc. and affiliates. */
+
+#define _GNU_SOURCE
+#include <err.h>
+#include <fcntl.h>
+#include <stdio.h>
+#include <string.h>
+#include <sys/fanotify.h>
+#include <unistd.h>
+#include <sys/ioctl.h>
+#include <sys/stat.h>
+
+#include <test_progs.h>
+
+#include "fan_fp.skel.h"
+
+#define TEST_FS "/tmp/"
+#define TEST_DIR "/tmp/fanotify_test/"
+
+static int create_test_subtree(void)
+{
+ int err;
+
+ err = mkdir(TEST_DIR, 0777);
+ if (err && errno != EEXIST)
+ return err;
+
+ return open(TEST_DIR, O_RDONLY);
+}
+
+static int create_fanotify_fd(void)
+{
+ int fanotify_fd, err;
+
+ fanotify_fd = fanotify_init(FAN_CLASS_NOTIF | FAN_REPORT_NAME | FAN_REPORT_DIR_FID,
+ O_RDONLY);
+
+ if (!ASSERT_OK_FD(fanotify_fd, "fanotify_init"))
+ return -1;
+
+ err = fanotify_mark(fanotify_fd, FAN_MARK_ADD | FAN_MARK_FILESYSTEM,
+ FAN_CREATE | FAN_OPEN | FAN_ONDIR | FAN_EVENT_ON_CHILD,
+ AT_FDCWD, TEST_FS);
+ if (!ASSERT_OK(err, "fanotify_mark")) {
+ close(fanotify_fd);
+ return -1;
+ }
+
+ return fanotify_fd;
+}
+
+static int attach_global_fastpath(int fanotify_fd)
+{
+ struct fanotify_fastpath_args args = {
+ .name = "_tmp_test_sub_tree",
+ .version = 1,
+ .flags = 0,
+ };
+
+ if (ioctl(fanotify_fd, FAN_IOC_ADD_FP, &args))
+ return -1;
+
+ return 0;
+}
+
+#define EVENT_BUFFER_SIZE 4096
+struct file_access_result {
+ char name_prefix[16];
+ bool accessed;
+} access_results[3] = {
+ {"aa", false},
+ {"bb", false},
+ {"cc", false},
+};
+
+static void update_access_results(char *name)
+{
+ int i;
+
+ for (i = 0; i < ARRAY_SIZE(access_results); i++) {
+ if (strcmp(name, access_results[i].name_prefix) == 0)
+ access_results[i].accessed = true;
+ }
+}
+
+static void parse_event(char *buffer, int len)
+{
+ struct fanotify_event_metadata *event =
+ (struct fanotify_event_metadata *) buffer;
+ struct fanotify_event_info_header *info;
+ struct fanotify_event_info_fid *fid;
+ struct file_handle *handle;
+ char *name;
+ int off;
+
+ for (; FAN_EVENT_OK(event, len); event = FAN_EVENT_NEXT(event, len)) {
+ for (off = sizeof(*event) ; off < event->event_len;
+ off += info->len) {
+ info = (struct fanotify_event_info_header *)
+ ((char *) event + off);
+ switch (info->info_type) {
+ case FAN_EVENT_INFO_TYPE_DFID_NAME:
+ fid = (struct fanotify_event_info_fid *) info;
+ handle = (struct file_handle *)&fid->handle;
+ name = (char *)handle + sizeof(*handle) + handle->handle_bytes;
+ update_access_results(name);
+ break;
+ default:
+ break;
+ }
+ }
+ }
+}
+
+static void touch_file(const char *path)
+{
+ int fd;
+
+ fd = open(path, O_WRONLY|O_CREAT|O_NOCTTY|O_NONBLOCK, 0666);
+ if (!ASSERT_OK_FD(fd, "open"))
+ goto cleanup;
+ close(fd);
+cleanup:
+ unlink(path);
+}
+
+static void generate_and_test_event(int fanotify_fd, struct fan_fp *skel)
+{
+ char buffer[EVENT_BUFFER_SIZE];
+ int len, err, fd;
+
+ /* Open the dir, so initialize_subdir_root can work */
+ fd = open(TEST_DIR, O_RDONLY);
+ close(fd);
+
+ if (!ASSERT_EQ(skel->bss->initialized, true, "initialized"))
+ goto cleanup;
+
+ /* access /tmp/fanotify_test/aa, this will generate event */
+ touch_file(TEST_DIR "aa");
+
+ /* create /tmp/fanotify_test/subdir, this will get tag from the
+ * parent directory (added in the bpf program on fsnotify_mkdir)
+ */
+ err = mkdir(TEST_DIR "subdir", 0777);
+ ASSERT_OK(err, "mkdir");
+
+ /* access /tmp/fanotify_test/subdir/bb, this will generate event */
+ touch_file(TEST_DIR "subdir/bb");
+
+ /* access /tmp/cc, this will NOT generate event, as the BPF
+ * fastpath filtered this event out. (Because /tmp doesn't have
+ * the tag.)
+ */
+ touch_file(TEST_FS "cc");
+
+ /* read and parse the events */
+ len = read(fanotify_fd, buffer, EVENT_BUFFER_SIZE);
+ if (!ASSERT_GE(len, 0, "read event"))
+ goto cleanup;
+ parse_event(buffer, len);
+
+ /* verify we generated events for aa and bb, but filtered out the
+ * event for cc.
+ */
+ ASSERT_TRUE(access_results[0].accessed, "access aa");
+ ASSERT_TRUE(access_results[1].accessed, "access bb");
+ ASSERT_FALSE(access_results[2].accessed, "access cc");
+
+ /* Each touch_file() generates two events: FAN_CREATE then
+ * FAN_OPEN. The second event will hit cache.
+ * open(TEST_DIR) also hit cache, as we updated it cache for
+ * TEST_DIR from userspace.
+ * Therefore, we expect 4 cache hits: aa, bb, cc, and TEST_DIR.
+ */
+ ASSERT_EQ(skel->bss->cache_hit, 4, "cache_hit");
+
+cleanup:
+ rmdir(TEST_DIR "subdir");
+ rmdir(TEST_DIR);
+}
+
+/* This test shows a simplified logic that monitors a subtree. This is
+ * simplified as it doesn't handle all the scenarios, such as:
+ *
+ * 1) moving a subsubtree into/outof the being monitoring subtree;
+ * 2) mount point inside the being monitored subtree
+ *
+ * Therefore, this is not to show a way to reliably monitor a subtree.
+ * Instead, this is to test the functionalities of bpf based fastpath.
+ *
+ * Overview of the logic:
+ * 1. fanotify is created for the whole file system (/tmp);
+ * 2. A bpf map (inode_storage_map) is used to tag directories to
+ * monitor (starting from /tmp/fanotify_test);
+ * 3. On fsnotify_mkdir, thee tag is propagated to newly created sub
+ * directories (/tmp/fanotify_test/subdir);
+ * 4. The bpf fastpath checks whether the event happens in a directory
+ * with the tag. If yes, the event is sent to user space; otherwise,
+ * the event is dropped.
+ */
+static void test_monitor_subtree(void)
+{
+ struct bpf_link *link;
+ struct fan_fp *skel;
+ int test_root_fd;
+ int zero = 0;
+ int err, fanotify_fd;
+ struct stat st;
+
+ test_root_fd = create_test_subtree();
+
+ if (!ASSERT_OK_FD(test_root_fd, "create_test_subtree"))
+ return;
+
+ err = fstat(test_root_fd, &st);
+ if (!ASSERT_OK(err, "fstat test_root_fd"))
+ goto close_test_root_fd;
+
+ skel = fan_fp__open_and_load();
+
+ if (!ASSERT_OK_PTR(skel, "fan_fp__open_and_load"))
+ goto close_test_root_fd;
+
+ skel->bss->root_ino = st.st_ino;
+
+ /* Add tag to /tmp/fanotify_test/ */
+ err = bpf_map_update_elem(bpf_map__fd(skel->maps.inode_storage_map),
+ &test_root_fd, &zero, BPF_ANY);
+ if (!ASSERT_OK(err, "bpf_map_update_elem"))
+ goto destroy_skel;
+ link = bpf_map__attach_struct_ops(skel->maps.bpf_fanotify_fastpath_ops);
+ if (!ASSERT_OK_PTR(link, "bpf_map__attach_struct_ops"))
+ goto destroy_skel;
+
+ fanotify_fd = create_fanotify_fd();
+ if (!ASSERT_OK_FD(fanotify_fd, "create_fanotify_fd"))
+ goto destroy_link;
+
+ err = attach_global_fastpath(fanotify_fd);
+ if (!ASSERT_OK(err, "attach_global_fastpath"))
+ goto close_fanotify_fd;
+
+ generate_and_test_event(fanotify_fd, skel);
+
+close_fanotify_fd:
+ close(fanotify_fd);
+
+destroy_link:
+ bpf_link__destroy(link);
+destroy_skel:
+ fan_fp__destroy(skel);
+
+close_test_root_fd:
+ close(test_root_fd);
+ rmdir(TEST_DIR);
+}
+
+void test_bpf_fanotify_fastpath(void)
+{
+ if (test__start_subtest("subtree"))
+ test_monitor_subtree();
+}
diff --git a/tools/testing/selftests/bpf/progs/fan_fp.c b/tools/testing/selftests/bpf/progs/fan_fp.c
new file mode 100644
index 000000000000..97e7d0b9e644
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/fan_fp.c
@@ -0,0 +1,154 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2024 Meta Platforms, Inc. and affiliates. */
+
+#include "vmlinux.h"
+#include <bpf/bpf_tracing.h>
+#include <bpf/bpf_core_read.h>
+#include "bpf_kfuncs.h"
+
+struct __dentry_kptr_value {
+ struct dentry __kptr * dentry;
+};
+
+/* subdir_root map holds a single dentry pointer to the subtree root.
+ * This pointer is used to call bpf_is_subdir().
+ */
+struct {
+ __uint(type, BPF_MAP_TYPE_ARRAY);
+ __type(key, int);
+ __type(value, struct __dentry_kptr_value);
+ __uint(max_entries, 1);
+} subdir_root SEC(".maps");
+
+/* inode_storage_map serves as cache for bpf_is_subdir(). inode local
+ * storage has O(1) access time. So this is preferred over calling
+ * bpf_is_subdir().
+ */
+struct {
+ __uint(type, BPF_MAP_TYPE_INODE_STORAGE);
+ __uint(map_flags, BPF_F_NO_PREALLOC);
+ __type(key, int);
+ __type(value, int);
+} inode_storage_map SEC(".maps");
+
+unsigned long root_ino;
+bool initialized;
+
+/* This function initialize map subdir_root. The logic is a bit ungly.
+ * First, user space sets root_ino. Then a fanotify event is triggered.
+ * If the event dentry matches root_ino, we take a reference on the
+ * dentry and save it in subdir_root map. The reference will be freed on
+ * the termination of subdir_root map.
+ */
+static void initialize_subdir_root(struct fanotify_fastpath_event *fp_event)
+{
+ struct __dentry_kptr_value *v;
+ struct dentry *dentry, *old;
+ int zero = 0;
+
+ if (initialized)
+ return;
+
+ dentry = bpf_fanotify_data_dentry(fp_event);
+ if (!dentry)
+ return;
+
+ if (dentry->d_inode->i_ino != root_ino) {
+ bpf_dput(dentry);
+ return;
+ }
+
+ v = bpf_map_lookup_elem(&subdir_root, &zero);
+ if (!v) {
+ bpf_dput(dentry);
+ return;
+ }
+
+ old = bpf_kptr_xchg(&v->dentry, dentry);
+ if (old)
+ bpf_dput(old);
+ initialized = true;
+}
+
+int cache_hit;
+
+/* bpf_fp_handler is sleepable, as it calls bpf_dput() */
+SEC("struct_ops.s")
+int BPF_PROG(bpf_fp_handler,
+ struct fsnotify_group *group,
+ struct fanotify_fastpath_hook *fp_hook,
+ struct fanotify_fastpath_event *fp_event)
+{
+ struct __dentry_kptr_value *v;
+ struct dentry *dentry;
+ int zero = 0;
+ int *value;
+ int ret;
+
+ initialize_subdir_root(fp_event);
+
+ /* Before the subdir_root map is initialized, send all events to
+ * user space.
+ */
+ if (!initialized)
+ return FAN_FP_RET_SEND_TO_USERSPACE;
+
+ dentry = bpf_fanotify_data_dentry(fp_event);
+ if (!dentry)
+ return FAN_FP_RET_SEND_TO_USERSPACE;
+
+ /* If inode_storage_map has cached value, just return it */
+ value = bpf_inode_storage_get(&inode_storage_map, dentry->d_inode, 0, 0);
+ if (value) {
+ bpf_dput(dentry);
+ cache_hit++;
+ return *value;
+ }
+
+ /* Hold rcu read lock for bpf_is_subdir */
+ bpf_rcu_read_lock();
+ v = bpf_map_lookup_elem(&subdir_root, &zero);
+ if (!v || !v->dentry) {
+ /* This shouldn't happen, but we need this to pass
+ * the verifier.
+ */
+ ret = FAN_FP_RET_SEND_TO_USERSPACE;
+ goto out;
+ }
+
+ if (bpf_is_subdir(dentry, v->dentry))
+ ret = FAN_FP_RET_SEND_TO_USERSPACE;
+ else
+ ret = FAN_FP_RET_SKIP_EVENT;
+out:
+ bpf_rcu_read_unlock();
+
+ /* Save current result to the inode_storage_map */
+ value = bpf_inode_storage_get(&inode_storage_map, dentry->d_inode, 0,
+ BPF_LOCAL_STORAGE_GET_F_CREATE);
+ if (value)
+ *value = ret;
+ bpf_dput(dentry);
+ return ret;
+}
+
+SEC("struct_ops")
+int BPF_PROG(bpf_fp_init, struct fanotify_fastpath_hook *hook, const char *args)
+{
+ return 0;
+}
+
+SEC("struct_ops")
+void BPF_PROG(bpf_fp_free, struct fanotify_fastpath_hook *hook)
+{
+}
+
+SEC(".struct_ops.link")
+struct fanotify_fastpath_ops bpf_fanotify_fastpath_ops = {
+ .fp_handler = (void *)bpf_fp_handler,
+ .fp_init = (void *)bpf_fp_init,
+ .fp_free = (void *)bpf_fp_free,
+ .name = "_tmp_test_sub_tree",
+};
+
+char _license[] SEC("license") = "GPL";
--
2.43.5
^ permalink raw reply related [flat|nested] 26+ messages in thread
* Re: [RFC/PATCH v2 bpf-next fanotify 7/7] selftests/bpf: Add test for BPF based fanotify fastpath handler
2024-11-14 8:43 ` [RFC/PATCH v2 bpf-next fanotify 7/7] selftests/bpf: Add test for BPF " Song Liu
@ 2024-11-14 20:14 ` Alexei Starovoitov
2024-11-14 23:02 ` Song Liu
2024-11-15 7:26 ` Amir Goldstein
0 siblings, 2 replies; 26+ messages in thread
From: Alexei Starovoitov @ 2024-11-14 20:14 UTC (permalink / raw)
To: Song Liu
Cc: bpf, Linux-Fsdevel, LKML, LSM List, Kernel Team, Andrii Nakryiko,
Eddy Z, Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau,
Alexander Viro, Christian Brauner, Jan Kara, KP Singh,
Matt Bobrowski, Amir Goldstein, repnop, Jeff Layton, Josef Bacik,
Mickaël Salaün, gnoack
On Thu, Nov 14, 2024 at 12:44 AM Song Liu <song@kernel.org> wrote:
>
> +
> + if (bpf_is_subdir(dentry, v->dentry))
> + ret = FAN_FP_RET_SEND_TO_USERSPACE;
> + else
> + ret = FAN_FP_RET_SKIP_EVENT;
It seems to me that all these patches and feature additions
to fanotify, new kfuncs, etc are done just to do the above
filtering by subdir ?
If so, just hard code this logic as an extra flag to fanotify ?
So it can filter all events by subdir.
bpf programmability makes sense when it needs to express
user space policy. Here it's just a filter by subdir.
bpf hammer doesn't look like the right tool for this use case.
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [RFC/PATCH v2 bpf-next fanotify 7/7] selftests/bpf: Add test for BPF based fanotify fastpath handler
2024-11-14 20:14 ` Alexei Starovoitov
@ 2024-11-14 23:02 ` Song Liu
2024-11-15 0:41 ` Alexei Starovoitov
2024-11-15 7:26 ` Amir Goldstein
1 sibling, 1 reply; 26+ messages in thread
From: Song Liu @ 2024-11-14 23:02 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: Song Liu, bpf, Linux-Fsdevel, LKML, LSM List, Kernel Team,
Andrii Nakryiko, Eddy Z, Alexei Starovoitov, Daniel Borkmann,
Martin KaFai Lau, Alexander Viro, Christian Brauner, Jan Kara,
KP Singh, Matt Bobrowski, Amir Goldstein, repnop@google.com,
Jeff Layton, Josef Bacik, Mickaël Salaün,
gnoack@google.com
> On Nov 14, 2024, at 12:14 PM, Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
>
> On Thu, Nov 14, 2024 at 12:44 AM Song Liu <song@kernel.org> wrote:
>>
>> +
>> + if (bpf_is_subdir(dentry, v->dentry))
>> + ret = FAN_FP_RET_SEND_TO_USERSPACE;
>> + else
>> + ret = FAN_FP_RET_SKIP_EVENT;
>
> It seems to me that all these patches and feature additions
> to fanotify, new kfuncs, etc are done just to do the above
> filtering by subdir ?
>
> If so, just hard code this logic as an extra flag to fanotify ?
> So it can filter all events by subdir.
> bpf programmability makes sense when it needs to express
> user space policy. Here it's just a filter by subdir.
> bpf hammer doesn't look like the right tool for this use case.
Current version is indeed tailored towards the subtree
monitoring use case. This is mostly because feedback on v1
mostly focused on this use case. V1 itself actually had some
other use cases.
In practice, fanotify fastpath can benefit from bpf
programmability. For example, with bpf programmability, we
can combine fanotify and BPF LSM in some security use cases.
If some security rules only applies to a few files, a
directory, or a subtree, we can use fanotify to only monitor
these files. LSM hooks, such as security_file_open(), are
always global. The overhead is higher if we are only
interested in a few files.
Does this make sense?
Thanks,
Song
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [RFC/PATCH v2 bpf-next fanotify 7/7] selftests/bpf: Add test for BPF based fanotify fastpath handler
2024-11-14 23:02 ` Song Liu
@ 2024-11-15 0:41 ` Alexei Starovoitov
2024-11-15 1:10 ` Song Liu
0 siblings, 1 reply; 26+ messages in thread
From: Alexei Starovoitov @ 2024-11-15 0:41 UTC (permalink / raw)
To: Song Liu
Cc: Song Liu, bpf, Linux-Fsdevel, LKML, LSM List, Kernel Team,
Andrii Nakryiko, Eddy Z, Alexei Starovoitov, Daniel Borkmann,
Martin KaFai Lau, Alexander Viro, Christian Brauner, Jan Kara,
KP Singh, Matt Bobrowski, Amir Goldstein, repnop@google.com,
Jeff Layton, Josef Bacik, Mickaël Salaün,
gnoack@google.com
On Thu, Nov 14, 2024 at 3:02 PM Song Liu <songliubraving@meta.com> wrote:
>
>
>
> > On Nov 14, 2024, at 12:14 PM, Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> >
> > On Thu, Nov 14, 2024 at 12:44 AM Song Liu <song@kernel.org> wrote:
> >>
> >> +
> >> + if (bpf_is_subdir(dentry, v->dentry))
> >> + ret = FAN_FP_RET_SEND_TO_USERSPACE;
> >> + else
> >> + ret = FAN_FP_RET_SKIP_EVENT;
> >
> > It seems to me that all these patches and feature additions
> > to fanotify, new kfuncs, etc are done just to do the above
> > filtering by subdir ?
> >
> > If so, just hard code this logic as an extra flag to fanotify ?
> > So it can filter all events by subdir.
> > bpf programmability makes sense when it needs to express
> > user space policy. Here it's just a filter by subdir.
> > bpf hammer doesn't look like the right tool for this use case.
>
> Current version is indeed tailored towards the subtree
> monitoring use case. This is mostly because feedback on v1
> mostly focused on this use case. V1 itself actually had some
> other use cases.
like?
> In practice, fanotify fastpath can benefit from bpf
> programmability. For example, with bpf programmability, we
> can combine fanotify and BPF LSM in some security use cases.
> If some security rules only applies to a few files, a
> directory, or a subtree, we can use fanotify to only monitor
> these files. LSM hooks, such as security_file_open(), are
> always global. The overhead is higher if we are only
> interested in a few files.
>
> Does this make sense?
Not yet.
This fanotify bpf filtering only reduces the number of events
sent to user space.
How is it supposed to interact with bpf-lsm?
Say, security policy applies to /usr/bin/*
so lsm suppose to act on all files and subdirs in there.
How fanotify helps ?
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [RFC/PATCH v2 bpf-next fanotify 7/7] selftests/bpf: Add test for BPF based fanotify fastpath handler
2024-11-15 0:41 ` Alexei Starovoitov
@ 2024-11-15 1:10 ` Song Liu
2024-11-15 1:31 ` Alexei Starovoitov
0 siblings, 1 reply; 26+ messages in thread
From: Song Liu @ 2024-11-15 1:10 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: Song Liu, Song Liu, bpf, Linux-Fsdevel, LKML, LSM List,
Kernel Team, Andrii Nakryiko, Eddy Z, Alexei Starovoitov,
Daniel Borkmann, Martin KaFai Lau, Alexander Viro,
Christian Brauner, Jan Kara, KP Singh, Matt Bobrowski,
Amir Goldstein, repnop@google.com, Jeff Layton, Josef Bacik,
Mickaël Salaün, gnoack@google.com
> On Nov 14, 2024, at 4:41 PM, Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
>
> On Thu, Nov 14, 2024 at 3:02 PM Song Liu <songliubraving@meta.com> wrote:
>>
>>
>>
>>> On Nov 14, 2024, at 12:14 PM, Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
>>>
>>> On Thu, Nov 14, 2024 at 12:44 AM Song Liu <song@kernel.org> wrote:
>>>>
>>>> +
>>>> + if (bpf_is_subdir(dentry, v->dentry))
>>>> + ret = FAN_FP_RET_SEND_TO_USERSPACE;
>>>> + else
>>>> + ret = FAN_FP_RET_SKIP_EVENT;
>>>
>>> It seems to me that all these patches and feature additions
>>> to fanotify, new kfuncs, etc are done just to do the above
>>> filtering by subdir ?
>>>
>>> If so, just hard code this logic as an extra flag to fanotify ?
>>> So it can filter all events by subdir.
>>> bpf programmability makes sense when it needs to express
>>> user space policy. Here it's just a filter by subdir.
>>> bpf hammer doesn't look like the right tool for this use case.
>>
>> Current version is indeed tailored towards the subtree
>> monitoring use case. This is mostly because feedback on v1
>> mostly focused on this use case. V1 itself actually had some
>> other use cases.
>
> like?
samples/fanotify in v1 shows pattern that matches file prefix
(no BPF). selftests/bpf in v1 shows a pattern where we
propagate a flag in inode local storage from parent directory
to newly created children directory.
>
>> In practice, fanotify fastpath can benefit from bpf
>> programmability. For example, with bpf programmability, we
>> can combine fanotify and BPF LSM in some security use cases.
>> If some security rules only applies to a few files, a
>> directory, or a subtree, we can use fanotify to only monitor
>> these files. LSM hooks, such as security_file_open(), are
>> always global. The overhead is higher if we are only
>> interested in a few files.
>>
>> Does this make sense?
>
> Not yet.
> This fanotify bpf filtering only reduces the number of events
> sent to user space.
> How is it supposed to interact with bpf-lsm?
Ah, I didn't explain this part. fanotify+bpf fastpath can
do more reducing number of events sent to user space. It can
also be used in tracing use cases. For example, we can
implement a filetop tool that only monitors a specific
directory, a specific device, or a specific mount point.
It can also reject some file access (fanotify permission
mode). I should have showed these features in a sample
and/or selftest.
> Say, security policy applies to /usr/bin/*
> so lsm suppose to act on all files and subdirs in there.
> How fanotify helps ?
LSM hooks are always global. It is up to the BPF program
to filter out irrelevant events. This filtering is
sometimes expensive (match d_patch) and inaccurate
(maintain a map of target inodes, etc.). OTOH, fanotify
has built-in filtering before the BPF program triggers.
When multiple BPF programs are monitoring open() for
different subdirectories, fanotify based solution will
not trigger all these BPF programs for all the open()
in the system.
Does this answer the questions?
Thanks,
Song
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [RFC/PATCH v2 bpf-next fanotify 7/7] selftests/bpf: Add test for BPF based fanotify fastpath handler
2024-11-15 1:10 ` Song Liu
@ 2024-11-15 1:31 ` Alexei Starovoitov
2024-11-15 7:01 ` Song Liu
0 siblings, 1 reply; 26+ messages in thread
From: Alexei Starovoitov @ 2024-11-15 1:31 UTC (permalink / raw)
To: Song Liu
Cc: Song Liu, bpf, Linux-Fsdevel, LKML, LSM List, Kernel Team,
Andrii Nakryiko, Eddy Z, Alexei Starovoitov, Daniel Borkmann,
Martin KaFai Lau, Alexander Viro, Christian Brauner, Jan Kara,
KP Singh, Matt Bobrowski, Amir Goldstein, repnop@google.com,
Jeff Layton, Josef Bacik, Mickaël Salaün,
gnoack@google.com
On Thu, Nov 14, 2024 at 5:10 PM Song Liu <songliubraving@meta.com> wrote:
>
>
>
> > On Nov 14, 2024, at 4:41 PM, Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> >
> > On Thu, Nov 14, 2024 at 3:02 PM Song Liu <songliubraving@meta.com> wrote:
> >>
> >>
> >>
> >>> On Nov 14, 2024, at 12:14 PM, Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> >>>
> >>> On Thu, Nov 14, 2024 at 12:44 AM Song Liu <song@kernel.org> wrote:
> >>>>
> >>>> +
> >>>> + if (bpf_is_subdir(dentry, v->dentry))
> >>>> + ret = FAN_FP_RET_SEND_TO_USERSPACE;
> >>>> + else
> >>>> + ret = FAN_FP_RET_SKIP_EVENT;
> >>>
> >>> It seems to me that all these patches and feature additions
> >>> to fanotify, new kfuncs, etc are done just to do the above
> >>> filtering by subdir ?
> >>>
> >>> If so, just hard code this logic as an extra flag to fanotify ?
> >>> So it can filter all events by subdir.
> >>> bpf programmability makes sense when it needs to express
> >>> user space policy. Here it's just a filter by subdir.
> >>> bpf hammer doesn't look like the right tool for this use case.
> >>
> >> Current version is indeed tailored towards the subtree
> >> monitoring use case. This is mostly because feedback on v1
> >> mostly focused on this use case. V1 itself actually had some
> >> other use cases.
> >
> > like?
>
> samples/fanotify in v1 shows pattern that matches file prefix
> (no BPF). selftests/bpf in v1 shows a pattern where we
> propagate a flag in inode local storage from parent directory
> to newly created children directory.
>
> >
> >> In practice, fanotify fastpath can benefit from bpf
> >> programmability. For example, with bpf programmability, we
> >> can combine fanotify and BPF LSM in some security use cases.
> >> If some security rules only applies to a few files, a
> >> directory, or a subtree, we can use fanotify to only monitor
> >> these files. LSM hooks, such as security_file_open(), are
> >> always global. The overhead is higher if we are only
> >> interested in a few files.
> >>
> >> Does this make sense?
> >
> > Not yet.
> > This fanotify bpf filtering only reduces the number of events
> > sent to user space.
> > How is it supposed to interact with bpf-lsm?
>
> Ah, I didn't explain this part. fanotify+bpf fastpath can
> do more reducing number of events sent to user space. It can
> also be used in tracing use cases. For example, we can
> implement a filetop tool that only monitors a specific
> directory, a specific device, or a specific mount point.
> It can also reject some file access (fanotify permission
> mode). I should have showed these features in a sample
> and/or selftest.
I bet bpf tracing can filetop already.
Not as efficient, but tracing isn't going to be running 24/7.
>
> > Say, security policy applies to /usr/bin/*
> > so lsm suppose to act on all files and subdirs in there.
> > How fanotify helps ?
>
> LSM hooks are always global. It is up to the BPF program
> to filter out irrelevant events. This filtering is
> sometimes expensive (match d_patch) and inaccurate
> (maintain a map of target inodes, etc.). OTOH, fanotify
> has built-in filtering before the BPF program triggers.
> When multiple BPF programs are monitoring open() for
> different subdirectories, fanotify based solution will
> not trigger all these BPF programs for all the open()
> in the system.
>
> Does this answer the questions?
No. Above is too much hand waving.
I think bpf-lsm hook fires before fanotify, so bpf-lsm prog
implementing some security policy has to decide right
at the moment what to do with, say, security_file_open().
fanotify with or without bpf fastpath is too late.
In general fanotify is not for security. It's notifying
user space of events that already happened, so I don't see
how these two can be combined.
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [RFC/PATCH v2 bpf-next fanotify 7/7] selftests/bpf: Add test for BPF based fanotify fastpath handler
2024-11-15 1:31 ` Alexei Starovoitov
@ 2024-11-15 7:01 ` Song Liu
2024-11-15 19:41 ` Alexei Starovoitov
0 siblings, 1 reply; 26+ messages in thread
From: Song Liu @ 2024-11-15 7:01 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: Song Liu, Song Liu, bpf, Linux-Fsdevel, LKML, LSM List,
Kernel Team, Andrii Nakryiko, Eddy Z, Alexei Starovoitov,
Daniel Borkmann, Martin KaFai Lau, Alexander Viro,
Christian Brauner, Jan Kara, KP Singh, Matt Bobrowski,
Amir Goldstein, repnop@google.com, Jeff Layton, Josef Bacik,
Mickaël Salaün, gnoack@google.com
> On Nov 14, 2024, at 5:31 PM, Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
[...]
>>>
>>> Not yet.
>>> This fanotify bpf filtering only reduces the number of events
>>> sent to user space.
>>> How is it supposed to interact with bpf-lsm?
>>
>> Ah, I didn't explain this part. fanotify+bpf fastpath can
>> do more reducing number of events sent to user space. It can
>> also be used in tracing use cases. For example, we can
>> implement a filetop tool that only monitors a specific
>> directory, a specific device, or a specific mount point.
>> It can also reject some file access (fanotify permission
>> mode). I should have showed these features in a sample
>> and/or selftest.
>
> I bet bpf tracing can filetop already.
> Not as efficient, but tracing isn't going to be running 24/7.
>
>>
>>> Say, security policy applies to /usr/bin/*
>>> so lsm suppose to act on all files and subdirs in there.
>>> How fanotify helps ?
>>
>> LSM hooks are always global. It is up to the BPF program
>> to filter out irrelevant events. This filtering is
>> sometimes expensive (match d_patch) and inaccurate
>> (maintain a map of target inodes, etc.). OTOH, fanotify
>> has built-in filtering before the BPF program triggers.
>> When multiple BPF programs are monitoring open() for
>> different subdirectories, fanotify based solution will
>> not trigger all these BPF programs for all the open()
>> in the system.
>>
>> Does this answer the questions?
>
> No. Above is too much hand waving.
>
> I think bpf-lsm hook fires before fanotify, so bpf-lsm prog
> implementing some security policy has to decide right
> at the moment what to do with, say, security_file_open().
> fanotify with or without bpf fastpath is too late.
Actually, fanotify in permission mode can stop a file open.
In current upstream code, fsnotify hook fsnotify_open_perm
is actually part of security_file_open(). It will be moved
to do_dentry_open(), right after security_file_open(). This
move is done by 1cda52f1b461 in linux-next.
In practice, we are not likely to use BPF LSM and fanotify
on the same hook at the same time. Instead, we can use
BPF LSM hooks to gather information and use fanotify to
make allow/deny decision, or vice versa.
> In general fanotify is not for security. It's notifying
> user space of events that already happened, so I don't see
> how these two can be combined.
fanotify is actually used by AntiVirus softwares. For
example, CalmAV (https://www.clamav.net/) uses fanotify
for its Linux version (it also supports Window and MacOS).
I guess I didn't state the motivation clearly. So let me
try it now.
Tracing is a critical part of a security solution. With
LSM, blocking an operation is straightforward. However,
knowing which operation should be blocked is not always
easy. Also, security hooks (LSM or fanotify) sit in the
critical path of user requests. It is very important to
optimize the latency of a security hook. Ideally, the
tracing logic should gather all the information ahead
of time, and make the actual hook fast.
For example, if security_file_open() only needs to read
a flag from inode local storage, the overhead is minimal
and predictable. If security_file_open() has to walk the
dentry tree, or call d_path(), the overhead will be
much higher. fanotify_file_perm() provides another
level of optimization over security_file_open(). If a
file is not being monitored, fanotify will not generate
the event.
Security solutions hold higher bars for the tracing logic:
- It needs to be accurate, as false positives and false
negatives can be very annoying and/or harmful.
- It needs to be efficient, as security daemons run 24/7.
Given these requirements of security solutions, I believe
it is important to optimize tracing logic as much as
possible. And BPF based fanotify fastpath handler can
bring non-trivials benefit to BPF based security solutions.
fanotify also has a feature that LSM doesn't provide.
When a file is accessed, user space daemon can get a
fd on this file from fanotify. OTOH, LSM can only send
an ino or a path to user space, which is not always
reliable.
I hope this starts to make sense...
Thanks,
Song
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [RFC/PATCH v2 bpf-next fanotify 7/7] selftests/bpf: Add test for BPF based fanotify fastpath handler
2024-11-14 20:14 ` Alexei Starovoitov
2024-11-14 23:02 ` Song Liu
@ 2024-11-15 7:26 ` Amir Goldstein
2024-11-15 20:04 ` Alexei Starovoitov
1 sibling, 1 reply; 26+ messages in thread
From: Amir Goldstein @ 2024-11-15 7:26 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: Song Liu, bpf, Linux-Fsdevel, LKML, LSM List, Kernel Team,
Andrii Nakryiko, Eddy Z, Alexei Starovoitov, Daniel Borkmann,
Martin KaFai Lau, Alexander Viro, Christian Brauner, Jan Kara,
KP Singh, Matt Bobrowski, repnop, Jeff Layton, Josef Bacik,
Mickaël Salaün, gnoack
On Thu, Nov 14, 2024 at 9:14 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Thu, Nov 14, 2024 at 12:44 AM Song Liu <song@kernel.org> wrote:
> >
> > +
> > + if (bpf_is_subdir(dentry, v->dentry))
> > + ret = FAN_FP_RET_SEND_TO_USERSPACE;
> > + else
> > + ret = FAN_FP_RET_SKIP_EVENT;
>
> It seems to me that all these patches and feature additions
> to fanotify, new kfuncs, etc are done just to do the above
> filtering by subdir ?
>
> If so, just hard code this logic as an extra flag to fanotify ?
> So it can filter all events by subdir.
> bpf programmability makes sense when it needs to express
> user space policy. Here it's just a filter by subdir.
> bpf hammer doesn't look like the right tool for this use case.
Good question.
Speaking as someone who has made several attempts to design
efficient subtree filtering in fanotify, it is not as easy as it sounds.
I recently implemented a method that could be used for "practical"
subdir filtering in userspace, not before Jan has questioned if we
should go directly to subtree filtering with bpf [1].
This is not the only filter that was proposed for fanotify, where bpf
filter came as an alternative proposal [2], but subtree filtering is by far
the most wanted filter.
The problem with implementing a naive is_subtree() filter in fanotify
is the unbounded cost to be paid by every user for every fs access
when M such filters are installed deep in the fs tree.
Making this more efficient then becomes a matter of trading of
memory (inode/path cache size) and performance and depends
on the size and depth of the watched filesystem.
This engineering decision *is* the userspace policy that can be
expressed by a bpf program.
As you may know, Linux is lagging behind Win and MacOS w.r.t
subtree filtering for fs events.
MacOS/FreeBSD took the userspace approach with fseventsd [3].
If you Google "fseventsd", you will get results with "High CPU and
Memory Usage" for as far as the browser can scroll.
I hope the bpf-aided early subtree filtering technology would be
able to reduce some of this overhead to facilitate a better engineering
solution, but that remains to be proven...
Thanks,
Amir.
[1] https://lore.kernel.org/linux-fsdevel/20220228140556.ae5rhgqsyzm5djbp@quack3.lan/
[2] https://lore.kernel.org/linux-fsdevel/20200828084603.GA7072@quack2.suse.cz/
[3] https://developer.apple.com/library/archive/documentation/Darwin/Conceptual/FSEvents_ProgGuide/TechnologyOverview/TechnologyOverview.html#//apple_ref/doc/uid/TP40005289-CH3-SW1
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [RFC/PATCH v2 bpf-next fanotify 1/7] fanotify: Introduce fanotify fastpath handler
2024-11-14 8:43 ` [RFC/PATCH v2 bpf-next fanotify 1/7] fanotify: Introduce fanotify " Song Liu
@ 2024-11-15 8:51 ` Amir Goldstein
2024-11-15 17:11 ` Song Liu
0 siblings, 1 reply; 26+ messages in thread
From: Amir Goldstein @ 2024-11-15 8:51 UTC (permalink / raw)
To: Song Liu
Cc: bpf, linux-fsdevel, linux-kernel, linux-security-module,
kernel-team, andrii, eddyz87, ast, daniel, martin.lau, viro,
brauner, jack, kpsingh, mattbobrowski, repnop, jlayton, josef,
mic, gnoack
On Thu, Nov 14, 2024 at 9:44 AM Song Liu <song@kernel.org> wrote:
>
> fanotify fastpath handler enables handling fanotify events within the
> kernel, and thus saves a trip to the user space. fanotify fastpath handler
> can be useful in many use cases. For example, if a user is only interested
> in events for some files in side a directory, a fastpath handler can be
> used to filter out irrelevant events.
>
> fanotify fastpath handler is attached to fsnotify_group. At most one
> fastpath handler can be attached to a fsnotify_group. The attach/detach
> of fastpath handlers are controlled by two new ioctls on the fanotify fds:
> FAN_IOC_ADD_FP and FAN_IOC_DEL_FP.
>
> fanotify fastpath handler is packaged in a kernel module. In the future,
> it is also possible to package fastpath handler in a BPF program. Since
> loading modules requires CAP_SYS_ADMIN, _loading_ fanotify fastpath
> handler in kernel modules is limited to CAP_SYS_ADMIN. However,
> non-SYS_CAP_ADMIN users can _attach_ fastpath handler loaded by sys admin
> to their fanotify fds. To make fanotify fastpath handler more useful
> for non-CAP_SYS_ADMIN users, a fastpath handler can take arguments at
> attach time.
>
> sysfs entry /sys/kernel/fanotify_fastpath is added to help users know
> which fastpath handlers are available. At the moment, files are added for
> each fastpath handler: flags, desc, and init_args.
>
> Signed-off-by: Song Liu <song@kernel.org>
> ---
> fs/notify/fanotify/Kconfig | 13 ++
> fs/notify/fanotify/Makefile | 1 +
> fs/notify/fanotify/fanotify.c | 29 +++
> fs/notify/fanotify/fanotify_fastpath.c | 282 +++++++++++++++++++++++++
> fs/notify/fanotify/fanotify_user.c | 7 +
> include/linux/fanotify.h | 131 ++++++++++++
> include/linux/fsnotify_backend.h | 4 +
> include/uapi/linux/fanotify.h | 25 +++
> 8 files changed, 492 insertions(+)
> create mode 100644 fs/notify/fanotify/fanotify_fastpath.c
>
> diff --git a/fs/notify/fanotify/Kconfig b/fs/notify/fanotify/Kconfig
> index 0e36aaf379b7..74677d3699a3 100644
> --- a/fs/notify/fanotify/Kconfig
> +++ b/fs/notify/fanotify/Kconfig
> @@ -24,3 +24,16 @@ config FANOTIFY_ACCESS_PERMISSIONS
> hierarchical storage management systems.
>
> If unsure, say N.
> +
> +config FANOTIFY_FASTPATH
> + bool "fanotify fastpath handler"
> + depends on FANOTIFY
> + default y
> + help
> + Say Y here if you want to use fanotify in kernel fastpath handler.
> + The fastpath handler can be implemented in a kernel module or a
> + BPF program. The fastpath handler can speed up fanotify in many
> + use cases. For example, when the listener is only interested in
> + a subset of events.
> +
> + If unsure, say Y.
> \ No newline at end of file
> diff --git a/fs/notify/fanotify/Makefile b/fs/notify/fanotify/Makefile
> index 25ef222915e5..543cb7aa08fc 100644
> --- a/fs/notify/fanotify/Makefile
> +++ b/fs/notify/fanotify/Makefile
> @@ -1,2 +1,3 @@
> # SPDX-License-Identifier: GPL-2.0-only
> obj-$(CONFIG_FANOTIFY) += fanotify.o fanotify_user.o
> +obj-$(CONFIG_FANOTIFY_FASTPATH) += fanotify_fastpath.o
> diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c
> index 224bccaab4cc..b395b628a58b 100644
> --- a/fs/notify/fanotify/fanotify.c
> +++ b/fs/notify/fanotify/fanotify.c
> @@ -18,6 +18,8 @@
>
> #include "fanotify.h"
>
> +extern struct srcu_struct fsnotify_mark_srcu;
> +
> static bool fanotify_path_equal(const struct path *p1, const struct path *p2)
> {
> return p1->mnt == p2->mnt && p1->dentry == p2->dentry;
> @@ -888,6 +890,7 @@ static int fanotify_handle_event(struct fsnotify_group *group, u32 mask,
> struct fsnotify_event *fsn_event;
> __kernel_fsid_t fsid = {};
> u32 match_mask = 0;
> + struct fanotify_fastpath_hook *fp_hook __maybe_unused;
>
> BUILD_BUG_ON(FAN_ACCESS != FS_ACCESS);
> BUILD_BUG_ON(FAN_MODIFY != FS_MODIFY);
> @@ -933,6 +936,27 @@ static int fanotify_handle_event(struct fsnotify_group *group, u32 mask,
> if (FAN_GROUP_FLAG(group, FANOTIFY_FID_BITS))
> fsid = fanotify_get_fsid(iter_info);
>
> +#ifdef CONFIG_FANOTIFY_FASTPATH
> + fp_hook = srcu_dereference(group->fanotify_data.fp_hook, &fsnotify_mark_srcu);
> + if (fp_hook) {
> + struct fanotify_fastpath_event fp_event = {
> + .mask = mask,
> + .data = data,
> + .data_type = data_type,
> + .dir = dir,
> + .file_name = file_name,
> + .fsid = &fsid,
> + .match_mask = match_mask,
> + };
> +
> + ret = fp_hook->ops->fp_handler(group, fp_hook, &fp_event);
> + if (ret == FAN_FP_RET_SKIP_EVENT) {
> + ret = 0;
> + goto finish;
> + }
> + }
> +#endif
> +
To me it makes sense that the fastpath module could also return a negative
(deny) result for permission events.
Is there a specific reason that you did not handle this or just didn't think
of this option?
Thanks,
Amir.
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [RFC/PATCH v2 bpf-next fanotify 1/7] fanotify: Introduce fanotify fastpath handler
2024-11-15 8:51 ` Amir Goldstein
@ 2024-11-15 17:11 ` Song Liu
2024-11-15 17:32 ` Amir Goldstein
0 siblings, 1 reply; 26+ messages in thread
From: Song Liu @ 2024-11-15 17:11 UTC (permalink / raw)
To: Amir Goldstein
Cc: Song Liu, bpf@vger.kernel.org, linux-fsdevel@vger.kernel.org,
linux-kernel@vger.kernel.org,
linux-security-module@vger.kernel.org, Kernel Team,
andrii@kernel.org, eddyz87@gmail.com, ast@kernel.org,
daniel@iogearbox.net, martin.lau@linux.dev,
viro@zeniv.linux.org.uk, brauner@kernel.org, jack@suse.cz,
kpsingh@kernel.org, mattbobrowski@google.com, repnop@google.com,
jlayton@kernel.org, Josef Bacik, mic@digikod.net,
gnoack@google.com
Hi Amir,
> On Nov 15, 2024, at 12:51 AM, Amir Goldstein <amir73il@gmail.com> wrote:
[...]
>>
>> +#ifdef CONFIG_FANOTIFY_FASTPATH
>> + fp_hook = srcu_dereference(group->fanotify_data.fp_hook, &fsnotify_mark_srcu);
>> + if (fp_hook) {
>> + struct fanotify_fastpath_event fp_event = {
>> + .mask = mask,
>> + .data = data,
>> + .data_type = data_type,
>> + .dir = dir,
>> + .file_name = file_name,
>> + .fsid = &fsid,
>> + .match_mask = match_mask,
>> + };
>> +
>> + ret = fp_hook->ops->fp_handler(group, fp_hook, &fp_event);
>> + if (ret == FAN_FP_RET_SKIP_EVENT) {
>> + ret = 0;
>> + goto finish;
>> + }
>> + }
>> +#endif
>> +
>
> To me it makes sense that the fastpath module could also return a negative
> (deny) result for permission events.
Yes, this should just work. And I actually plan to use it.
> Is there a specific reason that you did not handle this or just didn't think
> of this option?
But I haven't tested permission events yet. At first glance, maybe we just
need to change the above code a bit, as:
>> f (ret == FAN_FP_RET_SKIP_EVENT) {
>> + ret = 0;
>> + goto finish;
>> + }
if (ret != FAN_FP_RET_SEND_TO_USERSPACE) {
if (ret == FAN_FP_RET_SKIP_EVENT)
ret = 0;
goto finish;
}
Well, I guess we should change the value of FAN_FP_RET_SEND_TO_USERSPACE,
so that this condition will look better.
We may also consider reorder the code so that we do not call
fsnotify_prepare_user_wait() when the fastpath handles the event.
Does this look reasonable?
Thanks,
Song
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [RFC/PATCH v2 bpf-next fanotify 1/7] fanotify: Introduce fanotify fastpath handler
2024-11-15 17:11 ` Song Liu
@ 2024-11-15 17:32 ` Amir Goldstein
0 siblings, 0 replies; 26+ messages in thread
From: Amir Goldstein @ 2024-11-15 17:32 UTC (permalink / raw)
To: Song Liu
Cc: Song Liu, bpf@vger.kernel.org, linux-fsdevel@vger.kernel.org,
linux-kernel@vger.kernel.org,
linux-security-module@vger.kernel.org, Kernel Team,
andrii@kernel.org, eddyz87@gmail.com, ast@kernel.org,
daniel@iogearbox.net, martin.lau@linux.dev,
viro@zeniv.linux.org.uk, brauner@kernel.org, jack@suse.cz,
kpsingh@kernel.org, mattbobrowski@google.com, repnop@google.com,
jlayton@kernel.org, Josef Bacik, mic@digikod.net,
gnoack@google.com
On Fri, Nov 15, 2024 at 6:11 PM Song Liu <songliubraving@meta.com> wrote:
>
> Hi Amir,
>
> > On Nov 15, 2024, at 12:51 AM, Amir Goldstein <amir73il@gmail.com> wrote:
>
> [...]
>
> >>
> >> +#ifdef CONFIG_FANOTIFY_FASTPATH
> >> + fp_hook = srcu_dereference(group->fanotify_data.fp_hook, &fsnotify_mark_srcu);
> >> + if (fp_hook) {
> >> + struct fanotify_fastpath_event fp_event = {
> >> + .mask = mask,
> >> + .data = data,
> >> + .data_type = data_type,
> >> + .dir = dir,
> >> + .file_name = file_name,
> >> + .fsid = &fsid,
> >> + .match_mask = match_mask,
> >> + };
> >> +
> >> + ret = fp_hook->ops->fp_handler(group, fp_hook, &fp_event);
> >> + if (ret == FAN_FP_RET_SKIP_EVENT) {
> >> + ret = 0;
> >> + goto finish;
> >> + }
> >> + }
> >> +#endif
> >> +
> >
> > To me it makes sense that the fastpath module could also return a negative
> > (deny) result for permission events.
>
> Yes, this should just work. And I actually plan to use it.
>
> > Is there a specific reason that you did not handle this or just didn't think
> > of this option?
>
> But I haven't tested permission events yet. At first glance, maybe we just
> need to change the above code a bit, as:
>
>
> >> f (ret == FAN_FP_RET_SKIP_EVENT) {
> >> + ret = 0;
> >> + goto finish;
> >> + }
>
> if (ret != FAN_FP_RET_SEND_TO_USERSPACE) {
> if (ret == FAN_FP_RET_SKIP_EVENT)
> ret = 0;
> goto finish;
> }
>
> Well, I guess we should change the value of FAN_FP_RET_SEND_TO_USERSPACE,
> so that this condition will look better.
>
> We may also consider reorder the code so that we do not call
> fsnotify_prepare_user_wait() when the fastpath handles the event.
>
> Does this look reasonable?
Yes.
Thanks,
Amir.
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [RFC/PATCH v2 bpf-next fanotify 7/7] selftests/bpf: Add test for BPF based fanotify fastpath handler
2024-11-15 7:01 ` Song Liu
@ 2024-11-15 19:41 ` Alexei Starovoitov
2024-11-15 21:05 ` Song Liu
0 siblings, 1 reply; 26+ messages in thread
From: Alexei Starovoitov @ 2024-11-15 19:41 UTC (permalink / raw)
To: Song Liu
Cc: Song Liu, bpf, Linux-Fsdevel, LKML, LSM List, Kernel Team,
Andrii Nakryiko, Eddy Z, Alexei Starovoitov, Daniel Borkmann,
Martin KaFai Lau, Alexander Viro, Christian Brauner, Jan Kara,
KP Singh, Matt Bobrowski, Amir Goldstein, repnop@google.com,
Jeff Layton, Josef Bacik, Mickaël Salaün,
gnoack@google.com
On Thu, Nov 14, 2024 at 11:01 PM Song Liu <songliubraving@meta.com> wrote:
>
> >
> > I think bpf-lsm hook fires before fanotify, so bpf-lsm prog
> > implementing some security policy has to decide right
> > at the moment what to do with, say, security_file_open().
> > fanotify with or without bpf fastpath is too late.
>
> Actually, fanotify in permission mode can stop a file open.
The proposed patch 1 did:
+/* Return value of fp_handler */
+enum fanotify_fastpath_return {
+ /* The event should be sent to user space */
+ FAN_FP_RET_SEND_TO_USERSPACE = 0,
+ /* The event should NOT be sent to user space */
+ FAN_FP_RET_SKIP_EVENT = 1,
+};
It looked like a read-only notification to user space
where bpf prog is merely a filter.
> In current upstream code, fsnotify hook fsnotify_open_perm
> is actually part of security_file_open(). It will be moved
> to do_dentry_open(), right after security_file_open(). This
> move is done by 1cda52f1b461 in linux-next.
Separating fsnotify from LSM makes sense.
> In practice, we are not likely to use BPF LSM and fanotify
> on the same hook at the same time. Instead, we can use
> BPF LSM hooks to gather information and use fanotify to
> make allow/deny decision, or vice versa.
Pick one.
If the proposal is changing to let fsnotify-bpf prog to deny
file_open then it's a completely different discussion.
In such a case make it clear upfront that fsnotify will
rely on CONFIG_FANOTIFY_ACCESS_PERMISSIONS and
bpf-lsm part of file access will not be used,
since interaction of two callbacks at file_open makes little sense.
> > In general fanotify is not for security. It's notifying
> > user space of events that already happened, so I don't see
> > how these two can be combined.
>
> fanotify is actually used by AntiVirus softwares. For
> example, CalmAV (https://www.clamav.net/) uses fanotify
> for its Linux version (it also supports Window and MacOS).
It's relying on user space to send back FANOTIFY_PERM_EVENTS ?
fsnotify_open_perm->fsnotify->send_to_group->fanotify_handle_event.
is a pretty long path to call bpf prog and
preparing a giant 'struct fanotify_fastpath_event'
is not going to fast either.
If we want to accelerate that with bpf it needs to be done
sooner with negligible overhead.
>
> I guess I didn't state the motivation clearly. So let me
> try it now.
>
> Tracing is a critical part of a security solution. With
> LSM, blocking an operation is straightforward. However,
> knowing which operation should be blocked is not always
> easy. Also, security hooks (LSM or fanotify) sit in the
> critical path of user requests. It is very important to
> optimize the latency of a security hook. Ideally, the
> tracing logic should gather all the information ahead
> of time, and make the actual hook fast.
>
> For example, if security_file_open() only needs to read
> a flag from inode local storage, the overhead is minimal
> and predictable. If security_file_open() has to walk the
> dentry tree, or call d_path(), the overhead will be
> much higher. fanotify_file_perm() provides another
> level of optimization over security_file_open(). If a
> file is not being monitored, fanotify will not generate
> the event.
I agree with motivation, but don't see this in the patches.
The overhead to call into bpf prog is big.
Even if prog does nothing it's still going to be slower.
> Security solutions hold higher bars for the tracing logic:
>
> - It needs to be accurate, as false positives and false
> negatives can be very annoying and/or harmful.
> - It needs to be efficient, as security daemons run 24/7.
>
> Given these requirements of security solutions, I believe
> it is important to optimize tracing logic as much as
> possible. And BPF based fanotify fastpath handler can
> bring non-trivials benefit to BPF based security solutions.
Doing everything in the kernel is certainly faster than
going back and forth to user space,
but bpf-lsm should be able to do the same already.
Without patch 1 and only patches 4,5 that add few kfuncs,
bpf-lsm prog will be able to remember subtree dentry and
do the same is_subdir() to deny.
The patch 7 stays pretty much as-is. All in bpf-lsm.
Close to zero overhead without long chain of fsnotify callbacks.
> fanotify also has a feature that LSM doesn't provide.
> When a file is accessed, user space daemon can get a
> fd on this file from fanotify. OTOH, LSM can only send
> an ino or a path to user space, which is not always
> reliable.
That sounds useful, but we're mixing too many things.
If user space cares about fd it will be using the existing
mechanism with all accompanied overhead. fsnotify-bpf can
barely accelerate anything, since user space makes
ultimate decisions.
If user space is not in the driving seat then existing bpf-lsm
plus few kfuncs to remember dentry and call is_subdir()
will do the job and no need for patch 1.
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [RFC/PATCH v2 bpf-next fanotify 7/7] selftests/bpf: Add test for BPF based fanotify fastpath handler
2024-11-15 7:26 ` Amir Goldstein
@ 2024-11-15 20:04 ` Alexei Starovoitov
0 siblings, 0 replies; 26+ messages in thread
From: Alexei Starovoitov @ 2024-11-15 20:04 UTC (permalink / raw)
To: Amir Goldstein
Cc: Song Liu, bpf, Linux-Fsdevel, LKML, LSM List, Kernel Team,
Andrii Nakryiko, Eddy Z, Alexei Starovoitov, Daniel Borkmann,
Martin KaFai Lau, Alexander Viro, Christian Brauner, Jan Kara,
KP Singh, Matt Bobrowski, repnop@google.com, Jeff Layton,
Josef Bacik, Mickaël Salaün, gnoack@google.com
On Thu, Nov 14, 2024 at 11:26 PM Amir Goldstein <amir73il@gmail.com> wrote:
>
> On Thu, Nov 14, 2024 at 9:14 PM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Thu, Nov 14, 2024 at 12:44 AM Song Liu <song@kernel.org> wrote:
> > >
> > > +
> > > + if (bpf_is_subdir(dentry, v->dentry))
> > > + ret = FAN_FP_RET_SEND_TO_USERSPACE;
> > > + else
> > > + ret = FAN_FP_RET_SKIP_EVENT;
> >
> > It seems to me that all these patches and feature additions
> > to fanotify, new kfuncs, etc are done just to do the above
> > filtering by subdir ?
> >
> > If so, just hard code this logic as an extra flag to fanotify ?
> > So it can filter all events by subdir.
> > bpf programmability makes sense when it needs to express
> > user space policy. Here it's just a filter by subdir.
> > bpf hammer doesn't look like the right tool for this use case.
>
> Good question.
>
> Speaking as someone who has made several attempts to design
> efficient subtree filtering in fanotify, it is not as easy as it sounds.
>
> I recently implemented a method that could be used for "practical"
> subdir filtering in userspace, not before Jan has questioned if we
> should go directly to subtree filtering with bpf [1].
>
> This is not the only filter that was proposed for fanotify, where bpf
> filter came as an alternative proposal [2], but subtree filtering is by far
> the most wanted filter.
Interesting :)
Thanks for the links. Long threads back then.
Looks like bpf filtering was proposed 2 and 4 years ago and
Song's current attempt is the first real prototype that actually
implements what was discussed for so long?
Cool. I guess it has legs then.
> The problem with implementing a naive is_subtree() filter in fanotify
> is the unbounded cost to be paid by every user for every fs access
> when M such filters are installed deep in the fs tree.
Agree that the concern is real.
I just don't see how filtering in bpf vs filtering in the kernel
is going to be any different.
is_subdir() is fast, so either kernel or bpf prog calling it
on every file_open doesn't seem like a big deal.
Are you saying that 'naive is_subdir' would call is_subdir
M times for every such filter ?
I guess it can be optimized. When these M filters are installed
the kernel can check whether they're subdirs of each other
and filter a subset of M filters with a single is_subdir() call.
Same logic can be in either bpf prog or in the kernel.
> Making this more efficient then becomes a matter of trading of
> memory (inode/path cache size) and performance and depends
> on the size and depth of the watched filesystem.
> This engineering decision *is* the userspace policy that can be
> expressed by a bpf program.
>
> As you may know, Linux is lagging behind Win and MacOS w.r.t
> subtree filtering for fs events.
>
> MacOS/FreeBSD took the userspace approach with fseventsd [3].
> If you Google "fseventsd", you will get results with "High CPU and
> Memory Usage" for as far as the browser can scroll.
Yeah. Fun.
Looking at your 2 year old attempt it seems the key part was to
make sure inodes are not pinned for filtering purpose?
How would you call is_subdir() then if parent dentry is not dget() ?
Or meaning of 'pinning' is different?
> I hope the bpf-aided early subtree filtering technology would be
> able to reduce some of this overhead to facilitate a better engineering
> solution, but that remains to be proven...
Sure. Let's explore this further.
> Thanks,
> Amir.
>
> [1] https://lore.kernel.org/linux-fsdevel/20220228140556.ae5rhgqsyzm5djbp@quack3.lan/
> [2] https://lore.kernel.org/linux-fsdevel/20200828084603.GA7072@quack2.suse.cz/
> [3] https://developer.apple.com/library/archive/documentation/Darwin/Conceptual/FSEvents_ProgGuide/TechnologyOverview/TechnologyOverview.html#//apple_ref/doc/uid/TP40005289-CH3-SW1
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [RFC/PATCH v2 bpf-next fanotify 7/7] selftests/bpf: Add test for BPF based fanotify fastpath handler
2024-11-15 19:41 ` Alexei Starovoitov
@ 2024-11-15 21:05 ` Song Liu
2024-11-18 20:51 ` Song Liu
0 siblings, 1 reply; 26+ messages in thread
From: Song Liu @ 2024-11-15 21:05 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: Song Liu, Song Liu, bpf, Linux-Fsdevel, LKML, LSM List,
Kernel Team, Andrii Nakryiko, Eddy Z, Alexei Starovoitov,
Daniel Borkmann, Martin KaFai Lau, Alexander Viro,
Christian Brauner, Jan Kara, KP Singh, Matt Bobrowski,
Amir Goldstein, repnop@google.com, Jeff Layton, Josef Bacik,
Mickaël Salaün, gnoack@google.com
> On Nov 15, 2024, at 11:41 AM, Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
>
> On Thu, Nov 14, 2024 at 11:01 PM Song Liu <songliubraving@meta.com> wrote:
>>
>>>
>>> I think bpf-lsm hook fires before fanotify, so bpf-lsm prog
>>> implementing some security policy has to decide right
>>> at the moment what to do with, say, security_file_open().
>>> fanotify with or without bpf fastpath is too late.
>>
>> Actually, fanotify in permission mode can stop a file open.
>
> The proposed patch 1 did:
>
> +/* Return value of fp_handler */
> +enum fanotify_fastpath_return {
> + /* The event should be sent to user space */
> + FAN_FP_RET_SEND_TO_USERSPACE = 0,
> + /* The event should NOT be sent to user space */
> + FAN_FP_RET_SKIP_EVENT = 1,
> +};
>
> It looked like a read-only notification to user space
> where bpf prog is merely a filter.
Yep. As Amir also pointed out, this part needs more work and
clarifications.
>
>> In current upstream code, fsnotify hook fsnotify_open_perm
>> is actually part of security_file_open(). It will be moved
>> to do_dentry_open(), right after security_file_open(). This
>> move is done by 1cda52f1b461 in linux-next.
>
> Separating fsnotify from LSM makes sense.
>
>> In practice, we are not likely to use BPF LSM and fanotify
>> on the same hook at the same time. Instead, we can use
>> BPF LSM hooks to gather information and use fanotify to
>> make allow/deny decision, or vice versa.
>
> Pick one.
> If the proposal is changing to let fsnotify-bpf prog to deny
> file_open then it's a completely different discussion.
>
> In such a case make it clear upfront that fsnotify will
> rely on CONFIG_FANOTIFY_ACCESS_PERMISSIONS and
I am not sure whether we should limit fsnotify-bpf to
only work with CONFIG_FANOTIFY_ACCESS_PERMISSIONS. I still
think it can be useful just as a filter. But I guess we can
start with this dependency.
> bpf-lsm part of file access will not be used,
> since interaction of two callbacks at file_open makes little sense.
Agreed that having two hooks on file_open doesn't make sense.
I was actually thinking about combining fsnotify-bpf with
other LSM hooks. But I agree we should start as simple as
possible.
>
>>> In general fanotify is not for security. It's notifying
>>> user space of events that already happened, so I don't see
>>> how these two can be combined.
>>
>> fanotify is actually used by AntiVirus softwares. For
>> example, CalmAV (https://urldefense.com/v3/__https://www.clamav.net/__;!!Bt8RZUm9aw!4p7JB_E8RuNLldz_TYR07jnW5kHbr4H2FMm9vLbShKOBMznXwES7Dlt5_R6B_-HMzgV3Qk_9WKlhmjHSpYHRhTb2WM3hIg$ ) uses fanotify
>> for its Linux version (it also supports Window and MacOS).
>
> It's relying on user space to send back FANOTIFY_PERM_EVENTS ?
Yes, it goes all the way to another process, and comes back.
>
> fsnotify_open_perm->fsnotify->send_to_group->fanotify_handle_event.
>
> is a pretty long path to call bpf prog and
> preparing a giant 'struct fanotify_fastpath_event'
> is not going to fast either.
>
> If we want to accelerate that with bpf it needs to be done
> sooner with negligible overhead.
Agreed. This is actually something I have been thinking
since the beginning of this work: Shall it be fanotify-bpf
or fsnotify-bpf. Given we have more materials, this is a
good time to have broader discussions on this.
@all, please chime in whether we should redo this as
fsnotify-bpf. AFAICT:
Pros of fanotify-bpf:
- There is existing user space that we can leverage/reuse.
Pros of fsnotify-bpf:
- Faster fast path.
Another major pros/cons did I miss?
>
>> I guess I didn't state the motivation clearly. So let me
>> try it now.
>>
>> Tracing is a critical part of a security solution. With
>> LSM, blocking an operation is straightforward. However,
>> knowing which operation should be blocked is not always
>> easy. Also, security hooks (LSM or fanotify) sit in the
>> critical path of user requests. It is very important to
>> optimize the latency of a security hook. Ideally, the
>> tracing logic should gather all the information ahead
>> of time, and make the actual hook fast.
>>
>> For example, if security_file_open() only needs to read
>> a flag from inode local storage, the overhead is minimal
>> and predictable. If security_file_open() has to walk the
>> dentry tree, or call d_path(), the overhead will be
>> much higher. fanotify_file_perm() provides another
>> level of optimization over security_file_open(). If a
>> file is not being monitored, fanotify will not generate
>> the event.
>
> I agree with motivation, but don't see this in the patches.
Agreed. I should definitely do better job in this.
> The overhead to call into bpf prog is big.
> Even if prog does nothing it's still going to be slower.
>
>> Security solutions hold higher bars for the tracing logic:
>>
>> - It needs to be accurate, as false positives and false
>> negatives can be very annoying and/or harmful.
>> - It needs to be efficient, as security daemons run 24/7.
>>
>> Given these requirements of security solutions, I believe
>> it is important to optimize tracing logic as much as
>> possible. And BPF based fanotify fastpath handler can
>> bring non-trivials benefit to BPF based security solutions.
>
> Doing everything in the kernel is certainly faster than
> going back and forth to user space,
> but bpf-lsm should be able to do the same already.
>
> Without patch 1 and only patches 4,5 that add few kfuncs,
> bpf-lsm prog will be able to remember subtree dentry and
> do the same is_subdir() to deny.
> The patch 7 stays pretty much as-is. All in bpf-lsm.
> Close to zero overhead without long chain of fsnotify callbacks.
One of the advantages of fanotify-bpf or fsnotify-bpf is
that it only calls the BPF program for being monitored
files. OTOH, BPF LSM program is always global.
>
>> fanotify also has a feature that LSM doesn't provide.
>> When a file is accessed, user space daemon can get a
>> fd on this file from fanotify. OTOH, LSM can only send
>> an ino or a path to user space, which is not always
>> reliable.
>
> That sounds useful, but we're mixing too many things.
> If user space cares about fd it will be using the existing
> mechanism with all accompanied overhead. fsnotify-bpf can
> barely accelerate anything, since user space makes
> ultimate decisions.
> If user space is not in the driving seat then existing bpf-lsm
> plus few kfuncs to remember dentry and call is_subdir()
> will do the job and no need for patch 1.
In many cases, we only need the user space to look into
the file when necessary. For example, when a binary is
first written, the user space daemon will scan through
it (for virus, etc.) and mark it as safe/dangerous in
some BPF maps. Later, when the file is opened for
execution, f[s|a]notify-bpf can make the allow/deny
decision based on the information in BPF maps.
Thanks again for your review and inputs!
Song
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [RFC/PATCH v2 bpf-next fanotify 7/7] selftests/bpf: Add test for BPF based fanotify fastpath handler
2024-11-15 21:05 ` Song Liu
@ 2024-11-18 20:51 ` Song Liu
2024-11-19 0:10 ` Alexei Starovoitov
0 siblings, 1 reply; 26+ messages in thread
From: Song Liu @ 2024-11-18 20:51 UTC (permalink / raw)
To: Song Liu
Cc: Alexei Starovoitov, Song Liu, bpf, Linux-Fsdevel, LKML, LSM List,
Kernel Team, Andrii Nakryiko, Eddy Z, Alexei Starovoitov,
Daniel Borkmann, Martin KaFai Lau, Alexander Viro,
Christian Brauner, Jan Kara, KP Singh, Matt Bobrowski,
Amir Goldstein, repnop@google.com, Jeff Layton, Josef Bacik,
Mickaël Salaün, gnoack@google.com
> On Nov 15, 2024, at 1:05 PM, Song Liu <songliubraving@meta.com> wrote:
[...]
>
>>
>> fsnotify_open_perm->fsnotify->send_to_group->fanotify_handle_event.
>>
>> is a pretty long path to call bpf prog and
>> preparing a giant 'struct fanotify_fastpath_event'
>> is not going to fast either.
>>
>> If we want to accelerate that with bpf it needs to be done
>> sooner with negligible overhead.
>
> Agreed. This is actually something I have been thinking
> since the beginning of this work: Shall it be fanotify-bpf
> or fsnotify-bpf. Given we have more materials, this is a
> good time to have broader discussions on this.
>
> @all, please chime in whether we should redo this as
> fsnotify-bpf. AFAICT:
>
> Pros of fanotify-bpf:
> - There is existing user space that we can leverage/reuse.
>
> Pros of fsnotify-bpf:
> - Faster fast path.
>
> Another major pros/cons did I miss?
Adding more thoughts on this: I think it makes more sense to
go with fanotify-bpf. This is because one of the benefits of
fsnotify/fanotify over LSM solutions is the built-in event
filtering of events. While this call chain is a bit long:
fsnotify_open_perm->fsnotify->send_to_group->fanotify_handle_event.
There are built-in filtering in fsnotify() and
send_to_group(), so logics in the call chain are useful.
struct fanotify_fastpath_event is indeed big. But I think
we need to pass these information to the fastpath handler
either way.
Overall, I think current fastpath design makes sense,
though there are things we need to fix (as Amir and Alexei
pointed out). Please let me know comments and suggestions
on this.
Thanks,
Song
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [RFC/PATCH v2 bpf-next fanotify 7/7] selftests/bpf: Add test for BPF based fanotify fastpath handler
2024-11-18 20:51 ` Song Liu
@ 2024-11-19 0:10 ` Alexei Starovoitov
2024-11-19 1:10 ` Song Liu
0 siblings, 1 reply; 26+ messages in thread
From: Alexei Starovoitov @ 2024-11-19 0:10 UTC (permalink / raw)
To: Song Liu
Cc: Song Liu, bpf, Linux-Fsdevel, LKML, LSM List, Kernel Team,
Andrii Nakryiko, Eddy Z, Alexei Starovoitov, Daniel Borkmann,
Martin KaFai Lau, Alexander Viro, Christian Brauner, Jan Kara,
KP Singh, Matt Bobrowski, Amir Goldstein, repnop@google.com,
Jeff Layton, Josef Bacik, Mickaël Salaün,
gnoack@google.com
On Mon, Nov 18, 2024 at 12:51 PM Song Liu <songliubraving@meta.com> wrote:
>
>
>
> > On Nov 15, 2024, at 1:05 PM, Song Liu <songliubraving@meta.com> wrote:
>
> [...]
> >
> >>
> >> fsnotify_open_perm->fsnotify->send_to_group->fanotify_handle_event.
> >>
> >> is a pretty long path to call bpf prog and
> >> preparing a giant 'struct fanotify_fastpath_event'
> >> is not going to fast either.
> >>
> >> If we want to accelerate that with bpf it needs to be done
> >> sooner with negligible overhead.
> >
> > Agreed. This is actually something I have been thinking
> > since the beginning of this work: Shall it be fanotify-bpf
> > or fsnotify-bpf. Given we have more materials, this is a
> > good time to have broader discussions on this.
> >
> > @all, please chime in whether we should redo this as
> > fsnotify-bpf. AFAICT:
> >
> > Pros of fanotify-bpf:
> > - There is existing user space that we can leverage/reuse.
> >
> > Pros of fsnotify-bpf:
> > - Faster fast path.
> >
> > Another major pros/cons did I miss?
>
> Adding more thoughts on this: I think it makes more sense to
> go with fanotify-bpf. This is because one of the benefits of
> fsnotify/fanotify over LSM solutions is the built-in event
> filtering of events. While this call chain is a bit long:
>
> fsnotify_open_perm->fsnotify->send_to_group->fanotify_handle_event.
>
> There are built-in filtering in fsnotify() and
> send_to_group(), so logics in the call chain are useful.
fsnotify_marks based filtering happens in fsnotify.
No need to do more indirect calls to get to fanotify.
I would add the bpf struct_ops hook right before send_to_group
or inside of it.
Not sure whether fsnotify_group concept should be reused
or avoided.
Per inode mark/mask filter should stay.
> struct fanotify_fastpath_event is indeed big. But I think
> we need to pass these information to the fastpath handler
> either way.
Disagree.
That was the old way of hooking bpf bits in.
uapi/bpf.h is full of such "context" structs.
xpd_md, bpf_tcp_sock, etc.
They pack fields into one struct only because
old style bpf has one input argument: ctx.
struct_ops doesn't have this limitation.
Pass things like path/dentry/inode/whatever pointers directly.
No need to pack into fanotify_fastpath_event.
> Overall, I think current fastpath design makes sense,
> though there are things we need to fix (as Amir and Alexei
> pointed out). Please let me know comments and suggestions
> on this.
On one side you're arguing that extra indirection for
inode local storage due to inode->i_secruity is needed
for performance,
but on the other side you're not worried about the deep
call stack of fsnotify->fanotify and argument packing
which add way more overhead than i_security hop.
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [RFC/PATCH v2 bpf-next fanotify 7/7] selftests/bpf: Add test for BPF based fanotify fastpath handler
2024-11-19 0:10 ` Alexei Starovoitov
@ 2024-11-19 1:10 ` Song Liu
2024-11-19 7:59 ` Amir Goldstein
0 siblings, 1 reply; 26+ messages in thread
From: Song Liu @ 2024-11-19 1:10 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: Song Liu, Song Liu, bpf, Linux-Fsdevel, LKML, LSM List,
Kernel Team, Andrii Nakryiko, Eddy Z, Alexei Starovoitov,
Daniel Borkmann, Martin KaFai Lau, Alexander Viro,
Christian Brauner, Jan Kara, KP Singh, Matt Bobrowski,
Amir Goldstein, repnop@google.com, Jeff Layton, Josef Bacik,
Mickaël Salaün, gnoack@google.com
Hi Alexei,
> On Nov 18, 2024, at 4:10 PM, Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
[...]
>>>
>>> Agreed. This is actually something I have been thinking
>>> since the beginning of this work: Shall it be fanotify-bpf
>>> or fsnotify-bpf. Given we have more materials, this is a
>>> good time to have broader discussions on this.
>>>
>>> @all, please chime in whether we should redo this as
>>> fsnotify-bpf. AFAICT:
>>>
>>> Pros of fanotify-bpf:
>>> - There is existing user space that we can leverage/reuse.
>>>
>>> Pros of fsnotify-bpf:
>>> - Faster fast path.
>>>
>>> Another major pros/cons did I miss?
>>
>> Adding more thoughts on this: I think it makes more sense to
>> go with fanotify-bpf. This is because one of the benefits of
>> fsnotify/fanotify over LSM solutions is the built-in event
>> filtering of events. While this call chain is a bit long:
>>
>> fsnotify_open_perm->fsnotify->send_to_group->fanotify_handle_event.
>>
>> There are built-in filtering in fsnotify() and
>> send_to_group(), so logics in the call chain are useful.
>
> fsnotify_marks based filtering happens in fsnotify.
> No need to do more indirect calls to get to fanotify.
>
> I would add the bpf struct_ops hook right before send_to_group
> or inside of it.
> Not sure whether fsnotify_group concept should be reused
> or avoided.
> Per inode mark/mask filter should stay.
We still need fsnotify_group. It matches each fanotify
file descriptor.
Moving struct_ops hook inside send_to_group does save
us an indirect call. But this also means we need to
introduce the fastpath concept to both fsnotify and
fanotify. I personally don't really like duplications
like this (see the big BUILD_BUG_ON array in
fanotify_handle_event).
OTOH, maybe the benefit of one fewer indirect call
justifies the extra complexity. Let me think more
about it.
>
>> struct fanotify_fastpath_event is indeed big. But I think
>> we need to pass these information to the fastpath handler
>> either way.
>
> Disagree.
> That was the old way of hooking bpf bits in.
> uapi/bpf.h is full of such "context" structs.
> xpd_md, bpf_tcp_sock, etc.
> They pack fields into one struct only because
> old style bpf has one input argument: ctx.
> struct_ops doesn't have this limitation.
> Pass things like path/dentry/inode/whatever pointers directly.
> No need to pack into fanotify_fastpath_event.
OK. I am convinced on this one. I will adjust the
code to remove fanotify_fastpath_event.
>
>> Overall, I think current fastpath design makes sense,
>> though there are things we need to fix (as Amir and Alexei
>> pointed out). Please let me know comments and suggestions
>> on this.
>
> On one side you're arguing that extra indirection for
> inode local storage due to inode->i_secruity is needed
> for performance,
The biggest issue with inode_local_storage in i_security
is not the extra indirection and thus the extra latency.
The bigger problem is, once we make inode local storage
available to tracing programs, we will not disable it
at boot time. Therefore, the extra indirection through
i_security doesn't give us any memory savings. Instead,
it will cause latency and memory fragmentations. IOW,
we are paying the cost for no benefits at all.
> but on the other side you're not worried about the deep
> call stack of fsnotify->fanotify and argument packing
> which add way more overhead than i_security hop.
I think the difference is one indirect call. But that
may worth the work. I will think more about it. Also,
I would really appreciate other folks' input.
Thanks again for your review!
Song
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [RFC/PATCH v2 bpf-next fanotify 7/7] selftests/bpf: Add test for BPF based fanotify fastpath handler
2024-11-19 1:10 ` Song Liu
@ 2024-11-19 7:59 ` Amir Goldstein
2024-11-19 8:35 ` Song Liu
0 siblings, 1 reply; 26+ messages in thread
From: Amir Goldstein @ 2024-11-19 7:59 UTC (permalink / raw)
To: Song Liu
Cc: Alexei Starovoitov, Song Liu, bpf, Linux-Fsdevel, LKML, LSM List,
Kernel Team, Andrii Nakryiko, Eddy Z, Alexei Starovoitov,
Daniel Borkmann, Martin KaFai Lau, Alexander Viro,
Christian Brauner, Jan Kara, KP Singh, Matt Bobrowski,
repnop@google.com, Jeff Layton, Josef Bacik,
Mickaël Salaün, gnoack@google.com
On Tue, Nov 19, 2024 at 2:10 AM Song Liu <songliubraving@meta.com> wrote:
>
> Hi Alexei,
>
> > On Nov 18, 2024, at 4:10 PM, Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> [...]
> >>>
> >>> Agreed. This is actually something I have been thinking
> >>> since the beginning of this work: Shall it be fanotify-bpf
> >>> or fsnotify-bpf. Given we have more materials, this is a
> >>> good time to have broader discussions on this.
> >>>
> >>> @all, please chime in whether we should redo this as
> >>> fsnotify-bpf. AFAICT:
> >>>
> >>> Pros of fanotify-bpf:
> >>> - There is existing user space that we can leverage/reuse.
> >>>
> >>> Pros of fsnotify-bpf:
> >>> - Faster fast path.
> >>>
> >>> Another major pros/cons did I miss?
Sorry, I forgot to address this earlier.
First and foremost, I don't like the brand name "fast path" for this
feature. The word "fast" implies that avoiding an indirect call is
meaningful and I don't think this is the case here.
I would rather rebrand this feature as "fanotify filter program".
I am going to make a controversial argument:
I anticipate that in the more common use of kernel filter for fanotify
the filter will be *likely* to skip sending events to userspace,
for example, when watching a small subtree in a large filesystem.
In that case, the *likely* result is that a context switch to userspace
is avoided, hence, the addition of one indirect call is insignificant
in comparison.
If we later decide that my assumption is wrong and it is important
to optimize out the indirect call when skipping userspace, we could
do that - it is not a problem, but for now, this discussion seems
like premature optimization to me.
> >>
> >> Adding more thoughts on this: I think it makes more sense to
> >> go with fanotify-bpf. This is because one of the benefits of
> >> fsnotify/fanotify over LSM solutions is the built-in event
> >> filtering of events. While this call chain is a bit long:
> >>
> >> fsnotify_open_perm->fsnotify->send_to_group->fanotify_handle_event.
> >>
> >> There are built-in filtering in fsnotify() and
> >> send_to_group(), so logics in the call chain are useful.
> >
> > fsnotify_marks based filtering happens in fsnotify.
> > No need to do more indirect calls to get to fanotify.
> >
> > I would add the bpf struct_ops hook right before send_to_group
> > or inside of it.
> > Not sure whether fsnotify_group concept should be reused
> > or avoided.
> > Per inode mark/mask filter should stay.
>
> We still need fsnotify_group. It matches each fanotify
> file descriptor.
>
We need the filter to be per fsnotify_group.
Unlike LSM, fsnotify/fanotify is a pubsub service for many
fs event consumers (a.k.a groups).
The feature is a filter per event consumer, not a global filter
for all event consumers.
> Moving struct_ops hook inside send_to_group does save
> us an indirect call. But this also means we need to
> introduce the fastpath concept to both fsnotify and
> fanotify. I personally don't really like duplications
> like this (see the big BUILD_BUG_ON array in
> fanotify_handle_event).
>
> OTOH, maybe the benefit of one fewer indirect call
> justifies the extra complexity. Let me think more
> about it.
>
I need to explain something about fsnotify vs. fanotify
in order to argue why the feature should be "fanotify", but the
bottom line is that is should not be too hard to avoid the indirect
call even if the feature is introduced through fanotify API as I think
that it should be.
TLDR:
The fsnotify_backend abstraction has become somewhat
of a theater of abstraction over time, because the feature
distance between fanotify backend and all the rest has grew
quite large.
The logic in send_to_group() is *seemingly* the generic fsnotify
logic, but not really, because only fanotify has ignore masks
and only fanotify has mark types (inode,mount,sb).
This difference is encoded by the group->ops->handle_event()
operation that only fanotify implements.
All the rest of the backends implement the simpler ->handle_inode_event().
Similarly, the group->private union is always dominated by the size
of group->fanotify_data, so there is no big difference if we place
group->fp_hook (or ->filter_hook) outside of fanotify_data, so that
we can query and call it from send_to_group() saving the indirect call
to ->handle_event().
That still leaves the question if we need to call fanotify_group_event_mask()
before the filter hook.
fanotify_group_event_mask() does several things, but I think that
the only thing relevant before the filter hook is this line:
/*
* Send the event depending on event flags in mark mask.
*/
if (!fsnotify_mask_applicable(mark->mask, ondir, type))
continue;
This code is related to the two "built-in fanotify filters", namely
FAN_ONDIR and FAN_EVENT_ON_CHILD.
These built-in filters are so lame that they serve as a good example
why a programmable filter is a better idea.
For example, users need to opt-in for events on directories, but they
cannot request events only on directories.
Historically, the "generic" abstraction in send_to_group() has dealt
with the non-generic fanotify ignore mask, but has not dealt with
these non-generic fanotify built-in filters.
However, since commit 31a371e419c8 ("fanotify: prepare for setting
event flags in ignore mask"), send_to_group() is already aware of those
fanotify built-in filters.
So unless I am missing something, if we align the marks iteration
loop in send_to_group() to look exactly like the marks iteration loop in
fanotify_group_event_mask(), there should be no problem to call
the filter hook directly, right before calling ->handle_event().
Hope this brain dump helps.
Thanks,
Amir.
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [RFC/PATCH v2 bpf-next fanotify 7/7] selftests/bpf: Add test for BPF based fanotify fastpath handler
2024-11-19 7:59 ` Amir Goldstein
@ 2024-11-19 8:35 ` Song Liu
0 siblings, 0 replies; 26+ messages in thread
From: Song Liu @ 2024-11-19 8:35 UTC (permalink / raw)
To: Amir Goldstein
Cc: Song Liu, Alexei Starovoitov, Song Liu, bpf, Linux-Fsdevel, LKML,
LSM List, Kernel Team, Andrii Nakryiko, Eddy Z,
Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau,
Alexander Viro, Christian Brauner, Jan Kara, KP Singh,
Matt Bobrowski, repnop@google.com, Jeff Layton, Josef Bacik,
Mickaël Salaün, gnoack@google.com
Hi Amir,
Thanks for sharing these thoughts. I will take a closer look
later, as I am not awake enough to fully understand everything.
> On Nov 18, 2024, at 11:59 PM, Amir Goldstein <amir73il@gmail.com> wrote:
[...]
>> Moving struct_ops hook inside send_to_group does save
>> us an indirect call. But this also means we need to
>> introduce the fastpath concept to both fsnotify and
>> fanotify. I personally don't really like duplications
>> like this (see the big BUILD_BUG_ON array in
>> fanotify_handle_event).
>>
>> OTOH, maybe the benefit of one fewer indirect call
>> justifies the extra complexity. Let me think more
>> about it.
>>
>
> I need to explain something about fsnotify vs. fanotify
> in order to argue why the feature should be "fanotify", but the
> bottom line is that is should not be too hard to avoid the indirect
> call even if the feature is introduced through fanotify API as I think
> that it should be.
When I first looked into this, I thought about "whether
there will be a use case that uses fsnotify but not fanotify".
I didn't get any conclusion on this back then. But according
to this thread, I think we are pretty confident that future
use cases (such as FAN_PRE_ACCESS) will have a fanotify part.
If this is the case, I think fanotify-bpf makes more sense.
> TLDR:
> The fsnotify_backend abstraction has become somewhat
> of a theater of abstraction over time, because the feature
> distance between fanotify backend and all the rest has grew
> quite large.
>
> The logic in send_to_group() is *seemingly* the generic fsnotify
> logic, but not really, because only fanotify has ignore masks
> and only fanotify has mark types (inode,mount,sb).
>
> This difference is encoded by the group->ops->handle_event()
> operation that only fanotify implements.
> All the rest of the backends implement the simpler ->handle_inode_event().
>
> Similarly, the group->private union is always dominated by the size
> of group->fanotify_data, so there is no big difference if we place
> group->fp_hook (or ->filter_hook) outside of fanotify_data, so that
> we can query and call it from send_to_group() saving the indirect call
> to ->handle_event().
>
> That still leaves the question if we need to call fanotify_group_event_mask()
> before the filter hook.
I was trying Alexei's idea to move the API to fsnotify, and got
stucked at fanotify_group_event_mask(). It appears we should
always honor these built-in filters.
> fanotify_group_event_mask() does several things, but I think that
> the only thing relevant before the filter hook is this line:
> /*
> * Send the event depending on event flags in mark mask.
> */
> if (!fsnotify_mask_applicable(mark->mask, ondir, type))
> continue;
>
> This code is related to the two "built-in fanotify filters", namely
> FAN_ONDIR and FAN_EVENT_ON_CHILD.
> These built-in filters are so lame that they serve as a good example
> why a programmable filter is a better idea.
> For example, users need to opt-in for events on directories, but they
> cannot request events only on directories.
>
> Historically, the "generic" abstraction in send_to_group() has dealt
> with the non-generic fanotify ignore mask, but has not dealt with
> these non-generic fanotify built-in filters.
>
> However, since commit 31a371e419c8 ("fanotify: prepare for setting
> event flags in ignore mask"), send_to_group() is already aware of those
> fanotify built-in filters.
I will continue on this tomorrow. It is time to get some sleep. :)
>
> So unless I am missing something, if we align the marks iteration
> loop in send_to_group() to look exactly like the marks iteration loop in
> fanotify_group_event_mask(), there should be no problem to call
> the filter hook directly, right before calling ->handle_event().
>
> Hope this brain dump helps.
Thanks again for your input!
Song
>
> Thanks,
> Amir.
^ permalink raw reply [flat|nested] 26+ messages in thread
end of thread, other threads:[~2024-11-19 8:35 UTC | newest]
Thread overview: 26+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-11-14 8:43 [RFC/PATCH v2 bpf-next fanotify 0/7] Fanotify fastpath handler Song Liu
2024-11-14 8:43 ` [RFC/PATCH v2 bpf-next fanotify 1/7] fanotify: Introduce fanotify " Song Liu
2024-11-15 8:51 ` Amir Goldstein
2024-11-15 17:11 ` Song Liu
2024-11-15 17:32 ` Amir Goldstein
2024-11-14 8:43 ` [RFC/PATCH v2 bpf-next fanotify 2/7] samples/fanotify: Add a sample " Song Liu
2024-11-14 8:43 ` [RFC/PATCH v2 bpf-next fanotify 3/7] bpf: Make bpf inode storage available to tracing programs Song Liu
2024-11-14 8:43 ` [RFC/PATCH v2 bpf-next fanotify 4/7] bpf: fs: Add three kfuncs Song Liu
2024-11-14 8:43 ` [RFC/PATCH v2 bpf-next fanotify 5/7] bpf: Allow bpf map hold reference on dentry Song Liu
2024-11-14 8:43 ` [RFC/PATCH v2 bpf-next fanotify 6/7] fanotify: Enable bpf based fanotify fastpath handler Song Liu
2024-11-14 8:43 ` [RFC/PATCH v2 bpf-next fanotify 7/7] selftests/bpf: Add test for BPF " Song Liu
2024-11-14 20:14 ` Alexei Starovoitov
2024-11-14 23:02 ` Song Liu
2024-11-15 0:41 ` Alexei Starovoitov
2024-11-15 1:10 ` Song Liu
2024-11-15 1:31 ` Alexei Starovoitov
2024-11-15 7:01 ` Song Liu
2024-11-15 19:41 ` Alexei Starovoitov
2024-11-15 21:05 ` Song Liu
2024-11-18 20:51 ` Song Liu
2024-11-19 0:10 ` Alexei Starovoitov
2024-11-19 1:10 ` Song Liu
2024-11-19 7:59 ` Amir Goldstein
2024-11-19 8:35 ` Song Liu
2024-11-15 7:26 ` Amir Goldstein
2024-11-15 20:04 ` Alexei Starovoitov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).