[RFC bpf-next fanotify 0/5] Fanotify fastpath handler

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [RFC bpf-next fanotify 0/5] Fanotify fastpath handler
@ 2024-10-29 23:12 Song Liu
  2024-10-29 23:12 ` [RFC bpf-next fanotify 1/5] fanotify: Introduce fanotify " Song Liu
                   ` (4 more replies)
  0 siblings, 5 replies; 29+ messages in thread
From: Song Liu @ 2024-10-29 23:12 UTC (permalink / raw)
  To: bpf, linux-fsdevel, linux-kernel
  Cc: kernel-team, andrii, eddyz87, ast, daniel, martin.lau, viro,
	brauner, jack, kpsingh, mattbobrowski, amir73il, repnop, jlayton,
	josef, Song Liu

This RFC set introduces in-kernel fastpath handler for fanotify. The
fastpath handler can be used to handle/filter some events without going
through userspace.

In LPC 2024, multiple talks covered use cases of monitoring a subtree in
the VFS (fanotify: [1], bpf/lsm: [2]). This work is inspired by these
discussions. Reliably monitoring of a subtree with low overhead is a hard
problem. We do not claim this set fully solves problem. But we think this
work can be a very useful building block of the solution to this problem.

The fastpath handler can be implemented with built-in logic, in a kernel
module, or a bpf program. The fastpath handler is attached to a fsnotify
group. With current implementation, the multiple fastpath handlers are
maintained in a global list. Only users with CAP_SYS_ADMIN can add
fastpath handlers to the list by loading a kernel module. User without
CAP_SYS_ADMIN can attach a loaded fastpath handler to fanotify instances.
During the attach operation, the fastpath handler can take an argument.
This enables non-CAP_SYSADMIN users to customize/configure the fastpath
handler, for example, with a specific allowlist/denylist.

As the patchset grows to 1000+ lines (including samples and tests), I
would like some feedback before pushing it further.

Overview:

Patch 1/5 adds logic to write fastpath handlers in kernel modules.
Patch 2/5 adds a sample of a fastpath handler in a kernel module.
Patch 3/5 is some preparation work on BPF side.
Patch 4/5 adds logic to write fastpath handlers in bpf programs.
Patch 5/5 is a selftest and example of bpf based fastpath handler.

TODO:
1. Add some mechanism to help users discover available fastpath
   handlers. For example, we can add a sysctl which is similar to
   net.ipv4.tcp_available_congestion_control, or we can add some sysfs
   entries.
2. Enable prviate (not added to global list) bpf based fastpath handlers.
3. More testing for inode local storage.
4. Man pages.

[1] https://lpc.events/event/18/contributions/1717/
[2] https://lpc.events/event/18/contributions/1940/

Song Liu (5):
  fanotify: Introduce fanotify fastpath handler
  samples/fanotify: Add a sample fanotify fastpath handler
  bpf: Make bpf inode storage available to tracing programs
  fanotify: Enable bpf based fanotify fastpath handler
  selftests/bpf: Add test for BPF based fanotify fastpath handler

 MAINTAINERS                                   |   1 +
 fs/Makefile                                   |   2 +-
 fs/bpf_fs_kfuncs.c                            |  23 +-
 fs/notify/fanotify/Makefile                   |   2 +-
 fs/notify/fanotify/fanotify.c                 |  25 ++
 fs/notify/fanotify/fanotify_fastpath.c        | 318 ++++++++++++++++++
 fs/notify/fanotify/fanotify_user.c            |   7 +
 include/linux/bpf.h                           |   9 +
 include/linux/bpf_lsm.h                       |  29 --
 include/linux/fanotify.h                      |  45 +++
 include/linux/fs.h                            |   4 +
 include/linux/fsnotify_backend.h              |   3 +
 include/uapi/linux/fanotify.h                 |  26 ++
 kernel/bpf/Makefile                           |   3 +-
 kernel/bpf/bpf_inode_storage.c                | 174 +++++++---
 kernel/bpf/bpf_lsm.c                          |   4 -
 kernel/bpf/verifier.c                         |   5 +
 kernel/trace/bpf_trace.c                      |   8 +
 samples/Kconfig                               |  20 +-
 samples/Makefile                              |   2 +-
 samples/fanotify/.gitignore                   |   1 +
 samples/fanotify/Makefile                     |   5 +-
 samples/fanotify/fastpath-mod.c               | 138 ++++++++
 samples/fanotify/fastpath-user.c              |  90 +++++
 security/bpf/hooks.c                          |   5 -
 tools/testing/selftests/bpf/bpf_kfuncs.h      |   4 +
 tools/testing/selftests/bpf/config            |   1 +
 .../testing/selftests/bpf/prog_tests/fan_fp.c | 245 ++++++++++++++
 tools/testing/selftests/bpf/progs/fan_fp.c    |  77 +++++
 29 files changed, 1189 insertions(+), 87 deletions(-)
 create mode 100644 fs/notify/fanotify/fanotify_fastpath.c
 create mode 100644 samples/fanotify/fastpath-mod.c
 create mode 100644 samples/fanotify/fastpath-user.c
 create mode 100644 tools/testing/selftests/bpf/prog_tests/fan_fp.c
 create mode 100644 tools/testing/selftests/bpf/progs/fan_fp.c

--
2.43.5

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [RFC bpf-next fanotify 1/5] fanotify: Introduce fanotify fastpath handler
  2024-10-29 23:12 [RFC bpf-next fanotify 0/5] Fanotify fastpath handler Song Liu
@ 2024-10-29 23:12 ` Song Liu
  2024-10-30 12:45   ` Jeff Layton
  2024-11-07 10:48   ` Jan Kara
  2024-10-29 23:12 ` [RFC bpf-next fanotify 2/5] samples/fanotify: Add a sample " Song Liu
                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 29+ messages in thread
From: Song Liu @ 2024-10-29 23:12 UTC (permalink / raw)
  To: bpf, linux-fsdevel, linux-kernel
  Cc: kernel-team, andrii, eddyz87, ast, daniel, martin.lau, viro,
	brauner, jack, kpsingh, mattbobrowski, amir73il, repnop, jlayton,
	josef, Song Liu

fanotify fastpath handler enables handling fanotify events within the
kernel, and thus saves a trip to the user space. fanotify fastpath handler
can be useful in many use cases. For example, if a user is only interested
in events for some files in side a directory, a fastpath handler can be
used to filter out irrelevant events.

fanotify fastpath handler is attached to fsnotify_group. At most one
fastpath handler can be attached to a fsnotify_group. The attach/detach
of fastpath handlers are controlled by two new ioctls on the fanotify fds:
FAN_IOC_ADD_FP and FAN_IOC_DEL_FP.

fanotify fastpath handler is packaged in a kernel module. In the future,
it is also possible to package fastpath handler in a BPF program. Since
loading modules requires CAP_SYS_ADMIN, _loading_ fanotify fastpath
handler in kernel modules is limited to CAP_SYS_ADMIN. However,
non-SYS_CAP_ADMIN users can _attach_ fastpath handler loaded by sys admin
to their fanotify fds. To make fanotify fastpath handler more useful
for non-CAP_SYS_ADMIN users, a fastpath handler can take arguments at
attach time.

TODO: Add some mechanism to help users discover available fastpath
handlers. For example, we can add a sysctl which is similar to
net.ipv4.tcp_available_congestion_control, or we can add some sysfs
entries.

Signed-off-by: Song Liu <song@kernel.org>
---
 fs/notify/fanotify/Makefile            |   2 +-
 fs/notify/fanotify/fanotify.c          |  25 ++++
 fs/notify/fanotify/fanotify_fastpath.c | 171 +++++++++++++++++++++++++
 fs/notify/fanotify/fanotify_user.c     |   7 +
 include/linux/fanotify.h               |  45 +++++++
 include/linux/fsnotify_backend.h       |   3 +
 include/uapi/linux/fanotify.h          |  26 ++++
 7 files changed, 278 insertions(+), 1 deletion(-)
 create mode 100644 fs/notify/fanotify/fanotify_fastpath.c

diff --git a/fs/notify/fanotify/Makefile b/fs/notify/fanotify/Makefile
index 25ef222915e5..fddab88dde37 100644
--- a/fs/notify/fanotify/Makefile
+++ b/fs/notify/fanotify/Makefile
@@ -1,2 +1,2 @@
 # SPDX-License-Identifier: GPL-2.0-only
-obj-$(CONFIG_FANOTIFY)		+= fanotify.o fanotify_user.o
+obj-$(CONFIG_FANOTIFY)		+= fanotify.o fanotify_user.o fanotify_fastpath.o
diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c
index 224bccaab4cc..a40ec06d0218 100644
--- a/fs/notify/fanotify/fanotify.c
+++ b/fs/notify/fanotify/fanotify.c
@@ -18,6 +18,8 @@
 
 #include "fanotify.h"
 
+extern struct srcu_struct fsnotify_mark_srcu;
+
 static bool fanotify_path_equal(const struct path *p1, const struct path *p2)
 {
 	return p1->mnt == p2->mnt && p1->dentry == p2->dentry;
@@ -888,6 +890,7 @@ static int fanotify_handle_event(struct fsnotify_group *group, u32 mask,
 	struct fsnotify_event *fsn_event;
 	__kernel_fsid_t fsid = {};
 	u32 match_mask = 0;
+	struct fanotify_fastpath_hook *fp_hook;
 
 	BUILD_BUG_ON(FAN_ACCESS != FS_ACCESS);
 	BUILD_BUG_ON(FAN_MODIFY != FS_MODIFY);
@@ -933,6 +936,25 @@ static int fanotify_handle_event(struct fsnotify_group *group, u32 mask,
 	if (FAN_GROUP_FLAG(group, FANOTIFY_FID_BITS))
 		fsid = fanotify_get_fsid(iter_info);
 
+	fp_hook = srcu_dereference(group->fanotify_data.fp_hook, &fsnotify_mark_srcu);
+	if (fp_hook) {
+		struct fanotify_fastpath_event fp_event = {
+			.mask = mask,
+			.data = data,
+			.data_type = data_type,
+			.dir = dir,
+			.file_name = file_name,
+			.fsid = &fsid,
+			.match_mask = match_mask,
+		};
+
+		ret = fp_hook->ops->fp_handler(group, fp_hook, &fp_event);
+		if (ret == FAN_FP_RET_SKIP_EVENT) {
+			ret = 0;
+			goto finish;
+		}
+	}
+
 	event = fanotify_alloc_event(group, mask, data, data_type, dir,
 				     file_name, &fsid, match_mask);
 	ret = -ENOMEM;
@@ -976,6 +998,9 @@ static void fanotify_free_group_priv(struct fsnotify_group *group)
 
 	if (mempool_initialized(&group->fanotify_data.error_events_pool))
 		mempool_exit(&group->fanotify_data.error_events_pool);
+
+	if (group->fanotify_data.fp_hook)
+		fanotify_fastpath_hook_free(group->fanotify_data.fp_hook);
 }
 
 static void fanotify_free_path_event(struct fanotify_event *event)
diff --git a/fs/notify/fanotify/fanotify_fastpath.c b/fs/notify/fanotify/fanotify_fastpath.c
new file mode 100644
index 000000000000..0453a1ac25b1
--- /dev/null
+++ b/fs/notify/fanotify/fanotify_fastpath.c
@@ -0,0 +1,171 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/fanotify.h>
+#include <linux/module.h>
+
+#include "fanotify.h"
+
+extern struct srcu_struct fsnotify_mark_srcu;
+
+static DEFINE_SPINLOCK(fp_list_lock);
+static LIST_HEAD(fp_list);
+
+static struct fanotify_fastpath_ops *fanotify_fastpath_find(const char *name)
+{
+	struct fanotify_fastpath_ops *ops;
+
+	list_for_each_entry(ops, &fp_list, list) {
+		if (!strcmp(ops->name, name))
+			return ops;
+	}
+	return NULL;
+}
+
+
+/*
+ * fanotify_fastpath_register - Register a new fastpath handler.
+ *
+ * Add a fastpath handler to the fp_list. These fastpath handlers are
+ * available for all users in the system.
+ *
+ * @ops:	pointer to fanotify_fastpath_ops to add.
+ *
+ * Returns:
+ *	0	- on success;
+ *	-EEXIST	- fastpath handler of the same name already exists.
+ */
+int fanotify_fastpath_register(struct fanotify_fastpath_ops *ops)
+{
+	spin_lock(&fp_list_lock);
+	if (fanotify_fastpath_find(ops->name)) {
+		/* cannot register two handlers with the same name */
+		spin_unlock(&fp_list_lock);
+		return -EEXIST;
+	}
+	list_add_tail(&ops->list, &fp_list);
+	spin_unlock(&fp_list_lock);
+	return 0;
+}
+EXPORT_SYMBOL_GPL(fanotify_fastpath_register);
+
+/*
+ * fanotify_fastpath_unregister - Unregister a new fastpath handler.
+ *
+ * Remove a fastpath handler from fp_list.
+ *
+ * @ops:	pointer to fanotify_fastpath_ops to remove.
+ */
+void fanotify_fastpath_unregister(struct fanotify_fastpath_ops *ops)
+{
+	spin_lock(&fp_list_lock);
+	list_del_init(&ops->list);
+	spin_unlock(&fp_list_lock);
+}
+EXPORT_SYMBOL_GPL(fanotify_fastpath_unregister);
+
+/*
+ * fanotify_fastpath_add - Add a fastpath handler to fsnotify_group.
+ *
+ * Add a fastpath handler from fp_list to a fsnotify_group.
+ *
+ * @group:	fsnotify_group that will have add
+ * @argp:	fanotify_fastpath_args that specifies the fastpath handler
+ *		and the init arguments of the fastpath handler.
+ *
+ * Returns:
+ *	0	- on success;
+ *	-EEXIST	- fastpath handler of the same name already exists.
+ */
+int fanotify_fastpath_add(struct fsnotify_group *group,
+			  struct fanotify_fastpath_args __user *argp)
+{
+	struct fanotify_fastpath_hook *fp_hook;
+	struct fanotify_fastpath_ops *fp_ops;
+	struct fanotify_fastpath_args args;
+	int ret = 0;
+
+	ret = copy_from_user(&args, argp, sizeof(args));
+	if (ret)
+		return -EFAULT;
+
+	if (args.version != 1 || args.flags || args.init_args_len > FAN_FP_ARGS_MAX)
+		return -EINVAL;
+
+	args.name[FAN_FP_NAME_MAX - 1] = '\0';
+
+	fsnotify_group_lock(group);
+
+	if (rcu_access_pointer(group->fanotify_data.fp_hook)) {
+		fsnotify_group_unlock(group);
+		return -EBUSY;
+	}
+
+	fp_hook = kzalloc(sizeof(*fp_hook), GFP_KERNEL);
+	if (!fp_hook) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	spin_lock(&fp_list_lock);
+	fp_ops = fanotify_fastpath_find(args.name);
+	if (!fp_ops || !try_module_get(fp_ops->owner)) {
+		spin_unlock(&fp_list_lock);
+		ret = -ENOENT;
+		goto err_free_hook;
+	}
+	spin_unlock(&fp_list_lock);
+
+	if (fp_ops->fp_init) {
+		char *init_args = NULL;
+
+		if (args.init_args_len) {
+			init_args = strndup_user(u64_to_user_ptr(args.init_args),
+						 args.init_args_len);
+			if (IS_ERR(init_args)) {
+				ret = PTR_ERR(init_args);
+				if (ret == -EINVAL)
+					ret = -E2BIG;
+				goto err_module_put;
+			}
+		}
+		ret = fp_ops->fp_init(fp_hook, init_args);
+		kfree(init_args);
+		if (ret)
+			goto err_module_put;
+	}
+	fp_hook->ops = fp_ops;
+	rcu_assign_pointer(group->fanotify_data.fp_hook, fp_hook);
+
+out:
+	fsnotify_group_unlock(group);
+	return ret;
+
+err_module_put:
+	module_put(fp_ops->owner);
+err_free_hook:
+	kfree(fp_hook);
+	goto out;
+}
+
+void fanotify_fastpath_hook_free(struct fanotify_fastpath_hook *fp_hook)
+{
+	if (fp_hook->ops->fp_free)
+		fp_hook->ops->fp_free(fp_hook);
+
+	module_put(fp_hook->ops->owner);
+}
+
+void fanotify_fastpath_del(struct fsnotify_group *group)
+{
+	struct fanotify_fastpath_hook *fp_hook;
+
+	fsnotify_group_lock(group);
+	fp_hook = group->fanotify_data.fp_hook;
+	if (!fp_hook)
+		goto out;
+
+	rcu_assign_pointer(group->fanotify_data.fp_hook, NULL);
+	fanotify_fastpath_hook_free(fp_hook);
+
+out:
+	fsnotify_group_unlock(group);
+}
diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
index 8e2d43fc6f7c..e96cb83f8409 100644
--- a/fs/notify/fanotify/fanotify_user.c
+++ b/fs/notify/fanotify/fanotify_user.c
@@ -987,6 +987,13 @@ static long fanotify_ioctl(struct file *file, unsigned int cmd, unsigned long ar
 		spin_unlock(&group->notification_lock);
 		ret = put_user(send_len, (int __user *) p);
 		break;
+	case FAN_IOC_ADD_FP:
+		ret = fanotify_fastpath_add(group, p);
+		break;
+	case FAN_IOC_DEL_FP:
+		fanotify_fastpath_del(group);
+		ret = 0;
+		break;
 	}
 
 	return ret;
diff --git a/include/linux/fanotify.h b/include/linux/fanotify.h
index 89ff45bd6f01..cea95307a580 100644
--- a/include/linux/fanotify.h
+++ b/include/linux/fanotify.h
@@ -136,4 +136,49 @@
 #undef FAN_ALL_PERM_EVENTS
 #undef FAN_ALL_OUTGOING_EVENTS
 
+struct fsnotify_group;
+struct qstr;
+struct inode;
+struct fanotify_fastpath_hook;
+
+struct fanotify_fastpath_event {
+	u32 mask;
+	const void *data;
+	int data_type;
+	struct inode *dir;
+	const struct qstr *file_name;
+	__kernel_fsid_t *fsid;
+	u32 match_mask;
+};
+
+struct fanotify_fastpath_ops {
+	int (*fp_handler)(struct fsnotify_group *group,
+			  struct fanotify_fastpath_hook *fp_hook,
+			  struct fanotify_fastpath_event *fp_event);
+	int (*fp_init)(struct fanotify_fastpath_hook *hook, const char *args);
+	void (*fp_free)(struct fanotify_fastpath_hook *hook);
+
+	char name[FAN_FP_NAME_MAX];
+	struct module *owner;
+	struct list_head list;
+	int flags;
+};
+
+enum fanotify_fastpath_return {
+	FAN_FP_RET_SEND_TO_USERSPACE = 0,
+	FAN_FP_RET_SKIP_EVENT = 1,
+};
+
+struct fanotify_fastpath_hook {
+	struct fanotify_fastpath_ops *ops;
+	void *data;
+};
+
+int fanotify_fastpath_register(struct fanotify_fastpath_ops *ops);
+void fanotify_fastpath_unregister(struct fanotify_fastpath_ops *ops);
+int fanotify_fastpath_add(struct fsnotify_group *group,
+			  struct fanotify_fastpath_args __user *args);
+void fanotify_fastpath_del(struct fsnotify_group *group);
+void fanotify_fastpath_hook_free(struct fanotify_fastpath_hook *fp_hook);
+
 #endif /* _LINUX_FANOTIFY_H */
diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
index 3ecf7768e577..ef251b4e4e6f 100644
--- a/include/linux/fsnotify_backend.h
+++ b/include/linux/fsnotify_backend.h
@@ -117,6 +117,7 @@ struct fsnotify_fname;
 struct fsnotify_iter_info;
 
 struct mem_cgroup;
+struct fanotify_fastpath_hook;
 
 /*
  * Each group much define these ops.  The fsnotify infrastructure will call
@@ -255,6 +256,8 @@ struct fsnotify_group {
 			int f_flags; /* event_f_flags from fanotify_init() */
 			struct ucounts *ucounts;
 			mempool_t error_events_pool;
+
+			struct fanotify_fastpath_hook __rcu *fp_hook;
 		} fanotify_data;
 #endif /* CONFIG_FANOTIFY */
 	};
diff --git a/include/uapi/linux/fanotify.h b/include/uapi/linux/fanotify.h
index 34f221d3a1b9..9c30baeebae0 100644
--- a/include/uapi/linux/fanotify.h
+++ b/include/uapi/linux/fanotify.h
@@ -3,6 +3,7 @@
 #define _UAPI_LINUX_FANOTIFY_H
 
 #include <linux/types.h>
+#include <linux/ioctl.h>
 
 /* the following events that user-space can register for */
 #define FAN_ACCESS		0x00000001	/* File was accessed */
@@ -243,4 +244,29 @@ struct fanotify_response_info_audit_rule {
 				(long)(meta)->event_len >= (long)FAN_EVENT_METADATA_LEN && \
 				(long)(meta)->event_len <= (long)(len))
 
+#define FAN_FP_NAME_MAX 64
+#define FAN_FP_ARGS_MAX 1024
+
+/* This is the arguments used to add fastpath handler to a group. */
+struct fanotify_fastpath_args {
+	/* user space pointer to the name of fastpath handler */
+	char name[FAN_FP_NAME_MAX];
+
+	__u32 version;
+	__u32 flags;
+
+	/*
+	 * user space pointer to the init args of fastpath handler,
+	 * up to init_args_len (<= FAN_FP_ARGS_MAX).
+	 */
+	__u64 init_args;
+	/* length of init_args */
+	__u32 init_args_len;
+} __attribute__((__packed__));
+
+#define FAN_IOC_MAGIC 'F'
+
+#define FAN_IOC_ADD_FP _IOW(FAN_IOC_MAGIC, 0, struct fanotify_fastpath_args)
+#define FAN_IOC_DEL_FP _IOW(FAN_IOC_MAGIC, 1, char[FAN_FP_NAME_MAX])
+
 #endif /* _UAPI_LINUX_FANOTIFY_H */
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC bpf-next fanotify 2/5] samples/fanotify: Add a sample fanotify fastpath handler
  2024-10-29 23:12 [RFC bpf-next fanotify 0/5] Fanotify fastpath handler Song Liu
  2024-10-29 23:12 ` [RFC bpf-next fanotify 1/5] fanotify: Introduce fanotify " Song Liu
@ 2024-10-29 23:12 ` Song Liu
  2024-10-30  0:11   ` Al Viro
  2024-10-30 13:03   ` Jeff Layton
  2024-10-29 23:12 ` [RFC bpf-next fanotify 3/5] bpf: Make bpf inode storage available to tracing programs Song Liu
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 29+ messages in thread
From: Song Liu @ 2024-10-29 23:12 UTC (permalink / raw)
  To: bpf, linux-fsdevel, linux-kernel
  Cc: kernel-team, andrii, eddyz87, ast, daniel, martin.lau, viro,
	brauner, jack, kpsingh, mattbobrowski, amir73il, repnop, jlayton,
	josef, Song Liu

This fastpath handler filters out events for files with certain prefixes.
To use it:

  [root] insmod fastpath-mod.ko    # This requires root.

  [user] ./fastpath-user /tmp a,b,c &    # Root is not needed
  [user] touch /tmp/aa   # a is in the prefix list (a,b,c), no events
  [user] touch /tmp/xx   # x is not in the prefix list, generates events

  Accessing file xx   # this is the output from fastpath_user

Signed-off-by: Song Liu <song@kernel.org>
---
 MAINTAINERS                      |   1 +
 samples/Kconfig                  |  20 ++++-
 samples/Makefile                 |   2 +-
 samples/fanotify/.gitignore      |   1 +
 samples/fanotify/Makefile        |   5 +-
 samples/fanotify/fastpath-mod.c  | 138 +++++++++++++++++++++++++++++++
 samples/fanotify/fastpath-user.c |  90 ++++++++++++++++++++
 7 files changed, 254 insertions(+), 3 deletions(-)
 create mode 100644 samples/fanotify/fastpath-mod.c
 create mode 100644 samples/fanotify/fastpath-user.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 7ad507f49324..8939a48b2d99 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -8658,6 +8658,7 @@ S:	Maintained
 F:	fs/notify/fanotify/
 F:	include/linux/fanotify.h
 F:	include/uapi/linux/fanotify.h
+F:	samples/fanotify/
 
 FARADAY FOTG210 USB2 DUAL-ROLE CONTROLLER
 M:	Linus Walleij <linus.walleij@linaro.org>
diff --git a/samples/Kconfig b/samples/Kconfig
index b288d9991d27..b0d3dff48bb0 100644
--- a/samples/Kconfig
+++ b/samples/Kconfig
@@ -149,15 +149,33 @@ config SAMPLE_CONNECTOR
 	  with it.
 	  See also Documentation/driver-api/connector.rst
 
+config SAMPLE_FANOTIFY
+	bool "Build fanotify monitoring sample"
+	depends on FANOTIFY && CC_CAN_LINK && HEADERS_INSTALL
+	help
+	  When enabled, this builds samples for fanotify.
+	  There multiple samples for fanotify. Please see the
+	  following configs for more details of these
+	  samples.
+
 config SAMPLE_FANOTIFY_ERROR
 	bool "Build fanotify error monitoring sample"
-	depends on FANOTIFY && CC_CAN_LINK && HEADERS_INSTALL
+	depends on SAMPLE_FANOTIFY
 	help
 	  When enabled, this builds an example code that uses the
 	  FAN_FS_ERROR fanotify mechanism to monitor filesystem
 	  errors.
 	  See also Documentation/admin-guide/filesystem-monitoring.rst.
 
+config SAMPLE_FANOTIFY_FASTPATH
+	tristate "Build fanotify fastpath sample"
+	depends on SAMPLE_FANOTIFY && m
+	help
+	  When enabled, this builds kernel module that contains a
+	  fanotify fastpath handler.
+	  The fastpath handler filters out certain filename
+	  prefixes for the fanotify user.
+
 config SAMPLE_HIDRAW
 	bool "hidraw sample"
 	depends on CC_CAN_LINK && HEADERS_INSTALL
diff --git a/samples/Makefile b/samples/Makefile
index b85fa64390c5..108360972626 100644
--- a/samples/Makefile
+++ b/samples/Makefile
@@ -6,7 +6,7 @@ subdir-$(CONFIG_SAMPLE_ANDROID_BINDERFS) += binderfs
 subdir-$(CONFIG_SAMPLE_CGROUP) += cgroup
 obj-$(CONFIG_SAMPLE_CONFIGFS)		+= configfs/
 obj-$(CONFIG_SAMPLE_CONNECTOR)		+= connector/
-obj-$(CONFIG_SAMPLE_FANOTIFY_ERROR)	+= fanotify/
+obj-$(CONFIG_SAMPLE_FANOTIFY)		+= fanotify/
 subdir-$(CONFIG_SAMPLE_HIDRAW)		+= hidraw
 obj-$(CONFIG_SAMPLE_HW_BREAKPOINT)	+= hw_breakpoint/
 obj-$(CONFIG_SAMPLE_KDB)		+= kdb/
diff --git a/samples/fanotify/.gitignore b/samples/fanotify/.gitignore
index d74593e8b2de..306e1ddec4e0 100644
--- a/samples/fanotify/.gitignore
+++ b/samples/fanotify/.gitignore
@@ -1 +1,2 @@
 fs-monitor
+fastpath-user
diff --git a/samples/fanotify/Makefile b/samples/fanotify/Makefile
index e20db1bdde3b..f5bbd7380104 100644
--- a/samples/fanotify/Makefile
+++ b/samples/fanotify/Makefile
@@ -1,5 +1,8 @@
 # SPDX-License-Identifier: GPL-2.0-only
-userprogs-always-y += fs-monitor
+userprogs-always-$(CONFIG_SAMPLE_FANOTIFY_ERROR) += fs-monitor
 
 userccflags += -I usr/include -Wall
 
+obj-$(CONFIG_SAMPLE_FANOTIFY_FASTPATH) += fastpath-mod.o
+
+userprogs-always-$(CONFIG_SAMPLE_FANOTIFY_FASTPATH) += fastpath-user
diff --git a/samples/fanotify/fastpath-mod.c b/samples/fanotify/fastpath-mod.c
new file mode 100644
index 000000000000..06c4b42ff114
--- /dev/null
+++ b/samples/fanotify/fastpath-mod.c
@@ -0,0 +1,138 @@
+// SPDX-License-Identifier: GPL-2.0-only
+#include <linux/fsnotify.h>
+#include <linux/fanotify.h>
+#include <linux/list.h>
+#include <linux/module.h>
+#include <linux/string.h>
+
+struct prefix_item {
+	const char *prefix;
+	struct list_head list;
+};
+
+struct sample_fp_data {
+	/*
+	 * str_table contains all the prefixes to ignore. For example,
+	 * "prefix1\0prefix2\0prefix3"
+	 */
+	char *str_table;
+
+	/* item->prefix points to different prefixes in the str_table. */
+	struct list_head item_list;
+};
+
+static int sample_fp_handler(struct fsnotify_group *group,
+			     struct fanotify_fastpath_hook *fp_hook,
+			     struct fanotify_fastpath_event *fp_event)
+{
+	const struct qstr *file_name = fp_event->file_name;
+	struct sample_fp_data *fp_data;
+	struct prefix_item *item;
+
+	if (!file_name)
+		return FAN_FP_RET_SEND_TO_USERSPACE;
+	fp_data = fp_hook->data;
+
+	list_for_each_entry(item, &fp_data->item_list, list) {
+		if (strstr(file_name->name, item->prefix) == (char *)file_name->name)
+			return FAN_FP_RET_SKIP_EVENT;
+	}
+
+	return FAN_FP_RET_SEND_TO_USERSPACE;
+}
+
+static int add_item(struct sample_fp_data *fp_data, const char *prev)
+{
+	struct prefix_item *item;
+
+	item = kzalloc(sizeof(*item), GFP_KERNEL);
+	if (!item)
+		return -ENOMEM;
+	item->prefix = prev;
+	list_add_tail(&item->list, &fp_data->item_list);
+	return 0;
+}
+
+static void free_sample_fp_data(struct sample_fp_data *fp_data)
+{
+	struct prefix_item *item, *tmp;
+
+	list_for_each_entry_safe(item, tmp, &fp_data->item_list, list) {
+		list_del_init(&item->list);
+		kfree(item);
+	}
+	kfree(fp_data->str_table);
+	kfree(fp_data);
+}
+
+static int sample_fp_init(struct fanotify_fastpath_hook *fp_hook, const char *args)
+{
+	struct sample_fp_data *fp_data = kzalloc(sizeof(struct sample_fp_data), GFP_KERNEL);
+	char *p, *prev;
+	int ret;
+
+	if (!fp_data)
+		return -ENOMEM;
+
+	/* Make a copy of the list of prefix to ignore */
+	fp_data->str_table = kstrndup(args, FAN_FP_ARGS_MAX, GFP_KERNEL);
+	if (!fp_data->str_table) {
+		ret = -ENOMEM;
+		goto err_out;
+	}
+
+	INIT_LIST_HEAD(&fp_data->item_list);
+	prev = fp_data->str_table;
+	p = fp_data->str_table;
+
+	/* Update the list replace ',' with '\n'*/
+	while ((p = strchr(p, ',')) != NULL) {
+		*p = '\0';
+		ret = add_item(fp_data, prev);
+		if (ret)
+			goto err_out;
+		p = p + 1;
+		prev = p;
+	}
+
+	ret = add_item(fp_data, prev);
+	if (ret)
+		goto err_out;
+
+	fp_hook->data = fp_data;
+
+	return 0;
+
+err_out:
+	free_sample_fp_data(fp_data);
+	return ret;
+}
+
+static void sample_fp_free(struct fanotify_fastpath_hook *fp_hook)
+{
+	free_sample_fp_data(fp_hook->data);
+}
+
+static struct fanotify_fastpath_ops fan_fp_ignore_a_ops = {
+	.fp_handler = sample_fp_handler,
+	.fp_init = sample_fp_init,
+	.fp_free = sample_fp_free,
+	.name = "ignore-prefix",
+	.owner = THIS_MODULE,
+};
+
+static int __init fanotify_fastpath_sample_init(void)
+{
+	return fanotify_fastpath_register(&fan_fp_ignore_a_ops);
+}
+static void __exit fanotify_fastpath_sample_exit(void)
+{
+	fanotify_fastpath_unregister(&fan_fp_ignore_a_ops);
+}
+
+module_init(fanotify_fastpath_sample_init);
+module_exit(fanotify_fastpath_sample_exit);
+
+MODULE_AUTHOR("Song Liu");
+MODULE_DESCRIPTION("Example fanotify fastpath handler");
+MODULE_LICENSE("GPL");
diff --git a/samples/fanotify/fastpath-user.c b/samples/fanotify/fastpath-user.c
new file mode 100644
index 000000000000..f301c4e0d21a
--- /dev/null
+++ b/samples/fanotify/fastpath-user.c
@@ -0,0 +1,90 @@
+// SPDX-License-Identifier: GPL-2.0
+#define _GNU_SOURCE
+#include <err.h>
+#include <fcntl.h>
+#include <stdio.h>
+#include <string.h>
+#include <sys/fanotify.h>
+#include <unistd.h>
+#include <sys/ioctl.h>
+
+static int total_event_cnt;
+
+static void handle_notifications(char *buffer, int len)
+{
+	struct fanotify_event_metadata *event =
+		(struct fanotify_event_metadata *) buffer;
+	struct fanotify_event_info_header *info;
+	struct fanotify_event_info_fid *fid;
+	struct file_handle *handle;
+	char *name;
+	int off;
+
+	for (; FAN_EVENT_OK(event, len); event = FAN_EVENT_NEXT(event, len)) {
+		for (off = sizeof(*event) ; off < event->event_len;
+		     off += info->len) {
+			info = (struct fanotify_event_info_header *)
+				((char *) event + off);
+			switch (info->info_type) {
+			case FAN_EVENT_INFO_TYPE_DFID_NAME:
+				fid = (struct fanotify_event_info_fid *) info;
+				handle = (struct file_handle *)&fid->handle;
+				name = (char *)handle + sizeof(*handle) + handle->handle_bytes;
+
+				printf("Accessing file %s\n", name);
+				total_event_cnt++;
+				break;
+			default:
+				break;
+			}
+		}
+	}
+}
+
+int main(int argc, char **argv)
+{
+	struct fanotify_fastpath_args args = {
+		.name = "ignore-prefix",
+		.version = 1,
+		.flags = 0,
+	};
+	char buffer[BUFSIZ];
+	int fd;
+
+	if (argc < 3) {
+		printf("Usage\n"
+		       "\t %s <path to monitor> <prefix to ignore>\n",
+			argv[0]);
+		return 1;
+	}
+
+	args.init_args = (__u64)argv[2];
+	args.init_args_len = strlen(argv[2]) + 1;
+
+	fd = fanotify_init(FAN_CLASS_NOTIF | FAN_REPORT_NAME | FAN_REPORT_DIR_FID, O_RDONLY);
+	if (fd < 0)
+		errx(1, "fanotify_init");
+
+	if (fanotify_mark(fd, FAN_MARK_ADD,
+			  FAN_OPEN | FAN_ONDIR | FAN_EVENT_ON_CHILD,
+			  AT_FDCWD, argv[1])) {
+		errx(1, "fanotify_mark");
+	}
+
+	if (ioctl(fd, FAN_IOC_ADD_FP, &args))
+		errx(1, "ioctl");
+
+	while (total_event_cnt < 10) {
+		int n = read(fd, buffer, BUFSIZ);
+
+		if (n < 0)
+			errx(1, "read");
+
+		handle_notifications(buffer, n);
+	}
+
+	ioctl(fd, FAN_IOC_DEL_FP);
+	close(fd);
+
+	return 0;
+}
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC bpf-next fanotify 3/5] bpf: Make bpf inode storage available to tracing programs
  2024-10-29 23:12 [RFC bpf-next fanotify 0/5] Fanotify fastpath handler Song Liu
  2024-10-29 23:12 ` [RFC bpf-next fanotify 1/5] fanotify: Introduce fanotify " Song Liu
  2024-10-29 23:12 ` [RFC bpf-next fanotify 2/5] samples/fanotify: Add a sample " Song Liu
@ 2024-10-29 23:12 ` Song Liu
  2024-10-29 23:12 ` [RFC bpf-next fanotify 4/5] fanotify: Enable bpf based fanotify fastpath handler Song Liu
  2024-10-29 23:12 ` [RFC bpf-next fanotify 5/5] selftests/bpf: Add test for BPF " Song Liu
  4 siblings, 0 replies; 29+ messages in thread
From: Song Liu @ 2024-10-29 23:12 UTC (permalink / raw)
  To: bpf, linux-fsdevel, linux-kernel
  Cc: kernel-team, andrii, eddyz87, ast, daniel, martin.lau, viro,
	brauner, jack, kpsingh, mattbobrowski, amir73il, repnop, jlayton,
	josef, Song Liu

Use the same recursion voidance mechanism as task local storage.

TODO: Better testing, add selftests.

Signed-off-by: Song Liu <song@kernel.org>
---
 include/linux/bpf.h            |   9 ++
 include/linux/bpf_lsm.h        |  29 ------
 include/linux/fs.h             |   4 +
 kernel/bpf/Makefile            |   3 +-
 kernel/bpf/bpf_inode_storage.c | 174 +++++++++++++++++++++++++--------
 kernel/bpf/bpf_lsm.c           |   4 -
 kernel/trace/bpf_trace.c       |   8 ++
 security/bpf/hooks.c           |   5 -
 8 files changed, 156 insertions(+), 80 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 19d8ca8ac960..863cb972d1fa 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -2630,6 +2630,7 @@ struct bpf_link *bpf_link_by_id(u32 id);
 const struct bpf_func_proto *bpf_base_func_proto(enum bpf_func_id func_id,
 						 const struct bpf_prog *prog);
 void bpf_task_storage_free(struct task_struct *task);
+void bpf_inode_storage_free(struct inode *inode);
 void bpf_cgrp_storage_free(struct cgroup *cgroup);
 bool bpf_prog_has_kfunc_call(const struct bpf_prog *prog);
 const struct btf_func_model *
@@ -2900,6 +2901,10 @@ static inline void bpf_task_storage_free(struct task_struct *task)
 {
 }
 
+static inline void bpf_inode_storage_free(struct inode *inode)
+{
+}
+
 static inline bool bpf_prog_has_kfunc_call(const struct bpf_prog *prog)
 {
 	return false;
@@ -3263,6 +3268,10 @@ extern const struct bpf_func_proto bpf_task_storage_get_recur_proto;
 extern const struct bpf_func_proto bpf_task_storage_get_proto;
 extern const struct bpf_func_proto bpf_task_storage_delete_recur_proto;
 extern const struct bpf_func_proto bpf_task_storage_delete_proto;
+extern const struct bpf_func_proto bpf_inode_storage_get_proto;
+extern const struct bpf_func_proto bpf_inode_storage_get_recur_proto;
+extern const struct bpf_func_proto bpf_inode_storage_delete_proto;
+extern const struct bpf_func_proto bpf_inode_storage_delete_recur_proto;
 extern const struct bpf_func_proto bpf_for_each_map_elem_proto;
 extern const struct bpf_func_proto bpf_btf_find_by_name_kind_proto;
 extern const struct bpf_func_proto bpf_sk_setsockopt_proto;
diff --git a/include/linux/bpf_lsm.h b/include/linux/bpf_lsm.h
index aefcd6564251..a819c2f0a062 100644
--- a/include/linux/bpf_lsm.h
+++ b/include/linux/bpf_lsm.h
@@ -19,31 +19,12 @@
 #include <linux/lsm_hook_defs.h>
 #undef LSM_HOOK
 
-struct bpf_storage_blob {
-	struct bpf_local_storage __rcu *storage;
-};
-
-extern struct lsm_blob_sizes bpf_lsm_blob_sizes;
-
 int bpf_lsm_verify_prog(struct bpf_verifier_log *vlog,
 			const struct bpf_prog *prog);
 
 bool bpf_lsm_is_sleepable_hook(u32 btf_id);
 bool bpf_lsm_is_trusted(const struct bpf_prog *prog);
 
-static inline struct bpf_storage_blob *bpf_inode(
-	const struct inode *inode)
-{
-	if (unlikely(!inode->i_security))
-		return NULL;
-
-	return inode->i_security + bpf_lsm_blob_sizes.lbs_inode;
-}
-
-extern const struct bpf_func_proto bpf_inode_storage_get_proto;
-extern const struct bpf_func_proto bpf_inode_storage_delete_proto;
-void bpf_inode_storage_free(struct inode *inode);
-
 void bpf_lsm_find_cgroup_shim(const struct bpf_prog *prog, bpf_func_t *bpf_func);
 
 int bpf_lsm_get_retval_range(const struct bpf_prog *prog,
@@ -66,16 +47,6 @@ static inline int bpf_lsm_verify_prog(struct bpf_verifier_log *vlog,
 	return -EOPNOTSUPP;
 }
 
-static inline struct bpf_storage_blob *bpf_inode(
-	const struct inode *inode)
-{
-	return NULL;
-}
-
-static inline void bpf_inode_storage_free(struct inode *inode)
-{
-}
-
 static inline void bpf_lsm_find_cgroup_shim(const struct bpf_prog *prog,
 					   bpf_func_t *bpf_func)
 {
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 3559446279c1..479097e4dd5b 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -79,6 +79,7 @@ struct fs_context;
 struct fs_parameter_spec;
 struct fileattr;
 struct iomap_ops;
+struct bpf_local_storage;
 
 extern void __init inode_init(void);
 extern void __init inode_init_early(void);
@@ -648,6 +649,9 @@ struct inode {
 #ifdef CONFIG_SECURITY
 	void			*i_security;
 #endif
+#ifdef CONFIG_BPF_SYSCALL
+	struct bpf_local_storage __rcu *i_bpf_storage;
+#endif
 
 	/* Stat data, not accessed from path walking */
 	unsigned long		i_ino;
diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile
index 9b9c151b5c82..a5b7136b4884 100644
--- a/kernel/bpf/Makefile
+++ b/kernel/bpf/Makefile
@@ -10,8 +10,7 @@ obj-$(CONFIG_BPF_SYSCALL) += syscall.o verifier.o inode.o helpers.o tnum.o log.o
 obj-$(CONFIG_BPF_SYSCALL) += bpf_iter.o map_iter.o task_iter.o prog_iter.o link_iter.o
 obj-$(CONFIG_BPF_SYSCALL) += hashtab.o arraymap.o percpu_freelist.o bpf_lru_list.o lpm_trie.o map_in_map.o bloom_filter.o
 obj-$(CONFIG_BPF_SYSCALL) += local_storage.o queue_stack_maps.o ringbuf.o
-obj-$(CONFIG_BPF_SYSCALL) += bpf_local_storage.o bpf_task_storage.o
-obj-${CONFIG_BPF_LSM}	  += bpf_inode_storage.o
+obj-$(CONFIG_BPF_SYSCALL) += bpf_local_storage.o bpf_task_storage.o bpf_inode_storage.o
 obj-$(CONFIG_BPF_SYSCALL) += disasm.o mprog.o
 obj-$(CONFIG_BPF_JIT) += trampoline.o
 obj-$(CONFIG_BPF_SYSCALL) += btf.o memalloc.o
diff --git a/kernel/bpf/bpf_inode_storage.c b/kernel/bpf/bpf_inode_storage.c
index 29da6d3838f6..79bf7ebb329c 100644
--- a/kernel/bpf/bpf_inode_storage.c
+++ b/kernel/bpf/bpf_inode_storage.c
@@ -21,16 +21,36 @@
 
 DEFINE_BPF_STORAGE_CACHE(inode_cache);
 
-static struct bpf_local_storage __rcu **
-inode_storage_ptr(void *owner)
+static DEFINE_PER_CPU(int, bpf_inode_storage_busy);
+
+static void bpf_inode_storage_lock(void)
+{
+	migrate_disable();
+	this_cpu_inc(bpf_inode_storage_busy);
+}
+
+static void bpf_inode_storage_unlock(void)
+{
+	this_cpu_dec(bpf_inode_storage_busy);
+	migrate_enable();
+}
+
+static bool bpf_inode_storage_trylock(void)
+{
+	migrate_disable();
+	if (unlikely(this_cpu_inc_return(bpf_inode_storage_busy) != 1)) {
+		this_cpu_dec(bpf_inode_storage_busy);
+		migrate_enable();
+		return false;
+	}
+	return true;
+}
+
+static struct bpf_local_storage __rcu **inode_storage_ptr(void *owner)
 {
 	struct inode *inode = owner;
-	struct bpf_storage_blob *bsb;
 
-	bsb = bpf_inode(inode);
-	if (!bsb)
-		return NULL;
-	return &bsb->storage;
+	return &inode->i_bpf_storage;
 }
 
 static struct bpf_local_storage_data *inode_storage_lookup(struct inode *inode,
@@ -39,14 +59,9 @@ static struct bpf_local_storage_data *inode_storage_lookup(struct inode *inode,
 {
 	struct bpf_local_storage *inode_storage;
 	struct bpf_local_storage_map *smap;
-	struct bpf_storage_blob *bsb;
-
-	bsb = bpf_inode(inode);
-	if (!bsb)
-		return NULL;
 
 	inode_storage =
-		rcu_dereference_check(bsb->storage, bpf_rcu_lock_held());
+		rcu_dereference_check(inode->i_bpf_storage, bpf_rcu_lock_held());
 	if (!inode_storage)
 		return NULL;
 
@@ -57,21 +72,18 @@ static struct bpf_local_storage_data *inode_storage_lookup(struct inode *inode,
 void bpf_inode_storage_free(struct inode *inode)
 {
 	struct bpf_local_storage *local_storage;
-	struct bpf_storage_blob *bsb;
-
-	bsb = bpf_inode(inode);
-	if (!bsb)
-		return;
 
 	rcu_read_lock();
 
-	local_storage = rcu_dereference(bsb->storage);
+	local_storage = rcu_dereference(inode->i_bpf_storage);
 	if (!local_storage) {
 		rcu_read_unlock();
 		return;
 	}
 
+	bpf_inode_storage_lock();
 	bpf_local_storage_destroy(local_storage);
+	bpf_inode_storage_unlock();
 	rcu_read_unlock();
 }
 
@@ -83,7 +95,9 @@ static void *bpf_fd_inode_storage_lookup_elem(struct bpf_map *map, void *key)
 	if (fd_empty(f))
 		return ERR_PTR(-EBADF);
 
+	bpf_inode_storage_lock();
 	sdata = inode_storage_lookup(file_inode(fd_file(f)), map, true);
+	bpf_inode_storage_unlock();
 	return sdata ? sdata->data : NULL;
 }
 
@@ -98,13 +112,16 @@ static long bpf_fd_inode_storage_update_elem(struct bpf_map *map, void *key,
 	if (!inode_storage_ptr(file_inode(fd_file(f))))
 		return -EBADF;
 
+	bpf_inode_storage_lock();
 	sdata = bpf_local_storage_update(file_inode(fd_file(f)),
 					 (struct bpf_local_storage_map *)map,
 					 value, map_flags, GFP_ATOMIC);
+	bpf_inode_storage_unlock();
 	return PTR_ERR_OR_ZERO(sdata);
 }
 
-static int inode_storage_delete(struct inode *inode, struct bpf_map *map)
+static int inode_storage_delete(struct inode *inode, struct bpf_map *map,
+				bool nobusy)
 {
 	struct bpf_local_storage_data *sdata;
 
@@ -112,6 +129,9 @@ static int inode_storage_delete(struct inode *inode, struct bpf_map *map)
 	if (!sdata)
 		return -ENOENT;
 
+	if (!nobusy)
+		return -EBUSY;
+
 	bpf_selem_unlink(SELEM(sdata), false);
 
 	return 0;
@@ -119,60 +139,114 @@ static int inode_storage_delete(struct inode *inode, struct bpf_map *map)
 
 static long bpf_fd_inode_storage_delete_elem(struct bpf_map *map, void *key)
 {
+	int err;
+
 	CLASS(fd_raw, f)(*(int *)key);
 
 	if (fd_empty(f))
 		return -EBADF;
-	return inode_storage_delete(file_inode(fd_file(f)), map);
+	bpf_inode_storage_lock();
+	err = inode_storage_delete(file_inode(fd_file(f)), map, true);
+	bpf_inode_storage_unlock();
+	return err;
 }
 
-/* *gfp_flags* is a hidden argument provided by the verifier */
-BPF_CALL_5(bpf_inode_storage_get, struct bpf_map *, map, struct inode *, inode,
-	   void *, value, u64, flags, gfp_t, gfp_flags)
+static void *__bpf_inode_storage_get(struct bpf_map *map, struct inode *inode,
+				     void *value, u64 flags, gfp_t gfp_flags, bool nobusy)
 {
 	struct bpf_local_storage_data *sdata;
 
-	WARN_ON_ONCE(!bpf_rcu_lock_held());
-	if (flags & ~(BPF_LOCAL_STORAGE_GET_F_CREATE))
-		return (unsigned long)NULL;
-
 	/* explicitly check that the inode_storage_ptr is not
 	 * NULL as inode_storage_lookup returns NULL in this case and
 	 * bpf_local_storage_update expects the owner to have a
 	 * valid storage pointer.
 	 */
 	if (!inode || !inode_storage_ptr(inode))
-		return (unsigned long)NULL;
+		return NULL;
 
 	sdata = inode_storage_lookup(inode, map, true);
 	if (sdata)
-		return (unsigned long)sdata->data;
+		return sdata->data;
 
-	/* This helper must only called from where the inode is guaranteed
-	 * to have a refcount and cannot be freed.
-	 */
-	if (flags & BPF_LOCAL_STORAGE_GET_F_CREATE) {
+	/* only allocate new storage, when the inode is refcounted */
+	if (atomic_read(&inode->i_count) &&
+	    flags & BPF_LOCAL_STORAGE_GET_F_CREATE) {
 		sdata = bpf_local_storage_update(
 			inode, (struct bpf_local_storage_map *)map, value,
 			BPF_NOEXIST, gfp_flags);
-		return IS_ERR(sdata) ? (unsigned long)NULL :
-					     (unsigned long)sdata->data;
+		return IS_ERR(sdata) ? NULL : sdata->data;
 	}
 
-	return (unsigned long)NULL;
+	return NULL;
 }
 
-BPF_CALL_2(bpf_inode_storage_delete,
-	   struct bpf_map *, map, struct inode *, inode)
+/* *gfp_flags* is a hidden argument provided by the verifier */
+BPF_CALL_5(bpf_inode_storage_get_recur, struct bpf_map *, map, struct inode *, inode,
+	   void *, value, u64, flags, gfp_t, gfp_flags)
 {
+	bool nobusy;
+	void *data;
+
+	WARN_ON_ONCE(!bpf_rcu_lock_held());
+	if (flags & ~(BPF_LOCAL_STORAGE_GET_F_CREATE))
+		return (unsigned long)NULL;
+
+	nobusy = bpf_inode_storage_trylock();
+	data = __bpf_inode_storage_get(map, inode, value, flags, gfp_flags, nobusy);
+	if (nobusy)
+		bpf_inode_storage_unlock();
+	return (unsigned long)data;
+}
+
+/* *gfp_flags* is a hidden argument provided by the verifier */
+BPF_CALL_5(bpf_inode_storage_get, struct bpf_map *, map, struct inode *, inode,
+	   void *, value, u64, flags, gfp_t, gfp_flags)
+{
+	void *data;
+
+	WARN_ON_ONCE(!bpf_rcu_lock_held());
+	if (flags & ~(BPF_LOCAL_STORAGE_GET_F_CREATE))
+		return (unsigned long)NULL;
+
+	bpf_inode_storage_lock();
+	data = __bpf_inode_storage_get(map, inode, value, flags, gfp_flags, true);
+	bpf_inode_storage_unlock();
+	return (unsigned long)data;
+}
+
+BPF_CALL_2(bpf_inode_storage_delete_recur, struct bpf_map *, map, struct inode *, inode)
+{
+	bool nobusy;
+	int ret;
+
 	WARN_ON_ONCE(!bpf_rcu_lock_held());
 	if (!inode)
 		return -EINVAL;
 
+	nobusy = bpf_inode_storage_trylock();
 	/* This helper must only called from where the inode is guaranteed
 	 * to have a refcount and cannot be freed.
 	 */
-	return inode_storage_delete(inode, map);
+	ret = inode_storage_delete(inode, map, nobusy);
+	bpf_inode_storage_unlock();
+	return ret;
+}
+
+BPF_CALL_2(bpf_inode_storage_delete, struct bpf_map *, map, struct inode *, inode)
+{
+	int ret;
+
+	WARN_ON_ONCE(!bpf_rcu_lock_held());
+	if (!inode)
+		return -EINVAL;
+
+	bpf_inode_storage_lock();
+	/* This helper must only called from where the inode is guaranteed
+	 * to have a refcount and cannot be freed.
+	 */
+	ret = inode_storage_delete(inode, map, true);
+	bpf_inode_storage_unlock();
+	return ret;
 }
 
 static int notsupp_get_next_key(struct bpf_map *map, void *key,
@@ -208,6 +282,17 @@ const struct bpf_map_ops inode_storage_map_ops = {
 
 BTF_ID_LIST_SINGLE(bpf_inode_storage_btf_ids, struct, inode)
 
+const struct bpf_func_proto bpf_inode_storage_get_recur_proto = {
+	.func		= bpf_inode_storage_get_recur,
+	.gpl_only	= false,
+	.ret_type	= RET_PTR_TO_MAP_VALUE_OR_NULL,
+	.arg1_type	= ARG_CONST_MAP_PTR,
+	.arg2_type	= ARG_PTR_TO_BTF_ID_OR_NULL,
+	.arg2_btf_id	= &bpf_inode_storage_btf_ids[0],
+	.arg3_type	= ARG_PTR_TO_MAP_VALUE_OR_NULL,
+	.arg4_type	= ARG_ANYTHING,
+};
+
 const struct bpf_func_proto bpf_inode_storage_get_proto = {
 	.func		= bpf_inode_storage_get,
 	.gpl_only	= false,
@@ -219,6 +304,15 @@ const struct bpf_func_proto bpf_inode_storage_get_proto = {
 	.arg4_type	= ARG_ANYTHING,
 };
 
+const struct bpf_func_proto bpf_inode_storage_delete_recur_proto = {
+	.func		= bpf_inode_storage_delete_recur,
+	.gpl_only	= false,
+	.ret_type	= RET_INTEGER,
+	.arg1_type	= ARG_CONST_MAP_PTR,
+	.arg2_type	= ARG_PTR_TO_BTF_ID_OR_NULL,
+	.arg2_btf_id	= &bpf_inode_storage_btf_ids[0],
+};
+
 const struct bpf_func_proto bpf_inode_storage_delete_proto = {
 	.func		= bpf_inode_storage_delete,
 	.gpl_only	= false,
diff --git a/kernel/bpf/bpf_lsm.c b/kernel/bpf/bpf_lsm.c
index 6292ac5f9bd1..51e2de17325a 100644
--- a/kernel/bpf/bpf_lsm.c
+++ b/kernel/bpf/bpf_lsm.c
@@ -231,10 +231,6 @@ bpf_lsm_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 	}
 
 	switch (func_id) {
-	case BPF_FUNC_inode_storage_get:
-		return &bpf_inode_storage_get_proto;
-	case BPF_FUNC_inode_storage_delete:
-		return &bpf_inode_storage_delete_proto;
 #ifdef CONFIG_NET
 	case BPF_FUNC_sk_storage_get:
 		return &bpf_sk_storage_get_proto;
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index a582cd25ca87..3ec39e6704e2 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -1529,6 +1529,14 @@ bpf_tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 		if (bpf_prog_check_recur(prog))
 			return &bpf_task_storage_delete_recur_proto;
 		return &bpf_task_storage_delete_proto;
+	case BPF_FUNC_inode_storage_get:
+		if (bpf_prog_check_recur(prog))
+			return &bpf_inode_storage_get_recur_proto;
+		return &bpf_inode_storage_get_proto;
+	case BPF_FUNC_inode_storage_delete:
+		if (bpf_prog_check_recur(prog))
+			return &bpf_inode_storage_delete_recur_proto;
+		return &bpf_inode_storage_delete_proto;
 	case BPF_FUNC_for_each_map_elem:
 		return &bpf_for_each_map_elem_proto;
 	case BPF_FUNC_snprintf:
diff --git a/security/bpf/hooks.c b/security/bpf/hooks.c
index 3663aec7bcbd..625e0cc7027a 100644
--- a/security/bpf/hooks.c
+++ b/security/bpf/hooks.c
@@ -29,12 +29,7 @@ static int __init bpf_lsm_init(void)
 	return 0;
 }
 
-struct lsm_blob_sizes bpf_lsm_blob_sizes __ro_after_init = {
-	.lbs_inode = sizeof(struct bpf_storage_blob),
-};
-
 DEFINE_LSM(bpf) = {
 	.name = "bpf",
 	.init = bpf_lsm_init,
-	.blobs = &bpf_lsm_blob_sizes
 };
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC bpf-next fanotify 4/5] fanotify: Enable bpf based fanotify fastpath handler
  2024-10-29 23:12 [RFC bpf-next fanotify 0/5] Fanotify fastpath handler Song Liu
                   ` (2 preceding siblings ...)
  2024-10-29 23:12 ` [RFC bpf-next fanotify 3/5] bpf: Make bpf inode storage available to tracing programs Song Liu
@ 2024-10-29 23:12 ` Song Liu
  2024-10-29 23:12 ` [RFC bpf-next fanotify 5/5] selftests/bpf: Add test for BPF " Song Liu
  4 siblings, 0 replies; 29+ messages in thread
From: Song Liu @ 2024-10-29 23:12 UTC (permalink / raw)
  To: bpf, linux-fsdevel, linux-kernel
  Cc: kernel-team, andrii, eddyz87, ast, daniel, martin.lau, viro,
	brauner, jack, kpsingh, mattbobrowski, amir73il, repnop, jlayton,
	josef, Song Liu

Allow user to write fanotify fastpath handlers with bpf programs.

Major changes:
1. Make kfuncs in fs/bpf_fs_kfuncs.c available to STRUCT_OPS programs.
2. Add kfunc bpf_iput;
3. Add kfunc bpf_fanotify_data_inode;
4. Add struct_ops bpf_fanotify_fastpath_ops.

TODO:
1. Maybe split this into multiple patches.
2. With current logic, the bpf based fastpath handler is added to the
   global list, and thus available to all users. This is similar to
   bpf based tcp congestion algorithms. It is possible to add an API
   so that the bpf based handler is not added to global list, which is
   similar to hid-bpf. I plan to add that API later.

Signed-off-by: Song Liu <song@kernel.org>
---
 fs/Makefile                            |   2 +-
 fs/bpf_fs_kfuncs.c                     |  23 +++-
 fs/notify/fanotify/fanotify_fastpath.c | 153 ++++++++++++++++++++++++-
 kernel/bpf/verifier.c                  |   5 +
 4 files changed, 177 insertions(+), 6 deletions(-)

diff --git a/fs/Makefile b/fs/Makefile
index 61679fd587b7..1043d999262d 100644
--- a/fs/Makefile
+++ b/fs/Makefile
@@ -129,4 +129,4 @@ obj-$(CONFIG_EFIVAR_FS)		+= efivarfs/
 obj-$(CONFIG_EROFS_FS)		+= erofs/
 obj-$(CONFIG_VBOXSF_FS)		+= vboxsf/
 obj-$(CONFIG_ZONEFS_FS)		+= zonefs/
-obj-$(CONFIG_BPF_LSM)		+= bpf_fs_kfuncs.o
+obj-$(CONFIG_BPF_SYSCALL)	+= bpf_fs_kfuncs.o
diff --git a/fs/bpf_fs_kfuncs.c b/fs/bpf_fs_kfuncs.c
index 3fe9f59ef867..8110276faff9 100644
--- a/fs/bpf_fs_kfuncs.c
+++ b/fs/bpf_fs_kfuncs.c
@@ -152,6 +152,18 @@ __bpf_kfunc int bpf_get_file_xattr(struct file *file, const char *name__str,
 	return bpf_get_dentry_xattr(dentry, name__str, value_p);
 }
 
+/**
+ * bpf_iput - Drop a reference on the inode
+ *
+ * @inode: inode to drop reference.
+ *
+ * Drop a refcount on inode.
+ */
+__bpf_kfunc void bpf_iput(struct inode *inode)
+{
+	iput(inode);
+}
+
 __bpf_kfunc_end_defs();
 
 BTF_KFUNCS_START(bpf_fs_kfunc_set_ids)
@@ -161,12 +173,14 @@ BTF_ID_FLAGS(func, bpf_put_file, KF_RELEASE)
 BTF_ID_FLAGS(func, bpf_path_d_path, KF_TRUSTED_ARGS)
 BTF_ID_FLAGS(func, bpf_get_dentry_xattr, KF_SLEEPABLE | KF_TRUSTED_ARGS)
 BTF_ID_FLAGS(func, bpf_get_file_xattr, KF_SLEEPABLE | KF_TRUSTED_ARGS)
+BTF_ID_FLAGS(func, bpf_iput, KF_RELEASE)
 BTF_KFUNCS_END(bpf_fs_kfunc_set_ids)
 
 static int bpf_fs_kfuncs_filter(const struct bpf_prog *prog, u32 kfunc_id)
 {
 	if (!btf_id_set8_contains(&bpf_fs_kfunc_set_ids, kfunc_id) ||
-	    prog->type == BPF_PROG_TYPE_LSM)
+	    prog->type == BPF_PROG_TYPE_LSM ||
+	    prog->type == BPF_PROG_TYPE_STRUCT_OPS)
 		return 0;
 	return -EACCES;
 }
@@ -179,7 +193,12 @@ static const struct btf_kfunc_id_set bpf_fs_kfunc_set = {
 
 static int __init bpf_fs_kfuncs_init(void)
 {
-	return register_btf_kfunc_id_set(BPF_PROG_TYPE_LSM, &bpf_fs_kfunc_set);
+	int ret;
+
+	ret = register_btf_kfunc_id_set(BPF_PROG_TYPE_LSM, &bpf_fs_kfunc_set);
+	ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_STRUCT_OPS, &bpf_fs_kfunc_set);
+
+	return ret;
 }
 
 late_initcall(bpf_fs_kfuncs_init);
diff --git a/fs/notify/fanotify/fanotify_fastpath.c b/fs/notify/fanotify/fanotify_fastpath.c
index 0453a1ac25b1..4781270e7b6a 100644
--- a/fs/notify/fanotify/fanotify_fastpath.c
+++ b/fs/notify/fanotify/fanotify_fastpath.c
@@ -1,6 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0
 #include <linux/fanotify.h>
 #include <linux/module.h>
+#include <linux/bpf.h>
 
 #include "fanotify.h"
 
@@ -107,7 +108,7 @@ int fanotify_fastpath_add(struct fsnotify_group *group,
 
 	spin_lock(&fp_list_lock);
 	fp_ops = fanotify_fastpath_find(args.name);
-	if (!fp_ops || !try_module_get(fp_ops->owner)) {
+	if (!fp_ops || !bpf_try_module_get(fp_ops, fp_ops->owner)) {
 		spin_unlock(&fp_list_lock);
 		ret = -ENOENT;
 		goto err_free_hook;
@@ -140,7 +141,7 @@ int fanotify_fastpath_add(struct fsnotify_group *group,
 	return ret;
 
 err_module_put:
-	module_put(fp_ops->owner);
+	bpf_module_put(fp_ops, fp_ops->owner);
 err_free_hook:
 	kfree(fp_hook);
 	goto out;
@@ -151,7 +152,7 @@ void fanotify_fastpath_hook_free(struct fanotify_fastpath_hook *fp_hook)
 	if (fp_hook->ops->fp_free)
 		fp_hook->ops->fp_free(fp_hook);
 
-	module_put(fp_hook->ops->owner);
+	bpf_module_put(fp_hook->ops, fp_hook->ops->owner);
 }
 
 void fanotify_fastpath_del(struct fsnotify_group *group)
@@ -169,3 +170,149 @@ void fanotify_fastpath_del(struct fsnotify_group *group)
 out:
 	fsnotify_group_unlock(group);
 }
+
+__bpf_kfunc_start_defs();
+
+/**
+ * bpf_fanotify_data_inode - get inode from fanotify_fastpath_event
+ *
+ * @event: fanotify_fastpath_event to get inode from
+ *
+ * Get referenced inode from fanotify_fastpath_event.
+ *
+ * Return: A refcounted inode or NULL.
+ *
+ */
+__bpf_kfunc struct inode *bpf_fanotify_data_inode(struct fanotify_fastpath_event *event)
+{
+	struct inode *inode = fsnotify_data_inode(event->data, event->data_type);
+
+	return inode ? igrab(inode) : NULL;
+}
+
+__bpf_kfunc_end_defs();
+
+BTF_KFUNCS_START(bpf_fanotify_kfunc_set_ids)
+BTF_ID_FLAGS(func, bpf_fanotify_data_inode,
+	     KF_ACQUIRE | KF_TRUSTED_ARGS | KF_RET_NULL)
+BTF_KFUNCS_END(bpf_fanotify_kfunc_set_ids)
+
+static const struct btf_kfunc_id_set bpf_fanotify_kfunc_set = {
+	.owner = THIS_MODULE,
+	.set   = &bpf_fanotify_kfunc_set_ids,
+};
+
+static const struct bpf_func_proto *
+bpf_fanotify_fastpath_get_func_proto(enum bpf_func_id func_id,
+				     const struct bpf_prog *prog)
+{
+	return tracing_prog_func_proto(func_id, prog);
+}
+
+static bool bpf_fanotify_fastpath_is_valid_access(int off, int size,
+						  enum bpf_access_type type,
+						  const struct bpf_prog *prog,
+						  struct bpf_insn_access_aux *info)
+{
+	if (!bpf_tracing_btf_ctx_access(off, size, type, prog, info))
+		return false;
+
+	return true;
+}
+
+static int bpf_fanotify_fastpath_btf_struct_access(struct bpf_verifier_log *log,
+						   const struct bpf_reg_state *reg,
+						   int off, int size)
+{
+	return 0;
+}
+
+static const struct bpf_verifier_ops bpf_fanotify_fastpath_verifier_ops = {
+	.get_func_proto		= bpf_fanotify_fastpath_get_func_proto,
+	.is_valid_access	= bpf_fanotify_fastpath_is_valid_access,
+	.btf_struct_access	= bpf_fanotify_fastpath_btf_struct_access,
+};
+
+static int bpf_fanotify_fastpath_reg(void *kdata, struct bpf_link *link)
+{
+	return fanotify_fastpath_register(kdata);
+}
+
+static void bpf_fanotify_fastpath_unreg(void *kdata, struct bpf_link *link)
+{
+	fanotify_fastpath_unregister(kdata);
+}
+
+static int bpf_fanotify_fastpath_init(struct btf *btf)
+{
+	return 0;
+}
+
+static int bpf_fanotify_fastpath_init_member(const struct btf_type *t,
+					     const struct btf_member *member,
+					     void *kdata, const void *udata)
+{
+	const struct fanotify_fastpath_ops *uops;
+	struct fanotify_fastpath_ops *ops;
+	u32 moff;
+	int ret;
+
+	uops = (const struct fanotify_fastpath_ops *)udata;
+	ops = (struct fanotify_fastpath_ops *)kdata;
+
+	moff = __btf_member_bit_offset(t, member) / 8;
+	switch (moff) {
+	case offsetof(struct fanotify_fastpath_ops, name):
+		ret = bpf_obj_name_cpy(ops->name, uops->name,
+				       sizeof(ops->name));
+		if (ret <= 0)
+			return -EINVAL;
+		return 1;
+	}
+
+	return 0;
+}
+
+static int __bpf_fan_fp_handler(struct fsnotify_group *group,
+				struct fanotify_fastpath_hook *fp_hook,
+				struct fanotify_fastpath_event *fp_event)
+{
+	return 0;
+}
+
+static int __bpf_fan_fp_init(struct fanotify_fastpath_hook *hook, const char *args)
+{
+	return 0;
+}
+
+static void __bpf_fan_fp_free(struct fanotify_fastpath_hook *hook)
+{
+}
+
+/* For bpf_struct_ops->cfi_stubs */
+static struct fanotify_fastpath_ops __bpf_fanotify_fastpath_ops = {
+	.fp_handler = __bpf_fan_fp_handler,
+	.fp_init = __bpf_fan_fp_init,
+	.fp_free = __bpf_fan_fp_free,
+};
+
+static struct bpf_struct_ops bpf_fanotify_fastpath_ops = {
+	.verifier_ops = &bpf_fanotify_fastpath_verifier_ops,
+	.reg = bpf_fanotify_fastpath_reg,
+	.unreg = bpf_fanotify_fastpath_unreg,
+	.init = bpf_fanotify_fastpath_init,
+	.init_member = bpf_fanotify_fastpath_init_member,
+	.name = "fanotify_fastpath_ops",
+	.cfi_stubs = &__bpf_fanotify_fastpath_ops,
+	.owner = THIS_MODULE,
+};
+
+static int __init bpf_fanotify_fastpath_struct_ops_init(void)
+{
+	int ret;
+
+	ret = register_btf_kfunc_id_set(BPF_PROG_TYPE_STRUCT_OPS, &bpf_fanotify_kfunc_set);
+	ret = ret ?: register_bpf_struct_ops(&bpf_fanotify_fastpath_ops, fanotify_fastpath_ops);
+	return ret;
+}
+late_initcall(bpf_fanotify_fastpath_struct_ops_init);
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 9a7ed527e47e..cbca27d24ae5 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -6528,6 +6528,10 @@ BTF_TYPE_SAFE_TRUSTED(struct dentry) {
 	struct inode *d_inode;
 };
 
+BTF_TYPE_SAFE_TRUSTED(struct fanotify_fastpath_event) {
+	struct inode *dir;
+};
+
 BTF_TYPE_SAFE_TRUSTED_OR_NULL(struct socket) {
 	struct sock *sk;
 };
@@ -6563,6 +6567,7 @@ static bool type_is_trusted(struct bpf_verifier_env *env,
 	BTF_TYPE_EMIT(BTF_TYPE_SAFE_TRUSTED(struct linux_binprm));
 	BTF_TYPE_EMIT(BTF_TYPE_SAFE_TRUSTED(struct file));
 	BTF_TYPE_EMIT(BTF_TYPE_SAFE_TRUSTED(struct dentry));
+	BTF_TYPE_EMIT(BTF_TYPE_SAFE_TRUSTED(struct fanotify_fastpath_event));
 
 	return btf_nested_type_is_trusted(&env->log, reg, field_name, btf_id, "__safe_trusted");
 }
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC bpf-next fanotify 5/5] selftests/bpf: Add test for BPF based fanotify fastpath handler
  2024-10-29 23:12 [RFC bpf-next fanotify 0/5] Fanotify fastpath handler Song Liu
                   ` (3 preceding siblings ...)
  2024-10-29 23:12 ` [RFC bpf-next fanotify 4/5] fanotify: Enable bpf based fanotify fastpath handler Song Liu
@ 2024-10-29 23:12 ` Song Liu
  2024-11-07 11:10   ` Amir Goldstein
  4 siblings, 1 reply; 29+ messages in thread
From: Song Liu @ 2024-10-29 23:12 UTC (permalink / raw)
  To: bpf, linux-fsdevel, linux-kernel
  Cc: kernel-team, andrii, eddyz87, ast, daniel, martin.lau, viro,
	brauner, jack, kpsingh, mattbobrowski, amir73il, repnop, jlayton,
	josef, Song Liu

This test shows a simplified logic that monitors a subtree. This is
simplified as it doesn't handle all the scenarios, such as:

  1) moving a subsubtree into/outof the being monitoring subtree;
  2) mount point inside the being monitored subtree

Therefore, this is not to show a way to reliably monitor a subtree.
Instead, this is to test the functionalities of bpf based fastpath.
To really monitor a subtree reliably, we will need more complex logic.

Overview of the logic:
  1. fanotify is created for the whole file system (/tmp);
  2. A bpf map (inode_storage_map) is used to tag directories to
     monitor (starting from /tmp/fanotify_test);
  3. On fsnotify_mkdir, thee tag is propagated to newly created sub
     directories (/tmp/fanotify_test/subdir);
  4. The bpf fastpath checks whether the fanotify event happens in a
     directory with the tag. If yes, the event is sent to user space;
     otherwise, the event is dropped.

Signed-off-by: Song Liu <song@kernel.org>
---
 tools/testing/selftests/bpf/bpf_kfuncs.h      |   4 +
 tools/testing/selftests/bpf/config            |   1 +
 .../testing/selftests/bpf/prog_tests/fan_fp.c | 245 ++++++++++++++++++
 tools/testing/selftests/bpf/progs/fan_fp.c    |  77 ++++++
 4 files changed, 327 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/fan_fp.c
 create mode 100644 tools/testing/selftests/bpf/progs/fan_fp.c

diff --git a/tools/testing/selftests/bpf/bpf_kfuncs.h b/tools/testing/selftests/bpf/bpf_kfuncs.h
index 2eb3483f2fb0..44dcf4991244 100644
--- a/tools/testing/selftests/bpf/bpf_kfuncs.h
+++ b/tools/testing/selftests/bpf/bpf_kfuncs.h
@@ -87,4 +87,8 @@ struct dentry;
  */
 extern int bpf_get_dentry_xattr(struct dentry *dentry, const char *name,
 			      struct bpf_dynptr *value_ptr) __ksym __weak;
+
+struct fanotify_fastpath_event;
+extern struct inode *bpf_fanotify_data_inode(struct fanotify_fastpath_event *event) __ksym __weak;
+extern void bpf_iput(struct inode *inode) __ksym __weak;
 #endif
diff --git a/tools/testing/selftests/bpf/config b/tools/testing/selftests/bpf/config
index 4ca84c8d9116..392cbcad8a8b 100644
--- a/tools/testing/selftests/bpf/config
+++ b/tools/testing/selftests/bpf/config
@@ -24,6 +24,7 @@ CONFIG_DEBUG_INFO_BTF=y
 CONFIG_DEBUG_INFO_DWARF4=y
 CONFIG_DUMMY=y
 CONFIG_DYNAMIC_FTRACE=y
+CONFIG_FANOTIFY=y
 CONFIG_FPROBE=y
 CONFIG_FTRACE_SYSCALLS=y
 CONFIG_FUNCTION_ERROR_INJECTION=y
diff --git a/tools/testing/selftests/bpf/prog_tests/fan_fp.c b/tools/testing/selftests/bpf/prog_tests/fan_fp.c
new file mode 100644
index 000000000000..ea57ed366647
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/fan_fp.c
@@ -0,0 +1,245 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2024 Meta Platforms, Inc. and affiliates. */
+
+#define _GNU_SOURCE
+#include <err.h>
+#include <fcntl.h>
+#include <stdio.h>
+#include <string.h>
+#include <sys/fanotify.h>
+#include <unistd.h>
+#include <sys/ioctl.h>
+#include <sys/stat.h>
+
+#include <test_progs.h>
+
+#include "fan_fp.skel.h"
+
+#define TEST_FS "/tmp/"
+#define TEST_DIR "/tmp/fanotify_test/"
+
+static int create_test_subtree(void)
+{
+	int err;
+
+	err = mkdir(TEST_DIR, 0777);
+	if (err && errno != EEXIST)
+		return err;
+
+	return open(TEST_DIR, O_RDONLY);
+}
+
+static int create_fanotify_fd(void)
+{
+	int fanotify_fd, err;
+
+	fanotify_fd = fanotify_init(FAN_CLASS_NOTIF | FAN_REPORT_NAME | FAN_REPORT_DIR_FID,
+				    O_RDONLY);
+
+	if (!ASSERT_OK_FD(fanotify_fd, "fanotify_init"))
+		return -1;
+
+	err = fanotify_mark(fanotify_fd, FAN_MARK_ADD | FAN_MARK_FILESYSTEM,
+			    FAN_CREATE | FAN_OPEN | FAN_ONDIR | FAN_EVENT_ON_CHILD,
+			    AT_FDCWD, TEST_FS);
+	if (!ASSERT_OK(err, "fanotify_mark")) {
+		close(fanotify_fd);
+		return -1;
+	}
+
+	return fanotify_fd;
+}
+
+static int attach_global_fastpath(int fanotify_fd)
+{
+	struct fanotify_fastpath_args args = {
+		.name = "_tmp_test_sub_tree",
+		.version = 1,
+		.flags = 0,
+	};
+
+	if (ioctl(fanotify_fd, FAN_IOC_ADD_FP, &args))
+		return -1;
+
+	return 0;
+}
+
+#define EVENT_BUFFER_SIZE 4096
+struct file_access_result {
+	char name_prefix[16];
+	bool accessed;
+} access_results[3] = {
+	{"aa", false},
+	{"bb", false},
+	{"cc", false},
+};
+
+static void update_access_results(char *name)
+{
+	int i;
+
+	for (i = 0; i < 3; i++) {
+		if (strstr(name, access_results[i].name_prefix))
+			access_results[i].accessed = true;
+	}
+}
+
+static void parse_event(char *buffer, int len)
+{
+	struct fanotify_event_metadata *event =
+		(struct fanotify_event_metadata *) buffer;
+	struct fanotify_event_info_header *info;
+	struct fanotify_event_info_fid *fid;
+	struct file_handle *handle;
+	char *name;
+	int off;
+
+	for (; FAN_EVENT_OK(event, len); event = FAN_EVENT_NEXT(event, len)) {
+		for (off = sizeof(*event) ; off < event->event_len;
+		     off += info->len) {
+			info = (struct fanotify_event_info_header *)
+				((char *) event + off);
+			switch (info->info_type) {
+			case FAN_EVENT_INFO_TYPE_DFID_NAME:
+				fid = (struct fanotify_event_info_fid *) info;
+				handle = (struct file_handle *)&fid->handle;
+				name = (char *)handle + sizeof(*handle) + handle->handle_bytes;
+				update_access_results(name);
+				break;
+			default:
+				break;
+			}
+		}
+	}
+}
+
+static void touch_file(const char *path)
+{
+	int fd;
+
+	fd = open(path, O_WRONLY|O_CREAT|O_NOCTTY|O_NONBLOCK, 0666);
+	if (!ASSERT_OK_FD(fd, "open"))
+		goto cleanup;
+	close(fd);
+cleanup:
+	unlink(path);
+}
+
+static void generate_and_test_event(int fanotify_fd)
+{
+	char buffer[EVENT_BUFFER_SIZE];
+	int len, err;
+
+	/* access /tmp/fanotify_test/aa, this will generate event */
+	touch_file(TEST_DIR "aa");
+
+	/* create /tmp/fanotify_test/subdir, this will get tag from the
+	 * parent directory (added in the bpf program on fsnotify_mkdir)
+	 */
+	err = mkdir(TEST_DIR "subdir", 0777);
+	ASSERT_OK(err, "mkdir");
+
+	/* access /tmp/fanotify_test/subdir/bb, this will generate event */
+	touch_file(TEST_DIR "subdir/bb");
+
+	/* access /tmp/cc, this will NOT generate event, as the BPF
+	 * fastpath filtered this event out. (Because /tmp doesn't have
+	 * the tag.)
+	 */
+	touch_file(TEST_FS "cc");
+
+	/* read and parse the events */
+	len = read(fanotify_fd, buffer, EVENT_BUFFER_SIZE);
+	if (!ASSERT_GE(len, 0, "read event"))
+		goto cleanup;
+	parse_event(buffer, len);
+
+	/* verify we generated events for aa and bb, but filtered out the
+	 * event for cc.
+	 */
+	ASSERT_TRUE(access_results[0].accessed, "access aa");
+	ASSERT_TRUE(access_results[1].accessed, "access bb");
+	ASSERT_FALSE(access_results[2].accessed, "access cc");
+
+cleanup:
+	rmdir(TEST_DIR "subdir");
+	rmdir(TEST_DIR);
+}
+
+/* This test shows a simplified logic that monitors a subtree. This is
+ * simplified as it doesn't handle all the scenarios, such as:
+ *
+ *  1) moving a subsubtree into/outof the being monitoring subtree;
+ *  2) mount point inside the being monitored subtree
+ *
+ * Therefore, this is not to show a way to reliably monitor a subtree.
+ * Instead, this is to test the functionalities of bpf based fastpath.
+ *
+ * Overview of the logic:
+ * 1. fanotify is created for the whole file system (/tmp);
+ * 2. A bpf map (inode_storage_map) is used to tag directories to
+ *    monitor (starting from /tmp/fanotify_test);
+ * 3. On fsnotify_mkdir, thee tag is propagated to newly created sub
+ *    directories (/tmp/fanotify_test/subdir);
+ * 4. The bpf fastpath checks whether the event happens in a directory
+ *    with the tag. If yes, the event is sent to user space; otherwise,
+ *    the event is dropped.
+ */
+static void test_monitor_subtree(void)
+{
+	struct bpf_link *link;
+	struct fan_fp *skel;
+	int test_root_fd;
+	__u32 one = 1;
+	int err, fanotify_fd;
+
+	test_root_fd = create_test_subtree();
+
+	if (!ASSERT_OK_FD(test_root_fd, "create_test_subtree"))
+		return;
+
+	skel = fan_fp__open_and_load();
+
+	if (!ASSERT_OK_PTR(skel, "fan_fp__open_and_load"))
+		goto close_test_root_fd;
+
+	/* Add tag to /tmp/fanotify_test/ */
+	err = bpf_map_update_elem(bpf_map__fd(skel->maps.inode_storage_map),
+				  &test_root_fd, &one, BPF_ANY);
+	if (!ASSERT_OK(err, "bpf_map_update_elem"))
+		goto destroy_skel;
+	link = bpf_map__attach_struct_ops(skel->maps.bpf_fanotify_fastpath_ops);
+	if (!ASSERT_OK_PTR(link, "bpf_map__attach_struct_ops"))
+		goto destroy_skel;
+
+
+	fanotify_fd = create_fanotify_fd();
+	if (!ASSERT_OK_FD(fanotify_fd, "create_fanotify_fd"))
+		goto destroy_link;
+
+	err = attach_global_fastpath(fanotify_fd);
+	if (!ASSERT_OK(err, "attach_global_fastpath"))
+		goto close_fanotify_fd;
+
+	generate_and_test_event(fanotify_fd);
+
+	ASSERT_EQ(skel->bss->added_inode_storage, 1, "added_inode_storage");
+
+close_fanotify_fd:
+	close(fanotify_fd);
+
+destroy_link:
+	bpf_link__destroy(link);
+destroy_skel:
+	fan_fp__destroy(skel);
+
+close_test_root_fd:
+	close(test_root_fd);
+	rmdir(TEST_DIR);
+}
+
+void test_bpf_fanotify_fastpath(void)
+{
+	if (test__start_subtest("subtree"))
+		test_monitor_subtree();
+}
diff --git a/tools/testing/selftests/bpf/progs/fan_fp.c b/tools/testing/selftests/bpf/progs/fan_fp.c
new file mode 100644
index 000000000000..ee86dc189e38
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/fan_fp.c
@@ -0,0 +1,77 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2024 Facebook */
+
+#include "vmlinux.h"
+#include <bpf/bpf_tracing.h>
+#include <bpf/bpf_core_read.h>
+
+#define FS_CREATE		0x00000100	/* Subfile was created */
+#define FS_ISDIR		0x40000000	/* event occurred against dir */
+
+struct {
+	__uint(type, BPF_MAP_TYPE_INODE_STORAGE);
+	__uint(map_flags, BPF_F_NO_PREALLOC);
+	__type(key, int);
+	__type(value, __u32);
+} inode_storage_map SEC(".maps");
+
+int added_inode_storage;
+
+SEC("struct_ops")
+int BPF_PROG(bpf_fp_handler,
+	     struct fsnotify_group *group,
+	     struct fanotify_fastpath_hook *fp_hook,
+	     struct fanotify_fastpath_event *fp_event)
+{
+	struct inode *dir;
+	__u32 *value;
+
+	dir = fp_event->dir;
+
+	value = bpf_inode_storage_get(&inode_storage_map, dir, 0, 0);
+
+	/* if dir doesn't have the tag, skip the event */
+	if (!value)
+		return FAN_FP_RET_SKIP_EVENT;
+
+	/* propagate tag to subdir on fsnotify_mkdir */
+	if (fp_event->mask == (FS_CREATE | FS_ISDIR) &&
+	    fp_event->data_type == FSNOTIFY_EVENT_DENTRY) {
+		struct inode *new_inode;
+
+		new_inode = bpf_fanotify_data_inode(fp_event);
+		if (!new_inode)
+			goto out;
+
+		value = bpf_inode_storage_get(&inode_storage_map, new_inode, 0,
+					      BPF_LOCAL_STORAGE_GET_F_CREATE);
+		if (value) {
+			*value = 1;
+			added_inode_storage++;
+		}
+		bpf_iput(new_inode);
+	}
+out:
+	return FAN_FP_RET_SEND_TO_USERSPACE;
+}
+
+SEC("struct_ops")
+int BPF_PROG(bpf_fp_init, struct fanotify_fastpath_hook *hook, const char *args)
+{
+	return 0;
+}
+
+SEC("struct_ops")
+void BPF_PROG(bpf_fp_free, struct fanotify_fastpath_hook *hook)
+{
+}
+
+SEC(".struct_ops.link")
+struct fanotify_fastpath_ops bpf_fanotify_fastpath_ops = {
+	.fp_handler = (void *)bpf_fp_handler,
+	.fp_init = (void *)bpf_fp_init,
+	.fp_free = (void *)bpf_fp_free,
+	.name = "_tmp_test_sub_tree",
+};
+
+char _license[] SEC("license") = "GPL";
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [RFC bpf-next fanotify 2/5] samples/fanotify: Add a sample fanotify fastpath handler
  2024-10-29 23:12 ` [RFC bpf-next fanotify 2/5] samples/fanotify: Add a sample " Song Liu
@ 2024-10-30  0:11   ` Al Viro
  2024-10-30  1:48     ` Al Viro
  2024-10-30  2:07     ` Song Liu
  2024-10-30 13:03   ` Jeff Layton
  1 sibling, 2 replies; 29+ messages in thread
From: Al Viro @ 2024-10-30  0:11 UTC (permalink / raw)
  To: Song Liu
  Cc: bpf, linux-fsdevel, linux-kernel, kernel-team, andrii, eddyz87,
	ast, daniel, martin.lau, brauner, jack, kpsingh, mattbobrowski,
	amir73il, repnop, jlayton, josef

On Tue, Oct 29, 2024 at 04:12:41PM -0700, Song Liu wrote:
> +		if (strstr(file_name->name, item->prefix) == (char *)file_name->name)

	Huh?  "Find the first substring (if any) equal to item->prefix and
then check if that happens to be in the very beginning"???

	And you are placing that into the place where it's most likely to cause
the maximal braindamage and spread all over the tree.  Wonderful ;-/

	Where does that "idiom" come from, anyway?  Java?  Not the first time
I see that kind of garbage; typecast is an unusual twist, though...

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC bpf-next fanotify 2/5] samples/fanotify: Add a sample fanotify fastpath handler
  2024-10-30  0:11   ` Al Viro
@ 2024-10-30  1:48     ` Al Viro
  2024-10-30  2:07     ` Song Liu
  1 sibling, 0 replies; 29+ messages in thread
From: Al Viro @ 2024-10-30  1:48 UTC (permalink / raw)
  To: Song Liu
  Cc: bpf, linux-fsdevel, linux-kernel, kernel-team, andrii, eddyz87,
	ast, daniel, martin.lau, brauner, jack, kpsingh, mattbobrowski,
	amir73il, repnop, jlayton, josef

On Wed, Oct 30, 2024 at 12:11:55AM +0000, Al Viro wrote:
> On Tue, Oct 29, 2024 at 04:12:41PM -0700, Song Liu wrote:
> > +		if (strstr(file_name->name, item->prefix) == (char *)file_name->name)
> 
> 	Huh?  "Find the first substring (if any) equal to item->prefix and
> then check if that happens to be in the very beginning"???
> 
> 	And you are placing that into the place where it's most likely to cause
> the maximal braindamage and spread all over the tree.  Wonderful ;-/
> 
> 	Where does that "idiom" come from, anyway?  Java?  Not the first time
> I see that kind of garbage; typecast is an unusual twist, though...

	After some coproarchaeology: it's probably even worse than java - in
javashit indexOf() predates startsWith() by quite a few years; java itself
had both methods from the very beginning.  So that's probably where it had
been cargo-culted from...

	The thing is, performance requirements of some garbage script from
a javascript-infested webshite are somewhat different from what you want
to see anywhere near the kernel-side filesystem event handling...

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC bpf-next fanotify 2/5] samples/fanotify: Add a sample fanotify fastpath handler
  2024-10-30  0:11   ` Al Viro
  2024-10-30  1:48     ` Al Viro
@ 2024-10-30  2:07     ` Song Liu
  2024-10-30  2:35       ` Song Liu
  1 sibling, 1 reply; 29+ messages in thread
From: Song Liu @ 2024-10-30  2:07 UTC (permalink / raw)
  To: Al Viro
  Cc: Song Liu, bpf, Linux-Fsdevel, LKML, Kernel Team, Andrii Nakryiko,
	Eduard Zingerman, Alexei Starovoitov, Daniel Borkmann,
	Martin KaFai Lau, Christian Brauner, Jan Kara, KP Singh,
	Matt Bobrowski, Amir Goldstein, repnop@google.com,
	jlayton@kernel.org, Josef Bacik



> On Oct 29, 2024, at 5:11 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> 
> On Tue, Oct 29, 2024 at 04:12:41PM -0700, Song Liu wrote:
>> + if (strstr(file_name->name, item->prefix) == (char *)file_name->name)
> 
> Huh?  "Find the first substring (if any) equal to item->prefix and
> then check if that happens to be in the very beginning"???

Replaced it with strcmp locally, assuming strncmp is not necessary.

> 
> And you are placing that into the place where it's most likely to cause
> the maximal braindamage and spread all over the tree.  Wonderful ;-/
> 
> Where does that "idiom" come from, anyway?  Java?  Not the first time
> I see that kind of garbage; typecast is an unusual twist, though...

The strstr() was probably from lack of coffee. The cast is most likely 
from the compiler yelling at sleepy me. 

BTW: Why do we need "const unsigned char *" in qstr instead of 
"const char *"? I do notice git doesn't show the real history of it..

Thanks,
Song


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC bpf-next fanotify 2/5] samples/fanotify: Add a sample fanotify fastpath handler
  2024-10-30  2:07     ` Song Liu
@ 2024-10-30  2:35       ` Song Liu
  0 siblings, 0 replies; 29+ messages in thread
From: Song Liu @ 2024-10-30  2:35 UTC (permalink / raw)
  To: Song Liu
  Cc: Al Viro, Song Liu, bpf, Linux-Fsdevel, LKML, Kernel Team,
	Andrii Nakryiko, Eduard Zingerman, Alexei Starovoitov,
	Daniel Borkmann, Martin KaFai Lau, Christian Brauner, Jan Kara,
	KP Singh, Matt Bobrowski, Amir Goldstein, repnop@google.com,
	jlayton@kernel.org, Josef Bacik



> On Oct 29, 2024, at 7:07 PM, Song Liu <songliubraving@meta.com> wrote:
> 
> 
> 
>> On Oct 29, 2024, at 5:11 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
>> 
>> On Tue, Oct 29, 2024 at 04:12:41PM -0700, Song Liu wrote:
>>> + if (strstr(file_name->name, item->prefix) == (char *)file_name->name)
>> 
>> Huh?  "Find the first substring (if any) equal to item->prefix and
>> then check if that happens to be in the very beginning"???
> 
> Replaced it with strcmp locally, assuming strncmp is not necessary.

Oops, we do need strncmp and more coffee..



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC bpf-next fanotify 1/5] fanotify: Introduce fanotify fastpath handler
  2024-10-29 23:12 ` [RFC bpf-next fanotify 1/5] fanotify: Introduce fanotify " Song Liu
@ 2024-10-30 12:45   ` Jeff Layton
  2024-11-07 10:48   ` Jan Kara
  1 sibling, 0 replies; 29+ messages in thread
From: Jeff Layton @ 2024-10-30 12:45 UTC (permalink / raw)
  To: Song Liu, bpf, linux-fsdevel, linux-kernel
  Cc: kernel-team, andrii, eddyz87, ast, daniel, martin.lau, viro,
	brauner, jack, kpsingh, mattbobrowski, amir73il, repnop, josef

On Tue, 2024-10-29 at 16:12 -0700, Song Liu wrote:
> fanotify fastpath handler enables handling fanotify events within the
> kernel, and thus saves a trip to the user space. fanotify fastpath handler
> can be useful in many use cases. For example, if a user is only interested
> in events for some files in side a directory, a fastpath handler can be
> used to filter out irrelevant events.
> 
> fanotify fastpath handler is attached to fsnotify_group. At most one
> fastpath handler can be attached to a fsnotify_group. The attach/detach
> of fastpath handlers are controlled by two new ioctls on the fanotify fds:
> FAN_IOC_ADD_FP and FAN_IOC_DEL_FP.
> 
> fanotify fastpath handler is packaged in a kernel module. In the future,
> it is also possible to package fastpath handler in a BPF program. Since
> loading modules requires CAP_SYS_ADMIN, _loading_ fanotify fastpath
> handler in kernel modules is limited to CAP_SYS_ADMIN. However,
> non-SYS_CAP_ADMIN users can _attach_ fastpath handler loaded by sys admin
> to their fanotify fds. To make fanotify fastpath handler more useful
> for non-CAP_SYS_ADMIN users, a fastpath handler can take arguments at
> attach time.
> 
> TODO: Add some mechanism to help users discover available fastpath
> handlers. For example, we can add a sysctl which is similar to
> net.ipv4.tcp_available_congestion_control, or we can add some sysfs
> entries.
> 
> Signed-off-by: Song Liu <song@kernel.org>
> ---
>  fs/notify/fanotify/Makefile            |   2 +-
>  fs/notify/fanotify/fanotify.c          |  25 ++++
>  fs/notify/fanotify/fanotify_fastpath.c | 171 +++++++++++++++++++++++++
>  fs/notify/fanotify/fanotify_user.c     |   7 +
>  include/linux/fanotify.h               |  45 +++++++
>  include/linux/fsnotify_backend.h       |   3 +
>  include/uapi/linux/fanotify.h          |  26 ++++
>  7 files changed, 278 insertions(+), 1 deletion(-)
>  create mode 100644 fs/notify/fanotify/fanotify_fastpath.c
> 
> diff --git a/fs/notify/fanotify/Makefile b/fs/notify/fanotify/Makefile
> index 25ef222915e5..fddab88dde37 100644
> --- a/fs/notify/fanotify/Makefile
> +++ b/fs/notify/fanotify/Makefile
> @@ -1,2 +1,2 @@
>  # SPDX-License-Identifier: GPL-2.0-only
> -obj-$(CONFIG_FANOTIFY)		+= fanotify.o fanotify_user.o
> +obj-$(CONFIG_FANOTIFY)		+= fanotify.o fanotify_user.o fanotify_fastpath.o

This should probably be a compile-time option. Some people might be
spooked by this and we'd want a way to disable it if it was found to
cause a big issue later.

> diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c
> index 224bccaab4cc..a40ec06d0218 100644
> --- a/fs/notify/fanotify/fanotify.c
> +++ b/fs/notify/fanotify/fanotify.c
> @@ -18,6 +18,8 @@
>  
>  #include "fanotify.h"
>  
> +extern struct srcu_struct fsnotify_mark_srcu;
> +
>  static bool fanotify_path_equal(const struct path *p1, const struct path *p2)
>  {
>  	return p1->mnt == p2->mnt && p1->dentry == p2->dentry;
> @@ -888,6 +890,7 @@ static int fanotify_handle_event(struct fsnotify_group *group, u32 mask,
>  	struct fsnotify_event *fsn_event;
>  	__kernel_fsid_t fsid = {};
>  	u32 match_mask = 0;
> +	struct fanotify_fastpath_hook *fp_hook;
>  
>  	BUILD_BUG_ON(FAN_ACCESS != FS_ACCESS);
>  	BUILD_BUG_ON(FAN_MODIFY != FS_MODIFY);
> @@ -933,6 +936,25 @@ static int fanotify_handle_event(struct fsnotify_group *group, u32 mask,
>  	if (FAN_GROUP_FLAG(group, FANOTIFY_FID_BITS))
>  		fsid = fanotify_get_fsid(iter_info);
>  
> +	fp_hook = srcu_dereference(group->fanotify_data.fp_hook, &fsnotify_mark_srcu);
> +	if (fp_hook) {
> +		struct fanotify_fastpath_event fp_event = {
> +			.mask = mask,
> +			.data = data,
> +			.data_type = data_type,
> +			.dir = dir,
> +			.file_name = file_name,
> +			.fsid = &fsid,
> +			.match_mask = match_mask,
> +		};
> +
> +		ret = fp_hook->ops->fp_handler(group, fp_hook, &fp_event);
> +		if (ret == FAN_FP_RET_SKIP_EVENT) {
> +			ret = 0;
> +			goto finish;
> +		}
> +	}
> +
>  	event = fanotify_alloc_event(group, mask, data, data_type, dir,
>  				     file_name, &fsid, match_mask);
>  	ret = -ENOMEM;
> @@ -976,6 +998,9 @@ static void fanotify_free_group_priv(struct fsnotify_group *group)
>  
>  	if (mempool_initialized(&group->fanotify_data.error_events_pool))
>  		mempool_exit(&group->fanotify_data.error_events_pool);
> +
> +	if (group->fanotify_data.fp_hook)
> +		fanotify_fastpath_hook_free(group->fanotify_data.fp_hook);
>  }
>  
>  static void fanotify_free_path_event(struct fanotify_event *event)
> diff --git a/fs/notify/fanotify/fanotify_fastpath.c b/fs/notify/fanotify/fanotify_fastpath.c
> new file mode 100644
> index 000000000000..0453a1ac25b1
> --- /dev/null
> +++ b/fs/notify/fanotify/fanotify_fastpath.c
> @@ -0,0 +1,171 @@
> +// SPDX-License-Identifier: GPL-2.0
> +#include <linux/fanotify.h>
> +#include <linux/module.h>
> +
> +#include "fanotify.h"
> +
> +extern struct srcu_struct fsnotify_mark_srcu;
> +
> +static DEFINE_SPINLOCK(fp_list_lock);
> +static LIST_HEAD(fp_list);
> +
> +static struct fanotify_fastpath_ops *fanotify_fastpath_find(const char *name)
> +{
> +	struct fanotify_fastpath_ops *ops;
> +
> +	list_for_each_entry(ops, &fp_list, list) {
> +		if (!strcmp(ops->name, name))
> +			return ops;
> +	}
> +	return NULL;
> +}
> +
> +
> +/*
> + * fanotify_fastpath_register - Register a new fastpath handler.
> + *
> + * Add a fastpath handler to the fp_list. These fastpath handlers are
> + * available for all users in the system.
> + *
> + * @ops:	pointer to fanotify_fastpath_ops to add.
> + *
> + * Returns:
> + *	0	- on success;
> + *	-EEXIST	- fastpath handler of the same name already exists.
> + */
> +int fanotify_fastpath_register(struct fanotify_fastpath_ops *ops)
> +{
> +	spin_lock(&fp_list_lock);
> +	if (fanotify_fastpath_find(ops->name)) {
> +		/* cannot register two handlers with the same name */
> +		spin_unlock(&fp_list_lock);
> +		return -EEXIST;
> +	}
> +	list_add_tail(&ops->list, &fp_list);
> +	spin_unlock(&fp_list_lock);
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(fanotify_fastpath_register);
> +
> +/*
> + * fanotify_fastpath_unregister - Unregister a new fastpath handler.
> + *
> + * Remove a fastpath handler from fp_list.
> + *
> + * @ops:	pointer to fanotify_fastpath_ops to remove.
> + */
> +void fanotify_fastpath_unregister(struct fanotify_fastpath_ops *ops)
> +{
> +	spin_lock(&fp_list_lock);
> +	list_del_init(&ops->list);
> +	spin_unlock(&fp_list_lock);
> +}
> +EXPORT_SYMBOL_GPL(fanotify_fastpath_unregister);
> +
> +/*
> + * fanotify_fastpath_add - Add a fastpath handler to fsnotify_group.
> + *
> + * Add a fastpath handler from fp_list to a fsnotify_group.
> + *
> + * @group:	fsnotify_group that will have add
> + * @argp:	fanotify_fastpath_args that specifies the fastpath handler
> + *		and the init arguments of the fastpath handler.
> + *
> + * Returns:
> + *	0	- on success;
> + *	-EEXIST	- fastpath handler of the same name already exists.
> + */
> +int fanotify_fastpath_add(struct fsnotify_group *group,
> +			  struct fanotify_fastpath_args __user *argp)
> +{
> +	struct fanotify_fastpath_hook *fp_hook;
> +	struct fanotify_fastpath_ops *fp_ops;
> +	struct fanotify_fastpath_args args;
> +	int ret = 0;
> +
> +	ret = copy_from_user(&args, argp, sizeof(args));
> +	if (ret)
> +		return -EFAULT;
> +
> +	if (args.version != 1 || args.flags || args.init_args_len > FAN_FP_ARGS_MAX)
> +		return -EINVAL;
> +
> +	args.name[FAN_FP_NAME_MAX - 1] = '\0';
> +
> +	fsnotify_group_lock(group);
> +
> +	if (rcu_access_pointer(group->fanotify_data.fp_hook)) {
> +		fsnotify_group_unlock(group);
> +		return -EBUSY;
> +	}
> +
> +	fp_hook = kzalloc(sizeof(*fp_hook), GFP_KERNEL);
> +	if (!fp_hook) {
> +		ret = -ENOMEM;
> +		goto out;
> +	}
> +
> +	spin_lock(&fp_list_lock);
> +	fp_ops = fanotify_fastpath_find(args.name);
> +	if (!fp_ops || !try_module_get(fp_ops->owner)) {
> +		spin_unlock(&fp_list_lock);
> +		ret = -ENOENT;
> +		goto err_free_hook;
> +	}
> +	spin_unlock(&fp_list_lock);
> +
> +	if (fp_ops->fp_init) {
> +		char *init_args = NULL;
> +
> +		if (args.init_args_len) {
> +			init_args = strndup_user(u64_to_user_ptr(args.init_args),
> +						 args.init_args_len);
> +			if (IS_ERR(init_args)) {
> +				ret = PTR_ERR(init_args);
> +				if (ret == -EINVAL)
> +					ret = -E2BIG;
> +				goto err_module_put;
> +			}
> +		}
> +		ret = fp_ops->fp_init(fp_hook, init_args);
> +		kfree(init_args);
> +		if (ret)
> +			goto err_module_put;
> +	}
> +	fp_hook->ops = fp_ops;
> +	rcu_assign_pointer(group->fanotify_data.fp_hook, fp_hook);
> +
> +out:
> +	fsnotify_group_unlock(group);
> +	return ret;
> +
> +err_module_put:
> +	module_put(fp_ops->owner);
> +err_free_hook:
> +	kfree(fp_hook);
> +	goto out;
> +}
> +
> +void fanotify_fastpath_hook_free(struct fanotify_fastpath_hook *fp_hook)
> +{
> +	if (fp_hook->ops->fp_free)
> +		fp_hook->ops->fp_free(fp_hook);
> +
> +	module_put(fp_hook->ops->owner);
> +}
> +
> +void fanotify_fastpath_del(struct fsnotify_group *group)
> +{
> +	struct fanotify_fastpath_hook *fp_hook;
> +
> +	fsnotify_group_lock(group);
> +	fp_hook = group->fanotify_data.fp_hook;
> +	if (!fp_hook)
> +		goto out;
> +
> +	rcu_assign_pointer(group->fanotify_data.fp_hook, NULL);
> +	fanotify_fastpath_hook_free(fp_hook);
> +
> +out:
> +	fsnotify_group_unlock(group);
> +}
> diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
> index 8e2d43fc6f7c..e96cb83f8409 100644
> --- a/fs/notify/fanotify/fanotify_user.c
> +++ b/fs/notify/fanotify/fanotify_user.c
> @@ -987,6 +987,13 @@ static long fanotify_ioctl(struct file *file, unsigned int cmd, unsigned long ar
>  		spin_unlock(&group->notification_lock);
>  		ret = put_user(send_len, (int __user *) p);
>  		break;
> +	case FAN_IOC_ADD_FP:
> +		ret = fanotify_fastpath_add(group, p);
> +		break;
> +	case FAN_IOC_DEL_FP:
> +		fanotify_fastpath_del(group);
> +		ret = 0;
> +		break;
>  	}
>  
>  	return ret;
> diff --git a/include/linux/fanotify.h b/include/linux/fanotify.h
> index 89ff45bd6f01..cea95307a580 100644
> --- a/include/linux/fanotify.h
> +++ b/include/linux/fanotify.h
> @@ -136,4 +136,49 @@
>  #undef FAN_ALL_PERM_EVENTS
>  #undef FAN_ALL_OUTGOING_EVENTS
>  
> +struct fsnotify_group;
> +struct qstr;
> +struct inode;
> +struct fanotify_fastpath_hook;
> +

This would be a nice place for a doc header that describes what all of
these fields represent. Most of it should probably be the same as the
comment header over the struct fsnotify_ops definition:

> +struct fanotify_fastpath_event {
> +	u32 mask;
> +	const void *data;
> +	int data_type;
> +	struct inode *dir;
> +	const struct qstr *file_name;
> +	__kernel_fsid_t *fsid;
> +	u32 match_mask;
> +};
> +

Ditto here. This should have a kerneldoc comment that describes the
purpose, arguments and the "contract" around these new operations.

> +struct fanotify_fastpath_ops {
> +	int (*fp_handler)(struct fsnotify_group *group,
> +			  struct fanotify_fastpath_hook *fp_hook,
> +			  struct fanotify_fastpath_event *fp_event);
> +	int (*fp_init)(struct fanotify_fastpath_hook *hook, const char *args);
> +	void (*fp_free)(struct fanotify_fastpath_hook *hook);
> +
> +	char name[FAN_FP_NAME_MAX];
> +	struct module *owner;
> +	struct list_head list;
> +	int flags;
> +};
> +
> +enum fanotify_fastpath_return {
> +	FAN_FP_RET_SEND_TO_USERSPACE = 0,
> +	FAN_FP_RET_SKIP_EVENT = 1,
> +};
> +
> +struct fanotify_fastpath_hook {
> +	struct fanotify_fastpath_ops *ops;
> +	void *data;
> +};
> +
> +int fanotify_fastpath_register(struct fanotify_fastpath_ops *ops);
> +void fanotify_fastpath_unregister(struct fanotify_fastpath_ops *ops);
> +int fanotify_fastpath_add(struct fsnotify_group *group,
> +			  struct fanotify_fastpath_args __user *args);
> +void fanotify_fastpath_del(struct fsnotify_group *group);
> +void fanotify_fastpath_hook_free(struct fanotify_fastpath_hook *fp_hook);
> +
>  #endif /* _LINUX_FANOTIFY_H */
> diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
> index 3ecf7768e577..ef251b4e4e6f 100644
> --- a/include/linux/fsnotify_backend.h
> +++ b/include/linux/fsnotify_backend.h
> @@ -117,6 +117,7 @@ struct fsnotify_fname;
>  struct fsnotify_iter_info;
>  
>  struct mem_cgroup;
> +struct fanotify_fastpath_hook;
>  
>  /*
>   * Each group much define these ops.  The fsnotify infrastructure will call
> @@ -255,6 +256,8 @@ struct fsnotify_group {
>  			int f_flags; /* event_f_flags from fanotify_init() */
>  			struct ucounts *ucounts;
>  			mempool_t error_events_pool;
> +
> +			struct fanotify_fastpath_hook __rcu *fp_hook;
>  		} fanotify_data;
>  #endif /* CONFIG_FANOTIFY */
>  	};
> diff --git a/include/uapi/linux/fanotify.h b/include/uapi/linux/fanotify.h
> index 34f221d3a1b9..9c30baeebae0 100644
> --- a/include/uapi/linux/fanotify.h
> +++ b/include/uapi/linux/fanotify.h
> @@ -3,6 +3,7 @@
>  #define _UAPI_LINUX_FANOTIFY_H
>  
>  #include <linux/types.h>
> +#include <linux/ioctl.h>
>  
>  /* the following events that user-space can register for */
>  #define FAN_ACCESS		0x00000001	/* File was accessed */
> @@ -243,4 +244,29 @@ struct fanotify_response_info_audit_rule {
>  				(long)(meta)->event_len >= (long)FAN_EVENT_METADATA_LEN && \
>  				(long)(meta)->event_len <= (long)(len))
>  
> +#define FAN_FP_NAME_MAX 64
> +#define FAN_FP_ARGS_MAX 1024
> +
> +/* This is the arguments used to add fastpath handler to a group. */
> +struct fanotify_fastpath_args {
> +	/* user space pointer to the name of fastpath handler */
> +	char name[FAN_FP_NAME_MAX];
> +
> +	__u32 version;
> +	__u32 flags;
> +
> +	/*
> +	 * user space pointer to the init args of fastpath handler,
> +	 * up to init_args_len (<= FAN_FP_ARGS_MAX).
> +	 */
> +	__u64 init_args;
> +	/* length of init_args */
> +	__u32 init_args_len;
> +} __attribute__((__packed__));
> +
> +#define FAN_IOC_MAGIC 'F'
> +
> +#define FAN_IOC_ADD_FP _IOW(FAN_IOC_MAGIC, 0, struct fanotify_fastpath_args)
> +#define FAN_IOC_DEL_FP _IOW(FAN_IOC_MAGIC, 1, char[FAN_FP_NAME_MAX])
> +
>  #endif /* _UAPI_LINUX_FANOTIFY_H */

Otherwise, this looks fairly straightforward. Nice work, Song!
-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC bpf-next fanotify 2/5] samples/fanotify: Add a sample fanotify fastpath handler
  2024-10-29 23:12 ` [RFC bpf-next fanotify 2/5] samples/fanotify: Add a sample " Song Liu
  2024-10-30  0:11   ` Al Viro
@ 2024-10-30 13:03   ` Jeff Layton
  2024-10-30 20:30     ` Song Liu
  1 sibling, 1 reply; 29+ messages in thread
From: Jeff Layton @ 2024-10-30 13:03 UTC (permalink / raw)
  To: Song Liu, bpf, linux-fsdevel, linux-kernel
  Cc: kernel-team, andrii, eddyz87, ast, daniel, martin.lau, viro,
	brauner, jack, kpsingh, mattbobrowski, amir73il, repnop, josef

On Tue, 2024-10-29 at 16:12 -0700, Song Liu wrote:
> This fastpath handler filters out events for files with certain prefixes.
> To use it:
> 
>   [root] insmod fastpath-mod.ko    # This requires root.
> 
>   [user] ./fastpath-user /tmp a,b,c &    # Root is not needed
>   [user] touch /tmp/aa   # a is in the prefix list (a,b,c), no events
>   [user] touch /tmp/xx   # x is not in the prefix list, generates events
> 
>   Accessing file xx   # this is the output from fastpath_user
> 
> Signed-off-by: Song Liu <song@kernel.org>
> ---
>  MAINTAINERS                      |   1 +
>  samples/Kconfig                  |  20 ++++-
>  samples/Makefile                 |   2 +-
>  samples/fanotify/.gitignore      |   1 +
>  samples/fanotify/Makefile        |   5 +-
>  samples/fanotify/fastpath-mod.c  | 138 +++++++++++++++++++++++++++++++
>  samples/fanotify/fastpath-user.c |  90 ++++++++++++++++++++
>  7 files changed, 254 insertions(+), 3 deletions(-)
>  create mode 100644 samples/fanotify/fastpath-mod.c
>  create mode 100644 samples/fanotify/fastpath-user.c
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 7ad507f49324..8939a48b2d99 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -8658,6 +8658,7 @@ S:	Maintained
>  F:	fs/notify/fanotify/
>  F:	include/linux/fanotify.h
>  F:	include/uapi/linux/fanotify.h
> +F:	samples/fanotify/
>  
>  FARADAY FOTG210 USB2 DUAL-ROLE CONTROLLER
>  M:	Linus Walleij <linus.walleij@linaro.org>
> diff --git a/samples/Kconfig b/samples/Kconfig
> index b288d9991d27..b0d3dff48bb0 100644
> --- a/samples/Kconfig
> +++ b/samples/Kconfig
> @@ -149,15 +149,33 @@ config SAMPLE_CONNECTOR
>  	  with it.
>  	  See also Documentation/driver-api/connector.rst
>  
> +config SAMPLE_FANOTIFY
> +	bool "Build fanotify monitoring sample"
> +	depends on FANOTIFY && CC_CAN_LINK && HEADERS_INSTALL
> +	help
> +	  When enabled, this builds samples for fanotify.
> +	  There multiple samples for fanotify. Please see the
> +	  following configs for more details of these
> +	  samples.
> +
>  config SAMPLE_FANOTIFY_ERROR
>  	bool "Build fanotify error monitoring sample"
> -	depends on FANOTIFY && CC_CAN_LINK && HEADERS_INSTALL
> +	depends on SAMPLE_FANOTIFY
>  	help
>  	  When enabled, this builds an example code that uses the
>  	  FAN_FS_ERROR fanotify mechanism to monitor filesystem
>  	  errors.
>  	  See also Documentation/admin-guide/filesystem-monitoring.rst.
>  
> +config SAMPLE_FANOTIFY_FASTPATH
> +	tristate "Build fanotify fastpath sample"
> +	depends on SAMPLE_FANOTIFY && m
> +	help
> +	  When enabled, this builds kernel module that contains a
> +	  fanotify fastpath handler.
> +	  The fastpath handler filters out certain filename
> +	  prefixes for the fanotify user.
> +
>  config SAMPLE_HIDRAW
>  	bool "hidraw sample"
>  	depends on CC_CAN_LINK && HEADERS_INSTALL
> diff --git a/samples/Makefile b/samples/Makefile
> index b85fa64390c5..108360972626 100644
> --- a/samples/Makefile
> +++ b/samples/Makefile
> @@ -6,7 +6,7 @@ subdir-$(CONFIG_SAMPLE_ANDROID_BINDERFS) += binderfs
>  subdir-$(CONFIG_SAMPLE_CGROUP) += cgroup
>  obj-$(CONFIG_SAMPLE_CONFIGFS)		+= configfs/
>  obj-$(CONFIG_SAMPLE_CONNECTOR)		+= connector/
> -obj-$(CONFIG_SAMPLE_FANOTIFY_ERROR)	+= fanotify/
> +obj-$(CONFIG_SAMPLE_FANOTIFY)		+= fanotify/
>  subdir-$(CONFIG_SAMPLE_HIDRAW)		+= hidraw
>  obj-$(CONFIG_SAMPLE_HW_BREAKPOINT)	+= hw_breakpoint/
>  obj-$(CONFIG_SAMPLE_KDB)		+= kdb/
> diff --git a/samples/fanotify/.gitignore b/samples/fanotify/.gitignore
> index d74593e8b2de..306e1ddec4e0 100644
> --- a/samples/fanotify/.gitignore
> +++ b/samples/fanotify/.gitignore
> @@ -1 +1,2 @@
>  fs-monitor
> +fastpath-user
> diff --git a/samples/fanotify/Makefile b/samples/fanotify/Makefile
> index e20db1bdde3b..f5bbd7380104 100644
> --- a/samples/fanotify/Makefile
> +++ b/samples/fanotify/Makefile
> @@ -1,5 +1,8 @@
>  # SPDX-License-Identifier: GPL-2.0-only
> -userprogs-always-y += fs-monitor
> +userprogs-always-$(CONFIG_SAMPLE_FANOTIFY_ERROR) += fs-monitor
>  
>  userccflags += -I usr/include -Wall
>  
> +obj-$(CONFIG_SAMPLE_FANOTIFY_FASTPATH) += fastpath-mod.o
> +
> +userprogs-always-$(CONFIG_SAMPLE_FANOTIFY_FASTPATH) += fastpath-user
> diff --git a/samples/fanotify/fastpath-mod.c b/samples/fanotify/fastpath-mod.c
> new file mode 100644
> index 000000000000..06c4b42ff114
> --- /dev/null
> +++ b/samples/fanotify/fastpath-mod.c
> @@ -0,0 +1,138 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +#include <linux/fsnotify.h>
> +#include <linux/fanotify.h>
> +#include <linux/list.h>
> +#include <linux/module.h>
> +#include <linux/string.h>
> +
> +struct prefix_item {
> +	const char *prefix;
> +	struct list_head list;
> +};
> +
> +struct sample_fp_data {
> +	/*
> +	 * str_table contains all the prefixes to ignore. For example,
> +	 * "prefix1\0prefix2\0prefix3"
> +	 */
> +	char *str_table;
> +
> +	/* item->prefix points to different prefixes in the str_table. */
> +	struct list_head item_list;
> +};
> +
> +static int sample_fp_handler(struct fsnotify_group *group,
> +			     struct fanotify_fastpath_hook *fp_hook,
> +			     struct fanotify_fastpath_event *fp_event)
> +{
> +	const struct qstr *file_name = fp_event->file_name;
> +	struct sample_fp_data *fp_data;
> +	struct prefix_item *item;
> +
> +	if (!file_name)
> +		return FAN_FP_RET_SEND_TO_USERSPACE;
> +	fp_data = fp_hook->data;
> +
> +	list_for_each_entry(item, &fp_data->item_list, list) {
> +		if (strstr(file_name->name, item->prefix) == (char *)file_name->name)
> +			return FAN_FP_RET_SKIP_EVENT;
> +	}
> +
> +	return FAN_FP_RET_SEND_TO_USERSPACE;
> +}

The sample is a little underwhelming and everyone hates string parsing
in the kernel ;). It'd be nice to see a more real-world use-case for
this.

Could this be used to implement subtree filtering? I guess you'd have
to walk back up the directory tree and see whether it had a given
ancestor?

> +
> +static int add_item(struct sample_fp_data *fp_data, const char *prev)
> +{
> +	struct prefix_item *item;
> +
> +	item = kzalloc(sizeof(*item), GFP_KERNEL);
> +	if (!item)
> +		return -ENOMEM;
> +	item->prefix = prev;
> +	list_add_tail(&item->list, &fp_data->item_list);
> +	return 0;
> +}
> +
> +static void free_sample_fp_data(struct sample_fp_data *fp_data)
> +{
> +	struct prefix_item *item, *tmp;
> +
> +	list_for_each_entry_safe(item, tmp, &fp_data->item_list, list) {
> +		list_del_init(&item->list);
> +		kfree(item);
> +	}
> +	kfree(fp_data->str_table);
> +	kfree(fp_data);
> +}
> +
> +static int sample_fp_init(struct fanotify_fastpath_hook *fp_hook, const char *args)
> +{
> +	struct sample_fp_data *fp_data = kzalloc(sizeof(struct sample_fp_data), GFP_KERNEL);
> +	char *p, *prev;
> +	int ret;
> +
> +	if (!fp_data)
> +		return -ENOMEM;
> +
> +	/* Make a copy of the list of prefix to ignore */
> +	fp_data->str_table = kstrndup(args, FAN_FP_ARGS_MAX, GFP_KERNEL);
> +	if (!fp_data->str_table) {
> +		ret = -ENOMEM;
> +		goto err_out;
> +	}
> +
> +	INIT_LIST_HEAD(&fp_data->item_list);
> +	prev = fp_data->str_table;
> +	p = fp_data->str_table;
> +
> +	/* Update the list replace ',' with '\n'*/
> +	while ((p = strchr(p, ',')) != NULL) {
> +		*p = '\0';
> +		ret = add_item(fp_data, prev);
> +		if (ret)
> +			goto err_out;
> +		p = p + 1;
> +		prev = p;
> +	}
> +
> +	ret = add_item(fp_data, prev);
> +	if (ret)
> +		goto err_out;
> +
> +	fp_hook->data = fp_data;
> +
> +	return 0;
> +
> +err_out:
> +	free_sample_fp_data(fp_data);
> +	return ret;
> +}
> +
> +static void sample_fp_free(struct fanotify_fastpath_hook *fp_hook)
> +{
> +	free_sample_fp_data(fp_hook->data);
> +}
> +
> +static struct fanotify_fastpath_ops fan_fp_ignore_a_ops = {
> +	.fp_handler = sample_fp_handler,
> +	.fp_init = sample_fp_init,
> +	.fp_free = sample_fp_free,
> +	.name = "ignore-prefix",
> +	.owner = THIS_MODULE,
> +};
> +
> +static int __init fanotify_fastpath_sample_init(void)
> +{
> +	return fanotify_fastpath_register(&fan_fp_ignore_a_ops);
> +}
> +static void __exit fanotify_fastpath_sample_exit(void)
> +{
> +	fanotify_fastpath_unregister(&fan_fp_ignore_a_ops);
> +}
> +
> +module_init(fanotify_fastpath_sample_init);
> +module_exit(fanotify_fastpath_sample_exit);
> +
> +MODULE_AUTHOR("Song Liu");
> +MODULE_DESCRIPTION("Example fanotify fastpath handler");
> +MODULE_LICENSE("GPL");
> diff --git a/samples/fanotify/fastpath-user.c b/samples/fanotify/fastpath-user.c
> new file mode 100644
> index 000000000000..f301c4e0d21a
> --- /dev/null
> +++ b/samples/fanotify/fastpath-user.c
> @@ -0,0 +1,90 @@
> +// SPDX-License-Identifier: GPL-2.0
> +#define _GNU_SOURCE
> +#include <err.h>
> +#include <fcntl.h>
> +#include <stdio.h>
> +#include <string.h>
> +#include <sys/fanotify.h>
> +#include <unistd.h>
> +#include <sys/ioctl.h>
> +
> +static int total_event_cnt;
> +
> +static void handle_notifications(char *buffer, int len)
> +{
> +	struct fanotify_event_metadata *event =
> +		(struct fanotify_event_metadata *) buffer;
> +	struct fanotify_event_info_header *info;
> +	struct fanotify_event_info_fid *fid;
> +	struct file_handle *handle;
> +	char *name;
> +	int off;
> +
> +	for (; FAN_EVENT_OK(event, len); event = FAN_EVENT_NEXT(event, len)) {
> +		for (off = sizeof(*event) ; off < event->event_len;
> +		     off += info->len) {
> +			info = (struct fanotify_event_info_header *)
> +				((char *) event + off);
> +			switch (info->info_type) {
> +			case FAN_EVENT_INFO_TYPE_DFID_NAME:
> +				fid = (struct fanotify_event_info_fid *) info;
> +				handle = (struct file_handle *)&fid->handle;
> +				name = (char *)handle + sizeof(*handle) + handle->handle_bytes;
> +
> +				printf("Accessing file %s\n", name);
> +				total_event_cnt++;
> +				break;
> +			default:
> +				break;
> +			}
> +		}
> +	}
> +}
> +
> +int main(int argc, char **argv)
> +{
> +	struct fanotify_fastpath_args args = {
> +		.name = "ignore-prefix",
> +		.version = 1,
> +		.flags = 0,
> +	};
> +	char buffer[BUFSIZ];
> +	int fd;
> +
> +	if (argc < 3) {
> +		printf("Usage\n"
> +		       "\t %s <path to monitor> <prefix to ignore>\n",
> +			argv[0]);
> +		return 1;
> +	}
> +
> +	args.init_args = (__u64)argv[2];
> +	args.init_args_len = strlen(argv[2]) + 1;
> +
> +	fd = fanotify_init(FAN_CLASS_NOTIF | FAN_REPORT_NAME | FAN_REPORT_DIR_FID, O_RDONLY);
> +	if (fd < 0)
> +		errx(1, "fanotify_init");
> +
> +	if (fanotify_mark(fd, FAN_MARK_ADD,
> +			  FAN_OPEN | FAN_ONDIR | FAN_EVENT_ON_CHILD,
> +			  AT_FDCWD, argv[1])) {
> +		errx(1, "fanotify_mark");
> +	}
> +
> +	if (ioctl(fd, FAN_IOC_ADD_FP, &args))
> +		errx(1, "ioctl");
> +
> +	while (total_event_cnt < 10) {
> +		int n = read(fd, buffer, BUFSIZ);
> +
> +		if (n < 0)
> +			errx(1, "read");
> +
> +		handle_notifications(buffer, n);
> +	}
> +
> +	ioctl(fd, FAN_IOC_DEL_FP);
> +	close(fd);
> +
> +	return 0;
> +}

-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC bpf-next fanotify 2/5] samples/fanotify: Add a sample fanotify fastpath handler
  2024-10-30 13:03   ` Jeff Layton
@ 2024-10-30 20:30     ` Song Liu
  2024-10-31  0:23       ` Jeff Layton
  0 siblings, 1 reply; 29+ messages in thread
From: Song Liu @ 2024-10-30 20:30 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Song Liu, bpf, Linux-Fsdevel, LKML, Kernel Team, Andrii Nakryiko,
	Eduard Zingerman, Alexei Starovoitov, Daniel Borkmann,
	Martin KaFai Lau, Al Viro, Christian Brauner, Jan Kara, KP Singh,
	Matt Bobrowski, Amir Goldstein, repnop@google.com, Josef Bacik

Hi Jeff, 

Thanks for your review!

I will update 1/2 based on the feedback. (Replying here to save everyone 
an email..)

> On Oct 30, 2024, at 6:03 AM, Jeff Layton <jlayton@kernel.org> wrote:

[...]

>> +
>> +static int sample_fp_handler(struct fsnotify_group *group,
>> +     struct fanotify_fastpath_hook *fp_hook,
>> +     struct fanotify_fastpath_event *fp_event)
>> +{
>> + const struct qstr *file_name = fp_event->file_name;
>> + struct sample_fp_data *fp_data;
>> + struct prefix_item *item;
>> +
>> + if (!file_name)
>> + return FAN_FP_RET_SEND_TO_USERSPACE;
>> + fp_data = fp_hook->data;
>> +
>> + list_for_each_entry(item, &fp_data->item_list, list) {
>> + if (strstr(file_name->name, item->prefix) == (char *)file_name->name)
>> + return FAN_FP_RET_SKIP_EVENT;
>> + }
>> +
>> + return FAN_FP_RET_SEND_TO_USERSPACE;
>> +}
> 
> The sample is a little underwhelming and everyone hates string parsing
> in the kernel ;). It'd be nice to see a more real-world use-case for
> this.
> 
> Could this be used to implement subtree filtering? I guess you'd have
> to walk back up the directory tree and see whether it had a given
> ancestor?

If the subtree is all in the same file system, we can attach fanotify to 
the whole file system, and then use some dget_parent() and follow_up() 
to walk up the directory tree in the fastpath handler. However, if there
are other mount points in the subtree, we will need more logic to handle
these mount points. 

@Christian, I would like to know your thoughts on this (walking up the 
directory tree in fanotify fastpath handler). It can be expensive for 
very very deep subtree. 

How should we pass in the subtree? I guess we can just use full path in
a string as the argument.

Thanks,
Song 


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC bpf-next fanotify 2/5] samples/fanotify: Add a sample fanotify fastpath handler
  2024-10-30 20:30     ` Song Liu
@ 2024-10-31  0:23       ` Jeff Layton
  2024-10-31  1:52         ` Song Liu
  0 siblings, 1 reply; 29+ messages in thread
From: Jeff Layton @ 2024-10-31  0:23 UTC (permalink / raw)
  To: Song Liu
  Cc: Song Liu, bpf, Linux-Fsdevel, LKML, Kernel Team, Andrii Nakryiko,
	Eduard Zingerman, Alexei Starovoitov, Daniel Borkmann,
	Martin KaFai Lau, Al Viro, Christian Brauner, Jan Kara, KP Singh,
	Matt Bobrowski, Amir Goldstein, repnop@google.com, Josef Bacik

On Wed, 2024-10-30 at 20:30 +0000, Song Liu wrote:
> Hi Jeff, 
> 
> Thanks for your review!
> 
> I will update 1/2 based on the feedback. (Replying here to save everyone 
> an email..)
> 
> > On Oct 30, 2024, at 6:03 AM, Jeff Layton <jlayton@kernel.org> wrote:
> 
> [...]
> 
> > > +
> > > +static int sample_fp_handler(struct fsnotify_group *group,
> > > +     struct fanotify_fastpath_hook *fp_hook,
> > > +     struct fanotify_fastpath_event *fp_event)
> > > +{
> > > + const struct qstr *file_name = fp_event->file_name;
> > > + struct sample_fp_data *fp_data;
> > > + struct prefix_item *item;
> > > +
> > > + if (!file_name)
> > > + return FAN_FP_RET_SEND_TO_USERSPACE;
> > > + fp_data = fp_hook->data;
> > > +
> > > + list_for_each_entry(item, &fp_data->item_list, list) {
> > > + if (strstr(file_name->name, item->prefix) == (char *)file_name->name)
> > > + return FAN_FP_RET_SKIP_EVENT;
> > > + }
> > > +
> > > + return FAN_FP_RET_SEND_TO_USERSPACE;
> > > +}
> > 
> > The sample is a little underwhelming and everyone hates string parsing
> > in the kernel ;). It'd be nice to see a more real-world use-case for
> > this.
> > 
> > Could this be used to implement subtree filtering? I guess you'd have
> > to walk back up the directory tree and see whether it had a given
> > ancestor?
> 
> If the subtree is all in the same file system, we can attach fanotify to 
> the whole file system, and then use some dget_parent() and follow_up() 
> to walk up the directory tree in the fastpath handler. However, if there
> are other mount points in the subtree, we will need more logic to handle
> these mount points. 
> 

My 2 cents...

I'd just confine it to a single vfsmount. If you want to monitor in
several submounts, then you need to add new fanotify watches.

Alternately, maybe there is some way to designate that an entire
vfsmount is a child of a watched (or ignored) directory?

> @Christian, I would like to know your thoughts on this (walking up the 
> directory tree in fanotify fastpath handler). It can be expensive for 
> very very deep subtree. 
> 

I'm not Christian, but I'll make the case for it. It's basically a
bunch of pointer chasing. That's probably not "cheap", but if you can
do it under RCU it might not be too awful. It might still suck with
really deep paths, but this is a sample module. It's not expected that
everyone will want to use it anyway.

> How should we pass in the subtree? I guess we can just use full path in
> a string as the argument.
> 

I'd stay away from string parsing. How about this instead?

Allow a process to open a directory fd, and then hand that fd to an
fanotify ioctl that says that you want to ignore everything that has
that directory as an ancestor. Or, maybe make it so that you only watch
dentries that have that directory as an ancestor? I'm not sure what
makes the most sense.

Then, when you get a dentry-based event, you just walk the d_parent
pointers back up to the root of the vfsmount. If one of them matches
the dentry in your fd then you're done. If you hit the root of the
vfsmount, then you're also done (and know that it's not a child of that
dentry).
-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC bpf-next fanotify 2/5] samples/fanotify: Add a sample fanotify fastpath handler
  2024-10-31  0:23       ` Jeff Layton
@ 2024-10-31  1:52         ` Song Liu
  2024-11-06 19:40           ` Amir Goldstein
  2024-11-07 11:19           ` Amir Goldstein
  0 siblings, 2 replies; 29+ messages in thread
From: Song Liu @ 2024-10-31  1:52 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Song Liu, Song Liu, bpf, Linux-Fsdevel, LKML, Kernel Team,
	Andrii Nakryiko, Eduard Zingerman, Alexei Starovoitov,
	Daniel Borkmann, Martin KaFai Lau, Al Viro, Christian Brauner,
	Jan Kara, KP Singh, Matt Bobrowski, Amir Goldstein,
	repnop@google.com, Josef Bacik

Hi Jeff, 

> On Oct 30, 2024, at 5:23 PM, Jeff Layton <jlayton@kernel.org> wrote:

[...]

>> If the subtree is all in the same file system, we can attach fanotify to 
>> the whole file system, and then use some dget_parent() and follow_up() 
>> to walk up the directory tree in the fastpath handler. However, if there
>> are other mount points in the subtree, we will need more logic to handle
>> these mount points. 
>> 
> 
> My 2 cents...
> 
> I'd just confine it to a single vfsmount. If you want to monitor in
> several submounts, then you need to add new fanotify watches.
> 
> Alternately, maybe there is some way to designate that an entire
> vfsmount is a child of a watched (or ignored) directory?
> 
>> @Christian, I would like to know your thoughts on this (walking up the 
>> directory tree in fanotify fastpath handler). It can be expensive for 
>> very very deep subtree. 
>> 
> 
> I'm not Christian, but I'll make the case for it. It's basically a
> bunch of pointer chasing. That's probably not "cheap", but if you can
> do it under RCU it might not be too awful. It might still suck with
> really deep paths, but this is a sample module. It's not expected that
> everyone will want to use it anyway.

Thanks for the suggestion! I will try to do it under RCU. 

> 
>> How should we pass in the subtree? I guess we can just use full path in
>> a string as the argument.
>> 
> 
> I'd stay away from string parsing. How about this instead?
> 
> Allow a process to open a directory fd, and then hand that fd to an
> fanotify ioctl that says that you want to ignore everything that has
> that directory as an ancestor. Or, maybe make it so that you only watch
> dentries that have that directory as an ancestor? I'm not sure what
> makes the most sense.

Yes, directory fd is another option. Currently, the "attach to group"
function only takes a string as input. I guess it makes sense to allow
taking a fd, or maybe we should allow any random format (pass in a 
pointer to a structure. Let me give it a try. 

Thanks again!

Song

> 
> Then, when you get a dentry-based event, you just walk the d_parent
> pointers back up to the root of the vfsmount. If one of them matches
> the dentry in your fd then you're done. If you hit the root of the
> vfsmount, then you're also done (and know that it's not a child of that
> dentry).
> -- 
> Jeff Layton <jlayton@kernel.org>


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC bpf-next fanotify 2/5] samples/fanotify: Add a sample fanotify fastpath handler
  2024-10-31  1:52         ` Song Liu
@ 2024-11-06 19:40           ` Amir Goldstein
  2024-11-06 22:10             ` Song Liu
  2024-11-07 10:41             ` Jan Kara
  2024-11-07 11:19           ` Amir Goldstein
  1 sibling, 2 replies; 29+ messages in thread
From: Amir Goldstein @ 2024-11-06 19:40 UTC (permalink / raw)
  To: Song Liu
  Cc: Jeff Layton, Song Liu, bpf, Linux-Fsdevel, LKML, Kernel Team,
	Andrii Nakryiko, Eduard Zingerman, Alexei Starovoitov,
	Daniel Borkmann, Martin KaFai Lau, Al Viro, Christian Brauner,
	Jan Kara, KP Singh, Matt Bobrowski, repnop@google.com,
	Josef Bacik

On Thu, Oct 31, 2024 at 2:52 AM Song Liu <songliubraving@meta.com> wrote:
>
> Hi Jeff,
>
> > On Oct 30, 2024, at 5:23 PM, Jeff Layton <jlayton@kernel.org> wrote:
>
> [...]
>
> >> If the subtree is all in the same file system, we can attach fanotify to
> >> the whole file system, and then use some dget_parent() and follow_up()
> >> to walk up the directory tree in the fastpath handler. However, if there
> >> are other mount points in the subtree, we will need more logic to handle
> >> these mount points.
> >>
> >
> > My 2 cents...
> >
> > I'd just confine it to a single vfsmount. If you want to monitor in
> > several submounts, then you need to add new fanotify watches.
> >

Agreed.

> > Alternately, maybe there is some way to designate that an entire
> > vfsmount is a child of a watched (or ignored) directory?
> >
> >> @Christian, I would like to know your thoughts on this (walking up the
> >> directory tree in fanotify fastpath handler). It can be expensive for
> >> very very deep subtree.
> >>
> >
> > I'm not Christian, but I'll make the case for it. It's basically a
> > bunch of pointer chasing. That's probably not "cheap", but if you can
> > do it under RCU it might not be too awful. It might still suck with
> > really deep paths, but this is a sample module. It's not expected that
> > everyone will want to use it anyway.
>
> Thanks for the suggestion! I will try to do it under RCU.
>

That's the cost of doing a subtree filter.
Not sure how it could be avoided?

> >
> >> How should we pass in the subtree? I guess we can just use full path in
> >> a string as the argument.
> >>
> >
> > I'd stay away from string parsing. How about this instead?
> >
> > Allow a process to open a directory fd, and then hand that fd to an
> > fanotify ioctl that says that you want to ignore everything that has
> > that directory as an ancestor. Or, maybe make it so that you only watch
> > dentries that have that directory as an ancestor? I'm not sure what
> > makes the most sense.
>
> Yes, directory fd is another option. Currently, the "attach to group"
> function only takes a string as input. I guess it makes sense to allow
> taking a fd, or maybe we should allow any random format (pass in a
> pointer to a structure. Let me give it a try.
>
> Thanks again!
>
> Song
>
> >
> > Then, when you get a dentry-based event, you just walk the d_parent
> > pointers back up to the root of the vfsmount. If one of them matches
> > the dentry in your fd then you're done. If you hit the root of the
> > vfsmount, then you're also done (and know that it's not a child of that
> > dentry).

Agreed.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC bpf-next fanotify 2/5] samples/fanotify: Add a sample fanotify fastpath handler
  2024-11-06 19:40           ` Amir Goldstein
@ 2024-11-06 22:10             ` Song Liu
  2024-11-07 10:41             ` Jan Kara
  1 sibling, 0 replies; 29+ messages in thread
From: Song Liu @ 2024-11-06 22:10 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Song Liu, Jeff Layton, Song Liu, bpf, Linux-Fsdevel, LKML,
	Kernel Team, Andrii Nakryiko, Eduard Zingerman,
	Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Al Viro,
	Christian Brauner, Jan Kara, KP Singh, Matt Bobrowski,
	repnop@google.com, Josef Bacik

Hi Amir, 

> On Nov 6, 2024, at 11:40 AM, Amir Goldstein <amir73il@gmail.com> wrote:
> 
> On Thu, Oct 31, 2024 at 2:52 AM Song Liu <songliubraving@meta.com> wrote:
>> 
>> Hi Jeff,
>> 
>>> On Oct 30, 2024, at 5:23 PM, Jeff Layton <jlayton@kernel.org> wrote:
>> 
>> [...]
>> 
>>>> If the subtree is all in the same file system, we can attach fanotify to
>>>> the whole file system, and then use some dget_parent() and follow_up()
>>>> to walk up the directory tree in the fastpath handler. However, if there
>>>> are other mount points in the subtree, we will need more logic to handle
>>>> these mount points.
>>>> 
>>> 
>>> My 2 cents...
>>> 
>>> I'd just confine it to a single vfsmount. If you want to monitor in
>>> several submounts, then you need to add new fanotify watches.
>>> 
> 
> Agreed.
> 
>>> Alternately, maybe there is some way to designate that an entire
>>> vfsmount is a child of a watched (or ignored) directory?
>>> 
>>>> @Christian, I would like to know your thoughts on this (walking up the
>>>> directory tree in fanotify fastpath handler). It can be expensive for
>>>> very very deep subtree.
>>>> 
>>> 
>>> I'm not Christian, but I'll make the case for it. It's basically a
>>> bunch of pointer chasing. That's probably not "cheap", but if you can
>>> do it under RCU it might not be too awful. It might still suck with
>>> really deep paths, but this is a sample module. It's not expected that
>>> everyone will want to use it anyway.
>> 
>> Thanks for the suggestion! I will try to do it under RCU.
>> 
> 
> That's the cost of doing a subtree filter.
> Not sure how it could be avoided?

[...]

>> 
>>> 
>>> Then, when you get a dentry-based event, you just walk the d_parent
>>> pointers back up to the root of the vfsmount. If one of them matches
>>> the dentry in your fd then you're done. If you hit the root of the
>>> vfsmount, then you're also done (and know that it's not a child of that
>>> dentry).

Thanks for your feedback. I will update the set based on Jeff's suggestion. 

Song


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC bpf-next fanotify 2/5] samples/fanotify: Add a sample fanotify fastpath handler
  2024-11-06 19:40           ` Amir Goldstein
  2024-11-06 22:10             ` Song Liu
@ 2024-11-07 10:41             ` Jan Kara
  2024-11-07 19:39               ` Song Liu
  1 sibling, 1 reply; 29+ messages in thread
From: Jan Kara @ 2024-11-07 10:41 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Song Liu, Jeff Layton, Song Liu, bpf, Linux-Fsdevel, LKML,
	Kernel Team, Andrii Nakryiko, Eduard Zingerman,
	Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Al Viro,
	Christian Brauner, Jan Kara, KP Singh, Matt Bobrowski,
	repnop@google.com, Josef Bacik

On Wed 06-11-24 20:40:50, Amir Goldstein wrote:
> On Thu, Oct 31, 2024 at 2:52 AM Song Liu <songliubraving@meta.com> wrote:
> > > Alternately, maybe there is some way to designate that an entire
> > > vfsmount is a child of a watched (or ignored) directory?
> > >
> > >> @Christian, I would like to know your thoughts on this (walking up the
> > >> directory tree in fanotify fastpath handler). It can be expensive for
> > >> very very deep subtree.
> > >>
> > >
> > > I'm not Christian, but I'll make the case for it. It's basically a
> > > bunch of pointer chasing. That's probably not "cheap", but if you can
> > > do it under RCU it might not be too awful. It might still suck with
> > > really deep paths, but this is a sample module. It's not expected that
> > > everyone will want to use it anyway.
> >
> > Thanks for the suggestion! I will try to do it under RCU.
> >
> 
> That's the cost of doing a subtree filter.
> Not sure how it could be avoided?

Yes. For a real solution (not this example), we'd probably have to limit
the parent walk to say 16 steps and if we can reach neither root nor our
dir in that number of steps, we'll just chicken out and pass the event to
userspace to deal with it. This way the kernel filter will deal with most
cases anyway and we won't risk livelocking or too big performance overhead
of the filter.

For this example, I think using fs/dcache.c:is_subdir() will be OK for
demonstration purposes.

								Honza

-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC bpf-next fanotify 1/5] fanotify: Introduce fanotify fastpath handler
  2024-10-29 23:12 ` [RFC bpf-next fanotify 1/5] fanotify: Introduce fanotify " Song Liu
  2024-10-30 12:45   ` Jeff Layton
@ 2024-11-07 10:48   ` Jan Kara
  2024-11-07 19:13     ` Song Liu
  1 sibling, 1 reply; 29+ messages in thread
From: Jan Kara @ 2024-11-07 10:48 UTC (permalink / raw)
  To: Song Liu
  Cc: bpf, linux-fsdevel, linux-kernel, kernel-team, andrii, eddyz87,
	ast, daniel, martin.lau, viro, brauner, jack, kpsingh,
	mattbobrowski, amir73il, repnop, jlayton, josef

On Tue 29-10-24 16:12:40, Song Liu wrote:
> fanotify fastpath handler enables handling fanotify events within the
> kernel, and thus saves a trip to the user space. fanotify fastpath handler
> can be useful in many use cases. For example, if a user is only interested
> in events for some files in side a directory, a fastpath handler can be
> used to filter out irrelevant events.
> 
> fanotify fastpath handler is attached to fsnotify_group. At most one
> fastpath handler can be attached to a fsnotify_group. The attach/detach
> of fastpath handlers are controlled by two new ioctls on the fanotify fds:
> FAN_IOC_ADD_FP and FAN_IOC_DEL_FP.
> 
> fanotify fastpath handler is packaged in a kernel module. In the future,
> it is also possible to package fastpath handler in a BPF program. Since
> loading modules requires CAP_SYS_ADMIN, _loading_ fanotify fastpath
> handler in kernel modules is limited to CAP_SYS_ADMIN. However,
> non-SYS_CAP_ADMIN users can _attach_ fastpath handler loaded by sys admin
> to their fanotify fds. To make fanotify fastpath handler more useful
> for non-CAP_SYS_ADMIN users, a fastpath handler can take arguments at
> attach time.

Hum, I'm not sure I'd be fine as an sysadmin to allow arbitary users to
attach arbitrary filters to their groups. I might want some filters for
priviledged programs which know what they are doing (e.g. because the
filters are expensive) and other filters may be fine for anybody. But
overall I'd think we'll soon hit requirements for permission control over
who can attach what... Somebody must have created a solution for this
already?

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC bpf-next fanotify 5/5] selftests/bpf: Add test for BPF based fanotify fastpath handler
  2024-10-29 23:12 ` [RFC bpf-next fanotify 5/5] selftests/bpf: Add test for BPF " Song Liu
@ 2024-11-07 11:10   ` Amir Goldstein
  2024-11-07 19:53     ` Song Liu
  0 siblings, 1 reply; 29+ messages in thread
From: Amir Goldstein @ 2024-11-07 11:10 UTC (permalink / raw)
  To: Song Liu
  Cc: bpf, linux-fsdevel, linux-kernel, kernel-team, andrii, eddyz87,
	ast, daniel, martin.lau, viro, brauner, jack, kpsingh,
	mattbobrowski, repnop, jlayton, josef

On Wed, Oct 30, 2024 at 12:13 AM Song Liu <song@kernel.org> wrote:
>
> This test shows a simplified logic that monitors a subtree. This is
> simplified as it doesn't handle all the scenarios, such as:
>
>   1) moving a subsubtree into/outof the being monitoring subtree;

There is a solution for that (see below)

>   2) mount point inside the being monitored subtree

For that we will need to add the MOUNT/UNMOUNT/MOVE_MOUNT events,
but those have been requested by userspace anyway.

>
> Therefore, this is not to show a way to reliably monitor a subtree.
> Instead, this is to test the functionalities of bpf based fastpath.
> To really monitor a subtree reliably, we will need more complex logic.

Actually, this example is the foundation of my vision for efficient and race
free subtree filtering:

1. The inode map is to be treated as a cache for the is_subdir() query
2. Cache entries can also have a "distance from root" (i.e. depth) value
3. Each unknown queried path can call is_subdir() and populate the cache
    entries for all ancestors
4. The cache/map size should be limited and when limit is reached,
    evicting entries by depth priority makes sense
5. A rename event for a directory whose inode is in the map and whose
   new parent is not in the map or has a different value than old parent
   needs to invalidate the entire map
6. fast_path also needs a hook from inode evict to clear cache entries

This is similar, but more efficient and race free than what could already
be achieved in userspace using FAN_MARK_EVICTABLE.

Food for thought.

Thanks,
Amir.

>
> Overview of the logic:
>   1. fanotify is created for the whole file system (/tmp);
>   2. A bpf map (inode_storage_map) is used to tag directories to
>      monitor (starting from /tmp/fanotify_test);
>   3. On fsnotify_mkdir, thee tag is propagated to newly created sub
>      directories (/tmp/fanotify_test/subdir);
>   4. The bpf fastpath checks whether the fanotify event happens in a
>      directory with the tag. If yes, the event is sent to user space;
>      otherwise, the event is dropped.
>
> Signed-off-by: Song Liu <song@kernel.org>
> ---
>  tools/testing/selftests/bpf/bpf_kfuncs.h      |   4 +
>  tools/testing/selftests/bpf/config            |   1 +
>  .../testing/selftests/bpf/prog_tests/fan_fp.c | 245 ++++++++++++++++++
>  tools/testing/selftests/bpf/progs/fan_fp.c    |  77 ++++++
>  4 files changed, 327 insertions(+)
>  create mode 100644 tools/testing/selftests/bpf/prog_tests/fan_fp.c
>  create mode 100644 tools/testing/selftests/bpf/progs/fan_fp.c
>
> diff --git a/tools/testing/selftests/bpf/bpf_kfuncs.h b/tools/testing/selftests/bpf/bpf_kfuncs.h
> index 2eb3483f2fb0..44dcf4991244 100644
> --- a/tools/testing/selftests/bpf/bpf_kfuncs.h
> +++ b/tools/testing/selftests/bpf/bpf_kfuncs.h
> @@ -87,4 +87,8 @@ struct dentry;
>   */
>  extern int bpf_get_dentry_xattr(struct dentry *dentry, const char *name,
>                               struct bpf_dynptr *value_ptr) __ksym __weak;
> +
> +struct fanotify_fastpath_event;
> +extern struct inode *bpf_fanotify_data_inode(struct fanotify_fastpath_event *event) __ksym __weak;
> +extern void bpf_iput(struct inode *inode) __ksym __weak;
>  #endif
> diff --git a/tools/testing/selftests/bpf/config b/tools/testing/selftests/bpf/config
> index 4ca84c8d9116..392cbcad8a8b 100644
> --- a/tools/testing/selftests/bpf/config
> +++ b/tools/testing/selftests/bpf/config
> @@ -24,6 +24,7 @@ CONFIG_DEBUG_INFO_BTF=y
>  CONFIG_DEBUG_INFO_DWARF4=y
>  CONFIG_DUMMY=y
>  CONFIG_DYNAMIC_FTRACE=y
> +CONFIG_FANOTIFY=y
>  CONFIG_FPROBE=y
>  CONFIG_FTRACE_SYSCALLS=y
>  CONFIG_FUNCTION_ERROR_INJECTION=y
> diff --git a/tools/testing/selftests/bpf/prog_tests/fan_fp.c b/tools/testing/selftests/bpf/prog_tests/fan_fp.c
> new file mode 100644
> index 000000000000..ea57ed366647
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/prog_tests/fan_fp.c
> @@ -0,0 +1,245 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/* Copyright (c) 2024 Meta Platforms, Inc. and affiliates. */
> +
> +#define _GNU_SOURCE
> +#include <err.h>
> +#include <fcntl.h>
> +#include <stdio.h>
> +#include <string.h>
> +#include <sys/fanotify.h>
> +#include <unistd.h>
> +#include <sys/ioctl.h>
> +#include <sys/stat.h>
> +
> +#include <test_progs.h>
> +
> +#include "fan_fp.skel.h"
> +
> +#define TEST_FS "/tmp/"
> +#define TEST_DIR "/tmp/fanotify_test/"
> +
> +static int create_test_subtree(void)
> +{
> +       int err;
> +
> +       err = mkdir(TEST_DIR, 0777);
> +       if (err && errno != EEXIST)
> +               return err;
> +
> +       return open(TEST_DIR, O_RDONLY);
> +}
> +
> +static int create_fanotify_fd(void)
> +{
> +       int fanotify_fd, err;
> +
> +       fanotify_fd = fanotify_init(FAN_CLASS_NOTIF | FAN_REPORT_NAME | FAN_REPORT_DIR_FID,
> +                                   O_RDONLY);
> +
> +       if (!ASSERT_OK_FD(fanotify_fd, "fanotify_init"))
> +               return -1;
> +
> +       err = fanotify_mark(fanotify_fd, FAN_MARK_ADD | FAN_MARK_FILESYSTEM,
> +                           FAN_CREATE | FAN_OPEN | FAN_ONDIR | FAN_EVENT_ON_CHILD,
> +                           AT_FDCWD, TEST_FS);
> +       if (!ASSERT_OK(err, "fanotify_mark")) {
> +               close(fanotify_fd);
> +               return -1;
> +       }
> +
> +       return fanotify_fd;
> +}
> +
> +static int attach_global_fastpath(int fanotify_fd)
> +{
> +       struct fanotify_fastpath_args args = {
> +               .name = "_tmp_test_sub_tree",
> +               .version = 1,
> +               .flags = 0,
> +       };
> +
> +       if (ioctl(fanotify_fd, FAN_IOC_ADD_FP, &args))
> +               return -1;
> +
> +       return 0;
> +}
> +
> +#define EVENT_BUFFER_SIZE 4096
> +struct file_access_result {
> +       char name_prefix[16];
> +       bool accessed;
> +} access_results[3] = {
> +       {"aa", false},
> +       {"bb", false},
> +       {"cc", false},
> +};
> +
> +static void update_access_results(char *name)
> +{
> +       int i;
> +
> +       for (i = 0; i < 3; i++) {
> +               if (strstr(name, access_results[i].name_prefix))
> +                       access_results[i].accessed = true;
> +       }
> +}
> +
> +static void parse_event(char *buffer, int len)
> +{
> +       struct fanotify_event_metadata *event =
> +               (struct fanotify_event_metadata *) buffer;
> +       struct fanotify_event_info_header *info;
> +       struct fanotify_event_info_fid *fid;
> +       struct file_handle *handle;
> +       char *name;
> +       int off;
> +
> +       for (; FAN_EVENT_OK(event, len); event = FAN_EVENT_NEXT(event, len)) {
> +               for (off = sizeof(*event) ; off < event->event_len;
> +                    off += info->len) {
> +                       info = (struct fanotify_event_info_header *)
> +                               ((char *) event + off);
> +                       switch (info->info_type) {
> +                       case FAN_EVENT_INFO_TYPE_DFID_NAME:
> +                               fid = (struct fanotify_event_info_fid *) info;
> +                               handle = (struct file_handle *)&fid->handle;
> +                               name = (char *)handle + sizeof(*handle) + handle->handle_bytes;
> +                               update_access_results(name);
> +                               break;
> +                       default:
> +                               break;
> +                       }
> +               }
> +       }
> +}
> +
> +static void touch_file(const char *path)
> +{
> +       int fd;
> +
> +       fd = open(path, O_WRONLY|O_CREAT|O_NOCTTY|O_NONBLOCK, 0666);
> +       if (!ASSERT_OK_FD(fd, "open"))
> +               goto cleanup;
> +       close(fd);
> +cleanup:
> +       unlink(path);
> +}
> +
> +static void generate_and_test_event(int fanotify_fd)
> +{
> +       char buffer[EVENT_BUFFER_SIZE];
> +       int len, err;
> +
> +       /* access /tmp/fanotify_test/aa, this will generate event */
> +       touch_file(TEST_DIR "aa");
> +
> +       /* create /tmp/fanotify_test/subdir, this will get tag from the
> +        * parent directory (added in the bpf program on fsnotify_mkdir)
> +        */
> +       err = mkdir(TEST_DIR "subdir", 0777);
> +       ASSERT_OK(err, "mkdir");
> +
> +       /* access /tmp/fanotify_test/subdir/bb, this will generate event */
> +       touch_file(TEST_DIR "subdir/bb");
> +
> +       /* access /tmp/cc, this will NOT generate event, as the BPF
> +        * fastpath filtered this event out. (Because /tmp doesn't have
> +        * the tag.)
> +        */
> +       touch_file(TEST_FS "cc");
> +
> +       /* read and parse the events */
> +       len = read(fanotify_fd, buffer, EVENT_BUFFER_SIZE);
> +       if (!ASSERT_GE(len, 0, "read event"))
> +               goto cleanup;
> +       parse_event(buffer, len);
> +
> +       /* verify we generated events for aa and bb, but filtered out the
> +        * event for cc.
> +        */
> +       ASSERT_TRUE(access_results[0].accessed, "access aa");
> +       ASSERT_TRUE(access_results[1].accessed, "access bb");
> +       ASSERT_FALSE(access_results[2].accessed, "access cc");
> +
> +cleanup:
> +       rmdir(TEST_DIR "subdir");
> +       rmdir(TEST_DIR);
> +}
> +
> +/* This test shows a simplified logic that monitors a subtree. This is
> + * simplified as it doesn't handle all the scenarios, such as:
> + *
> + *  1) moving a subsubtree into/outof the being monitoring subtree;
> + *  2) mount point inside the being monitored subtree
> + *
> + * Therefore, this is not to show a way to reliably monitor a subtree.
> + * Instead, this is to test the functionalities of bpf based fastpath.
> + *
> + * Overview of the logic:
> + * 1. fanotify is created for the whole file system (/tmp);
> + * 2. A bpf map (inode_storage_map) is used to tag directories to
> + *    monitor (starting from /tmp/fanotify_test);
> + * 3. On fsnotify_mkdir, thee tag is propagated to newly created sub
> + *    directories (/tmp/fanotify_test/subdir);
> + * 4. The bpf fastpath checks whether the event happens in a directory
> + *    with the tag. If yes, the event is sent to user space; otherwise,
> + *    the event is dropped.
> + */
> +static void test_monitor_subtree(void)
> +{
> +       struct bpf_link *link;
> +       struct fan_fp *skel;
> +       int test_root_fd;
> +       __u32 one = 1;
> +       int err, fanotify_fd;
> +
> +       test_root_fd = create_test_subtree();
> +
> +       if (!ASSERT_OK_FD(test_root_fd, "create_test_subtree"))
> +               return;
> +
> +       skel = fan_fp__open_and_load();
> +
> +       if (!ASSERT_OK_PTR(skel, "fan_fp__open_and_load"))
> +               goto close_test_root_fd;
> +
> +       /* Add tag to /tmp/fanotify_test/ */
> +       err = bpf_map_update_elem(bpf_map__fd(skel->maps.inode_storage_map),
> +                                 &test_root_fd, &one, BPF_ANY);
> +       if (!ASSERT_OK(err, "bpf_map_update_elem"))
> +               goto destroy_skel;
> +       link = bpf_map__attach_struct_ops(skel->maps.bpf_fanotify_fastpath_ops);
> +       if (!ASSERT_OK_PTR(link, "bpf_map__attach_struct_ops"))
> +               goto destroy_skel;
> +
> +
> +       fanotify_fd = create_fanotify_fd();
> +       if (!ASSERT_OK_FD(fanotify_fd, "create_fanotify_fd"))
> +               goto destroy_link;
> +
> +       err = attach_global_fastpath(fanotify_fd);
> +       if (!ASSERT_OK(err, "attach_global_fastpath"))
> +               goto close_fanotify_fd;
> +
> +       generate_and_test_event(fanotify_fd);
> +
> +       ASSERT_EQ(skel->bss->added_inode_storage, 1, "added_inode_storage");
> +
> +close_fanotify_fd:
> +       close(fanotify_fd);
> +
> +destroy_link:
> +       bpf_link__destroy(link);
> +destroy_skel:
> +       fan_fp__destroy(skel);
> +
> +close_test_root_fd:
> +       close(test_root_fd);
> +       rmdir(TEST_DIR);
> +}
> +
> +void test_bpf_fanotify_fastpath(void)
> +{
> +       if (test__start_subtest("subtree"))
> +               test_monitor_subtree();
> +}
> diff --git a/tools/testing/selftests/bpf/progs/fan_fp.c b/tools/testing/selftests/bpf/progs/fan_fp.c
> new file mode 100644
> index 000000000000..ee86dc189e38
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/progs/fan_fp.c
> @@ -0,0 +1,77 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/* Copyright (c) 2024 Facebook */
> +
> +#include "vmlinux.h"
> +#include <bpf/bpf_tracing.h>
> +#include <bpf/bpf_core_read.h>
> +
> +#define FS_CREATE              0x00000100      /* Subfile was created */
> +#define FS_ISDIR               0x40000000      /* event occurred against dir */
> +
> +struct {
> +       __uint(type, BPF_MAP_TYPE_INODE_STORAGE);
> +       __uint(map_flags, BPF_F_NO_PREALLOC);
> +       __type(key, int);
> +       __type(value, __u32);
> +} inode_storage_map SEC(".maps");
> +
> +int added_inode_storage;
> +
> +SEC("struct_ops")
> +int BPF_PROG(bpf_fp_handler,
> +            struct fsnotify_group *group,
> +            struct fanotify_fastpath_hook *fp_hook,
> +            struct fanotify_fastpath_event *fp_event)
> +{
> +       struct inode *dir;
> +       __u32 *value;
> +
> +       dir = fp_event->dir;
> +
> +       value = bpf_inode_storage_get(&inode_storage_map, dir, 0, 0);
> +
> +       /* if dir doesn't have the tag, skip the event */
> +       if (!value)
> +               return FAN_FP_RET_SKIP_EVENT;
> +
> +       /* propagate tag to subdir on fsnotify_mkdir */
> +       if (fp_event->mask == (FS_CREATE | FS_ISDIR) &&
> +           fp_event->data_type == FSNOTIFY_EVENT_DENTRY) {
> +               struct inode *new_inode;
> +
> +               new_inode = bpf_fanotify_data_inode(fp_event);
> +               if (!new_inode)
> +                       goto out;
> +
> +               value = bpf_inode_storage_get(&inode_storage_map, new_inode, 0,
> +                                             BPF_LOCAL_STORAGE_GET_F_CREATE);
> +               if (value) {
> +                       *value = 1;
> +                       added_inode_storage++;
> +               }
> +               bpf_iput(new_inode);
> +       }
> +out:
> +       return FAN_FP_RET_SEND_TO_USERSPACE;
> +}
> +
> +SEC("struct_ops")
> +int BPF_PROG(bpf_fp_init, struct fanotify_fastpath_hook *hook, const char *args)
> +{
> +       return 0;
> +}
> +
> +SEC("struct_ops")
> +void BPF_PROG(bpf_fp_free, struct fanotify_fastpath_hook *hook)
> +{
> +}
> +
> +SEC(".struct_ops.link")
> +struct fanotify_fastpath_ops bpf_fanotify_fastpath_ops = {
> +       .fp_handler = (void *)bpf_fp_handler,
> +       .fp_init = (void *)bpf_fp_init,
> +       .fp_free = (void *)bpf_fp_free,
> +       .name = "_tmp_test_sub_tree",
> +};
> +
> +char _license[] SEC("license") = "GPL";
> --
> 2.43.5
>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC bpf-next fanotify 2/5] samples/fanotify: Add a sample fanotify fastpath handler
  2024-10-31  1:52         ` Song Liu
  2024-11-06 19:40           ` Amir Goldstein
@ 2024-11-07 11:19           ` Amir Goldstein
  2024-11-07 19:39             ` Song Liu
  1 sibling, 1 reply; 29+ messages in thread
From: Amir Goldstein @ 2024-11-07 11:19 UTC (permalink / raw)
  To: Song Liu
  Cc: Jeff Layton, Song Liu, bpf, Linux-Fsdevel, LKML, Kernel Team,
	Andrii Nakryiko, Eduard Zingerman, Alexei Starovoitov,
	Daniel Borkmann, Martin KaFai Lau, Al Viro, Christian Brauner,
	Jan Kara, KP Singh, Matt Bobrowski, repnop@google.com,
	Josef Bacik

On Thu, Oct 31, 2024 at 2:52 AM Song Liu <songliubraving@meta.com> wrote:
>
> Hi Jeff,
>
> > On Oct 30, 2024, at 5:23 PM, Jeff Layton <jlayton@kernel.org> wrote:
>
> [...]
>
> >> If the subtree is all in the same file system, we can attach fanotify to
> >> the whole file system, and then use some dget_parent() and follow_up()
> >> to walk up the directory tree in the fastpath handler. However, if there
> >> are other mount points in the subtree, we will need more logic to handle
> >> these mount points.
> >>
> >
> > My 2 cents...
> >
> > I'd just confine it to a single vfsmount. If you want to monitor in
> > several submounts, then you need to add new fanotify watches.
> >
> > Alternately, maybe there is some way to designate that an entire
> > vfsmount is a child of a watched (or ignored) directory?
> >
> >> @Christian, I would like to know your thoughts on this (walking up the
> >> directory tree in fanotify fastpath handler). It can be expensive for
> >> very very deep subtree.
> >>
> >
> > I'm not Christian, but I'll make the case for it. It's basically a
> > bunch of pointer chasing. That's probably not "cheap", but if you can
> > do it under RCU it might not be too awful. It might still suck with
> > really deep paths, but this is a sample module. It's not expected that
> > everyone will want to use it anyway.
>
> Thanks for the suggestion! I will try to do it under RCU.
>
> >
> >> How should we pass in the subtree? I guess we can just use full path in
> >> a string as the argument.
> >>
> >
> > I'd stay away from string parsing. How about this instead?
> >
> > Allow a process to open a directory fd, and then hand that fd to an
> > fanotify ioctl that says that you want to ignore everything that has
> > that directory as an ancestor. Or, maybe make it so that you only watch
> > dentries that have that directory as an ancestor? I'm not sure what
> > makes the most sense.
>
> Yes, directory fd is another option. Currently, the "attach to group"
> function only takes a string as input. I guess it makes sense to allow
> taking a fd, or maybe we should allow any random format (pass in a
> pointer to a structure. Let me give it a try.
>

IIUC, the BFP program example uses another API to configure the filter
(i.e. the inode map).

IMO, passing any single argument during setup time is not scalable
and any filter should have its own way to reconfigure its parameters
in runtime (i.e. add/remove watched subtree).

Assuming that the same module/bfp_prog serves multiple fanotify
groups and each group may have a different filter config, I think that
passing an integer arg to identify the config (be it fd or something else)
is the most we need for this minimal API.
If we need something more elaborate, we can extend the ioctl size
or add a new ioctl later.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC bpf-next fanotify 1/5] fanotify: Introduce fanotify fastpath handler
  2024-11-07 10:48   ` Jan Kara
@ 2024-11-07 19:13     ` Song Liu
  2024-11-11 14:09       ` Jan Kara
  0 siblings, 1 reply; 29+ messages in thread
From: Song Liu @ 2024-11-07 19:13 UTC (permalink / raw)
  To: Jan Kara
  Cc: Song Liu, bpf, Linux-Fsdevel, LKML, Kernel Team, Andrii Nakryiko,
	Eduard Zingerman, Alexei Starovoitov, Daniel Borkmann,
	Martin KaFai Lau, Al Viro, Christian Brauner, KP Singh,
	Matt Bobrowski, Amir Goldstein, repnop@google.com,
	jlayton@kernel.org, Josef Bacik



> On Nov 7, 2024, at 2:48 AM, Jan Kara <jack@suse.cz> wrote:
> 
> On Tue 29-10-24 16:12:40, Song Liu wrote:
>> fanotify fastpath handler enables handling fanotify events within the
>> kernel, and thus saves a trip to the user space. fanotify fastpath handler
>> can be useful in many use cases. For example, if a user is only interested
>> in events for some files in side a directory, a fastpath handler can be
>> used to filter out irrelevant events.
>> 
>> fanotify fastpath handler is attached to fsnotify_group. At most one
>> fastpath handler can be attached to a fsnotify_group. The attach/detach
>> of fastpath handlers are controlled by two new ioctls on the fanotify fds:
>> FAN_IOC_ADD_FP and FAN_IOC_DEL_FP.
>> 
>> fanotify fastpath handler is packaged in a kernel module. In the future,
>> it is also possible to package fastpath handler in a BPF program. Since
>> loading modules requires CAP_SYS_ADMIN, _loading_ fanotify fastpath
>> handler in kernel modules is limited to CAP_SYS_ADMIN. However,
>> non-SYS_CAP_ADMIN users can _attach_ fastpath handler loaded by sys admin
>> to their fanotify fds. To make fanotify fastpath handler more useful
>> for non-CAP_SYS_ADMIN users, a fastpath handler can take arguments at
>> attach time.
> 
> Hum, I'm not sure I'd be fine as an sysadmin to allow arbitary users to
> attach arbitrary filters to their groups. I might want some filters for
> priviledged programs which know what they are doing (e.g. because the
> filters are expensive) and other filters may be fine for anybody. But
> overall I'd think we'll soon hit requirements for permission control over
> who can attach what... Somebody must have created a solution for this
> already?

I have "flags" in fanotify_fastpath_ops. In an earlier version of my 
local code, I actually have "SYS_ADMIN_ONLY" flag that specifies some
filters are only available to users with CAP_SYS_ADMIN. I removed this 
flag later before sending the first RFC for simplicity. 

The model here (fast path loaded in kernel modules) is similar to 
different TCP congestion control algorithms. Regular user can choose 
which algorithm to use for each TCP connection. This model is 
straightforward because the kernel modules are global. With BPF, we 
have the option not to add the fast path to a global list, so that 
whoever loads the fast path can attach it to specific group (I didn't
include this model in the RFC).

For the first version, I think a SYS_ADMIN_ONLY flag would be good
enough?

Thanks,
Song



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC bpf-next fanotify 2/5] samples/fanotify: Add a sample fanotify fastpath handler
  2024-11-07 11:19           ` Amir Goldstein
@ 2024-11-07 19:39             ` Song Liu
  0 siblings, 0 replies; 29+ messages in thread
From: Song Liu @ 2024-11-07 19:39 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Song Liu, Jeff Layton, Song Liu, bpf, Linux-Fsdevel, LKML,
	Kernel Team, Andrii Nakryiko, Eduard Zingerman,
	Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Al Viro,
	Christian Brauner, Jan Kara, KP Singh, Matt Bobrowski,
	repnop@google.com, Josef Bacik



> On Nov 7, 2024, at 3:19 AM, Amir Goldstein <amir73il@gmail.com> wrote:
> 
> On Thu, Oct 31, 2024 at 2:52 AM Song Liu <songliubraving@meta.com> wrote:
>> 
>> Hi Jeff,
>> 
>>> On Oct 30, 2024, at 5:23 PM, Jeff Layton <jlayton@kernel.org> wrote:
>> 
>> [...]
>> 
>>>> If the subtree is all in the same file system, we can attach fanotify to
>>>> the whole file system, and then use some dget_parent() and follow_up()
>>>> to walk up the directory tree in the fastpath handler. However, if there
>>>> are other mount points in the subtree, we will need more logic to handle
>>>> these mount points.
>>>> 
>>> 
>>> My 2 cents...
>>> 
>>> I'd just confine it to a single vfsmount. If you want to monitor in
>>> several submounts, then you need to add new fanotify watches.
>>> 
>>> Alternately, maybe there is some way to designate that an entire
>>> vfsmount is a child of a watched (or ignored) directory?
>>> 
>>>> @Christian, I would like to know your thoughts on this (walking up the
>>>> directory tree in fanotify fastpath handler). It can be expensive for
>>>> very very deep subtree.
>>>> 
>>> 
>>> I'm not Christian, but I'll make the case for it. It's basically a
>>> bunch of pointer chasing. That's probably not "cheap", but if you can
>>> do it under RCU it might not be too awful. It might still suck with
>>> really deep paths, but this is a sample module. It's not expected that
>>> everyone will want to use it anyway.
>> 
>> Thanks for the suggestion! I will try to do it under RCU.
>> 
>>> 
>>>> How should we pass in the subtree? I guess we can just use full path in
>>>> a string as the argument.
>>>> 
>>> 
>>> I'd stay away from string parsing. How about this instead?
>>> 
>>> Allow a process to open a directory fd, and then hand that fd to an
>>> fanotify ioctl that says that you want to ignore everything that has
>>> that directory as an ancestor. Or, maybe make it so that you only watch
>>> dentries that have that directory as an ancestor? I'm not sure what
>>> makes the most sense.
>> 
>> Yes, directory fd is another option. Currently, the "attach to group"
>> function only takes a string as input. I guess it makes sense to allow
>> taking a fd, or maybe we should allow any random format (pass in a
>> pointer to a structure. Let me give it a try.
>> 
> 
> IIUC, the BFP program example uses another API to configure the filter
> (i.e. the inode map).

With BPF, the users can configure the filter via different BPF maps. 
The inode map is just one example, we can also use task map to create
a different filter for each task (task that generates the event). 

> IMO, passing any single argument during setup time is not scalable
> and any filter should have its own way to reconfigure its parameters
> in runtime (i.e. add/remove watched subtree).
> 
> Assuming that the same module/bfp_prog serves multiple fanotify
> groups and each group may have a different filter config, I think that
> passing an integer arg to identify the config (be it fd or something else)
> is the most we need for this minimal API.
> If we need something more elaborate, we can extend the ioctl size
> or add a new ioctl later.

With my local code, which is slightly different to the RFC, I have 
the ioctl pass in a pointer to fanotify_fastpath_args. 

struct fanotify_fastpath_args {
        char name[FAN_FP_NAME_MAX];

        __u32 version;
        __u32 flags;

        /*
         * user space pointer to the init args of fastpath handler,
         * up to init_args_len (<= FAN_FP_ARGS_MAX).
         */
        __u64 init_args;
        /* size of init_args */
        __u32 init_args_size;
} __attribute__((__packed__));

fanotify_fastpath_args->init_args is a user pointer to a custom (per
fast path) structure. Then fanotify_fastpath_args->init_args will be 
passed to fanotify_fastpath_ops->fp_init(). 

I think this is flexible enough for the "attach fast path to a group"
operation. If we want to reconfigure the fast path later, we may 
need another API. 

Thanks,
Song


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC bpf-next fanotify 2/5] samples/fanotify: Add a sample fanotify fastpath handler
  2024-11-07 10:41             ` Jan Kara
@ 2024-11-07 19:39               ` Song Liu
  0 siblings, 0 replies; 29+ messages in thread
From: Song Liu @ 2024-11-07 19:39 UTC (permalink / raw)
  To: Jan Kara
  Cc: Amir Goldstein, Song Liu, Jeff Layton, Song Liu, bpf,
	Linux-Fsdevel, LKML, Kernel Team, Andrii Nakryiko,
	Eduard Zingerman, Alexei Starovoitov, Daniel Borkmann,
	Martin KaFai Lau, Al Viro, Christian Brauner, KP Singh,
	Matt Bobrowski, repnop@google.com, Josef Bacik



> On Nov 7, 2024, at 2:41 AM, Jan Kara <jack@suse.cz> wrote:
> 
> On Wed 06-11-24 20:40:50, Amir Goldstein wrote:
>> On Thu, Oct 31, 2024 at 2:52 AM Song Liu <songliubraving@meta.com> wrote:
>>>> Alternately, maybe there is some way to designate that an entire
>>>> vfsmount is a child of a watched (or ignored) directory?
>>>> 
>>>>> @Christian, I would like to know your thoughts on this (walking up the
>>>>> directory tree in fanotify fastpath handler). It can be expensive for
>>>>> very very deep subtree.
>>>>> 
>>>> 
>>>> I'm not Christian, but I'll make the case for it. It's basically a
>>>> bunch of pointer chasing. That's probably not "cheap", but if you can
>>>> do it under RCU it might not be too awful. It might still suck with
>>>> really deep paths, but this is a sample module. It's not expected that
>>>> everyone will want to use it anyway.
>>> 
>>> Thanks for the suggestion! I will try to do it under RCU.
>>> 
>> 
>> That's the cost of doing a subtree filter.
>> Not sure how it could be avoided?
> 
> Yes. For a real solution (not this example), we'd probably have to limit
> the parent walk to say 16 steps and if we can reach neither root nor our
> dir in that number of steps, we'll just chicken out and pass the event to
> userspace to deal with it. This way the kernel filter will deal with most
> cases anyway and we won't risk livelocking or too big performance overhead
> of the filter.
> 
> For this example, I think using fs/dcache.c:is_subdir() will be OK for
> demonstration purposes.

Thanks for the pointer! It does look like a good solution for this example.

Song


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC bpf-next fanotify 5/5] selftests/bpf: Add test for BPF based fanotify fastpath handler
  2024-11-07 11:10   ` Amir Goldstein
@ 2024-11-07 19:53     ` Song Liu
  2024-11-07 20:33       ` Amir Goldstein
  0 siblings, 1 reply; 29+ messages in thread
From: Song Liu @ 2024-11-07 19:53 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Song Liu, bpf, Linux-Fsdevel, LKML, Kernel Team,
	andrii@kernel.org, eddyz87@gmail.com, ast@kernel.org,
	daniel@iogearbox.net, martin.lau@linux.dev,
	viro@zeniv.linux.org.uk, brauner@kernel.org, jack@suse.cz,
	kpsingh@kernel.org, mattbobrowski@google.com, repnop@google.com,
	jlayton@kernel.org, Josef Bacik



> On Nov 7, 2024, at 3:10 AM, Amir Goldstein <amir73il@gmail.com> wrote:
> 
> On Wed, Oct 30, 2024 at 12:13 AM Song Liu <song@kernel.org> wrote:
>> 
>> This test shows a simplified logic that monitors a subtree. This is
>> simplified as it doesn't handle all the scenarios, such as:
>> 
>>  1) moving a subsubtree into/outof the being monitoring subtree;
> 
> There is a solution for that (see below)
> 
>>  2) mount point inside the being monitored subtree
> 
> For that we will need to add the MOUNT/UNMOUNT/MOVE_MOUNT events,
> but those have been requested by userspace anyway.
> 
>> 
>> Therefore, this is not to show a way to reliably monitor a subtree.
>> Instead, this is to test the functionalities of bpf based fastpath.
>> To really monitor a subtree reliably, we will need more complex logic.
> 
> Actually, this example is the foundation of my vision for efficient and race
> free subtree filtering:
> 
> 1. The inode map is to be treated as a cache for the is_subdir() query

Using is_subdir() as the truth and managing the cache in inode map seems
promising to me. 

> 2. Cache entries can also have a "distance from root" (i.e. depth) value
> 3. Each unknown queried path can call is_subdir() and populate the cache
>    entries for all ancestors
> 4. The cache/map size should be limited and when limit is reached,
>    evicting entries by depth priority makes sense
> 5. A rename event for a directory whose inode is in the map and whose
>   new parent is not in the map or has a different value than old parent
>   needs to invalidate the entire map
> 6. fast_path also needs a hook from inode evict to clear cache entries

The inode map is physically attached to the inode itself. So the evict 
event is automatically handled. IOW, an inode's entry in the inode map
is automatically removed when the inode is freed. For the same reason, 
we don't need to set a limit in map size and add evicting logic. Of 
course, this works based on the assumption that we don't use too much 
memory for each inode. I think this assumption is true. 

> 
> This is similar, but more efficient and race free than what could already
> be achieved in userspace using FAN_MARK_EVICTABLE.
> 
> Food for thought.

Thanks,
Song


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC bpf-next fanotify 5/5] selftests/bpf: Add test for BPF based fanotify fastpath handler
  2024-11-07 19:53     ` Song Liu
@ 2024-11-07 20:33       ` Amir Goldstein
  2024-11-07 20:48         ` Song Liu
  0 siblings, 1 reply; 29+ messages in thread
From: Amir Goldstein @ 2024-11-07 20:33 UTC (permalink / raw)
  To: Song Liu
  Cc: Song Liu, bpf, Linux-Fsdevel, LKML, Kernel Team,
	andrii@kernel.org, eddyz87@gmail.com, ast@kernel.org,
	daniel@iogearbox.net, martin.lau@linux.dev,
	viro@zeniv.linux.org.uk, brauner@kernel.org, jack@suse.cz,
	kpsingh@kernel.org, mattbobrowski@google.com, repnop@google.com,
	jlayton@kernel.org, Josef Bacik

On Thu, Nov 7, 2024 at 8:53 PM Song Liu <songliubraving@meta.com> wrote:
>
>
>
> > On Nov 7, 2024, at 3:10 AM, Amir Goldstein <amir73il@gmail.com> wrote:
> >
> > On Wed, Oct 30, 2024 at 12:13 AM Song Liu <song@kernel.org> wrote:
> >>
> >> This test shows a simplified logic that monitors a subtree. This is
> >> simplified as it doesn't handle all the scenarios, such as:
> >>
> >>  1) moving a subsubtree into/outof the being monitoring subtree;
> >
> > There is a solution for that (see below)
> >
> >>  2) mount point inside the being monitored subtree
> >
> > For that we will need to add the MOUNT/UNMOUNT/MOVE_MOUNT events,
> > but those have been requested by userspace anyway.
> >
> >>
> >> Therefore, this is not to show a way to reliably monitor a subtree.
> >> Instead, this is to test the functionalities of bpf based fastpath.
> >> To really monitor a subtree reliably, we will need more complex logic.
> >
> > Actually, this example is the foundation of my vision for efficient and race
> > free subtree filtering:
> >
> > 1. The inode map is to be treated as a cache for the is_subdir() query
>
> Using is_subdir() as the truth and managing the cache in inode map seems
> promising to me.
>
> > 2. Cache entries can also have a "distance from root" (i.e. depth) value
> > 3. Each unknown queried path can call is_subdir() and populate the cache
> >    entries for all ancestors
> > 4. The cache/map size should be limited and when limit is reached,
> >    evicting entries by depth priority makes sense
> > 5. A rename event for a directory whose inode is in the map and whose
> >   new parent is not in the map or has a different value than old parent
> >   needs to invalidate the entire map
> > 6. fast_path also needs a hook from inode evict to clear cache entries
>
> The inode map is physically attached to the inode itself. So the evict
> event is automatically handled. IOW, an inode's entry in the inode map
> is automatically removed when the inode is freed. For the same reason,
> we don't need to set a limit in map size and add evicting logic. Of
> course, this works based on the assumption that we don't use too much
> memory for each inode. I think this assumption is true.

Oh no, it is definitely wrong each inode is around 1K and this was the
main incentive to implement FAN_MARK_EVICTABLE and
FAN_MARK_FILESYSTEK in the first place, because recursive
pinning of all inodes in a large tree is not scalable to large trees.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC bpf-next fanotify 5/5] selftests/bpf: Add test for BPF based fanotify fastpath handler
  2024-11-07 20:33       ` Amir Goldstein
@ 2024-11-07 20:48         ` Song Liu
  2024-11-08  8:18           ` Amir Goldstein
  0 siblings, 1 reply; 29+ messages in thread
From: Song Liu @ 2024-11-07 20:48 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Song Liu, bpf, Linux-Fsdevel, LKML, Kernel Team,
	andrii@kernel.org, eddyz87@gmail.com, ast@kernel.org,
	daniel@iogearbox.net, martin.lau@linux.dev,
	viro@zeniv.linux.org.uk, brauner@kernel.org, jack@suse.cz,
	kpsingh@kernel.org, mattbobrowski@google.com, repnop@google.com,
	jlayton@kernel.org, Josef Bacik

On Thu, Nov 7, 2024 at 12:34 PM Amir Goldstein <amir73il@gmail.com> wrote:
>
> On Thu, Nov 7, 2024 at 8:53 PM Song Liu <songliubraving@meta.com> wrote:
> >
> >
> >
> > > On Nov 7, 2024, at 3:10 AM, Amir Goldstein <amir73il@gmail.com> wrote:
> > >
> > > On Wed, Oct 30, 2024 at 12:13 AM Song Liu <song@kernel.org> wrote:
> > >>
> > >> This test shows a simplified logic that monitors a subtree. This is
> > >> simplified as it doesn't handle all the scenarios, such as:
> > >>
> > >>  1) moving a subsubtree into/outof the being monitoring subtree;
> > >
> > > There is a solution for that (see below)
> > >
> > >>  2) mount point inside the being monitored subtree
> > >
> > > For that we will need to add the MOUNT/UNMOUNT/MOVE_MOUNT events,
> > > but those have been requested by userspace anyway.
> > >
> > >>
> > >> Therefore, this is not to show a way to reliably monitor a subtree.
> > >> Instead, this is to test the functionalities of bpf based fastpath.
> > >> To really monitor a subtree reliably, we will need more complex logic.
> > >
> > > Actually, this example is the foundation of my vision for efficient and race
> > > free subtree filtering:
> > >
> > > 1. The inode map is to be treated as a cache for the is_subdir() query
> >
> > Using is_subdir() as the truth and managing the cache in inode map seems
> > promising to me.
> >
> > > 2. Cache entries can also have a "distance from root" (i.e. depth) value
> > > 3. Each unknown queried path can call is_subdir() and populate the cache
> > >    entries for all ancestors
> > > 4. The cache/map size should be limited and when limit is reached,
> > >    evicting entries by depth priority makes sense
> > > 5. A rename event for a directory whose inode is in the map and whose
> > >   new parent is not in the map or has a different value than old parent
> > >   needs to invalidate the entire map
> > > 6. fast_path also needs a hook from inode evict to clear cache entries
> >
> > The inode map is physically attached to the inode itself. So the evict
> > event is automatically handled. IOW, an inode's entry in the inode map
> > is automatically removed when the inode is freed. For the same reason,
> > we don't need to set a limit in map size and add evicting logic. Of
> > course, this works based on the assumption that we don't use too much
> > memory for each inode. I think this assumption is true.
>
> Oh no, it is definitely wrong each inode is around 1K and this was the
> main incentive to implement FAN_MARK_EVICTABLE and
> FAN_MARK_FILESYSTEK in the first place, because recursive
> pinning of all inodes in a large tree is not scalable to large trees.

The inode map does not pin the inode in memory. The inode will
still be evicted as normal. The entry associated with the being
evicted inode in the map will be freed together as the inoide is freed.
The overhead I mentioned was the extra few bytes (for the flag, etc.)
per inode. When an evicted inode is loaded again, we will use
is_subdir() to recreate the entry in the inode map.

Does this make sense and address the concern?

Thanks,
Song

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC bpf-next fanotify 5/5] selftests/bpf: Add test for BPF based fanotify fastpath handler
  2024-11-07 20:48         ` Song Liu
@ 2024-11-08  8:18           ` Amir Goldstein
  0 siblings, 0 replies; 29+ messages in thread
From: Amir Goldstein @ 2024-11-08  8:18 UTC (permalink / raw)
  To: Song Liu
  Cc: Song Liu, bpf, Linux-Fsdevel, LKML, Kernel Team,
	andrii@kernel.org, eddyz87@gmail.com, ast@kernel.org,
	daniel@iogearbox.net, martin.lau@linux.dev,
	viro@zeniv.linux.org.uk, brauner@kernel.org, jack@suse.cz,
	kpsingh@kernel.org, mattbobrowski@google.com, repnop@google.com,
	jlayton@kernel.org, Josef Bacik

On Thu, Nov 7, 2024 at 9:48 PM Song Liu <song@kernel.org> wrote:
>
> On Thu, Nov 7, 2024 at 12:34 PM Amir Goldstein <amir73il@gmail.com> wrote:
> >
> > On Thu, Nov 7, 2024 at 8:53 PM Song Liu <songliubraving@meta.com> wrote:
> > >
> > >
> > >
> > > > On Nov 7, 2024, at 3:10 AM, Amir Goldstein <amir73il@gmail.com> wrote:
> > > >
> > > > On Wed, Oct 30, 2024 at 12:13 AM Song Liu <song@kernel.org> wrote:
> > > >>
> > > >> This test shows a simplified logic that monitors a subtree. This is
> > > >> simplified as it doesn't handle all the scenarios, such as:
> > > >>
> > > >>  1) moving a subsubtree into/outof the being monitoring subtree;
> > > >
> > > > There is a solution for that (see below)
> > > >
> > > >>  2) mount point inside the being monitored subtree
> > > >
> > > > For that we will need to add the MOUNT/UNMOUNT/MOVE_MOUNT events,
> > > > but those have been requested by userspace anyway.
> > > >
> > > >>
> > > >> Therefore, this is not to show a way to reliably monitor a subtree.
> > > >> Instead, this is to test the functionalities of bpf based fastpath.
> > > >> To really monitor a subtree reliably, we will need more complex logic.
> > > >
> > > > Actually, this example is the foundation of my vision for efficient and race
> > > > free subtree filtering:
> > > >
> > > > 1. The inode map is to be treated as a cache for the is_subdir() query
> > >
> > > Using is_subdir() as the truth and managing the cache in inode map seems
> > > promising to me.
> > >
> > > > 2. Cache entries can also have a "distance from root" (i.e. depth) value
> > > > 3. Each unknown queried path can call is_subdir() and populate the cache
> > > >    entries for all ancestors
> > > > 4. The cache/map size should be limited and when limit is reached,
> > > >    evicting entries by depth priority makes sense
> > > > 5. A rename event for a directory whose inode is in the map and whose
> > > >   new parent is not in the map or has a different value than old parent
> > > >   needs to invalidate the entire map
> > > > 6. fast_path also needs a hook from inode evict to clear cache entries
> > >
> > > The inode map is physically attached to the inode itself. So the evict
> > > event is automatically handled. IOW, an inode's entry in the inode map
> > > is automatically removed when the inode is freed. For the same reason,
> > > we don't need to set a limit in map size and add evicting logic. Of
> > > course, this works based on the assumption that we don't use too much
> > > memory for each inode. I think this assumption is true.
> >
> > Oh no, it is definitely wrong each inode is around 1K and this was the
> > main incentive to implement FAN_MARK_EVICTABLE and
> > FAN_MARK_FILESYSTEK in the first place, because recursive
> > pinning of all inodes in a large tree is not scalable to large trees.
>
> The inode map does not pin the inode in memory. The inode will
> still be evicted as normal. The entry associated with the being
> evicted inode in the map will be freed together as the inoide is freed.
> The overhead I mentioned was the extra few bytes (for the flag, etc.)
> per inode. When an evicted inode is loaded again, we will use
> is_subdir() to recreate the entry in the inode map.
>
> Does this make sense and address the concern?
>

Yes, that sounds good.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC bpf-next fanotify 1/5] fanotify: Introduce fanotify fastpath handler
  2024-11-07 19:13     ` Song Liu
@ 2024-11-11 14:09       ` Jan Kara
  0 siblings, 0 replies; 29+ messages in thread
From: Jan Kara @ 2024-11-11 14:09 UTC (permalink / raw)
  To: Song Liu
  Cc: Jan Kara, Song Liu, bpf, Linux-Fsdevel, LKML, Kernel Team,
	Andrii Nakryiko, Eduard Zingerman, Alexei Starovoitov,
	Daniel Borkmann, Martin KaFai Lau, Al Viro, Christian Brauner,
	KP Singh, Matt Bobrowski, Amir Goldstein, repnop@google.com,
	jlayton@kernel.org, Josef Bacik

On Thu 07-11-24 19:13:23, Song Liu wrote:
> > On Nov 7, 2024, at 2:48 AM, Jan Kara <jack@suse.cz> wrote:
> > On Tue 29-10-24 16:12:40, Song Liu wrote:
> >> fanotify fastpath handler enables handling fanotify events within the
> >> kernel, and thus saves a trip to the user space. fanotify fastpath handler
> >> can be useful in many use cases. For example, if a user is only interested
> >> in events for some files in side a directory, a fastpath handler can be
> >> used to filter out irrelevant events.
> >> 
> >> fanotify fastpath handler is attached to fsnotify_group. At most one
> >> fastpath handler can be attached to a fsnotify_group. The attach/detach
> >> of fastpath handlers are controlled by two new ioctls on the fanotify fds:
> >> FAN_IOC_ADD_FP and FAN_IOC_DEL_FP.
> >> 
> >> fanotify fastpath handler is packaged in a kernel module. In the future,
> >> it is also possible to package fastpath handler in a BPF program. Since
> >> loading modules requires CAP_SYS_ADMIN, _loading_ fanotify fastpath
> >> handler in kernel modules is limited to CAP_SYS_ADMIN. However,
> >> non-SYS_CAP_ADMIN users can _attach_ fastpath handler loaded by sys admin
> >> to their fanotify fds. To make fanotify fastpath handler more useful
> >> for non-CAP_SYS_ADMIN users, a fastpath handler can take arguments at
> >> attach time.
> > 
> > Hum, I'm not sure I'd be fine as an sysadmin to allow arbitary users to
> > attach arbitrary filters to their groups. I might want some filters for
> > priviledged programs which know what they are doing (e.g. because the
> > filters are expensive) and other filters may be fine for anybody. But
> > overall I'd think we'll soon hit requirements for permission control over
> > who can attach what... Somebody must have created a solution for this
> > already?
> 
> I have "flags" in fanotify_fastpath_ops. In an earlier version of my 
> local code, I actually have "SYS_ADMIN_ONLY" flag that specifies some
> filters are only available to users with CAP_SYS_ADMIN. I removed this 
> flag later before sending the first RFC for simplicity. 
> 
> The model here (fast path loaded in kernel modules) is similar to 
> different TCP congestion control algorithms. Regular user can choose 
> which algorithm to use for each TCP connection. This model is 
> straightforward because the kernel modules are global. With BPF, we 
> have the option not to add the fast path to a global list, so that 
> whoever loads the fast path can attach it to specific group (I didn't
> include this model in the RFC).
> 
> For the first version, I think a SYS_ADMIN_ONLY flag would be good
> enough?

Yes, initially that should be enough.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2024-11-11 14:09 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-29 23:12 [RFC bpf-next fanotify 0/5] Fanotify fastpath handler Song Liu
2024-10-29 23:12 ` [RFC bpf-next fanotify 1/5] fanotify: Introduce fanotify " Song Liu
2024-10-30 12:45   ` Jeff Layton
2024-11-07 10:48   ` Jan Kara
2024-11-07 19:13     ` Song Liu
2024-11-11 14:09       ` Jan Kara
2024-10-29 23:12 ` [RFC bpf-next fanotify 2/5] samples/fanotify: Add a sample " Song Liu
2024-10-30  0:11   ` Al Viro
2024-10-30  1:48     ` Al Viro
2024-10-30  2:07     ` Song Liu
2024-10-30  2:35       ` Song Liu
2024-10-30 13:03   ` Jeff Layton
2024-10-30 20:30     ` Song Liu
2024-10-31  0:23       ` Jeff Layton
2024-10-31  1:52         ` Song Liu
2024-11-06 19:40           ` Amir Goldstein
2024-11-06 22:10             ` Song Liu
2024-11-07 10:41             ` Jan Kara
2024-11-07 19:39               ` Song Liu
2024-11-07 11:19           ` Amir Goldstein
2024-11-07 19:39             ` Song Liu
2024-10-29 23:12 ` [RFC bpf-next fanotify 3/5] bpf: Make bpf inode storage available to tracing programs Song Liu
2024-10-29 23:12 ` [RFC bpf-next fanotify 4/5] fanotify: Enable bpf based fanotify fastpath handler Song Liu
2024-10-29 23:12 ` [RFC bpf-next fanotify 5/5] selftests/bpf: Add test for BPF " Song Liu
2024-11-07 11:10   ` Amir Goldstein
2024-11-07 19:53     ` Song Liu
2024-11-07 20:33       ` Amir Goldstein
2024-11-07 20:48         ` Song Liu
2024-11-08  8:18           ` Amir Goldstein

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox