BPF List
 help / color / mirror / Atom feed
* [PATCH 00/26] FUSE BPF: A Stacked Filesystem Extension for FUSE
@ 2022-09-26 23:17 Daniel Rosenberg
  2022-09-26 23:17 ` [PATCH 01/26] bpf: verifier: Allow for multiple packets Daniel Rosenberg
                   ` (25 more replies)
  0 siblings, 26 replies; 38+ messages in thread
From: Daniel Rosenberg @ 2022-09-26 23:17 UTC (permalink / raw)
  To: Miklos Szeredi, Alexei Starovoitov, Daniel Borkmann,
	John Fastabend
  Cc: Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Daniel Rosenberg, Paul Lawrence, Alessio Balsini, David Anderson,
	Sandeep Patil, linux-fsdevel, bpf, kernel-team

These patches extend FUSE to be able to act as a stacked filesystem. This
allows pure passthrough, where the fuse file system simply reflects the lower
filesystem, and also allows optional pre and post filtering in BPF and/or the
userspace daemon as needed. This can dramatically reduce or even eliminate
transitions to and from userspace.

Currently, we either set the backing file/bpf at mount time at the root level,
or at lookup time, via an optional block added at the end of the lookup return
call. The added lookup block contains an fd for the backing file/folder and bpf
if necessary, or a signal to clear or inherit the parent values. We're looking
into two options for extending this to mkdir/mknod/etc, as we currently only
support setting the backing to a pre-existing file, although naturally you can
create new ones. When we're doing a lookup for create, we could pass an
fd for the parent dir and the name of the backing file we're creating. This has
the benefit of avoiding an additional call to userspace, but requires hanging
on to some data in a negative dentry where there is no elegant place to store it.
Another option is adding the same block we added to lookup to the create type
op codes. This keeps that code more uniform, but means userspace must implement
that logic in more areas.

As is, the patches definitely need some work before they're ready. We still
need to go through and ensure we respect changed filter values/disallow changes
that don't make sense. We aren't currently calling mnt_want_write for the lower
calls where appropriate, and we don't have an override_creds layer either. We
also plan to add to our read/write iter filters to allow for more interesting
use cases. There are also probably some node id inconsistencies. For nodes that
will be completely passthrough, we give an id of 0.

For the BPF verification side, we have currently set things set up in the old
style, with a new bpf program type and helper functions. From LPC, my
understanding is that newer bpf additions are done in a new style, so I imagine
much of that will need to be redone as well, but hopefully these patches get
across what our needs there are.

For testing, we've provided the selftest code we have been using. We also have
a mode to run with no userspace daemon in a pure passthrough mode that I have
been running xfstests over to get some coverage on the backing operation code.
I had to modify mounts/unmounts to get that running, along with some other
small touch ups. The most notable failure I currently see there is in
generic/126, which I suspect is likely related to override_creds.


Alessio Balsini (1):
  fs: Generic function to convert iocb to rw flags

Daniel Rosenberg (25):
  bpf: verifier: Allow for multiple packets
  bpf: verifier: Allow single packet invalidation
  fuse-bpf: Update uapi for fuse-bpf
  fuse-bpf: Add BPF supporting functions
  bpf: Export bpf_prog_fops
  fuse-bpf: Prepare for fuse-bpf patch
  fuse: Add fuse-bpf, a stacked fs extension for FUSE
  fuse-bpf: Don't support export_operations
  fuse-bpf: Partially add mapping support
  fuse-bpf: Add lseek support
  fuse-bpf: Add support for fallocate
  fuse-bpf: Support file/dir open/close
  fuse-bpf: Support mknod/unlink/mkdir/rmdir
  fuse-bpf: Add support for read/write iter
  fuse-bpf: support FUSE_READDIR
  fuse-bpf: Add support for sync operations
  fuse-bpf: Add Rename support
  fuse-bpf: Add attr support
  fuse-bpf: Add support for FUSE_COPY_FILE_RANGE
  fuse-bpf: Add xattr support
  fuse-bpf: Add symlink/link support
  fuse-bpf: allow mounting with no userspace daemon
  fuse-bpf: Call bpf for pre/post filters
  fuse-bpf: Add userspace pre/post filters
  fuse-bpf: Add selftests

 fs/fuse/Kconfig                               |   10 +
 fs/fuse/Makefile                              |    1 +
 fs/fuse/backing.c                             | 2753 +++++++++++++++++
 fs/fuse/control.c                             |    2 +-
 fs/fuse/dev.c                                 |   33 +-
 fs/fuse/dir.c                                 |  443 ++-
 fs/fuse/file.c                                |  125 +-
 fs/fuse/fuse_i.h                              |  804 ++++-
 fs/fuse/inode.c                               |  292 +-
 fs/fuse/ioctl.c                               |    2 +-
 fs/fuse/readdir.c                             |   22 +
 fs/fuse/xattr.c                               |   36 +
 fs/overlayfs/file.c                           |   23 +-
 include/linux/bpf.h                           |    4 +
 include/linux/bpf_fuse.h                      |   64 +
 include/linux/bpf_types.h                     |    4 +
 include/linux/bpf_verifier.h                  |    5 +-
 include/linux/fs.h                            |    5 +
 include/uapi/linux/bpf.h                      |   33 +
 include/uapi/linux/fuse.h                     |   19 +-
 kernel/bpf/Makefile                           |    4 +
 kernel/bpf/bpf_fuse.c                         |  342 ++
 kernel/bpf/btf.c                              |    1 +
 kernel/bpf/core.c                             |    5 +
 kernel/bpf/syscall.c                          |    1 +
 kernel/bpf/verifier.c                         |  144 +-
 tools/include/uapi/linux/bpf.h                |   33 +
 tools/include/uapi/linux/fuse.h               | 1066 +++++++
 .../selftests/filesystems/fuse/.gitignore     |    2 +
 .../selftests/filesystems/fuse/Makefile       |   41 +
 .../testing/selftests/filesystems/fuse/OWNERS |    2 +
 .../selftests/filesystems/fuse/bpf_loader.c   |  798 +++++
 .../testing/selftests/filesystems/fuse/fd.txt |   21 +
 .../selftests/filesystems/fuse/fd_bpf.c       |  370 +++
 .../selftests/filesystems/fuse/fuse_daemon.c  |  294 ++
 .../selftests/filesystems/fuse/fuse_test.c    | 2147 +++++++++++++
 .../selftests/filesystems/fuse/test_bpf.c     |  800 +++++
 .../filesystems/fuse/test_framework.h         |  173 ++
 .../selftests/filesystems/fuse/test_fuse.h    |  328 ++
 39 files changed, 11017 insertions(+), 235 deletions(-)
 create mode 100644 fs/fuse/backing.c
 create mode 100644 include/linux/bpf_fuse.h
 create mode 100644 kernel/bpf/bpf_fuse.c
 create mode 100644 tools/include/uapi/linux/fuse.h
 create mode 100644 tools/testing/selftests/filesystems/fuse/.gitignore
 create mode 100644 tools/testing/selftests/filesystems/fuse/Makefile
 create mode 100644 tools/testing/selftests/filesystems/fuse/OWNERS
 create mode 100644 tools/testing/selftests/filesystems/fuse/bpf_loader.c
 create mode 100644 tools/testing/selftests/filesystems/fuse/fd.txt
 create mode 100644 tools/testing/selftests/filesystems/fuse/fd_bpf.c
 create mode 100644 tools/testing/selftests/filesystems/fuse/fuse_daemon.c
 create mode 100644 tools/testing/selftests/filesystems/fuse/fuse_test.c
 create mode 100644 tools/testing/selftests/filesystems/fuse/test_bpf.c
 create mode 100644 tools/testing/selftests/filesystems/fuse/test_framework.h
 create mode 100644 tools/testing/selftests/filesystems/fuse/test_fuse.h


base-commit: bf682942cd26ce9cd5e87f73ae099b383041e782
-- 
2.37.3.998.g577e59143f-goog


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 01/26] bpf: verifier: Allow for multiple packets
  2022-09-26 23:17 [PATCH 00/26] FUSE BPF: A Stacked Filesystem Extension for FUSE Daniel Rosenberg
@ 2022-09-26 23:17 ` Daniel Rosenberg
  2022-09-26 23:17 ` [PATCH 02/26] bpf: verifier: Allow single packet invalidation Daniel Rosenberg
                   ` (24 subsequent siblings)
  25 siblings, 0 replies; 38+ messages in thread
From: Daniel Rosenberg @ 2022-09-26 23:17 UTC (permalink / raw)
  To: Miklos Szeredi, Alexei Starovoitov, Daniel Borkmann,
	John Fastabend
  Cc: Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Daniel Rosenberg, Paul Lawrence, Alessio Balsini, David Anderson,
	Sandeep Patil, linux-fsdevel, bpf, kernel-team

This allows multiple PTR_TO_PACKETs for a single bpf program. Fuse bpf
uses this to handle the various input and output types it has.

Signed-off-by: Daniel Rosenberg <drosen@google.com>
Signed-off-by: Paul Lawrence <paullawrence@google.com>
---
 include/linux/bpf.h          |  1 +
 include/linux/bpf_verifier.h |  5 ++-
 kernel/bpf/verifier.c        | 60 +++++++++++++++++++++++-------------
 3 files changed, 43 insertions(+), 23 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 20c26aed7896..07086e375487 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -633,6 +633,7 @@ struct bpf_insn_access_aux {
 			struct btf *btf;
 			u32 btf_id;
 		};
+		int data_id;
 	};
 	struct bpf_verifier_log *log; /* for verbose logs */
 };
diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index 2e3bad8640dc..feae965e08a4 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -50,7 +50,10 @@ struct bpf_reg_state {
 	s32 off;
 	union {
 		/* valid when type == PTR_TO_PACKET */
-		int range;
+		struct {
+			int range;
+			u32 data_id;
+		};
 
 		/* valid when type == CONST_PTR_TO_MAP | PTR_TO_MAP_VALUE |
 		 *   PTR_TO_MAP_VALUE_OR_NULL
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 3eadb14e090b..d28cb22d5ee5 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -3544,8 +3544,9 @@ static int __check_mem_access(struct bpf_verifier_env *env, int regno,
 	case PTR_TO_PACKET:
 	case PTR_TO_PACKET_META:
 	case PTR_TO_PACKET_END:
-		verbose(env, "invalid access to packet, off=%d size=%d, R%d(id=%d,off=%d,r=%d)\n",
-			off, size, regno, reg->id, off, mem_size);
+		verbose(env,
+			"invalid access to packet %d, off=%d size=%d, R%d(id=%d,off=%d,r=%d)\n",
+			reg->data_id, off, size, regno, reg->id, off, mem_size);
 		break;
 	case PTR_TO_MEM:
 	default:
@@ -3938,7 +3939,7 @@ static int check_packet_access(struct bpf_verifier_env *env, u32 regno, int off,
 /* check access to 'struct bpf_context' fields.  Supports fixed offsets only */
 static int check_ctx_access(struct bpf_verifier_env *env, int insn_idx, int off, int size,
 			    enum bpf_access_type t, enum bpf_reg_type *reg_type,
-			    struct btf **btf, u32 *btf_id)
+			    struct btf **btf, u32 *btf_id, u32 *data_id)
 {
 	struct bpf_insn_access_aux info = {
 		.reg_type = *reg_type,
@@ -3959,6 +3960,8 @@ static int check_ctx_access(struct bpf_verifier_env *env, int insn_idx, int off,
 		if (base_type(*reg_type) == PTR_TO_BTF_ID) {
 			*btf = info.btf;
 			*btf_id = info.btf_id;
+		} else if (*reg_type == PTR_TO_PACKET || *reg_type == PTR_TO_PACKET_END) {
+			*data_id = info.data_id;
 		} else {
 			env->insn_aux_data[insn_idx].ctx_field_size = info.ctx_field_size;
 		}
@@ -4788,6 +4791,7 @@ static int check_mem_access(struct bpf_verifier_env *env, int insn_idx, u32 regn
 		enum bpf_reg_type reg_type = SCALAR_VALUE;
 		struct btf *btf = NULL;
 		u32 btf_id = 0;
+		u32 data_id = 0;
 
 		if (t == BPF_WRITE && value_regno >= 0 &&
 		    is_pointer_value(env, value_regno)) {
@@ -4800,7 +4804,7 @@ static int check_mem_access(struct bpf_verifier_env *env, int insn_idx, u32 regn
 			return err;
 
 		err = check_ctx_access(env, insn_idx, off, size, t, &reg_type, &btf,
-				       &btf_id);
+				       &btf_id, &data_id);
 		if (err)
 			verbose_linfo(env, insn_idx, "; ");
 		if (!err && t == BPF_READ && value_regno >= 0) {
@@ -4824,6 +4828,10 @@ static int check_mem_access(struct bpf_verifier_env *env, int insn_idx, u32 regn
 				if (base_type(reg_type) == PTR_TO_BTF_ID) {
 					regs[value_regno].btf = btf;
 					regs[value_regno].btf_id = btf_id;
+				} else if (reg_type == PTR_TO_PACKET ||
+				    reg_type == PTR_TO_PACKET_END ||
+				    reg_type == PTR_TO_PACKET_META) {
+					regs[value_regno].data_id = data_id;
 				}
 			}
 			regs[value_regno].type = reg_type;
@@ -9921,18 +9929,20 @@ static bool try_match_pkt_pointers(const struct bpf_insn *insn,
 
 	switch (BPF_OP(insn->code)) {
 	case BPF_JGT:
-		if ((dst_reg->type == PTR_TO_PACKET &&
+		if (dst_reg->data_id == src_reg->data_id &&
+		    ((dst_reg->type == PTR_TO_PACKET &&
 		     src_reg->type == PTR_TO_PACKET_END) ||
 		    (dst_reg->type == PTR_TO_PACKET_META &&
-		     reg_is_init_pkt_pointer(src_reg, PTR_TO_PACKET))) {
+		     reg_is_init_pkt_pointer(src_reg, PTR_TO_PACKET)))) {
 			/* pkt_data' > pkt_end, pkt_meta' > pkt_data */
 			find_good_pkt_pointers(this_branch, dst_reg,
 					       dst_reg->type, false);
 			mark_pkt_end(other_branch, insn->dst_reg, true);
-		} else if ((dst_reg->type == PTR_TO_PACKET_END &&
+		} else if (dst_reg->data_id == src_reg->data_id &&
+			   ((dst_reg->type == PTR_TO_PACKET_END &&
 			    src_reg->type == PTR_TO_PACKET) ||
 			   (reg_is_init_pkt_pointer(dst_reg, PTR_TO_PACKET) &&
-			    src_reg->type == PTR_TO_PACKET_META)) {
+			    src_reg->type == PTR_TO_PACKET_META))) {
 			/* pkt_end > pkt_data', pkt_data > pkt_meta' */
 			find_good_pkt_pointers(other_branch, src_reg,
 					       src_reg->type, true);
@@ -9942,18 +9952,20 @@ static bool try_match_pkt_pointers(const struct bpf_insn *insn,
 		}
 		break;
 	case BPF_JLT:
-		if ((dst_reg->type == PTR_TO_PACKET &&
-		     src_reg->type == PTR_TO_PACKET_END) ||
+		if (dst_reg->data_id == src_reg->data_id &&
+		    ((dst_reg->type == PTR_TO_PACKET &&
+		     src_reg->type == PTR_TO_PACKET_END && dst_reg->data_id == src_reg->data_id) ||
 		    (dst_reg->type == PTR_TO_PACKET_META &&
-		     reg_is_init_pkt_pointer(src_reg, PTR_TO_PACKET))) {
+		     reg_is_init_pkt_pointer(src_reg, PTR_TO_PACKET)))) {
 			/* pkt_data' < pkt_end, pkt_meta' < pkt_data */
 			find_good_pkt_pointers(other_branch, dst_reg,
 					       dst_reg->type, true);
 			mark_pkt_end(this_branch, insn->dst_reg, false);
-		} else if ((dst_reg->type == PTR_TO_PACKET_END &&
+		} else if (dst_reg->data_id == src_reg->data_id &&
+			   ((dst_reg->type == PTR_TO_PACKET_END &&
 			    src_reg->type == PTR_TO_PACKET) ||
 			   (reg_is_init_pkt_pointer(dst_reg, PTR_TO_PACKET) &&
-			    src_reg->type == PTR_TO_PACKET_META)) {
+			    src_reg->type == PTR_TO_PACKET_META))) {
 			/* pkt_end < pkt_data', pkt_data > pkt_meta' */
 			find_good_pkt_pointers(this_branch, src_reg,
 					       src_reg->type, false);
@@ -9963,18 +9975,20 @@ static bool try_match_pkt_pointers(const struct bpf_insn *insn,
 		}
 		break;
 	case BPF_JGE:
-		if ((dst_reg->type == PTR_TO_PACKET &&
+		if (dst_reg->data_id == src_reg->data_id &&
+		    ((dst_reg->type == PTR_TO_PACKET &&
 		     src_reg->type == PTR_TO_PACKET_END) ||
 		    (dst_reg->type == PTR_TO_PACKET_META &&
-		     reg_is_init_pkt_pointer(src_reg, PTR_TO_PACKET))) {
+		     reg_is_init_pkt_pointer(src_reg, PTR_TO_PACKET)))) {
 			/* pkt_data' >= pkt_end, pkt_meta' >= pkt_data */
 			find_good_pkt_pointers(this_branch, dst_reg,
 					       dst_reg->type, true);
 			mark_pkt_end(other_branch, insn->dst_reg, false);
-		} else if ((dst_reg->type == PTR_TO_PACKET_END &&
+		} else if (dst_reg->data_id == src_reg->data_id &&
+			   ((dst_reg->type == PTR_TO_PACKET_END &&
 			    src_reg->type == PTR_TO_PACKET) ||
 			   (reg_is_init_pkt_pointer(dst_reg, PTR_TO_PACKET) &&
-			    src_reg->type == PTR_TO_PACKET_META)) {
+			    src_reg->type == PTR_TO_PACKET_META))) {
 			/* pkt_end >= pkt_data', pkt_data >= pkt_meta' */
 			find_good_pkt_pointers(other_branch, src_reg,
 					       src_reg->type, false);
@@ -9984,18 +9998,20 @@ static bool try_match_pkt_pointers(const struct bpf_insn *insn,
 		}
 		break;
 	case BPF_JLE:
-		if ((dst_reg->type == PTR_TO_PACKET &&
-		     src_reg->type == PTR_TO_PACKET_END) ||
+		if (dst_reg->data_id == src_reg->data_id &&
+		    ((dst_reg->type == PTR_TO_PACKET &&
+		     src_reg->type == PTR_TO_PACKET_END && dst_reg->data_id == src_reg->data_id) ||
 		    (dst_reg->type == PTR_TO_PACKET_META &&
-		     reg_is_init_pkt_pointer(src_reg, PTR_TO_PACKET))) {
+		     reg_is_init_pkt_pointer(src_reg, PTR_TO_PACKET)))) {
 			/* pkt_data' <= pkt_end, pkt_meta' <= pkt_data */
 			find_good_pkt_pointers(other_branch, dst_reg,
 					       dst_reg->type, false);
 			mark_pkt_end(this_branch, insn->dst_reg, true);
-		} else if ((dst_reg->type == PTR_TO_PACKET_END &&
+		} else if (dst_reg->data_id == src_reg->data_id &&
+			   ((dst_reg->type == PTR_TO_PACKET_END &&
 			    src_reg->type == PTR_TO_PACKET) ||
 			   (reg_is_init_pkt_pointer(dst_reg, PTR_TO_PACKET) &&
-			    src_reg->type == PTR_TO_PACKET_META)) {
+			    src_reg->type == PTR_TO_PACKET_META))) {
 			/* pkt_end <= pkt_data', pkt_data <= pkt_meta' */
 			find_good_pkt_pointers(this_branch, src_reg,
 					       src_reg->type, true);
-- 
2.37.3.998.g577e59143f-goog


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 02/26] bpf: verifier: Allow single packet invalidation
  2022-09-26 23:17 [PATCH 00/26] FUSE BPF: A Stacked Filesystem Extension for FUSE Daniel Rosenberg
  2022-09-26 23:17 ` [PATCH 01/26] bpf: verifier: Allow for multiple packets Daniel Rosenberg
@ 2022-09-26 23:17 ` Daniel Rosenberg
  2022-09-26 23:17 ` [PATCH 03/26] fuse-bpf: Update uapi for fuse-bpf Daniel Rosenberg
                   ` (23 subsequent siblings)
  25 siblings, 0 replies; 38+ messages in thread
From: Daniel Rosenberg @ 2022-09-26 23:17 UTC (permalink / raw)
  To: Miklos Szeredi, Alexei Starovoitov, Daniel Borkmann,
	John Fastabend
  Cc: Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Daniel Rosenberg, Paul Lawrence, Alessio Balsini, David Anderson,
	Sandeep Patil, linux-fsdevel, bpf, kernel-team

Previously there could only be one packet. Helper functions that may
modify the packet could simply invalidate all packets. Now that we
support multiple packets, we should allow helpers to invalidate specific
packets.

This is leaving the default global invalidation in place in case that's
still useful. All existing packets use the default id of '0', and could
be transitioned to the specific packet code with no change in behavior.

This also adds ARG_PTR_TO_PACKET, to allow packets to be passed to
helper functions at all. This is required to inform the verifier which
packets should be invalidated. Currenly only one packet is allowed per
helper.

Signed-off-by: Daniel Rosenberg <drosen@google.com>
Signed-off-by: Paul Lawrence <paullawrence@google.com>
---
 include/linux/bpf.h      |  1 +
 include/linux/bpf_fuse.h | 11 ++++++
 kernel/bpf/core.c        |  5 +++
 kernel/bpf/verifier.c    | 83 +++++++++++++++++++++++++++++++++++++++-
 4 files changed, 98 insertions(+), 2 deletions(-)
 create mode 100644 include/linux/bpf_fuse.h

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 07086e375487..4e6bfcfd8fea 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -456,6 +456,7 @@ enum bpf_arg_type {
 	ARG_PTR_TO_TIMER,	/* pointer to bpf_timer */
 	ARG_PTR_TO_KPTR,	/* pointer to referenced kptr */
 	ARG_PTR_TO_DYNPTR,      /* pointer to bpf_dynptr. See bpf_type_flag for dynptr type */
+	ARG_PTR_TO_PACKET,	/* pointer to packet */
 	__BPF_ARG_TYPE_MAX,
 
 	/* Extended arg_types. */
diff --git a/include/linux/bpf_fuse.h b/include/linux/bpf_fuse.h
new file mode 100644
index 000000000000..18e2ec5bf453
--- /dev/null
+++ b/include/linux/bpf_fuse.h
@@ -0,0 +1,11 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright 2022 Google LLC.
+ */
+
+#ifndef _BPF_FUSE_H
+#define _BPF_FUSE_H
+
+bool bpf_helper_changes_one_pkt_data(void *func);
+
+#endif /* _BPF_FUSE_H */
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 3d9eb3ae334c..2ac3597ec932 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -2685,6 +2685,11 @@ bool __weak bpf_helper_changes_pkt_data(void *func)
 	return false;
 }
 
+bool __weak bpf_helper_changes_one_pkt_data(void *func)
+{
+	return false;
+}
+
 /* Return TRUE if the JIT backend wants verifier to enable sub-register usage
  * analysis code and wants explicit zero extension inserted by verifier.
  * Otherwise, return FALSE.
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index d28cb22d5ee5..2884650904fe 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -23,6 +23,7 @@
 #include <linux/error-injection.h>
 #include <linux/bpf_lsm.h>
 #include <linux/btf_ids.h>
+#include <linux/bpf_fuse.h>
 
 #include "disasm.h"
 
@@ -263,6 +264,7 @@ struct bpf_call_arg_meta {
 	u32 subprogno;
 	struct bpf_map_value_off_desc *kptr_off_desc;
 	u8 uninit_dynptr_regno;
+	u32 data_id;
 };
 
 struct btf *btf_vmlinux;
@@ -1396,6 +1398,12 @@ static bool reg_is_pkt_pointer_any(const struct bpf_reg_state *reg)
 	       reg->type == PTR_TO_PACKET_END;
 }
 
+static bool reg_is_specific_pkt_pointer_any(const struct bpf_reg_state *reg, u32 id)
+{
+	return (reg_is_pkt_pointer(reg) ||
+	       reg->type == PTR_TO_PACKET_END) && reg->data_id == id;
+}
+
 /* Unmodified PTR_TO_PACKET[_META,_END] register from ctx access. */
 static bool reg_is_init_pkt_pointer(const struct bpf_reg_state *reg,
 				    enum bpf_reg_type which)
@@ -5664,6 +5672,7 @@ static const struct bpf_reg_types stack_ptr_types = { .types = { PTR_TO_STACK }
 static const struct bpf_reg_types const_str_ptr_types = { .types = { PTR_TO_MAP_VALUE } };
 static const struct bpf_reg_types timer_types = { .types = { PTR_TO_MAP_VALUE } };
 static const struct bpf_reg_types kptr_types = { .types = { PTR_TO_MAP_VALUE } };
+static const struct bpf_reg_types packet_ptr_types = { .types = { PTR_TO_PACKET } };
 
 static const struct bpf_reg_types *compatible_reg_types[__BPF_ARG_TYPE_MAX] = {
 	[ARG_PTR_TO_MAP_KEY]		= &map_key_value_types,
@@ -5691,6 +5700,7 @@ static const struct bpf_reg_types *compatible_reg_types[__BPF_ARG_TYPE_MAX] = {
 	[ARG_PTR_TO_TIMER]		= &timer_types,
 	[ARG_PTR_TO_KPTR]		= &kptr_types,
 	[ARG_PTR_TO_DYNPTR]		= &stack_ptr_types,
+	[ARG_PTR_TO_PACKET]		= &packet_ptr_types,
 };
 
 static int check_reg_type(struct bpf_verifier_env *env, u32 regno,
@@ -5800,7 +5810,8 @@ int check_func_arg_reg_off(struct bpf_verifier_env *env,
 		/* Some of the argument types nevertheless require a
 		 * zero register offset.
 		 */
-		if (base_type(arg_type) != ARG_PTR_TO_ALLOC_MEM)
+		if (base_type(arg_type) != ARG_PTR_TO_ALLOC_MEM &&
+			base_type(arg_type) != ARG_PTR_TO_PACKET)
 			return 0;
 		break;
 	/* All the rest must be rejected, except PTR_TO_BTF_ID which allows
@@ -6135,6 +6146,9 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg,
 		if (process_kptr_func(env, regno, meta))
 			return -EACCES;
 		break;
+	case ARG_PTR_TO_PACKET:
+		meta->data_id = reg->data_id;
+		break;
 	}
 
 	return err;
@@ -6509,13 +6523,36 @@ static bool check_btf_id_ok(const struct bpf_func_proto *fn)
 	return true;
 }
 
+static bool check_packet_ok(const struct bpf_func_proto *fn)
+{
+	int count = 0;
+
+	if (fn->arg1_type == ARG_PTR_TO_PACKET)
+		count++;
+	if (fn->arg2_type == ARG_PTR_TO_PACKET)
+		count++;
+	if (fn->arg3_type == ARG_PTR_TO_PACKET)
+		count++;
+	if (fn->arg4_type == ARG_PTR_TO_PACKET)
+		count++;
+	if (fn->arg5_type == ARG_PTR_TO_PACKET)
+		count++;
+
+	/* We only support one arg being a packet at the moment,
+	 * which is sufficient for the helper functions we have right now.
+	 */
+	return count <= 1;
+}
+
+
 static int check_func_proto(const struct bpf_func_proto *fn, int func_id,
 			    struct bpf_call_arg_meta *meta)
 {
 	return check_raw_mode_ok(fn) &&
 	       check_arg_pair_ok(fn) &&
 	       check_btf_id_ok(fn) &&
-	       check_refcount_ok(fn, func_id) ? 0 : -EINVAL;
+	       check_refcount_ok(fn, func_id) &&
+	       check_packet_ok(fn) ? 0 : -EINVAL;
 }
 
 /* Packet data might have moved, any old PTR_TO_PACKET[_META,_END]
@@ -6539,6 +6576,25 @@ static void __clear_all_pkt_pointers(struct bpf_verifier_env *env,
 	}
 }
 
+static void __clear_specific_pkt_pointers(struct bpf_verifier_env *env,
+				     struct bpf_func_state *state,
+				     u32 data_id)
+{
+	struct bpf_reg_state *regs = state->regs, *reg;
+	int i;
+
+	for (i = 0; i < MAX_BPF_REG; i++)
+		if (reg_is_specific_pkt_pointer_any(&regs[i], data_id))
+			mark_reg_unknown(env, regs, i);
+
+	bpf_for_each_spilled_reg(i, state, reg) {
+		if (!reg)
+			continue;
+		if (reg_is_specific_pkt_pointer_any(reg, data_id))
+			__mark_reg_unknown(env, reg);
+	}
+}
+
 static void clear_all_pkt_pointers(struct bpf_verifier_env *env)
 {
 	struct bpf_verifier_state *vstate = env->cur_state;
@@ -6548,6 +6604,15 @@ static void clear_all_pkt_pointers(struct bpf_verifier_env *env)
 		__clear_all_pkt_pointers(env, vstate->frame[i]);
 }
 
+static void clear_specific_pkt_pointers(struct bpf_verifier_env *env, u32 data_id)
+{
+	struct bpf_verifier_state *vstate = env->cur_state;
+	int i;
+
+	for (i = 0; i <= vstate->curframe; i++)
+		__clear_specific_pkt_pointers(env, vstate->frame[i], data_id);
+}
+
 enum {
 	AT_PKT_END = -1,
 	BEYOND_PKT_END = -2,
@@ -7187,6 +7252,7 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
 	struct bpf_call_arg_meta meta;
 	int insn_idx = *insn_idx_p;
 	bool changes_data;
+	bool changes_specific_data;
 	int i, err, func_id;
 
 	/* find function prototype */
@@ -7224,6 +7290,17 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
 		return -EINVAL;
 	}
 
+	changes_specific_data = bpf_helper_changes_one_pkt_data(fn->func);
+	if (changes_data && fn->arg1_type != ARG_PTR_TO_PACKET &&
+			    fn->arg2_type != ARG_PTR_TO_PACKET &&
+			    fn->arg3_type != ARG_PTR_TO_PACKET &&
+			    fn->arg4_type != ARG_PTR_TO_PACKET &&
+			    fn->arg5_type != ARG_PTR_TO_PACKET) {
+		verbose(env, "kernel subsystem misconfigured func %s#%d: no packet arg\n",
+			func_id_name(func_id), func_id);
+		return -EINVAL;
+	}
+
 	memset(&meta, 0, sizeof(meta));
 	meta.pkt_access = fn->pkt_access;
 
@@ -7534,6 +7611,8 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
 
 	if (changes_data)
 		clear_all_pkt_pointers(env);
+	if (changes_specific_data)
+		clear_specific_pkt_pointers(env, meta.data_id);
 	return 0;
 }
 
-- 
2.37.3.998.g577e59143f-goog


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 03/26] fuse-bpf: Update uapi for fuse-bpf
  2022-09-26 23:17 [PATCH 00/26] FUSE BPF: A Stacked Filesystem Extension for FUSE Daniel Rosenberg
  2022-09-26 23:17 ` [PATCH 01/26] bpf: verifier: Allow for multiple packets Daniel Rosenberg
  2022-09-26 23:17 ` [PATCH 02/26] bpf: verifier: Allow single packet invalidation Daniel Rosenberg
@ 2022-09-26 23:17 ` Daniel Rosenberg
  2022-09-27 18:19   ` Miklos Szeredi
  2022-09-26 23:18 ` [PATCH 04/26] fuse-bpf: Add BPF supporting functions Daniel Rosenberg
                   ` (22 subsequent siblings)
  25 siblings, 1 reply; 38+ messages in thread
From: Daniel Rosenberg @ 2022-09-26 23:17 UTC (permalink / raw)
  To: Miklos Szeredi, Alexei Starovoitov, Daniel Borkmann,
	John Fastabend
  Cc: Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Daniel Rosenberg, Paul Lawrence, Alessio Balsini, David Anderson,
	Sandeep Patil, linux-fsdevel, bpf, kernel-team

This adds the bpf prog type for fuse-bpf, and the associated structures.

Signed-off-by: Daniel Rosenberg <drosen@google.com>
Signed-off-by: Paul Lawrence <paullawrence@google.com>
---
 include/linux/bpf_fuse.h  | 50 +++++++++++++++++++++++++++++++++++++++
 include/linux/bpf_types.h |  4 ++++
 include/uapi/linux/bpf.h  | 31 ++++++++++++++++++++++++
 include/uapi/linux/fuse.h | 13 +++++++++-
 4 files changed, 97 insertions(+), 1 deletion(-)

diff --git a/include/linux/bpf_fuse.h b/include/linux/bpf_fuse.h
index 18e2ec5bf453..9d22205c9ae0 100644
--- a/include/linux/bpf_fuse.h
+++ b/include/linux/bpf_fuse.h
@@ -6,6 +6,56 @@
 #ifndef _BPF_FUSE_H
 #define _BPF_FUSE_H
 
+/*
+ * Fuse BPF Args
+ *
+ * Used to communicate with bpf programs to allow checking or altering certain values.
+ * The end_offset allows the bpf verifier to check boundaries statically. This reflects
+ * the ends of the buffer. size shows the length that was actually used.
+ *
+ * In order to write to the output args, you must use the pointer returned by
+ * bpf_fuse_get_writeable.
+ *
+ */
+
+#define FUSE_MAX_ARGS_IN 3
+#define FUSE_MAX_ARGS_OUT 2
+
+struct bpf_fuse_arg {
+	void *value;		// Start of the buffer
+	void *end_offset;	// End of the buffer
+	uint32_t size;		// Used size of the buffer
+	uint32_t max_size;	// Max permitted size, if buffer is resizable. Otherwise 0
+	uint32_t flags;		// Flags indicating buffer status
+};
+
+#define FUSE_BPF_FORCE (1 << 0)
+#define FUSE_BPF_OUT_ARGVAR (1 << 6)
+
+struct bpf_fuse_args {
+	uint64_t nodeid;
+	uint32_t opcode;
+	uint32_t error_in;
+	uint32_t in_numargs;
+	uint32_t out_numargs;
+	uint32_t flags;
+	struct bpf_fuse_arg in_args[FUSE_MAX_ARGS_IN];
+	struct bpf_fuse_arg out_args[FUSE_MAX_ARGS_OUT];
+};
+
+/* These flags are used internally to track information about the fuse buffers.
+ * Fuse sets some of the flags in init. The helper functions sets others, depending on what
+ * was requested by the bpf program.
+ */
+// Flags set by FUSE
+#define BPF_FUSE_IMMUTABLE	(1 << 0) // Buffer may not be written to
+#define BPF_FUSE_VARIABLE_SIZE	(1 << 1) // Buffer length may be changed (growth requires alloc)
+#define BPF_FUSE_MUST_ALLOCATE	(1 << 2) // Buffer must be re allocated before allowing writes
+
+// Flags set by helper function
+#define BPF_FUSE_MODIFIED	(1 << 3) // The helper function allowed writes to the buffer
+#define BPF_FUSE_ALLOCATED	(1 << 4) // The helper function allocated the buffer
+
 bool bpf_helper_changes_one_pkt_data(void *func);
 
 #endif /* _BPF_FUSE_H */
diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h
index 2b9112b80171..80c7f7d69794 100644
--- a/include/linux/bpf_types.h
+++ b/include/linux/bpf_types.h
@@ -79,6 +79,10 @@ BPF_PROG_TYPE(BPF_PROG_TYPE_LSM, lsm,
 #endif
 BPF_PROG_TYPE(BPF_PROG_TYPE_SYSCALL, bpf_syscall,
 	      void *, void *)
+#ifdef CONFIG_FUSE_BPF
+BPF_PROG_TYPE(BPF_PROG_TYPE_FUSE, fuse,
+	      struct __bpf_fuse_args, struct bpf_fuse_args)
+#endif
 
 BPF_MAP_TYPE(BPF_MAP_TYPE_ARRAY, array_map_ops)
 BPF_MAP_TYPE(BPF_MAP_TYPE_PERCPU_ARRAY, percpu_array_map_ops)
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 59a217ca2dfd..ac81763f002b 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -952,6 +952,7 @@ enum bpf_prog_type {
 	BPF_PROG_TYPE_LSM,
 	BPF_PROG_TYPE_SK_LOOKUP,
 	BPF_PROG_TYPE_SYSCALL, /* a program that can execute syscalls */
+	BPF_PROG_TYPE_FUSE,
 };
 
 enum bpf_attach_type {
@@ -6848,4 +6849,34 @@ struct bpf_core_relo {
 	enum bpf_core_relo_kind kind;
 };
 
+struct __bpf_fuse_arg {
+	__u64 value;
+	__u64 end_offset;
+	__u32 size;
+	__u32 max_size;
+};
+
+struct __bpf_fuse_args {
+	__u64 nodeid;
+	__u32 opcode;
+	__u32 error_in;
+	__u32 in_numargs;
+	__u32 out_numargs;
+	__u32 flags;
+	struct __bpf_fuse_arg in_args[3];
+	struct __bpf_fuse_arg out_args[2];
+};
+
+/* Return Codes for Fuse BPF programs */
+#define BPF_FUSE_CONTINUE		0
+#define BPF_FUSE_USER			1
+#define BPF_FUSE_USER_PREFILTER		2
+#define BPF_FUSE_POSTFILTER		3
+#define BPF_FUSE_USER_POSTFILTER	4
+
+/* Op Code Filter values for BPF Programs */
+#define FUSE_OPCODE_FILTER	0x0ffff
+#define FUSE_PREFILTER		0x10000
+#define FUSE_POSTFILTER		0x20000
+
 #endif /* _UAPI__LINUX_BPF_H__ */
diff --git a/include/uapi/linux/fuse.h b/include/uapi/linux/fuse.h
index d6ccee961891..8c80c146e69b 100644
--- a/include/uapi/linux/fuse.h
+++ b/include/uapi/linux/fuse.h
@@ -572,6 +572,17 @@ struct fuse_entry_out {
 	struct fuse_attr attr;
 };
 
+#define FUSE_ACTION_KEEP	0
+#define FUSE_ACTION_REMOVE	1
+#define FUSE_ACTION_REPLACE	2
+
+struct fuse_entry_bpf_out {
+	uint64_t	backing_action;
+	uint64_t	backing_fd;
+	uint64_t	bpf_action;
+	uint64_t	bpf_fd;
+};
+
 struct fuse_forget_in {
 	uint64_t	nlookup;
 };
@@ -870,7 +881,7 @@ struct fuse_in_header {
 	uint32_t	uid;
 	uint32_t	gid;
 	uint32_t	pid;
-	uint32_t	padding;
+	uint32_t	error_in;
 };
 
 struct fuse_out_header {
-- 
2.37.3.998.g577e59143f-goog


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 04/26] fuse-bpf: Add BPF supporting functions
  2022-09-26 23:17 [PATCH 00/26] FUSE BPF: A Stacked Filesystem Extension for FUSE Daniel Rosenberg
                   ` (2 preceding siblings ...)
  2022-09-26 23:17 ` [PATCH 03/26] fuse-bpf: Update uapi for fuse-bpf Daniel Rosenberg
@ 2022-09-26 23:18 ` Daniel Rosenberg
  2022-09-26 23:18 ` [PATCH 05/26] fs: Generic function to convert iocb to rw flags Daniel Rosenberg
                   ` (21 subsequent siblings)
  25 siblings, 0 replies; 38+ messages in thread
From: Daniel Rosenberg @ 2022-09-26 23:18 UTC (permalink / raw)
  To: Miklos Szeredi, Alexei Starovoitov, Daniel Borkmann,
	John Fastabend
  Cc: Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Daniel Rosenberg, Paul Lawrence, Alessio Balsini, David Anderson,
	Sandeep Patil, linux-fsdevel, bpf, kernel-team

This adds support for verifying fuse-bpf programs. These programs are
not permitted to make any changes to their contexts unless they request
the access via fuse_get_writeable_in/fuse_get_writeable_out. These
return a buffer, either to the preexisting buffer, or a newly allocated
one which will replace the preexisting buffer. The caller of the bpf
program is responsible for cleaning up these allocations, and is
notified via the flags set by the helper.

Signed-off-by: Daniel Rosenberg <drosen@google.com>
Signed-off-by: Paul Lawrence <paullawrence@google.com>
---
 include/linux/bpf.h      |   2 +
 include/linux/bpf_fuse.h |   1 +
 include/uapi/linux/bpf.h |   2 +
 kernel/bpf/Makefile      |   4 +
 kernel/bpf/bpf_fuse.c    | 342 +++++++++++++++++++++++++++++++++++++++
 kernel/bpf/btf.c         |   1 +
 kernel/bpf/verifier.c    |   1 +
 7 files changed, 353 insertions(+)
 create mode 100644 kernel/bpf/bpf_fuse.c

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 4e6bfcfd8fea..749e65c438dd 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -2412,6 +2412,8 @@ extern const struct bpf_func_proto bpf_loop_proto;
 extern const struct bpf_func_proto bpf_copy_from_user_task_proto;
 extern const struct bpf_func_proto bpf_set_retval_proto;
 extern const struct bpf_func_proto bpf_get_retval_proto;
+extern const struct bpf_func_proto bpf_fuse_get_writeable_in_proto;
+extern const struct bpf_func_proto bpf_fuse_get_writeable_out_proto;
 
 const struct bpf_func_proto *tracing_prog_func_proto(
   enum bpf_func_id func_id, const struct bpf_prog *prog);
diff --git a/include/linux/bpf_fuse.h b/include/linux/bpf_fuse.h
index 9d22205c9ae0..91b60d4e78b1 100644
--- a/include/linux/bpf_fuse.h
+++ b/include/linux/bpf_fuse.h
@@ -56,6 +56,7 @@ struct bpf_fuse_args {
 #define BPF_FUSE_MODIFIED	(1 << 3) // The helper function allowed writes to the buffer
 #define BPF_FUSE_ALLOCATED	(1 << 4) // The helper function allocated the buffer
 
+extern void *bpf_fuse_get_writeable(struct bpf_fuse_arg *arg, u64 size, bool copy);
 bool bpf_helper_changes_one_pkt_data(void *func);
 
 #endif /* _BPF_FUSE_H */
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index ac81763f002b..8218b9ea4313 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -5542,6 +5542,8 @@ union bpf_attr {
 	FN(tcp_raw_gen_syncookie_ipv6),	\
 	FN(tcp_raw_check_syncookie_ipv4),	\
 	FN(tcp_raw_check_syncookie_ipv6),	\
+	FN(fuse_get_writeable_in),	\
+	FN(fuse_get_writeable_out),	\
 	/* */
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile
index 057ba8e01e70..717212bb8282 100644
--- a/kernel/bpf/Makefile
+++ b/kernel/bpf/Makefile
@@ -40,3 +40,7 @@ obj-$(CONFIG_BPF_PRELOAD) += preload/
 obj-$(CONFIG_BPF_SYSCALL) += relo_core.o
 $(obj)/relo_core.o: $(srctree)/tools/lib/bpf/relo_core.c FORCE
 	$(call if_changed_rule,cc_o_c)
+
+ifeq ($(CONFIG_FUSE_BPF),y)
+obj-$(CONFIG_BPF_SYSCALL) += bpf_fuse.o
+endif
diff --git a/kernel/bpf/bpf_fuse.c b/kernel/bpf/bpf_fuse.c
new file mode 100644
index 000000000000..cc5c9b7fc361
--- /dev/null
+++ b/kernel/bpf/bpf_fuse.c
@@ -0,0 +1,342 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (c) 2021 Google LLC
+
+#include <linux/filter.h>
+#include <linux/bpf_fuse.h>
+
+static const struct bpf_func_proto *
+fuse_prog_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
+{
+	switch (func_id) {
+	case BPF_FUNC_trace_printk:
+			return bpf_get_trace_printk_proto();
+
+	case BPF_FUNC_get_current_uid_gid:
+			return &bpf_get_current_uid_gid_proto;
+
+	case BPF_FUNC_get_current_pid_tgid:
+			return &bpf_get_current_pid_tgid_proto;
+
+	case BPF_FUNC_map_lookup_elem:
+		return &bpf_map_lookup_elem_proto;
+
+	case BPF_FUNC_map_update_elem:
+		return &bpf_map_update_elem_proto;
+
+	case BPF_FUNC_fuse_get_writeable_in:
+		return &bpf_fuse_get_writeable_in_proto;
+
+	case BPF_FUNC_fuse_get_writeable_out:
+		return &bpf_fuse_get_writeable_out_proto;
+
+	default:
+		pr_debug("Invalid fuse bpf func %d\n", func_id);
+		return NULL;
+	}
+}
+
+static bool fuse_arg_valid_access(int off, int start, int size, struct bpf_insn_access_aux *info)
+{
+	int arg_off = (off - start) % sizeof(struct __bpf_fuse_arg);
+	int arg_start = off - arg_off;
+
+	switch (arg_off) {
+	case bpf_ctx_range(struct __bpf_fuse_arg, value):
+	case offsetof(struct __bpf_fuse_arg, end_offset):
+		if (size != sizeof(__u64))
+			return false;
+		break;
+
+	case offsetof(struct __bpf_fuse_arg, max_size):
+	case offsetof(struct __bpf_fuse_arg, size):
+		if (size != sizeof(__u32))
+			return false;
+		break;
+
+	}
+
+	switch (arg_off) {
+	case bpf_ctx_range(struct __bpf_fuse_arg, value):
+		info->reg_type = PTR_TO_PACKET;
+		info->data_id = arg_start;
+		return true;
+
+	case offsetof(struct __bpf_fuse_arg, end_offset):
+		info->reg_type = PTR_TO_PACKET_END;
+		info->data_id = arg_start;
+		return true;
+
+	case offsetof(struct __bpf_fuse_arg, max_size):
+	case offsetof(struct __bpf_fuse_arg, size):
+		info->reg_type = SCALAR_VALUE;
+		return true;
+	}
+	return false;
+}
+
+static bool fuse_prog_is_valid_access(int off, int size,
+				enum bpf_access_type type,
+				const struct bpf_prog *prog,
+				struct bpf_insn_access_aux *info)
+{
+	if (off < 0 || off > offsetofend(struct bpf_fuse_args, out_args))
+		return false;
+
+	/* No fields should be written directly. Writable buffers are requested via helper function
+	 * The size fields is set by helper. If bpfs have a need to adjust the size smaller, we may
+	 * revisit this...
+	 */
+	if (type == BPF_WRITE)
+		return false;
+
+	switch (off) {
+	case bpf_ctx_range(struct __bpf_fuse_args, nodeid):
+		info->reg_type = SCALAR_VALUE;
+		if (size == sizeof(__u64))
+			return true;
+		break;
+	case bpf_ctx_range(struct __bpf_fuse_args, opcode):
+	case bpf_ctx_range(struct __bpf_fuse_args, error_in):
+	case bpf_ctx_range(struct __bpf_fuse_args, in_numargs):
+	case bpf_ctx_range(struct __bpf_fuse_args, out_numargs):
+	case bpf_ctx_range(struct __bpf_fuse_args, flags):
+		info->reg_type = SCALAR_VALUE;
+		if (size == sizeof(__u32))
+			return true;
+		break;
+	case bpf_ctx_range_till(struct __bpf_fuse_args, in_args[0], in_args[2]):
+		if (fuse_arg_valid_access(off, offsetof(struct __bpf_fuse_args, in_args[0]),
+					  size, info))
+			return true;
+		break;
+	case bpf_ctx_range_till(struct __bpf_fuse_args, out_args[0], out_args[1]):
+		if (fuse_arg_valid_access(off, offsetof(struct __bpf_fuse_args, out_args[0]),
+					  size, info))
+			return true;
+		break;
+	}
+
+	return false;
+}
+
+static struct bpf_insn *fuse_arg_convert_access(int off, int start, int converted_start,
+						const struct bpf_insn *si, struct bpf_insn *insn)
+{
+	int arg_off = (off - start) % sizeof(struct __bpf_fuse_arg);
+	int arg_num = (off - start) / sizeof(struct __bpf_fuse_arg);
+	int arg_start = converted_start + arg_num * sizeof(struct bpf_fuse_arg);
+
+	switch (arg_off) {
+	case offsetof(struct __bpf_fuse_arg, value):
+		*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(struct bpf_fuse_arg, value),
+				      si->dst_reg, si->src_reg,
+				      arg_start + offsetof(struct bpf_fuse_arg, value));
+		break;
+
+	case offsetof(struct __bpf_fuse_arg, end_offset):
+		*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(struct bpf_fuse_arg, end_offset),
+				      si->dst_reg, si->src_reg,
+				      arg_start + offsetof(struct bpf_fuse_arg, end_offset));
+		break;
+
+	case offsetof(struct __bpf_fuse_arg, size):
+		*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(struct bpf_fuse_arg, size),
+				      si->dst_reg, si->src_reg,
+				      arg_start + offsetof(struct bpf_fuse_arg, size));
+		break;
+
+	case offsetof(struct __bpf_fuse_arg, max_size):
+		*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(struct bpf_fuse_arg, max_size),
+				      si->dst_reg, si->src_reg,
+				      arg_start + offsetof(struct bpf_fuse_arg, max_size));
+		break;
+	}
+	return insn;
+}
+
+static u32 fuse_prog_convert_ctx_access(enum bpf_access_type type,
+		     const struct bpf_insn *si,
+		     struct bpf_insn *insn_buf,
+		     struct bpf_prog *prog,
+		     u32 *target_size)
+{
+	struct bpf_insn *insn = insn_buf;
+
+	switch (si->off) {
+	case offsetof(struct __bpf_fuse_args, nodeid):
+		*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(struct bpf_fuse_args, nodeid),
+				      si->dst_reg, si->src_reg,
+				      offsetof(struct bpf_fuse_args, nodeid));
+		break;
+
+	case offsetof(struct __bpf_fuse_args, opcode):
+		*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(struct bpf_fuse_args, opcode),
+				      si->dst_reg, si->src_reg,
+				      offsetof(struct bpf_fuse_args, opcode));
+		break;
+
+	case offsetof(struct __bpf_fuse_args, error_in):
+		*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(struct bpf_fuse_args, error_in),
+				      si->dst_reg, si->src_reg,
+				      offsetof(struct bpf_fuse_args, error_in));
+		break;
+
+	case offsetof(struct __bpf_fuse_args, in_numargs):
+		*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(struct bpf_fuse_args, in_numargs),
+				      si->dst_reg, si->src_reg,
+				      offsetof(struct bpf_fuse_args, in_numargs));
+		break;
+
+	case offsetof(struct __bpf_fuse_args, out_numargs):
+		*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(struct bpf_fuse_args, out_numargs),
+				      si->dst_reg, si->src_reg,
+				      offsetof(struct bpf_fuse_args, out_numargs));
+		break;
+
+	case offsetof(struct __bpf_fuse_args, flags):
+		*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(struct bpf_fuse_args, flags),
+				      si->dst_reg, si->src_reg,
+				      offsetof(struct bpf_fuse_args, flags));
+		break;
+
+	case bpf_ctx_range_till(struct __bpf_fuse_args, in_args[0], in_args[2]):
+		insn = fuse_arg_convert_access(si->off,
+					       offsetof(struct __bpf_fuse_args, in_args[0]),
+					       offsetof(struct bpf_fuse_args, in_args[0]),
+					       si, insn);
+		break;
+
+	case bpf_ctx_range_till(struct __bpf_fuse_args, out_args[0], out_args[1]):
+		insn = fuse_arg_convert_access(si->off,
+					       offsetof(struct __bpf_fuse_args, out_args[0]),
+					       offsetof(struct bpf_fuse_args, out_args[0]),
+					       si, insn);
+		break;
+
+	}
+
+	return insn - insn_buf;
+}
+
+static int fuse_prog_get_prologue(struct bpf_insn *insn_buf,
+				   bool direct_write,
+				   const struct bpf_prog *prog)
+{
+	return 0;
+}
+
+static int buff_size(struct bpf_fuse_arg *arg)
+{
+	return ((char *)arg->end_offset - (char *)arg->value);
+}
+
+void *bpf_fuse_get_writeable(struct bpf_fuse_arg *arg, u64 size, bool copy)
+{
+	void *writeable_val;
+
+	if (arg->flags & BPF_FUSE_IMMUTABLE)
+		return 0;
+
+	if (size <= buff_size(arg) &&
+			(!(arg->flags & BPF_FUSE_MUST_ALLOCATE) ||
+			  (arg->flags & BPF_FUSE_ALLOCATED))) {
+		if (arg->flags & BPF_FUSE_VARIABLE_SIZE)
+			arg->size = size;
+		arg->flags |= BPF_FUSE_MODIFIED;
+		return arg->value;
+	}
+	/* Variable sized arrays must stay below max size. If the buffer must be fixed size,
+	 * don't change the allocated size. Verifier will enforce requested size for accesses
+	 */
+	if (arg->flags & BPF_FUSE_VARIABLE_SIZE) {
+		if (size > arg->max_size)
+			return 0;
+	} else {
+		if (size > arg->size)
+			return 0;
+		size = arg->size;
+	}
+
+	if (size != arg->size && size > arg->max_size)
+		return 0;
+	writeable_val = kzalloc(size, GFP_KERNEL);
+	if (!writeable_val)
+		return 0;
+
+	/* If we're copying the buffer, assume the same amount is used. If that isn't the case,
+	 * caller must change size. Otherwise, assume entirety of new buffer is used.
+	 */
+	if (copy)
+		memcpy(writeable_val, arg->value, (arg->size > size) ? size : arg->size);
+	else
+		arg->size = size;
+
+	if (arg->flags & BPF_FUSE_ALLOCATED)
+		kfree(arg->value);
+	arg->value = writeable_val;
+	arg->end_offset = (char *)writeable_val + size;
+
+	arg->flags |= BPF_FUSE_ALLOCATED | BPF_FUSE_MODIFIED;
+
+	return arg->value;
+}
+EXPORT_SYMBOL(bpf_fuse_get_writeable);
+
+BPF_CALL_5(bpf_fuse_get_writeable_in, struct bpf_fuse_args *, ctx, u32, index, void *, value,
+		u64, size, bool, copy)
+{
+	if (ctx->in_args[index].value != value)
+		return 0;
+	return (unsigned long) bpf_fuse_get_writeable(&ctx->in_args[index], size, copy);
+}
+
+BPF_CALL_5(bpf_fuse_get_writeable_out, struct bpf_fuse_args *, ctx, u32, index, void *, value,
+		u64, size, bool, copy)
+{
+	if (ctx->out_args[index].value != value)
+		return 0;
+	return (unsigned long) bpf_fuse_get_writeable(&ctx->out_args[index], size, copy);
+}
+
+bool bpf_helper_changes_one_pkt_data(void *func)
+{
+	if (func == bpf_fuse_get_writeable_in || func == bpf_fuse_get_writeable_out)
+		return true;
+	return false;
+}
+
+const struct bpf_func_proto bpf_fuse_get_writeable_in_proto = {
+	.func		= bpf_fuse_get_writeable_in,
+	.ret_type	= RET_PTR_TO_ALLOC_MEM_OR_NULL,
+	.arg1_type	= ARG_PTR_TO_CTX,
+	.arg2_type	= ARG_ANYTHING,
+	.arg3_type	= ARG_PTR_TO_PACKET,
+	.arg4_type	= ARG_CONST_ALLOC_SIZE_OR_ZERO,
+	.arg5_type	= ARG_ANYTHING,
+	.gpl_only	= false,
+	.pkt_access	= true,
+};
+
+const struct bpf_func_proto bpf_fuse_get_writeable_out_proto = {
+	.func		= bpf_fuse_get_writeable_out,
+	.ret_type	= RET_PTR_TO_ALLOC_MEM_OR_NULL,
+	.arg1_type	= ARG_PTR_TO_CTX,
+	.arg2_type	= ARG_ANYTHING,
+	.arg3_type	= ARG_PTR_TO_PACKET,
+	.arg4_type	= ARG_CONST_ALLOC_SIZE_OR_ZERO,
+	.arg5_type	= ARG_ANYTHING,
+	.gpl_only	= false,
+	.pkt_access	= true,
+};
+
+
+const struct bpf_verifier_ops fuse_verifier_ops = {
+	.get_func_proto  = fuse_prog_func_proto,
+	.is_valid_access = fuse_prog_is_valid_access,
+	.convert_ctx_access = fuse_prog_convert_ctx_access,
+	.gen_prologue = fuse_prog_get_prologue,
+};
+
+const struct bpf_prog_ops fuse_prog_ops = {
+};
+
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index 7e64447659f3..97f4a0889f2b 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -24,6 +24,7 @@
 #include <linux/bsearch.h>
 #include <linux/kobject.h>
 #include <linux/sysfs.h>
+#include <linux/bpf_fuse.h>
 #include <net/sock.h>
 #include "../tools/lib/bpf/relo_core.h"
 
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 2884650904fe..e076677f63be 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -3874,6 +3874,7 @@ static bool may_access_direct_pkt_data(struct bpf_verifier_env *env,
 	case BPF_PROG_TYPE_SK_REUSEPORT:
 	case BPF_PROG_TYPE_FLOW_DISSECTOR:
 	case BPF_PROG_TYPE_CGROUP_SKB:
+	case BPF_PROG_TYPE_FUSE:
 		if (t == BPF_WRITE)
 			return false;
 		fallthrough;
-- 
2.37.3.998.g577e59143f-goog


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 05/26] fs: Generic function to convert iocb to rw flags
  2022-09-26 23:17 [PATCH 00/26] FUSE BPF: A Stacked Filesystem Extension for FUSE Daniel Rosenberg
                   ` (3 preceding siblings ...)
  2022-09-26 23:18 ` [PATCH 04/26] fuse-bpf: Add BPF supporting functions Daniel Rosenberg
@ 2022-09-26 23:18 ` Daniel Rosenberg
  2022-09-26 23:18 ` [PATCH 06/26] bpf: Export bpf_prog_fops Daniel Rosenberg
                   ` (20 subsequent siblings)
  25 siblings, 0 replies; 38+ messages in thread
From: Daniel Rosenberg @ 2022-09-26 23:18 UTC (permalink / raw)
  To: Miklos Szeredi, Alexei Starovoitov, Daniel Borkmann,
	John Fastabend
  Cc: Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Daniel Rosenberg, Paul Lawrence, Alessio Balsini, David Anderson,
	Sandeep Patil, linux-fsdevel, bpf, kernel-team, Alessio Balsini

From: Alessio Balsini <balsini@google.com>

OverlayFS implements its own function to translate iocb flags into rw
flags, so that they can be passed into another vfs call.
With commit ce71bfea207b4 ("fs: align IOCB_* flags with RWF_* flags")
Jens created a 1:1 matching between the iocb flags and rw flags,
simplifying the conversion.

Reduce the OverlayFS code by making the flag conversion function generic
and reusable.

Signed-off-by: Alessio Balsini <balsini@android.com>
---
 fs/overlayfs/file.c | 23 +++++------------------
 include/linux/fs.h  |  5 +++++
 2 files changed, 10 insertions(+), 18 deletions(-)

diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
index daff601b5c41..c9df01577052 100644
--- a/fs/overlayfs/file.c
+++ b/fs/overlayfs/file.c
@@ -15,6 +15,8 @@
 #include <linux/fs.h>
 #include "overlayfs.h"
 
+#define OVL_IOCB_MASK (IOCB_DSYNC | IOCB_HIPRI | IOCB_NOWAIT | IOCB_SYNC)
+
 struct ovl_aio_req {
 	struct kiocb iocb;
 	refcount_t ref;
@@ -240,22 +242,6 @@ static void ovl_file_accessed(struct file *file)
 	touch_atime(&file->f_path);
 }
 
-static rwf_t ovl_iocb_to_rwf(int ifl)
-{
-	rwf_t flags = 0;
-
-	if (ifl & IOCB_NOWAIT)
-		flags |= RWF_NOWAIT;
-	if (ifl & IOCB_HIPRI)
-		flags |= RWF_HIPRI;
-	if (ifl & IOCB_DSYNC)
-		flags |= RWF_DSYNC;
-	if (ifl & IOCB_SYNC)
-		flags |= RWF_SYNC;
-
-	return flags;
-}
-
 static inline void ovl_aio_put(struct ovl_aio_req *aio_req)
 {
 	if (refcount_dec_and_test(&aio_req->ref)) {
@@ -315,7 +301,8 @@ static ssize_t ovl_read_iter(struct kiocb *iocb, struct iov_iter *iter)
 	old_cred = ovl_override_creds(file_inode(file)->i_sb);
 	if (is_sync_kiocb(iocb)) {
 		ret = vfs_iter_read(real.file, iter, &iocb->ki_pos,
-				    ovl_iocb_to_rwf(iocb->ki_flags));
+				    iocb_to_rw_flags(iocb->ki_flags,
+						     OVL_IOCB_MASK));
 	} else {
 		struct ovl_aio_req *aio_req;
 
@@ -379,7 +366,7 @@ static ssize_t ovl_write_iter(struct kiocb *iocb, struct iov_iter *iter)
 	if (is_sync_kiocb(iocb)) {
 		file_start_write(real.file);
 		ret = vfs_iter_write(real.file, iter, &iocb->ki_pos,
-				     ovl_iocb_to_rwf(ifl));
+				     iocb_to_rw_flags(ifl, OVL_IOCB_MASK));
 		file_end_write(real.file);
 		/* Update size */
 		ovl_copyattr(inode);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 9eced4cc286e..c1d49675092e 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -3420,6 +3420,11 @@ static inline int kiocb_set_rw_flags(struct kiocb *ki, rwf_t flags)
 	return 0;
 }
 
+static inline rwf_t iocb_to_rw_flags(int ifl, int iocb_mask)
+{
+	return ifl & iocb_mask;
+}
+
 static inline ino_t parent_ino(struct dentry *dentry)
 {
 	ino_t res;
-- 
2.37.3.998.g577e59143f-goog


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 06/26] bpf: Export bpf_prog_fops
  2022-09-26 23:17 [PATCH 00/26] FUSE BPF: A Stacked Filesystem Extension for FUSE Daniel Rosenberg
                   ` (4 preceding siblings ...)
  2022-09-26 23:18 ` [PATCH 05/26] fs: Generic function to convert iocb to rw flags Daniel Rosenberg
@ 2022-09-26 23:18 ` Daniel Rosenberg
  2022-09-26 23:18 ` [PATCH 07/26] fuse-bpf: Prepare for fuse-bpf patch Daniel Rosenberg
                   ` (19 subsequent siblings)
  25 siblings, 0 replies; 38+ messages in thread
From: Daniel Rosenberg @ 2022-09-26 23:18 UTC (permalink / raw)
  To: Miklos Szeredi, Alexei Starovoitov, Daniel Borkmann,
	John Fastabend
  Cc: Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Daniel Rosenberg, Paul Lawrence, Alessio Balsini, David Anderson,
	Sandeep Patil, linux-fsdevel, bpf, kernel-team

Fuse-bpf requires access tp bpf_prog_f_ops to confirm the fd it was
given is in fact a bpf program.

Signed-off-by: Daniel Rosenberg <drosen@google.com>
Signed-off-by: Paul Lawrence <paullawrence@google.com>
---
 kernel/bpf/syscall.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 27760627370d..2000b6029e6a 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -2160,6 +2160,7 @@ const struct file_operations bpf_prog_fops = {
 	.read		= bpf_dummy_read,
 	.write		= bpf_dummy_write,
 };
+EXPORT_SYMBOL_GPL(bpf_prog_fops);
 
 int bpf_prog_new_fd(struct bpf_prog *prog)
 {
-- 
2.37.3.998.g577e59143f-goog


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 07/26] fuse-bpf: Prepare for fuse-bpf patch
  2022-09-26 23:17 [PATCH 00/26] FUSE BPF: A Stacked Filesystem Extension for FUSE Daniel Rosenberg
                   ` (5 preceding siblings ...)
  2022-09-26 23:18 ` [PATCH 06/26] bpf: Export bpf_prog_fops Daniel Rosenberg
@ 2022-09-26 23:18 ` Daniel Rosenberg
  2022-09-26 23:18 ` [PATCH 08/26] fuse: Add fuse-bpf, a stacked fs extension for FUSE Daniel Rosenberg
                   ` (18 subsequent siblings)
  25 siblings, 0 replies; 38+ messages in thread
From: Daniel Rosenberg @ 2022-09-26 23:18 UTC (permalink / raw)
  To: Miklos Szeredi, Alexei Starovoitov, Daniel Borkmann,
	John Fastabend
  Cc: Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Daniel Rosenberg, Paul Lawrence, Alessio Balsini, David Anderson,
	Sandeep Patil, linux-fsdevel, bpf, kernel-team

This moves some functions and structs around to make the following patch
easier to read.

Signed-off-by: Daniel Rosenberg <drosen@google.com>
Signed-off-by: Paul Lawrence <paullawrence@google.com>
---
 fs/fuse/dir.c    | 30 ------------------------------
 fs/fuse/fuse_i.h | 35 +++++++++++++++++++++++++++++++++++
 fs/fuse/inode.c  | 46 +++++++++++++++++++++++-----------------------
 3 files changed, 58 insertions(+), 53 deletions(-)

diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
index b585b04e815e..74e13af039f1 100644
--- a/fs/fuse/dir.c
+++ b/fs/fuse/dir.c
@@ -46,10 +46,6 @@ static inline u64 fuse_dentry_time(const struct dentry *entry)
 }
 
 #else
-union fuse_dentry {
-	u64 time;
-	struct rcu_head rcu;
-};
 
 static inline void __fuse_dentry_settime(struct dentry *dentry, u64 time)
 {
@@ -83,27 +79,6 @@ static void fuse_dentry_settime(struct dentry *dentry, u64 time)
 	__fuse_dentry_settime(dentry, time);
 }
 
-/*
- * FUSE caches dentries and attributes with separate timeout.  The
- * time in jiffies until the dentry/attributes are valid is stored in
- * dentry->d_fsdata and fuse_inode->i_time respectively.
- */
-
-/*
- * Calculate the time in jiffies until a dentry/attributes are valid
- */
-static u64 time_to_jiffies(u64 sec, u32 nsec)
-{
-	if (sec || nsec) {
-		struct timespec64 ts = {
-			sec,
-			min_t(u32, nsec, NSEC_PER_SEC - 1)
-		};
-
-		return get_jiffies_64() + timespec64_to_jiffies(&ts);
-	} else
-		return 0;
-}
 
 /*
  * Set dentry and possibly attribute timeouts from the lookup/mk*
@@ -115,11 +90,6 @@ void fuse_change_entry_timeout(struct dentry *entry, struct fuse_entry_out *o)
 		time_to_jiffies(o->entry_valid, o->entry_valid_nsec));
 }
 
-static u64 attr_timeout(struct fuse_attr_out *o)
-{
-	return time_to_jiffies(o->attr_valid, o->attr_valid_nsec);
-}
-
 u64 entry_attr_timeout(struct fuse_entry_out *o)
 {
 	return time_to_jiffies(o->attr_valid, o->attr_valid_nsec);
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index 488b460e046f..054b96c3e061 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -63,6 +63,14 @@ struct fuse_forget_link {
 	struct fuse_forget_link *next;
 };
 
+/** FUSE specific dentry data */
+#if BITS_PER_LONG < 64
+union fuse_dentry {
+	u64 time;
+	struct rcu_head rcu;
+};
+#endif
+
 /** FUSE inode */
 struct fuse_inode {
 	/** Inode data */
@@ -1316,4 +1324,31 @@ struct fuse_file *fuse_file_open(struct fuse_mount *fm, u64 nodeid,
 void fuse_file_release(struct inode *inode, struct fuse_file *ff,
 		       unsigned int open_flags, fl_owner_t id, bool isdir);
 
+/*
+ * FUSE caches dentries and attributes with separate timeout.  The
+ * time in jiffies until the dentry/attributes are valid is stored in
+ * dentry->d_fsdata and fuse_inode->i_time respectively.
+ */
+
+/*
+ * Calculate the time in jiffies until a dentry/attributes are valid
+ */
+static inline u64 time_to_jiffies(u64 sec, u32 nsec)
+{
+	if (sec || nsec) {
+		struct timespec64 ts = {
+			sec,
+			min_t(u32, nsec, NSEC_PER_SEC - 1)
+		};
+
+		return get_jiffies_64() + timespec64_to_jiffies(&ts);
+	} else
+		return 0;
+}
+
+static inline u64 attr_timeout(struct fuse_attr_out *o)
+{
+	return time_to_jiffies(o->attr_valid, o->attr_valid_nsec);
+}
+
 #endif /* _FS_FUSE_I_H */
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index 6b3beda16c1b..036a14e917e1 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -162,6 +162,28 @@ static ino_t fuse_squash_ino(u64 ino64)
 	return ino;
 }
 
+static void fuse_fill_attr_from_inode(struct fuse_attr *attr,
+				      const struct inode *inode)
+{
+	*attr = (struct fuse_attr){
+		.ino		= inode->i_ino,
+		.size		= inode->i_size,
+		.blocks		= inode->i_blocks,
+		.atime		= inode->i_atime.tv_sec,
+		.mtime		= inode->i_mtime.tv_sec,
+		.ctime		= inode->i_ctime.tv_sec,
+		.atimensec	= inode->i_atime.tv_nsec,
+		.mtimensec	= inode->i_mtime.tv_nsec,
+		.ctimensec	= inode->i_ctime.tv_nsec,
+		.mode		= inode->i_mode,
+		.nlink		= inode->i_nlink,
+		.uid		= inode->i_uid.val,
+		.gid		= inode->i_gid.val,
+		.rdev		= inode->i_rdev,
+		.blksize	= 1u << inode->i_blkbits,
+	};
+}
+
 void fuse_change_attributes_common(struct inode *inode, struct fuse_attr *attr,
 				   u64 attr_valid, u32 cache_mask)
 {
@@ -1386,28 +1408,6 @@ void fuse_dev_free(struct fuse_dev *fud)
 }
 EXPORT_SYMBOL_GPL(fuse_dev_free);
 
-static void fuse_fill_attr_from_inode(struct fuse_attr *attr,
-				      const struct fuse_inode *fi)
-{
-	*attr = (struct fuse_attr){
-		.ino		= fi->inode.i_ino,
-		.size		= fi->inode.i_size,
-		.blocks		= fi->inode.i_blocks,
-		.atime		= fi->inode.i_atime.tv_sec,
-		.mtime		= fi->inode.i_mtime.tv_sec,
-		.ctime		= fi->inode.i_ctime.tv_sec,
-		.atimensec	= fi->inode.i_atime.tv_nsec,
-		.mtimensec	= fi->inode.i_mtime.tv_nsec,
-		.ctimensec	= fi->inode.i_ctime.tv_nsec,
-		.mode		= fi->inode.i_mode,
-		.nlink		= fi->inode.i_nlink,
-		.uid		= fi->inode.i_uid.val,
-		.gid		= fi->inode.i_gid.val,
-		.rdev		= fi->inode.i_rdev,
-		.blksize	= 1u << fi->inode.i_blkbits,
-	};
-}
-
 static void fuse_sb_defaults(struct super_block *sb)
 {
 	sb->s_magic = FUSE_SUPER_MAGIC;
@@ -1451,7 +1451,7 @@ static int fuse_fill_super_submount(struct super_block *sb,
 	if (parent_sb->s_subtype && !sb->s_subtype)
 		return -ENOMEM;
 
-	fuse_fill_attr_from_inode(&root_attr, parent_fi);
+	fuse_fill_attr_from_inode(&root_attr, &parent_fi->inode);
 	root = fuse_iget(sb, parent_fi->nodeid, 0, &root_attr, 0, 0);
 	/*
 	 * This inode is just a duplicate, so it is not looked up and
-- 
2.37.3.998.g577e59143f-goog


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 08/26] fuse: Add fuse-bpf, a stacked fs extension for FUSE
  2022-09-26 23:17 [PATCH 00/26] FUSE BPF: A Stacked Filesystem Extension for FUSE Daniel Rosenberg
                   ` (6 preceding siblings ...)
  2022-09-26 23:18 ` [PATCH 07/26] fuse-bpf: Prepare for fuse-bpf patch Daniel Rosenberg
@ 2022-09-26 23:18 ` Daniel Rosenberg
  2022-09-26 23:18 ` [PATCH 09/26] fuse-bpf: Don't support export_operations Daniel Rosenberg
                   ` (17 subsequent siblings)
  25 siblings, 0 replies; 38+ messages in thread
From: Daniel Rosenberg @ 2022-09-26 23:18 UTC (permalink / raw)
  To: Miklos Szeredi, Alexei Starovoitov, Daniel Borkmann,
	John Fastabend
  Cc: Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Daniel Rosenberg, Paul Lawrence, Alessio Balsini, David Anderson,
	Sandeep Patil, linux-fsdevel, bpf, kernel-team

Fuse-bpf provides a short circuit path for Fuse implementations that act
as a stacked filesystem. For cases that are directly unchanged,
operations are passed directly to the backing filesystem. Small
adjustments can be handled by bpf prefilters or postfilters, with the
option to fall back to userspace as needed.

Fuse implementations may supply backing node information, as well as bpf
programs via an optional add on to the lookup structure.

This has been split over the next set of patches for readability.
Clusters of fuse ops have been split into their own patches, as well as
the actual bpf calls and userspace calls for filters.

Signed-off-by: Daniel Rosenberg <drosen@google.com>
Signed-off-by: Paul Lawrence <paullawrence@google.com>
Signed-off-by: Alessio Balsini <balsini@google.com>
---
 fs/fuse/Kconfig          |  10 ++
 fs/fuse/Makefile         |   1 +
 fs/fuse/backing.c        | 328 +++++++++++++++++++++++++++++++++++++++
 fs/fuse/dev.c            |  31 +++-
 fs/fuse/dir.c            | 218 +++++++++++++++++++++-----
 fs/fuse/file.c           |  25 ++-
 fs/fuse/fuse_i.h         | 185 +++++++++++++++++++++-
 fs/fuse/inode.c          | 178 ++++++++++++++++++---
 fs/fuse/ioctl.c          |   2 +-
 include/linux/bpf_fuse.h |   1 +
 10 files changed, 913 insertions(+), 66 deletions(-)
 create mode 100644 fs/fuse/backing.c

diff --git a/fs/fuse/Kconfig b/fs/fuse/Kconfig
index 038ed0b9aaa5..e4fba7e60d03 100644
--- a/fs/fuse/Kconfig
+++ b/fs/fuse/Kconfig
@@ -52,3 +52,13 @@ config FUSE_DAX
 
 	  If you want to allow mounting a Virtio Filesystem with the "dax"
 	  option, answer Y.
+
+config FUSE_BPF
+	bool "Adds BPF to fuse"
+	depends on FUSE_FS
+	depends on BPF
+	help
+	  Extends FUSE by adding BPF to prefilter calls and potentially pass to a
+	  backing file system
+
+	  If you want to use FUSE as a stacked filesystem with bpf, answer Y
\ No newline at end of file
diff --git a/fs/fuse/Makefile b/fs/fuse/Makefile
index 0c48b35c058d..a0853c439db2 100644
--- a/fs/fuse/Makefile
+++ b/fs/fuse/Makefile
@@ -9,5 +9,6 @@ obj-$(CONFIG_VIRTIO_FS) += virtiofs.o
 
 fuse-y := dev.o dir.o file.o inode.o control.o xattr.o acl.o readdir.o ioctl.o
 fuse-$(CONFIG_FUSE_DAX) += dax.o
+fuse-$(CONFIG_FUSE_BPF) += backing.o
 
 virtiofs-y := virtio_fs.o
diff --git a/fs/fuse/backing.c b/fs/fuse/backing.c
new file mode 100644
index 000000000000..51088701e7ad
--- /dev/null
+++ b/fs/fuse/backing.c
@@ -0,0 +1,328 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * FUSE-BPF: Filesystem in Userspace with BPF
+ * Copyright (c) 2021 Google LLC
+ */
+
+#include "fuse_i.h"
+
+#include <linux/fdtable.h>
+#include <linux/filter.h>
+#include <linux/fs_stack.h>
+#include <linux/namei.h>
+#include <linux/bpf_fuse.h>
+
+struct bpf_prog *fuse_get_bpf_prog(struct file *file)
+{
+	struct bpf_prog *bpf_prog = ERR_PTR(-EINVAL);
+
+	if (!file || IS_ERR(file))
+		return bpf_prog;
+
+	if (file->f_op != &bpf_prog_fops)
+		return bpf_prog;
+
+	bpf_prog = file->private_data;
+	if (bpf_prog->type == BPF_PROG_TYPE_FUSE)
+		bpf_prog_inc(bpf_prog);
+	else
+		bpf_prog = ERR_PTR(-EINVAL);
+
+	return bpf_prog;
+}
+
+void fuse_get_backing_path(struct file *file, struct path *path)
+{
+	path_get(&file->f_path);
+	*path = file->f_path;
+}
+
+int parse_fuse_entry_bpf(struct fuse_entry_bpf *feb)
+{
+	struct fuse_entry_bpf_out *febo = &feb->out;
+	struct bpf_prog *bpf;
+	struct file *file;
+	int err = 0;
+
+	if (febo->backing_action == FUSE_ACTION_REPLACE) {
+		file = fget(febo->backing_fd);
+		if (!file) {
+			err = -EBADF;
+			goto out_err;
+		}
+		fuse_get_backing_path(file, &feb->backing_path);
+		fput(file);
+	}
+	if (febo->bpf_action == FUSE_ACTION_REPLACE) {
+		file = fget(febo->bpf_fd);
+		if (!file) {
+			err = -EBADF;
+			goto out_put;
+		}
+		bpf = fuse_get_bpf_prog(file);
+		if (IS_ERR(bpf)) {
+			err = PTR_ERR(bpf);
+			goto out_fput;
+		}
+		feb->bpf = bpf;
+		fput(file);
+	}
+
+	return 0;
+out_fput:
+	fput(file);
+out_put:
+	path_put_init(&feb->backing_path);
+out_err:
+	return err;
+}
+
+/*******************************************************************************
+ * Directory operations after here                                             *
+ ******************************************************************************/
+
+int fuse_lookup_initialize_in(struct bpf_fuse_args *fa, struct fuse_lookup_io *fli,
+			      struct inode *dir, struct dentry *entry, unsigned int flags)
+{
+	*fa = (struct bpf_fuse_args) {
+		.nodeid = get_fuse_inode(dir)->nodeid,
+		.opcode = FUSE_LOOKUP,
+		.in_numargs = 1,
+		.in_args[0] = (struct bpf_fuse_arg) {
+			.size = entry->d_name.len + 1,
+			.max_size = NAME_MAX + 1,
+			.flags = BPF_FUSE_VARIABLE_SIZE | BPF_FUSE_MUST_ALLOCATE,
+			.value =  (void *) entry->d_name.name,
+		},
+	};
+
+	return 0;
+}
+
+int fuse_lookup_initialize_out(struct bpf_fuse_args *fa, struct fuse_lookup_io *fli,
+			       struct inode *dir, struct dentry *entry, unsigned int flags)
+{
+	fa->out_numargs = 2;
+	fa->flags = FUSE_BPF_OUT_ARGVAR | FUSE_BPF_IS_LOOKUP;
+	fa->out_args[0] = (struct bpf_fuse_arg) {
+		.size = sizeof(fli->feo),
+		.value = &fli->feo,
+	};
+	fa->out_args[1] = (struct bpf_fuse_arg) {
+		.size = sizeof(fli->feb.out),
+		.value = &fli->feb.out,
+	};
+
+	return 0;
+}
+
+int fuse_lookup_backing(struct bpf_fuse_args *fa, struct dentry **out, struct inode *dir,
+			struct dentry *entry, unsigned int flags)
+{
+	struct fuse_dentry *fuse_entry = get_fuse_dentry(entry);
+	struct fuse_dentry *dir_fuse_entry = get_fuse_dentry(entry->d_parent);
+	struct dentry *dir_backing_entry = dir_fuse_entry->backing_path.dentry;
+	struct inode *dir_backing_inode = dir_backing_entry->d_inode;
+	struct dentry *backing_entry;
+	const char *name;
+	int len;
+
+	/* TODO this will not handle lookups over mount points */
+	inode_lock_nested(dir_backing_inode, I_MUTEX_PARENT);
+	if (fa->in_args[0].flags & BPF_FUSE_MODIFIED) {
+		name = (char *)fa->in_args[0].value;
+		len = strnlen(name, fa->in_args[0].size);
+	} else {
+		name = entry->d_name.name;
+		len = entry->d_name.len;
+	}
+	backing_entry = lookup_one_len(name, dir_backing_entry, len);
+	inode_unlock(dir_backing_inode);
+
+	if (IS_ERR(backing_entry))
+		return PTR_ERR(backing_entry);
+
+	fuse_entry->backing_path = (struct path) {
+		.dentry = backing_entry,
+		.mnt = dir_fuse_entry->backing_path.mnt,
+	};
+
+	mntget(fuse_entry->backing_path.mnt);
+	return 0;
+}
+
+int fuse_handle_backing(struct fuse_entry_bpf *feb, struct path *backing_path)
+{
+	switch (feb->out.backing_action) {
+	case FUSE_ACTION_KEEP:
+		/* backing inode/path are added in fuse_lookup_backing */
+		break;
+
+	case FUSE_ACTION_REMOVE:
+		path_put_init(backing_path);
+		break;
+
+	case FUSE_ACTION_REPLACE: {
+		if (!feb->backing_path.dentry)
+			return -EINVAL;
+
+		path_put(backing_path);
+		*backing_path = feb->backing_path;
+		feb->backing_path.dentry = NULL;
+		feb->backing_path.mnt = NULL;
+
+		break;
+	}
+
+	default:
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+int fuse_handle_bpf_prog(struct fuse_entry_bpf *feb, struct inode *parent,
+			 struct bpf_prog **bpf)
+{
+	struct fuse_inode *pi;
+
+	// Parent isn't presented, but we want to keep
+	// Don't touch bpf program at all in this case
+	if (feb->out.bpf_action == FUSE_ACTION_KEEP && !parent)
+		goto out;
+
+	if (*bpf) {
+		bpf_prog_put(*bpf);
+		*bpf = NULL;
+	}
+
+	switch (feb->out.bpf_action) {
+	case FUSE_ACTION_KEEP:
+		pi = get_fuse_inode(parent);
+		*bpf = pi->bpf;
+		if (*bpf)
+			bpf_prog_inc(*bpf);
+		break;
+
+	case FUSE_ACTION_REMOVE:
+		break;
+
+	case FUSE_ACTION_REPLACE: {
+		struct bpf_prog *bpf_prog = feb->bpf;
+
+		if (IS_ERR(bpf_prog))
+			return PTR_ERR(bpf_prog);
+
+		*bpf = bpf_prog;
+		break;
+	}
+
+	default:
+		return -EINVAL;
+	}
+
+out:
+	return 0;
+}
+
+int fuse_lookup_finalize(struct bpf_fuse_args *fa, struct dentry **out,
+			 struct inode *dir, struct dentry *entry, unsigned int flags)
+{
+	struct fuse_dentry *fd;
+	struct dentry *backing_dentry;
+	struct inode *inode, *backing_inode;
+	struct inode *d_inode = entry->d_inode;
+	struct fuse_entry_out *feo = fa->out_args[0].value;
+	struct fuse_entry_bpf_out *febo = fa->out_args[1].value;
+	struct fuse_entry_bpf *feb = container_of(febo, struct fuse_entry_bpf, out);
+	int error = -1;
+	u64 target_nodeid = 0;
+
+	fd = get_fuse_dentry(entry);
+	if (!fd)
+		return -EIO;
+	error = fuse_handle_backing(feb, &fd->backing_path);
+	if (error)
+		return error;
+	backing_dentry = fd->backing_path.dentry;
+	if (!backing_dentry)
+		return -ENOENT;
+	backing_inode = backing_dentry->d_inode;
+	if (!backing_inode) {
+		*out = 0;
+		return 0;
+	}
+
+	if (d_inode)
+		target_nodeid = get_fuse_inode(d_inode)->nodeid;
+
+	inode = fuse_iget_backing(dir->i_sb, target_nodeid, backing_inode);
+
+	if (IS_ERR(inode))
+		return PTR_ERR(inode);
+
+	error = fuse_handle_bpf_prog(feb, dir, &get_fuse_inode(inode)->bpf);
+	if (error)
+		return error;
+
+	get_fuse_inode(inode)->nodeid = feo->nodeid;
+
+	*out = d_splice_alias(inode, entry);
+	return 0;
+}
+
+int fuse_revalidate_backing(struct dentry *entry, unsigned int flags)
+{
+	struct fuse_dentry *fuse_dentry = get_fuse_dentry(entry);
+	struct dentry *backing_entry = fuse_dentry->backing_path.dentry;
+
+	spin_lock(&backing_entry->d_lock);
+	if (d_unhashed(backing_entry)) {
+		spin_unlock(&backing_entry->d_lock);
+		return 0;
+	}
+	spin_unlock(&backing_entry->d_lock);
+
+	if (unlikely(backing_entry->d_flags & DCACHE_OP_REVALIDATE))
+		return backing_entry->d_op->d_revalidate(backing_entry, flags);
+	return 1;
+}
+
+int fuse_access_initialize_in(struct bpf_fuse_args *fa, struct fuse_access_in *fai,
+			      struct inode *inode, int mask)
+{
+	*fai = (struct fuse_access_in) {
+		.mask = mask,
+	};
+
+	*fa = (struct bpf_fuse_args) {
+		.opcode = FUSE_ACCESS,
+		.nodeid = get_node_id(inode),
+		.in_numargs = 1,
+		.in_args[0].size = sizeof(*fai),
+		.in_args[0].value = fai,
+	};
+
+	return 0;
+}
+
+int fuse_access_initialize_out(struct bpf_fuse_args *fa, struct fuse_access_in *fai,
+			       struct inode *inode, int mask)
+{
+	return 0;
+}
+
+int fuse_access_backing(struct bpf_fuse_args *fa, int *out, struct inode *inode, int mask)
+{
+	struct fuse_inode *fi = get_fuse_inode(inode);
+	const struct fuse_access_in *fai = fa->in_args[0].value;
+
+	*out = inode_permission(&init_user_ns, fi->backing_inode, fai->mask);
+	return 0;
+}
+
+int fuse_access_finalize(struct bpf_fuse_args *fa, int *out, struct inode *inode, int mask)
+{
+	return 0;
+}
+
diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 51897427a534..626dbbf92874 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -238,6 +238,11 @@ void fuse_queue_forget(struct fuse_conn *fc, struct fuse_forget_link *forget,
 {
 	struct fuse_iqueue *fiq = &fc->iq;
 
+	if (nodeid == 0) {
+		kfree(forget);
+		return;
+	}
+
 	forget->forget_one.nodeid = nodeid;
 	forget->forget_one.nlookup = nlookup;
 
@@ -475,6 +480,7 @@ static void fuse_args_to_req(struct fuse_req *req, struct fuse_args *args)
 {
 	req->in.h.opcode = args->opcode;
 	req->in.h.nodeid = args->nodeid;
+	req->in.h.error_in = args->error_in;
 	req->args = args;
 	if (args->end)
 		__set_bit(FR_ASYNC, &req->flags);
@@ -1005,10 +1011,27 @@ static int fuse_copy_one(struct fuse_copy_state *cs, void *val, unsigned size)
 	return 0;
 }
 
+/* Copy the fuse-bpf lookup args and verify them */
+static int fuse_copy_lookup(struct fuse_copy_state *cs, void *val, unsigned size)
+{
+	struct fuse_entry_bpf_out *febo = (struct fuse_entry_bpf_out *)val;
+	struct fuse_entry_bpf *feb = container_of(febo, struct fuse_entry_bpf, out);
+	int err;
+
+	if (size && size != sizeof(*febo))
+		return -EINVAL;
+	err = fuse_copy_one(cs, val, size);
+	if (err)
+		return err;
+	if (size)
+		err = parse_fuse_entry_bpf(feb);
+	return err;
+}
+
 /* Copy request arguments to/from userspace buffer */
 static int fuse_copy_args(struct fuse_copy_state *cs, unsigned numargs,
 			  unsigned argpages, struct fuse_arg *args,
-			  int zeroing)
+			  int zeroing, unsigned is_lookup)
 {
 	int err = 0;
 	unsigned i;
@@ -1017,6 +1040,8 @@ static int fuse_copy_args(struct fuse_copy_state *cs, unsigned numargs,
 		struct fuse_arg *arg = &args[i];
 		if (i == numargs - 1 && argpages)
 			err = fuse_copy_pages(cs, arg->size, zeroing);
+		else if (i == numargs - 1 && is_lookup)
+			err = fuse_copy_lookup(cs, arg->value, arg->size);
 		else
 			err = fuse_copy_one(cs, arg->value, arg->size);
 	}
@@ -1294,7 +1319,7 @@ static ssize_t fuse_dev_do_read(struct fuse_dev *fud, struct file *file,
 	err = fuse_copy_one(cs, &req->in.h, sizeof(req->in.h));
 	if (!err)
 		err = fuse_copy_args(cs, args->in_numargs, args->in_pages,
-				     (struct fuse_arg *) args->in_args, 0);
+				     (struct fuse_arg *) args->in_args, 0, 0);
 	fuse_copy_finish(cs);
 	spin_lock(&fpq->lock);
 	clear_bit(FR_LOCKED, &req->flags);
@@ -1833,7 +1858,7 @@ static int copy_out_args(struct fuse_copy_state *cs, struct fuse_args *args,
 		lastarg->size -= diffsize;
 	}
 	return fuse_copy_args(cs, args->out_numargs, args->out_pages,
-			      args->out_args, args->page_zeroing);
+			      args->out_args, args->page_zeroing, args->is_lookup);
 }
 
 /*
diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
index 74e13af039f1..daaf3576fab9 100644
--- a/fs/fuse/dir.c
+++ b/fs/fuse/dir.c
@@ -10,6 +10,7 @@
 
 #include <linux/pagemap.h>
 #include <linux/file.h>
+#include <linux/filter.h>
 #include <linux/fs_context.h>
 #include <linux/moduleparam.h>
 #include <linux/sched.h>
@@ -34,7 +35,7 @@ static void fuse_advise_use_readdirplus(struct inode *dir)
 	set_bit(FUSE_I_ADVISE_RDPLUS, &fi->state);
 }
 
-#if BITS_PER_LONG >= 64
+#if BITS_PER_LONG >= 64 && !defined(CONFIG_FUSE_BPF)
 static inline void __fuse_dentry_settime(struct dentry *entry, u64 time)
 {
 	entry->d_fsdata = (void *) time;
@@ -49,12 +50,12 @@ static inline u64 fuse_dentry_time(const struct dentry *entry)
 
 static inline void __fuse_dentry_settime(struct dentry *dentry, u64 time)
 {
-	((union fuse_dentry *) dentry->d_fsdata)->time = time;
+	((struct fuse_dentry *) dentry->d_fsdata)->time = time;
 }
 
 static inline u64 fuse_dentry_time(const struct dentry *entry)
 {
-	return ((union fuse_dentry *) entry->d_fsdata)->time;
+	return ((struct fuse_dentry *) entry->d_fsdata)->time;
 }
 #endif
 
@@ -79,6 +80,17 @@ static void fuse_dentry_settime(struct dentry *dentry, u64 time)
 	__fuse_dentry_settime(dentry, time);
 }
 
+void fuse_init_dentry_root(struct dentry *root, struct file *backing_dir)
+{
+#ifdef CONFIG_FUSE_BPF
+	struct fuse_dentry *fuse_dentry = root->d_fsdata;
+
+	if (backing_dir) {
+		fuse_dentry->backing_path = backing_dir->f_path;
+		path_get(&fuse_dentry->backing_path);
+	}
+#endif
+}
 
 /*
  * Set dentry and possibly attribute timeouts from the lookup/mk*
@@ -150,7 +162,8 @@ static void fuse_invalidate_entry(struct dentry *entry)
 
 static void fuse_lookup_init(struct fuse_conn *fc, struct fuse_args *args,
 			     u64 nodeid, const struct qstr *name,
-			     struct fuse_entry_out *outarg)
+			     struct fuse_entry_out *outarg,
+			     struct fuse_entry_bpf_out *bpf_outarg)
 {
 	memset(outarg, 0, sizeof(struct fuse_entry_out));
 	args->opcode = FUSE_LOOKUP;
@@ -158,11 +171,51 @@ static void fuse_lookup_init(struct fuse_conn *fc, struct fuse_args *args,
 	args->in_numargs = 1;
 	args->in_args[0].size = name->len + 1;
 	args->in_args[0].value = name->name;
-	args->out_numargs = 1;
+	args->out_argvar = true;
+	args->out_numargs = 2;
 	args->out_args[0].size = sizeof(struct fuse_entry_out);
 	args->out_args[0].value = outarg;
+	args->out_args[1].size = sizeof(struct fuse_entry_bpf_out);
+	args->out_args[1].value = bpf_outarg;
+	args->is_lookup = 1;
 }
 
+#ifdef CONFIG_FUSE_BPF
+static bool backing_data_changed(struct fuse_inode *fi, struct dentry *entry,
+				 struct fuse_entry_bpf *bpf_arg)
+{
+	struct path new_backing_path;
+	struct inode *new_backing_inode;
+	struct bpf_prog *bpf = NULL;
+	int err;
+	bool ret = true;
+
+	if (!entry)
+		return false;
+
+	get_fuse_backing_path(entry, &new_backing_path);
+
+	err = fuse_handle_backing(bpf_arg, &new_backing_path);
+	new_backing_inode = d_inode(new_backing_path.dentry);
+
+	if (err)
+		goto put_inode;
+
+	err = fuse_handle_bpf_prog(bpf_arg, entry->d_parent->d_inode, &bpf);
+	if (err)
+		goto put_bpf;
+
+	ret = (bpf != fi->bpf || fi->backing_inode != new_backing_inode ||
+			!path_equal(&get_fuse_dentry(entry)->backing_path, &new_backing_path));
+put_bpf:
+	if (bpf)
+		bpf_prog_put(bpf);
+put_inode:
+	path_put(&new_backing_path);
+	return ret;
+}
+#endif
+
 /*
  * Check whether the dentry is still valid
  *
@@ -183,9 +236,23 @@ static int fuse_dentry_revalidate(struct dentry *entry, unsigned int flags)
 	inode = d_inode_rcu(entry);
 	if (inode && fuse_is_bad(inode))
 		goto invalid;
-	else if (time_before64(fuse_dentry_time(entry), get_jiffies_64()) ||
+
+#ifdef CONFIG_FUSE_BPF
+	/* TODO: Do we need bpf support for revalidate?
+	 * If the lower filesystem says the entry is invalid, FUSE probably shouldn't
+	 * try to fix that without going through the normal lookup path...
+	 */
+	if (get_fuse_dentry(entry)->backing_path.dentry) {
+		ret = fuse_revalidate_backing(entry, flags);
+		if (ret <= 0) {
+			goto out;
+		}
+	}
+#endif
+	if (time_before64(fuse_dentry_time(entry), get_jiffies_64()) ||
 		 (flags & (LOOKUP_EXCL | LOOKUP_REVAL))) {
 		struct fuse_entry_out outarg;
+		struct fuse_entry_bpf bpf_arg;
 		FUSE_ARGS(args);
 		struct fuse_forget_link *forget;
 		u64 attr_version;
@@ -197,27 +264,44 @@ static int fuse_dentry_revalidate(struct dentry *entry, unsigned int flags)
 		ret = -ECHILD;
 		if (flags & LOOKUP_RCU)
 			goto out;
-
 		fm = get_fuse_mount(inode);
 
+		parent = dget_parent(entry);
+
+#ifdef CONFIG_FUSE_BPF
+		/* TODO: Once we're handling timeouts for backing inodes, do a
+		 * bpf based lookup_revalidate here.
+		 */
+		if (get_fuse_inode(parent->d_inode)->backing_inode) {
+			dput(parent);
+			ret = 1;
+			goto out;
+		}
+#endif
 		forget = fuse_alloc_forget();
 		ret = -ENOMEM;
-		if (!forget)
+		if (!forget) {
+			dput(parent);
 			goto out;
+		}
 
 		attr_version = fuse_get_attr_version(fm->fc);
 
-		parent = dget_parent(entry);
 		fuse_lookup_init(fm->fc, &args, get_node_id(d_inode(parent)),
-				 &entry->d_name, &outarg);
+				 &entry->d_name, &outarg, &bpf_arg.out);
 		ret = fuse_simple_request(fm, &args);
 		dput(parent);
+
 		/* Zero nodeid is same as -ENOENT */
 		if (!ret && !outarg.nodeid)
 			ret = -ENOENT;
-		if (!ret) {
+		if (!ret || ret == sizeof(bpf_arg.out)) {
 			fi = get_fuse_inode(inode);
 			if (outarg.nodeid != get_node_id(inode) ||
+#ifdef CONFIG_FUSE_BPF
+			    (ret == sizeof(bpf_arg.out) &&
+					    backing_data_changed(fi, entry, &bpf_arg)) ||
+#endif
 			    (bool) IS_AUTOMOUNT(inode) != (bool) (outarg.attr.flags & FUSE_ATTR_SUBMOUNT)) {
 				fuse_queue_forget(fm->fc, forget,
 						  outarg.nodeid, 1);
@@ -259,17 +343,20 @@ static int fuse_dentry_revalidate(struct dentry *entry, unsigned int flags)
 	goto out;
 }
 
-#if BITS_PER_LONG < 64
+#if BITS_PER_LONG < 64 || defined(CONFIG_FUSE_BPF)
 static int fuse_dentry_init(struct dentry *dentry)
 {
-	dentry->d_fsdata = kzalloc(sizeof(union fuse_dentry),
+	dentry->d_fsdata = kzalloc(sizeof(struct fuse_dentry),
 				   GFP_KERNEL_ACCOUNT | __GFP_RECLAIMABLE);
 
 	return dentry->d_fsdata ? 0 : -ENOMEM;
 }
 static void fuse_dentry_release(struct dentry *dentry)
 {
-	union fuse_dentry *fd = dentry->d_fsdata;
+	struct fuse_dentry *fd = dentry->d_fsdata;
+
+	if (fd && fd->backing_path.dentry)
+		path_put(&fd->backing_path);
 
 	kfree_rcu(fd, rcu);
 }
@@ -310,7 +397,7 @@ static struct vfsmount *fuse_dentry_automount(struct path *path)
 const struct dentry_operations fuse_dentry_operations = {
 	.d_revalidate	= fuse_dentry_revalidate,
 	.d_delete	= fuse_dentry_delete,
-#if BITS_PER_LONG < 64
+#if BITS_PER_LONG < 64 || defined(CONFIG_FUSE_BPF)
 	.d_init		= fuse_dentry_init,
 	.d_release	= fuse_dentry_release,
 #endif
@@ -318,7 +405,7 @@ const struct dentry_operations fuse_dentry_operations = {
 };
 
 const struct dentry_operations fuse_root_dentry_operations = {
-#if BITS_PER_LONG < 64
+#if BITS_PER_LONG < 64 || defined(CONFIG_FUSE_BPF)
 	.d_init		= fuse_dentry_init,
 	.d_release	= fuse_dentry_release,
 #endif
@@ -336,11 +423,13 @@ bool fuse_invalid_attr(struct fuse_attr *attr)
 		attr->size > LLONG_MAX;
 }
 
-int fuse_lookup_name(struct super_block *sb, u64 nodeid, const struct qstr *name,
-		     struct fuse_entry_out *outarg, struct inode **inode)
+int fuse_lookup_name(struct super_block *sb, u64 nodeid,
+		     const struct qstr *name, struct fuse_entry_out *outarg,
+		     struct dentry *entry, struct inode **inode)
 {
 	struct fuse_mount *fm = get_fuse_mount_super(sb);
 	FUSE_ARGS(args);
+	struct fuse_entry_bpf bpf_arg = { 0 };
 	struct fuse_forget_link *forget;
 	u64 attr_version;
 	int err;
@@ -358,23 +447,60 @@ int fuse_lookup_name(struct super_block *sb, u64 nodeid, const struct qstr *name
 
 	attr_version = fuse_get_attr_version(fm->fc);
 
-	fuse_lookup_init(fm->fc, &args, nodeid, name, outarg);
+	fuse_lookup_init(fm->fc, &args, nodeid, name, outarg, &bpf_arg.out);
 	err = fuse_simple_request(fm, &args);
-	/* Zero nodeid is same as -ENOENT, but with valid timeout */
-	if (err || !outarg->nodeid)
-		goto out_put_forget;
 
-	err = -EIO;
-	if (!outarg->nodeid)
-		goto out_put_forget;
-	if (fuse_invalid_attr(&outarg->attr))
-		goto out_put_forget;
-
-	*inode = fuse_iget(sb, outarg->nodeid, outarg->generation,
-			   &outarg->attr, entry_attr_timeout(outarg),
-			   attr_version);
+#ifdef CONFIG_FUSE_BPF
+	if (err == sizeof(bpf_arg.out)) {
+		/* TODO Make sure this handles invalid handles */
+		struct path *backing_path;
+		struct inode *backing_inode;
+
+		err = -ENOENT;
+		if (!entry)
+			goto out_queue_forget;
+
+		err = -EINVAL;
+		backing_path = &bpf_arg.backing_path;
+		if (!backing_path->dentry)
+			goto out_queue_forget;
+
+		err = fuse_handle_backing(&bpf_arg,
+				&get_fuse_dentry(entry)->backing_path);
+		if (err)
+			goto out_queue_forget;
+
+		backing_inode = d_inode(get_fuse_dentry(entry)->backing_path.dentry);
+		*inode = fuse_iget_backing(sb, outarg->nodeid, backing_inode);
+		if (!*inode)
+			goto out_queue_forget;
+
+		err = fuse_handle_bpf_prog(&bpf_arg, NULL, &get_fuse_inode(*inode)->bpf);
+		if (err)
+			goto out;
+	} else
+#endif
+	{
+		/* Zero nodeid is same as -ENOENT, but with valid timeout */
+		if (err || !outarg->nodeid)
+			goto out_put_forget;
+
+		err = -EIO;
+		if (!outarg->nodeid)
+			goto out_put_forget;
+		if (fuse_invalid_attr(&outarg->attr))
+			goto out_put_forget;
+
+		*inode = fuse_iget(sb, outarg->nodeid, outarg->generation,
+				   &outarg->attr, entry_attr_timeout(outarg),
+				   attr_version);
+	}
+
 	err = -ENOMEM;
-	if (!*inode) {
+#ifdef CONFIG_FUSE_BPF
+out_queue_forget:
+#endif
+	if (!*inode && outarg->nodeid) {
 		fuse_queue_forget(fm->fc, forget, outarg->nodeid, 1);
 		goto out;
 	}
@@ -396,12 +522,20 @@ static struct dentry *fuse_lookup(struct inode *dir, struct dentry *entry,
 	bool outarg_valid = true;
 	bool locked;
 
+#ifdef CONFIG_FUSE_BPF
+	if (fuse_bpf_backing(dir, struct fuse_lookup_io, newent,
+			       fuse_lookup_initialize_in, fuse_lookup_initialize_out,
+			       fuse_lookup_backing, fuse_lookup_finalize,
+			       dir, entry, flags))
+		return newent;
+#endif
+
 	if (fuse_is_bad(dir))
 		return ERR_PTR(-EIO);
 
 	locked = fuse_lock_inode(dir);
 	err = fuse_lookup_name(dir->i_sb, get_node_id(dir), &entry->d_name,
-			       &outarg, &inode);
+			       &outarg, entry, &inode);
 	fuse_unlock_inode(dir, locked);
 	if (err == -ENOENT) {
 		outarg_valid = false;
@@ -1230,6 +1364,13 @@ static int fuse_access(struct inode *inode, int mask)
 	struct fuse_access_in inarg;
 	int err;
 
+#ifdef CONFIG_FUSE_BPF
+	if (fuse_bpf_backing(inode, struct fuse_access_in, err,
+			       fuse_access_initialize_in, fuse_access_initialize_out,
+			       fuse_access_backing, fuse_access_finalize, inode, mask))
+		return err;
+#endif
+
 	BUG_ON(mask & MAY_NOT_BLOCK);
 
 	if (fm->fc->no_access)
@@ -1278,6 +1419,7 @@ static int fuse_permission(struct user_namespace *mnt_userns,
 	struct fuse_conn *fc = get_fuse_conn(inode);
 	bool refreshed = false;
 	int err = 0;
+	struct fuse_inode *fi = get_fuse_inode(inode);
 
 	if (fuse_is_bad(inode))
 		return -EIO;
@@ -1285,12 +1427,18 @@ static int fuse_permission(struct user_namespace *mnt_userns,
 	if (!fuse_allow_current_process(fc))
 		return -EACCES;
 
+#ifdef CONFIG_FUSE_BPF
+	if (fuse_bpf_backing(inode, struct fuse_access_in, err,
+			       fuse_access_initialize_in, fuse_access_initialize_out,
+			       fuse_access_backing, fuse_access_finalize, inode, mask))
+		return err;
+#endif
+
 	/*
 	 * If attributes are needed, refresh them before proceeding
 	 */
 	if (fc->default_permissions ||
 	    ((mask & MAY_EXEC) && S_ISREG(inode->i_mode))) {
-		struct fuse_inode *fi = get_fuse_inode(inode);
 		u32 perm_mask = STATX_MODE | STATX_UID | STATX_GID;
 
 		if (perm_mask & READ_ONCE(fi->inval_mask) ||
@@ -1467,7 +1615,7 @@ static long fuse_dir_compat_ioctl(struct file *file, unsigned int cmd,
 				 FUSE_IOCTL_COMPAT | FUSE_IOCTL_DIR);
 }
 
-static bool update_mtime(unsigned ivalid, bool trust_local_mtime)
+static inline bool update_mtime(unsigned int ivalid, bool trust_local_mtime)
 {
 	/* Always update if mtime is explicitly set  */
 	if (ivalid & ATTR_MTIME_SET)
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 1a3afd469e3a..4fa2ebc068f0 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -8,6 +8,7 @@
 
 #include "fuse_i.h"
 
+#include <linux/filter.h>
 #include <linux/pagemap.h>
 #include <linux/slab.h>
 #include <linux/kernel.h>
@@ -125,13 +126,18 @@ static void fuse_file_put(struct fuse_file *ff, bool sync, bool isdir)
 }
 
 struct fuse_file *fuse_file_open(struct fuse_mount *fm, u64 nodeid,
-				 unsigned int open_flags, bool isdir)
+				 unsigned int open_flags, bool isdir, struct file *file)
 {
 	struct fuse_conn *fc = fm->fc;
 	struct fuse_file *ff;
 	int opcode = isdir ? FUSE_OPENDIR : FUSE_OPEN;
 
-	ff = fuse_file_alloc(fm);
+	if (file && file->private_data) {
+		ff = file->private_data;
+		file->private_data = NULL;
+	} else {
+		ff = fuse_file_alloc(fm);
+	}
 	if (!ff)
 		return ERR_PTR(-ENOMEM);
 
@@ -169,7 +175,7 @@ struct fuse_file *fuse_file_open(struct fuse_mount *fm, u64 nodeid,
 int fuse_do_open(struct fuse_mount *fm, u64 nodeid, struct file *file,
 		 bool isdir)
 {
-	struct fuse_file *ff = fuse_file_open(fm, nodeid, file->f_flags, isdir);
+	struct fuse_file *ff = fuse_file_open(fm, nodeid, file->f_flags, isdir, file);
 
 	if (!IS_ERR(ff))
 		file->private_data = ff;
@@ -1873,6 +1879,19 @@ int fuse_write_inode(struct inode *inode, struct writeback_control *wbc)
 	 */
 	WARN_ON(wbc->for_reclaim);
 
+	/**
+	 * TODO - fully understand why this is necessary
+	 *
+	 * With fuse-bpf, fsstress fails if rename is enabled without this
+	 *
+	 * We are getting writes here on directory inodes, which do not have an
+	 * initialized file list so crash.
+	 *
+	 * The question is why we are getting those writes
+	 */
+	if (!S_ISREG(inode->i_mode))
+		return 0;
+
 	ff = __fuse_write_file_get(fi);
 	err = fuse_flush_times(inode, ff);
 	if (ff)
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index 054b96c3e061..30ddc298fb27 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -13,9 +13,12 @@
 # define pr_fmt(fmt) "fuse: " fmt
 #endif
 
+#include <linux/filter.h>
 #include <linux/fuse.h>
 #include <linux/fs.h>
 #include <linux/mount.h>
+#include <linux/pagemap.h>
+#include <linux/statfs.h>
 #include <linux/wait.h>
 #include <linux/list.h>
 #include <linux/spinlock.h>
@@ -31,6 +34,8 @@
 #include <linux/pid_namespace.h>
 #include <linux/refcount.h>
 #include <linux/user_namespace.h>
+#include <linux/bpf_fuse.h>
+#include <linux/magic.h>
 
 /** Default max number of pages that can be used in a single read request */
 #define FUSE_DEFAULT_MAX_PAGES_PER_REQ 32
@@ -64,11 +69,35 @@ struct fuse_forget_link {
 };
 
 /** FUSE specific dentry data */
-#if BITS_PER_LONG < 64
-union fuse_dentry {
-	u64 time;
-	struct rcu_head rcu;
+#if BITS_PER_LONG < 64 || defined(CONFIG_FUSE_BPF)
+struct fuse_dentry {
+	union {
+		u64 time;
+		struct rcu_head rcu;
+	};
+	struct path backing_path;
 };
+
+static inline struct fuse_dentry *get_fuse_dentry(const struct dentry *entry)
+{
+	return entry->d_fsdata;
+}
+#endif
+
+#ifdef CONFIG_FUSE_BPF
+static inline void get_fuse_backing_path(const struct dentry *d,
+					  struct path *path)
+{
+	struct fuse_dentry *di = get_fuse_dentry(d);
+
+	if (!di) {
+		*path = (struct path) { .mnt = 0, .dentry = 0 };
+		return;
+	}
+
+	*path = di->backing_path;
+	path_get(path);
+}
 #endif
 
 /** FUSE inode */
@@ -76,6 +105,20 @@ struct fuse_inode {
 	/** Inode data */
 	struct inode inode;
 
+#ifdef CONFIG_FUSE_BPF
+	/**
+	 * Backing inode, if this inode is from a backing file system.
+	 * If this is set, nodeid is 0.
+	 */
+	struct inode *backing_inode;
+
+	/**
+	 * bpf_prog, run on all operations to determine whether to pass through
+	 * or handle in place
+	 */
+	struct bpf_prog *bpf;
+#endif
+
 	/** Unique ID, which identifies the inode between userspace
 	 * and kernel */
 	u64 nodeid;
@@ -226,6 +269,14 @@ struct fuse_file {
 
 	} readdir;
 
+#ifdef CONFIG_FUSE_BPF
+	/**
+	 * TODO: Reconcile with passthrough file
+	 * backing file when in bpf mode
+	 */
+	struct file *backing_file;
+#endif
+
 	/** RB node to be linked on fuse_conn->polled_files */
 	struct rb_node polled_node;
 
@@ -257,6 +308,7 @@ struct fuse_page_desc {
 struct fuse_args {
 	uint64_t nodeid;
 	uint32_t opcode;
+	uint32_t error_in;
 	unsigned short in_numargs;
 	unsigned short out_numargs;
 	bool force:1;
@@ -269,6 +321,7 @@ struct fuse_args {
 	bool page_zeroing:1;
 	bool page_replace:1;
 	bool may_block:1;
+	bool is_lookup:1;
 	struct fuse_in_arg in_args[3];
 	struct fuse_arg out_args[2];
 	void (*end)(struct fuse_mount *fm, struct fuse_args *args, int error);
@@ -522,6 +575,8 @@ struct fuse_fs_context {
 	unsigned int max_read;
 	unsigned int blksize;
 	const char *subtype;
+	struct bpf_prog *root_bpf;
+	struct file *root_dir;
 
 	/* DAX device, may be NULL */
 	struct dax_device *dax_dev;
@@ -962,12 +1017,16 @@ extern const struct dentry_operations fuse_root_dentry_operations;
 /**
  * Get a filled in inode
  */
+struct inode *fuse_iget_backing(struct super_block *sb,
+				u64 nodeid,
+				struct inode *backing_inode);
 struct inode *fuse_iget(struct super_block *sb, u64 nodeid,
 			int generation, struct fuse_attr *attr,
 			u64 attr_valid, u64 attr_version);
 
 int fuse_lookup_name(struct super_block *sb, u64 nodeid, const struct qstr *name,
-		     struct fuse_entry_out *outarg, struct inode **inode);
+		     struct fuse_entry_out *outarg,
+		     struct dentry *entry, struct inode **inode);
 
 /**
  * Send FORGET command
@@ -1112,6 +1171,7 @@ void fuse_invalidate_entry_cache(struct dentry *entry);
 void fuse_invalidate_atime(struct inode *inode);
 
 u64 entry_attr_timeout(struct fuse_entry_out *o);
+void fuse_init_dentry_root(struct dentry *root, struct file *backing_dir);
 void fuse_change_entry_timeout(struct dentry *entry, struct fuse_entry_out *o);
 
 /**
@@ -1320,10 +1380,58 @@ int fuse_fileattr_set(struct user_namespace *mnt_userns,
 /* file.c */
 
 struct fuse_file *fuse_file_open(struct fuse_mount *fm, u64 nodeid,
-				 unsigned int open_flags, bool isdir);
+				 unsigned int open_flags, bool isdir,
+				 struct file *file);
 void fuse_file_release(struct inode *inode, struct fuse_file *ff,
 		       unsigned int open_flags, fl_owner_t id, bool isdir);
 
+/* backing.c */
+
+struct bpf_prog *fuse_get_bpf_prog(struct file *file);
+void fuse_get_backing_path(struct file *file, struct path *path);
+
+/*
+ * Dummy io passed to fuse_bpf_backing when io operation needs no scratch space
+ */
+struct fuse_dummy_io {
+	int unused;
+};
+
+struct fuse_entry_bpf {
+	struct fuse_entry_bpf_out out;
+
+	struct path backing_path;
+	struct bpf_prog *bpf;
+};
+
+int parse_fuse_entry_bpf(struct fuse_entry_bpf *feb);
+
+struct fuse_lookup_io {
+	struct fuse_entry_out feo;
+	struct fuse_entry_bpf feb;
+};
+
+int fuse_handle_backing(struct fuse_entry_bpf *feb, struct path *backing_path);
+int fuse_handle_bpf_prog(struct fuse_entry_bpf *feb, struct inode *parent,
+			 struct bpf_prog **bpf);
+
+int fuse_lookup_initialize_in(struct bpf_fuse_args *fa, struct fuse_lookup_io *feo,
+			      struct inode *dir, struct dentry *entry, unsigned int flags);
+int fuse_lookup_initialize_out(struct bpf_fuse_args *fa, struct fuse_lookup_io *feo,
+			       struct inode *dir, struct dentry *entry, unsigned int flags);
+int fuse_lookup_backing(struct bpf_fuse_args *fa, struct dentry **out, struct inode *dir,
+			struct dentry *entry, unsigned int flags);
+int fuse_lookup_finalize(struct bpf_fuse_args *fa, struct dentry **out,
+			 struct inode *dir, struct dentry *entry, unsigned int flags);
+int fuse_revalidate_backing(struct dentry *entry, unsigned int flags);
+
+int fuse_access_initialize_in(struct bpf_fuse_args *fa, struct fuse_access_in *fai,
+			      struct inode *inode, int mask);
+int fuse_access_initialize_out(struct bpf_fuse_args *fa, struct fuse_access_in *fai,
+			       struct inode *inode, int mask);
+int fuse_access_backing(struct bpf_fuse_args *fa, int *out, struct inode *inode, int mask);
+int fuse_access_finalize(struct bpf_fuse_args *fa, int *out, struct inode *inode, int mask);
+
 /*
  * FUSE caches dentries and attributes with separate timeout.  The
  * time in jiffies until the dentry/attributes are valid is stored in
@@ -1351,4 +1459,69 @@ static inline u64 attr_timeout(struct fuse_attr_out *o)
 	return time_to_jiffies(o->attr_valid, o->attr_valid_nsec);
 }
 
+#ifdef CONFIG_FUSE_BPF
+/*
+ * expression statement to wrap the backing filter logic
+ * struct inode *inode: inode with bpf and backing inode
+ * typedef io: (typically complex) type whose components fuse_args can point to.
+ *	An instance of this type is created locally and passed to initialize
+ * void initialize_in(struct bpf_fuse_args *fa, io *in_out, args...): function that sets
+ *	up fa and io based on args
+ * void initialize_out(struct bpf_fuse_args *fa, io *in_out, args...): function that sets
+ *	up fa and io based on args
+ * int backing(struct fuse_bpf_args_internal *fa, args...): function that actually performs
+ *	the backing io operation
+ * void *finalize(struct fuse_bpf_args *, args...): function that performs any final
+ *	work needed to commit the backing io
+ */
+#define fuse_bpf_backing(inode, io, out, initialize_in, initialize_out,	\
+			 backing, finalize, args...)			\
+({									\
+	struct fuse_inode *fuse_inode = get_fuse_inode(inode);		\
+	struct bpf_fuse_args fa = { 0 };				\
+	bool initialized = false;					\
+	bool handled = false;						\
+	ssize_t res;							\
+	io feo = { 0 };							\
+	int error = 0;							\
+									\
+	do {								\
+		if (!fuse_inode || !fuse_inode->backing_inode)		\
+			break;						\
+									\
+		handled = true;						\
+		error = initialize_in(&fa, &feo, args);			\
+		if (error)						\
+			break;						\
+									\
+		error = initialize_out(&fa, &feo, args);		\
+		if (error)						\
+			break;						\
+									\
+		initialized = true;					\
+									\
+		error = backing(&fa, &out, args);			\
+		if (error < 0)						\
+			fa.error_in = error;				\
+									\
+	} while (false);						\
+									\
+	if (initialized && handled) {					\
+		res = finalize(&fa, &out, args);			\
+		if (res)						\
+			error = res;					\
+	}								\
+									\
+	out = error ? _Generic((out),					\
+			default :					\
+				error,					\
+			struct dentry * :				\
+				ERR_PTR(error),				\
+			const char * :					\
+				ERR_PTR(error)				\
+			) : (out);					\
+	handled;							\
+})
+#endif /* CONFIG_FUSE_BPF */
+
 #endif /* _FS_FUSE_I_H */
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index 036a14e917e1..ca65199b38cb 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -78,6 +78,10 @@ static struct inode *fuse_alloc_inode(struct super_block *sb)
 
 	fi->i_time = 0;
 	fi->inval_mask = 0;
+#ifdef CONFIG_FUSE_BPF
+	fi->backing_inode = NULL;
+	fi->bpf = NULL;
+#endif
 	fi->nodeid = 0;
 	fi->nlookup = 0;
 	fi->attr_version = 0;
@@ -120,6 +124,13 @@ static void fuse_evict_inode(struct inode *inode)
 	/* Will write inode on close/munmap and in all other dirtiers */
 	WARN_ON(inode->i_state & I_DIRTY_INODE);
 
+#ifdef CONFIG_FUSE_BPF
+	iput(fi->backing_inode);
+	if (fi->bpf)
+		bpf_prog_put(fi->bpf);
+	fi->bpf = NULL;
+#endif
+
 	truncate_inode_pages_final(&inode->i_data);
 	clear_inode(inode);
 	if (inode->i_sb->s_flags & SB_ACTIVE) {
@@ -351,28 +362,105 @@ static void fuse_init_inode(struct inode *inode, struct fuse_attr *attr)
 	else if (S_ISCHR(inode->i_mode) || S_ISBLK(inode->i_mode) ||
 		 S_ISFIFO(inode->i_mode) || S_ISSOCK(inode->i_mode)) {
 		fuse_init_common(inode);
-		init_special_inode(inode, inode->i_mode,
-				   new_decode_dev(attr->rdev));
+		init_special_inode(inode, inode->i_mode, attr->rdev);
 	} else
 		BUG();
 }
 
+struct fuse_inode_identifier {
+	u64 nodeid;
+	struct inode *backing_inode;
+};
+
 static int fuse_inode_eq(struct inode *inode, void *_nodeidp)
 {
-	u64 nodeid = *(u64 *) _nodeidp;
-	if (get_node_id(inode) == nodeid)
-		return 1;
-	else
-		return 0;
+	struct fuse_inode_identifier *fii =
+		(struct fuse_inode_identifier *) _nodeidp;
+	struct fuse_inode *fi = get_fuse_inode(inode);
+
+	return fii->nodeid == fi->nodeid;
+}
+
+static int fuse_inode_backing_eq(struct inode *inode, void *_nodeidp)
+{
+	struct fuse_inode_identifier *fii =
+		(struct fuse_inode_identifier *) _nodeidp;
+	struct fuse_inode *fi = get_fuse_inode(inode);
+
+	return fii->nodeid == fi->nodeid
+#ifdef CONFIG_FUSE_BPF
+		&& fii->backing_inode == fi->backing_inode
+#endif
+		;
 }
 
 static int fuse_inode_set(struct inode *inode, void *_nodeidp)
 {
-	u64 nodeid = *(u64 *) _nodeidp;
-	get_fuse_inode(inode)->nodeid = nodeid;
+	struct fuse_inode_identifier *fii =
+		(struct fuse_inode_identifier *) _nodeidp;
+	struct fuse_inode *fi = get_fuse_inode(inode);
+
+	fi->nodeid = fii->nodeid;
+
+	return 0;
+}
+
+static int fuse_inode_backing_set(struct inode *inode, void *_nodeidp)
+{
+	struct fuse_inode_identifier *fii =
+		(struct fuse_inode_identifier *) _nodeidp;
+	struct fuse_inode *fi = get_fuse_inode(inode);
+
+	fi->nodeid = fii->nodeid;
+#ifdef CONFIG_FUSE_BPF
+	BUG_ON(fi->backing_inode != NULL);
+	fi->backing_inode = fii->backing_inode;
+	if (fi->backing_inode)
+		ihold(fi->backing_inode);
+#endif
+
 	return 0;
 }
 
+struct inode *fuse_iget_backing(struct super_block *sb, u64 nodeid,
+				struct inode *backing_inode)
+{
+	struct inode *inode;
+	struct fuse_inode *fi;
+	struct fuse_conn *fc = get_fuse_conn_super(sb);
+	struct fuse_inode_identifier fii = {
+		.nodeid = nodeid,
+		.backing_inode = backing_inode,
+	};
+	struct fuse_attr attr;
+	unsigned long hash = (unsigned long) backing_inode;
+
+	if (nodeid)
+		hash = nodeid;
+
+	fuse_fill_attr_from_inode(&attr, backing_inode);
+	inode = iget5_locked(sb, hash, fuse_inode_backing_eq,
+			     fuse_inode_backing_set, &fii);
+	if (!inode)
+		return NULL;
+
+	if ((inode->i_state & I_NEW)) {
+		inode->i_flags |= S_NOATIME;
+		if (!fc->writeback_cache)
+			inode->i_flags |= S_NOCMTIME;
+		fuse_init_common(inode);
+		unlock_new_inode(inode);
+	}
+
+	fi = get_fuse_inode(inode);
+	fuse_init_inode(inode, &attr);
+	spin_lock(&fi->lock);
+	fi->nlookup++;
+	spin_unlock(&fi->lock);
+
+	return inode;
+}
+
 struct inode *fuse_iget(struct super_block *sb, u64 nodeid,
 			int generation, struct fuse_attr *attr,
 			u64 attr_valid, u64 attr_version)
@@ -380,6 +468,9 @@ struct inode *fuse_iget(struct super_block *sb, u64 nodeid,
 	struct inode *inode;
 	struct fuse_inode *fi;
 	struct fuse_conn *fc = get_fuse_conn_super(sb);
+	struct fuse_inode_identifier fii = {
+		.nodeid = nodeid,
+	};
 
 	/*
 	 * Auto mount points get their node id from the submount root, which is
@@ -401,7 +492,7 @@ struct inode *fuse_iget(struct super_block *sb, u64 nodeid,
 	}
 
 retry:
-	inode = iget5_locked(sb, nodeid, fuse_inode_eq, fuse_inode_set, &nodeid);
+	inode = iget5_locked(sb, nodeid, fuse_inode_eq, fuse_inode_set, &fii);
 	if (!inode)
 		return NULL;
 
@@ -433,13 +524,16 @@ struct inode *fuse_ilookup(struct fuse_conn *fc, u64 nodeid,
 {
 	struct fuse_mount *fm_iter;
 	struct inode *inode;
+	struct fuse_inode_identifier fii = {
+		.nodeid = nodeid,
+	};
 
 	WARN_ON(!rwsem_is_locked(&fc->killsb));
 	list_for_each_entry(fm_iter, &fc->mounts, fc_entry) {
 		if (!fm_iter->sb)
 			continue;
 
-		inode = ilookup5(fm_iter->sb, nodeid, fuse_inode_eq, &nodeid);
+		inode = ilookup5(fm_iter->sb, nodeid, fuse_inode_eq, &fii);
 		if (inode) {
 			if (fm)
 				*fm = fm_iter;
@@ -669,6 +763,8 @@ enum {
 	OPT_ALLOW_OTHER,
 	OPT_MAX_READ,
 	OPT_BLKSIZE,
+	OPT_ROOT_BPF,
+	OPT_ROOT_DIR,
 	OPT_ERR
 };
 
@@ -683,6 +779,8 @@ static const struct fs_parameter_spec fuse_fs_parameters[] = {
 	fsparam_u32	("max_read",		OPT_MAX_READ),
 	fsparam_u32	("blksize",		OPT_BLKSIZE),
 	fsparam_string	("subtype",		OPT_SUBTYPE),
+	fsparam_u32	("root_bpf",		OPT_ROOT_BPF),
+	fsparam_u32	("root_dir",		OPT_ROOT_DIR),
 	{}
 };
 
@@ -766,6 +864,21 @@ static int fuse_parse_param(struct fs_context *fsc, struct fs_parameter *param)
 		ctx->blksize = result.uint_32;
 		break;
 
+	case OPT_ROOT_BPF:
+		ctx->root_bpf = bpf_prog_get_type_dev(result.uint_32,
+						BPF_PROG_TYPE_FUSE, false);
+		if (IS_ERR(ctx->root_bpf)) {
+			ctx->root_bpf = NULL;
+			return invalfc(fsc, "Unable to open bpf program");
+		}
+		break;
+
+	case OPT_ROOT_DIR:
+		ctx->root_dir = fget(result.uint_32);
+		if (!ctx->root_dir)
+			return invalfc(fsc, "Unable to open root directory");
+		break;
+
 	default:
 		return -EINVAL;
 	}
@@ -778,6 +891,10 @@ static void fuse_free_fsc(struct fs_context *fsc)
 	struct fuse_fs_context *ctx = fsc->fs_private;
 
 	if (ctx) {
+		if (ctx->root_dir)
+			fput(ctx->root_dir);
+		if (ctx->root_bpf)
+			bpf_prog_put(ctx->root_bpf);
 		kfree(ctx->subtype);
 		kfree(ctx);
 	}
@@ -905,15 +1022,34 @@ struct fuse_conn *fuse_conn_get(struct fuse_conn *fc)
 }
 EXPORT_SYMBOL_GPL(fuse_conn_get);
 
-static struct inode *fuse_get_root_inode(struct super_block *sb, unsigned mode)
+static struct inode *fuse_get_root_inode(struct super_block *sb,
+					 unsigned int mode,
+					 struct bpf_prog *root_bpf,
+					 struct file *backing_fd)
 {
 	struct fuse_attr attr;
-	memset(&attr, 0, sizeof(attr));
+	struct inode *inode;
 
+	memset(&attr, 0, sizeof(attr));
 	attr.mode = mode;
 	attr.ino = FUSE_ROOT_ID;
 	attr.nlink = 1;
-	return fuse_iget(sb, 1, 0, &attr, 0, 0);
+	inode = fuse_iget(sb, 1, 0, &attr, 0, 0);
+	if (!inode)
+		return NULL;
+
+#ifdef CONFIG_FUSE_BPF
+	get_fuse_inode(inode)->bpf = root_bpf;
+	if (root_bpf)
+		bpf_prog_inc(root_bpf);
+
+	if (backing_fd) {
+		get_fuse_inode(inode)->backing_inode = backing_fd->f_inode;
+		ihold(backing_fd->f_inode);
+	}
+#endif
+
+	return inode;
 }
 
 struct fuse_inode_handle {
@@ -928,11 +1064,14 @@ static struct dentry *fuse_get_dentry(struct super_block *sb,
 	struct inode *inode;
 	struct dentry *entry;
 	int err = -ESTALE;
+	struct fuse_inode_identifier fii = {
+		.nodeid = handle->nodeid,
+	};
 
 	if (handle->nodeid == 0)
 		goto out_err;
 
-	inode = ilookup5(sb, handle->nodeid, fuse_inode_eq, &handle->nodeid);
+	inode = ilookup5(sb, handle->nodeid, fuse_inode_eq, &fii);
 	if (!inode) {
 		struct fuse_entry_out outarg;
 		const struct qstr name = QSTR_INIT(".", 1);
@@ -941,7 +1080,7 @@ static struct dentry *fuse_get_dentry(struct super_block *sb,
 			goto out_err;
 
 		err = fuse_lookup_name(sb, handle->nodeid, &name, &outarg,
-				       &inode);
+				       NULL, &inode);
 		if (err && err != -ENOENT)
 			goto out_err;
 		if (err || !inode) {
@@ -1035,13 +1174,14 @@ static struct dentry *fuse_get_parent(struct dentry *child)
 	struct inode *inode;
 	struct dentry *parent;
 	struct fuse_entry_out outarg;
+	const struct qstr name = QSTR_INIT("..", 2);
 	int err;
 
 	if (!fc->export_support)
 		return ERR_PTR(-ESTALE);
 
 	err = fuse_lookup_name(child_inode->i_sb, get_node_id(child_inode),
-			       &dotdot_name, &outarg, &inode);
+			       &name, &outarg, NULL, &inode);
 	if (err) {
 		if (err == -ENOENT)
 			return ERR_PTR(-ESTALE);
@@ -1580,11 +1720,13 @@ int fuse_fill_super_common(struct super_block *sb, struct fuse_fs_context *ctx)
 	fc->no_force_umount = ctx->no_force_umount;
 
 	err = -ENOMEM;
-	root = fuse_get_root_inode(sb, ctx->rootmode);
+	root = fuse_get_root_inode(sb, ctx->rootmode, ctx->root_bpf,
+				   ctx->root_dir);
 	sb->s_d_op = &fuse_root_dentry_operations;
 	root_dentry = d_make_root(root);
 	if (!root_dentry)
 		goto err_dev_free;
+	fuse_init_dentry_root(root_dentry, ctx->root_dir);
 	/* Root dentry doesn't have .d_revalidate */
 	sb->s_d_op = &fuse_dentry_operations;
 
diff --git a/fs/fuse/ioctl.c b/fs/fuse/ioctl.c
index 61d8afcb10a3..8bc8d50917e2 100644
--- a/fs/fuse/ioctl.c
+++ b/fs/fuse/ioctl.c
@@ -422,7 +422,7 @@ static struct fuse_file *fuse_priv_ioctl_prepare(struct inode *inode)
 	if (!S_ISREG(inode->i_mode) && !isdir)
 		return ERR_PTR(-ENOTTY);
 
-	return fuse_file_open(fm, get_node_id(inode), O_RDONLY, isdir);
+	return fuse_file_open(fm, get_node_id(inode), O_RDONLY, isdir, NULL);
 }
 
 static void fuse_priv_ioctl_cleanup(struct inode *inode, struct fuse_file *ff)
diff --git a/include/linux/bpf_fuse.h b/include/linux/bpf_fuse.h
index 91b60d4e78b1..ef5c8fdaffee 100644
--- a/include/linux/bpf_fuse.h
+++ b/include/linux/bpf_fuse.h
@@ -31,6 +31,7 @@ struct bpf_fuse_arg {
 
 #define FUSE_BPF_FORCE (1 << 0)
 #define FUSE_BPF_OUT_ARGVAR (1 << 6)
+#define FUSE_BPF_IS_LOOKUP (1 << 11)
 
 struct bpf_fuse_args {
 	uint64_t nodeid;
-- 
2.37.3.998.g577e59143f-goog


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 09/26] fuse-bpf: Don't support export_operations
  2022-09-26 23:17 [PATCH 00/26] FUSE BPF: A Stacked Filesystem Extension for FUSE Daniel Rosenberg
                   ` (7 preceding siblings ...)
  2022-09-26 23:18 ` [PATCH 08/26] fuse: Add fuse-bpf, a stacked fs extension for FUSE Daniel Rosenberg
@ 2022-09-26 23:18 ` Daniel Rosenberg
  2022-09-26 23:18 ` [PATCH 10/26] fuse-bpf: Partially add mapping support Daniel Rosenberg
                   ` (16 subsequent siblings)
  25 siblings, 0 replies; 38+ messages in thread
From: Daniel Rosenberg @ 2022-09-26 23:18 UTC (permalink / raw)
  To: Miklos Szeredi, Alexei Starovoitov, Daniel Borkmann,
	John Fastabend
  Cc: Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Daniel Rosenberg, Paul Lawrence, Alessio Balsini, David Anderson,
	Sandeep Patil, linux-fsdevel, bpf, kernel-team

In the future, we may choose to support these, but it poses some
challenges. In order to create a disconnected dentry/inode, we'll need
to encode the mountpoint and bpf into the file_handle, which means we'd
need a stable representation of them. This also won't hold up to cases
where the bpf is not stateless. One possibility is registering bpf
programs and mounts in a specific order, so they can be assigned
consistent ids we can use in the file_handle. We can defer to the lower
filesystem for the lower inode's representation in the file_handle.

Signed-off-by: Daniel Rosenberg <drosen@google.com>
Signed-off-by: Paul Lawrence <paullawrence@google.com>
---
 fs/fuse/inode.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index ca65199b38cb..290eae750282 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -1122,6 +1122,14 @@ static int fuse_encode_fh(struct inode *inode, u32 *fh, int *max_len,
 	nodeid = get_fuse_inode(inode)->nodeid;
 	generation = inode->i_generation;
 
+#ifdef CONFIG_FUSE_BPF
+	/* TODO: Does it make sense to support this in some cases? */
+	if (!nodeid && get_fuse_inode(inode)->backing_inode) {
+		*max_len = 0;
+		return FILEID_INVALID;
+	}
+#endif
+
 	fh[0] = (u32)(nodeid >> 32);
 	fh[1] = (u32)(nodeid & 0xffffffff);
 	fh[2] = generation;
-- 
2.37.3.998.g577e59143f-goog


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 10/26] fuse-bpf: Partially add mapping support
  2022-09-26 23:17 [PATCH 00/26] FUSE BPF: A Stacked Filesystem Extension for FUSE Daniel Rosenberg
                   ` (8 preceding siblings ...)
  2022-09-26 23:18 ` [PATCH 09/26] fuse-bpf: Don't support export_operations Daniel Rosenberg
@ 2022-09-26 23:18 ` Daniel Rosenberg
  2022-09-26 23:18 ` [PATCH 11/26] fuse-bpf: Add lseek support Daniel Rosenberg
                   ` (15 subsequent siblings)
  25 siblings, 0 replies; 38+ messages in thread
From: Daniel Rosenberg @ 2022-09-26 23:18 UTC (permalink / raw)
  To: Miklos Szeredi, Alexei Starovoitov, Daniel Borkmann,
	John Fastabend
  Cc: Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Daniel Rosenberg, Paul Lawrence, Alessio Balsini, David Anderson,
	Sandeep Patil, linux-fsdevel, bpf, kernel-team

This adds a backing implementation for mapping, but no bpf counterpart
yet.

Signed-off-by: Daniel Rosenberg <drosen@google.com>
Signed-off-by: Paul Lawrence <paullawrence@google.com>
---
 fs/fuse/backing.c | 37 +++++++++++++++++++++++++++++++++++++
 fs/fuse/file.c    |  6 ++++++
 fs/fuse/fuse_i.h  |  3 +++
 3 files changed, 46 insertions(+)

diff --git a/fs/fuse/backing.c b/fs/fuse/backing.c
index 51088701e7ad..fa8805e24061 100644
--- a/fs/fuse/backing.c
+++ b/fs/fuse/backing.c
@@ -77,6 +77,43 @@ int parse_fuse_entry_bpf(struct fuse_entry_bpf *feb)
 	return err;
 }
 
+ssize_t fuse_backing_mmap(struct file *file, struct vm_area_struct *vma)
+{
+	int ret;
+	struct fuse_file *ff = file->private_data;
+	struct inode *fuse_inode = file_inode(file);
+	struct file *backing_file = ff->backing_file;
+	struct inode *backing_inode = file_inode(backing_file);
+
+	if (!backing_file->f_op->mmap)
+		return -ENODEV;
+
+	if (WARN_ON(file != vma->vm_file))
+		return -EIO;
+
+	vma->vm_file = get_file(backing_file);
+
+	ret = call_mmap(vma->vm_file, vma);
+
+	if (ret)
+		fput(backing_file);
+	else
+		fput(file);
+
+	if (file->f_flags & O_NOATIME)
+		return ret;
+
+	if ((!timespec64_equal(&fuse_inode->i_mtime, &backing_inode->i_mtime) ||
+	     !timespec64_equal(&fuse_inode->i_ctime,
+			       &backing_inode->i_ctime))) {
+		fuse_inode->i_mtime = backing_inode->i_mtime;
+		fuse_inode->i_ctime = backing_inode->i_ctime;
+	}
+	touch_atime(&file->f_path);
+
+	return ret;
+}
+
 /*******************************************************************************
  * Directory operations after here                                             *
  ******************************************************************************/
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 4fa2ebc068f0..138890eae07c 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -2452,6 +2452,12 @@ static int fuse_file_mmap(struct file *file, struct vm_area_struct *vma)
 	if (FUSE_IS_DAX(file_inode(file)))
 		return fuse_dax_mmap(file, vma);
 
+#ifdef CONFIG_FUSE_BPF
+	/* TODO - this is simply passthrough, not a proper BPF filter */
+	if (ff->backing_file)
+		return fuse_backing_mmap(file, vma);
+#endif
+
 	if (ff->open_flags & FOPEN_DIRECT_IO) {
 		/* Can't provide the coherency needed for MAP_SHARED */
 		if (vma->vm_flags & VM_MAYSHARE)
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index 30ddc298fb27..a9653f71c145 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -1404,8 +1404,11 @@ struct fuse_entry_bpf {
 	struct bpf_prog *bpf;
 };
 
+
 int parse_fuse_entry_bpf(struct fuse_entry_bpf *feb);
 
+ssize_t fuse_backing_mmap(struct file *file, struct vm_area_struct *vma);
+
 struct fuse_lookup_io {
 	struct fuse_entry_out feo;
 	struct fuse_entry_bpf feb;
-- 
2.37.3.998.g577e59143f-goog


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 11/26] fuse-bpf: Add lseek support
  2022-09-26 23:17 [PATCH 00/26] FUSE BPF: A Stacked Filesystem Extension for FUSE Daniel Rosenberg
                   ` (9 preceding siblings ...)
  2022-09-26 23:18 ` [PATCH 10/26] fuse-bpf: Partially add mapping support Daniel Rosenberg
@ 2022-09-26 23:18 ` Daniel Rosenberg
  2022-09-26 23:18 ` [PATCH 12/26] fuse-bpf: Add support for fallocate Daniel Rosenberg
                   ` (14 subsequent siblings)
  25 siblings, 0 replies; 38+ messages in thread
From: Daniel Rosenberg @ 2022-09-26 23:18 UTC (permalink / raw)
  To: Miklos Szeredi, Alexei Starovoitov, Daniel Borkmann,
	John Fastabend
  Cc: Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Daniel Rosenberg, Paul Lawrence, Alessio Balsini, David Anderson,
	Sandeep Patil, linux-fsdevel, bpf, kernel-team

Signed-off-by: Daniel Rosenberg <drosen@google.com>
Signed-off-by: Paul Lawrence <paullawrence@google.com>
---
 fs/fuse/backing.c | 74 +++++++++++++++++++++++++++++++++++++++++++++++
 fs/fuse/file.c    |  8 +++++
 fs/fuse/fuse_i.h  | 15 +++++++++-
 3 files changed, 96 insertions(+), 1 deletion(-)

diff --git a/fs/fuse/backing.c b/fs/fuse/backing.c
index fa8805e24061..97e92c633cfd 100644
--- a/fs/fuse/backing.c
+++ b/fs/fuse/backing.c
@@ -77,6 +77,80 @@ int parse_fuse_entry_bpf(struct fuse_entry_bpf *feb)
 	return err;
 }
 
+int fuse_lseek_initialize_in(struct bpf_fuse_args *fa, struct fuse_lseek_io *flio,
+			     struct file *file, loff_t offset, int whence)
+{
+	struct fuse_file *fuse_file = file->private_data;
+
+	flio->fli = (struct fuse_lseek_in) {
+		.fh = fuse_file->fh,
+		.offset = offset,
+		.whence = whence,
+	};
+
+	*fa = (struct bpf_fuse_args) {
+		.nodeid = get_node_id(file->f_inode),
+		.opcode = FUSE_LSEEK,
+		.in_numargs = 1,
+		.in_args[0].size = sizeof(flio->fli),
+		.in_args[0].value = &flio->fli,
+	};
+
+	return 0;
+}
+
+int fuse_lseek_initialize_out(struct bpf_fuse_args *fa, struct fuse_lseek_io *flio,
+			      struct file *file, loff_t offset, int whence)
+{
+	fa->out_numargs = 1;
+	fa->out_args[0].size = sizeof(flio->flo);
+	fa->out_args[0].value = &flio->flo;
+
+	return 0;
+}
+
+int fuse_lseek_backing(struct bpf_fuse_args *fa, loff_t *out,
+		       struct file *file, loff_t offset, int whence)
+{
+	const struct fuse_lseek_in *fli = fa->in_args[0].value;
+	struct fuse_lseek_out *flo = fa->out_args[0].value;
+	struct fuse_file *fuse_file = file->private_data;
+	struct file *backing_file = fuse_file->backing_file;
+
+	/* TODO: Handle changing of the file handle */
+	if (offset == 0) {
+		if (whence == SEEK_CUR) {
+			flo->offset = file->f_pos;
+			*out = flo->offset;
+			return 0;
+		}
+
+		if (whence == SEEK_SET) {
+			flo->offset = vfs_setpos(file, 0, 0);
+			*out = flo->offset;
+			return 0;
+		}
+	}
+
+	inode_lock(file->f_inode);
+	backing_file->f_pos = file->f_pos;
+	*out = vfs_llseek(backing_file, fli->offset, fli->whence);
+	flo->offset = *out;
+	inode_unlock(file->f_inode);
+	return 0;
+}
+
+int fuse_lseek_finalize(struct bpf_fuse_args *fa, loff_t *out,
+			struct file *file, loff_t offset, int whence)
+{
+	struct fuse_lseek_out *flo = fa->out_args[0].value;
+
+	if (!fa->error_in)
+		file->f_pos = flo->offset;
+	*out = flo->offset;
+	return 0;
+}
+
 ssize_t fuse_backing_mmap(struct file *file, struct vm_area_struct *vma)
 {
 	int ret;
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 138890eae07c..dd4485261cc7 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -2703,6 +2703,14 @@ static loff_t fuse_file_llseek(struct file *file, loff_t offset, int whence)
 {
 	loff_t retval;
 	struct inode *inode = file_inode(file);
+#ifdef CONFIG_FUSE_BPF
+	if (fuse_bpf_backing(inode, struct fuse_lseek_io, retval,
+			       fuse_lseek_initialize_in, fuse_lseek_initialize_out,
+			       fuse_lseek_backing,
+			       fuse_lseek_finalize,
+			       file, offset, whence))
+		return retval;
+#endif
 
 	switch (whence) {
 	case SEEK_SET:
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index a9653f71c145..fc3e8adf0422 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -1404,9 +1404,22 @@ struct fuse_entry_bpf {
 	struct bpf_prog *bpf;
 };
 
-
 int parse_fuse_entry_bpf(struct fuse_entry_bpf *feb);
 
+struct fuse_lseek_io {
+	struct fuse_lseek_in fli;
+	struct fuse_lseek_out flo;
+};
+
+int fuse_lseek_initialize_in(struct bpf_fuse_args *fa, struct fuse_lseek_io *fli,
+			     struct file *file, loff_t offset, int whence);
+int fuse_lseek_initialize_out(struct bpf_fuse_args *fa, struct fuse_lseek_io *fli,
+			      struct file *file, loff_t offset, int whence);
+int fuse_lseek_backing(struct bpf_fuse_args *fa, loff_t *out, struct file *file,
+		       loff_t offset, int whence);
+int fuse_lseek_finalize(struct bpf_fuse_args *fa, loff_t *out, struct file *file,
+			loff_t offset, int whence);
+
 ssize_t fuse_backing_mmap(struct file *file, struct vm_area_struct *vma);
 
 struct fuse_lookup_io {
-- 
2.37.3.998.g577e59143f-goog


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 12/26] fuse-bpf: Add support for fallocate
  2022-09-26 23:17 [PATCH 00/26] FUSE BPF: A Stacked Filesystem Extension for FUSE Daniel Rosenberg
                   ` (10 preceding siblings ...)
  2022-09-26 23:18 ` [PATCH 11/26] fuse-bpf: Add lseek support Daniel Rosenberg
@ 2022-09-26 23:18 ` Daniel Rosenberg
  2022-09-27 22:07   ` Dave Chinner
  2022-09-26 23:18 ` [PATCH 13/26] fuse-bpf: Support file/dir open/close Daniel Rosenberg
                   ` (13 subsequent siblings)
  25 siblings, 1 reply; 38+ messages in thread
From: Daniel Rosenberg @ 2022-09-26 23:18 UTC (permalink / raw)
  To: Miklos Szeredi, Alexei Starovoitov, Daniel Borkmann,
	John Fastabend
  Cc: Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Daniel Rosenberg, Paul Lawrence, Alessio Balsini, David Anderson,
	Sandeep Patil, linux-fsdevel, bpf, kernel-team

Signed-off-by: Daniel Rosenberg <drosen@google.com>
Signed-off-by: Paul Lawrence <paullawrence@google.com>
---
 fs/fuse/backing.c | 48 +++++++++++++++++++++++++++++++++++++++++++++++
 fs/fuse/file.c    | 10 ++++++++++
 fs/fuse/fuse_i.h  | 11 +++++++++++
 3 files changed, 69 insertions(+)

diff --git a/fs/fuse/backing.c b/fs/fuse/backing.c
index 97e92c633cfd..95c60d6d7597 100644
--- a/fs/fuse/backing.c
+++ b/fs/fuse/backing.c
@@ -188,6 +188,54 @@ ssize_t fuse_backing_mmap(struct file *file, struct vm_area_struct *vma)
 	return ret;
 }
 
+int fuse_file_fallocate_initialize_in(struct bpf_fuse_args *fa,
+				      struct fuse_fallocate_in *ffi,
+				      struct file *file, int mode, loff_t offset, loff_t length)
+{
+	struct fuse_file *ff = file->private_data;
+
+	*ffi = (struct fuse_fallocate_in) {
+		.fh = ff->fh,
+		.offset = offset,
+		.length = length,
+		.mode = mode,
+	};
+
+	*fa = (struct bpf_fuse_args) {
+		.opcode = FUSE_FALLOCATE,
+		.nodeid = ff->nodeid,
+		.in_numargs = 1,
+		.in_args[0].size = sizeof(*ffi),
+		.in_args[0].value = ffi,
+	};
+
+	return 0;
+}
+
+int fuse_file_fallocate_initialize_out(struct bpf_fuse_args *fa,
+				       struct fuse_fallocate_in *ffi,
+				       struct file *file, int mode, loff_t offset, loff_t length)
+{
+	return 0;
+}
+
+int fuse_file_fallocate_backing(struct bpf_fuse_args *fa, int *out,
+				struct file *file, int mode, loff_t offset, loff_t length)
+{
+	const struct fuse_fallocate_in *ffi = fa->in_args[0].value;
+	struct fuse_file *ff = file->private_data;
+
+	*out = vfs_fallocate(ff->backing_file, ffi->mode, ffi->offset,
+			     ffi->length);
+	return 0;
+}
+
+int fuse_file_fallocate_finalize(struct bpf_fuse_args *fa, int *out,
+				 struct file *file, int mode, loff_t offset, loff_t length)
+{
+	return 0;
+}
+
 /*******************************************************************************
  * Directory operations after here                                             *
  ******************************************************************************/
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index dd4485261cc7..ef6f6b0b3b59 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -3002,6 +3002,16 @@ static long fuse_file_fallocate(struct file *file, int mode, loff_t offset,
 
 	bool block_faults = FUSE_IS_DAX(inode) && lock_inode;
 
+#ifdef CONFIG_FUSE_BPF
+	if (fuse_bpf_backing(inode, struct fuse_fallocate_in, err,
+			       fuse_file_fallocate_initialize_in,
+			       fuse_file_fallocate_initialize_out,
+			       fuse_file_fallocate_backing,
+			       fuse_file_fallocate_finalize,
+			       file, mode, offset, length))
+		return err;
+#endif
+
 	if (mode & ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE |
 		     FALLOC_FL_ZERO_RANGE))
 		return -EOPNOTSUPP;
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index fc3e8adf0422..0e4996766c6c 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -1422,6 +1422,17 @@ int fuse_lseek_finalize(struct bpf_fuse_args *fa, loff_t *out, struct file *file
 
 ssize_t fuse_backing_mmap(struct file *file, struct vm_area_struct *vma);
 
+int fuse_file_fallocate_initialize_in(struct bpf_fuse_args *fa,
+				      struct fuse_fallocate_in *ffi,
+				      struct file *file, int mode, loff_t offset, loff_t length);
+int fuse_file_fallocate_initialize_out(struct bpf_fuse_args *fa,
+				       struct fuse_fallocate_in *ffi,
+				       struct file *file, int mode, loff_t offset, loff_t length);
+int fuse_file_fallocate_backing(struct bpf_fuse_args *fa, int *out,
+				struct file *file, int mode, loff_t offset, loff_t length);
+int fuse_file_fallocate_finalize(struct bpf_fuse_args *fa, int *out,
+				 struct file *file, int mode, loff_t offset, loff_t length);
+
 struct fuse_lookup_io {
 	struct fuse_entry_out feo;
 	struct fuse_entry_bpf feb;
-- 
2.37.3.998.g577e59143f-goog


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 13/26] fuse-bpf: Support file/dir open/close
  2022-09-26 23:17 [PATCH 00/26] FUSE BPF: A Stacked Filesystem Extension for FUSE Daniel Rosenberg
                   ` (11 preceding siblings ...)
  2022-09-26 23:18 ` [PATCH 12/26] fuse-bpf: Add support for fallocate Daniel Rosenberg
@ 2022-09-26 23:18 ` Daniel Rosenberg
  2022-09-26 23:18 ` [PATCH 14/26] fuse-bpf: Support mknod/unlink/mkdir/rmdir Daniel Rosenberg
                   ` (12 subsequent siblings)
  25 siblings, 0 replies; 38+ messages in thread
From: Daniel Rosenberg @ 2022-09-26 23:18 UTC (permalink / raw)
  To: Miklos Szeredi, Alexei Starovoitov, Daniel Borkmann,
	John Fastabend
  Cc: Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Daniel Rosenberg, Paul Lawrence, Alessio Balsini, David Anderson,
	Sandeep Patil, linux-fsdevel, bpf, kernel-team

Signed-off-by: Daniel Rosenberg <drosen@google.com>
Signed-off-by: Paul Lawrence <paullawrence@google.com>
---
 fs/fuse/backing.c | 318 ++++++++++++++++++++++++++++++++++++++++++++++
 fs/fuse/dir.c     |  21 +++
 fs/fuse/file.c    |  21 +++
 fs/fuse/fuse_i.h  |  48 +++++++
 4 files changed, 408 insertions(+)

diff --git a/fs/fuse/backing.c b/fs/fuse/backing.c
index 95c60d6d7597..1a2a89ddd535 100644
--- a/fs/fuse/backing.c
+++ b/fs/fuse/backing.c
@@ -77,6 +77,324 @@ int parse_fuse_entry_bpf(struct fuse_entry_bpf *feb)
 	return err;
 }
 
+int fuse_open_initialize_in(struct bpf_fuse_args *fa, struct fuse_open_io *foio,
+			    struct inode *inode, struct file *file, bool isdir)
+{
+	foio->foi = (struct fuse_open_in) {
+		.flags = file->f_flags & ~(O_CREAT | O_EXCL | O_NOCTTY),
+	};
+	*fa = (struct bpf_fuse_args) {
+		.nodeid = get_fuse_inode(inode)->nodeid,
+		.opcode = isdir ? FUSE_OPENDIR : FUSE_OPEN,
+		.in_numargs = 1,
+		.in_args[0] = (struct bpf_fuse_arg) {
+			.size = sizeof(foio->foi),
+			.value = &foio->foi,
+		},
+	};
+
+	return 0;
+}
+
+int fuse_open_initialize_out(struct bpf_fuse_args *fa, struct fuse_open_io *foio,
+			     struct inode *inode, struct file *file, bool isdir)
+{
+	foio->foo = (struct fuse_open_out) { 0 };
+
+	fa->out_numargs = 1;
+	fa->out_args[0] = (struct bpf_fuse_arg) {
+		.size = sizeof(foio->foo),
+		.value = &foio->foo,
+	};
+
+	return 0;
+}
+
+int fuse_open_backing(struct bpf_fuse_args *fa, int *out,
+		      struct inode *inode, struct file *file, bool isdir)
+{
+	struct fuse_mount *fm = get_fuse_mount(inode);
+	const struct fuse_open_in *foi = fa->in_args[0].value;
+	struct fuse_file *ff;
+	int mask;
+	struct fuse_dentry *fd = get_fuse_dentry(file->f_path.dentry);
+	struct file *backing_file;
+
+	ff = fuse_file_alloc(fm);
+	if (!ff)
+		return -ENOMEM;
+	file->private_data = ff;
+
+	switch (foi->flags & O_ACCMODE) {
+	case O_RDONLY:
+		mask = MAY_READ;
+		break;
+
+	case O_WRONLY:
+		mask = MAY_WRITE;
+		break;
+
+	case O_RDWR:
+		mask = MAY_READ | MAY_WRITE;
+		break;
+
+	default:
+		return -EINVAL;
+	}
+
+	*out = inode_permission(&init_user_ns,
+				  get_fuse_inode(inode)->backing_inode, mask);
+	if (*out)
+		return *out;
+
+	backing_file =
+		dentry_open(&fd->backing_path, foi->flags, current_cred());
+
+	if (IS_ERR(backing_file)) {
+		fuse_file_free(ff);
+		file->private_data = NULL;
+		return PTR_ERR(backing_file);
+	}
+	ff->backing_file = backing_file;
+
+	*out = 0;
+	return 0;
+}
+
+int fuse_open_finalize(struct bpf_fuse_args *fa, int *out,
+		       struct inode *inode, struct file *file, bool isdir)
+{
+	struct fuse_file *ff = file->private_data;
+	struct fuse_open_out *foo = fa->out_args[0].value;
+
+	if (ff)
+		ff->fh = foo->fh;
+	return 0;
+}
+
+int fuse_create_open_initialize_in(struct bpf_fuse_args *fa, struct fuse_create_open_io *fcoio,
+				   struct inode *dir, struct dentry *entry,
+				   struct file *file, unsigned int flags, umode_t mode)
+{
+	fcoio->fci = (struct fuse_create_in) {
+		.flags = file->f_flags & ~(O_CREAT | O_EXCL | O_NOCTTY),
+		.mode = mode,
+	};
+
+	*fa = (struct bpf_fuse_args) {
+		.nodeid = get_node_id(dir),
+		.opcode = FUSE_CREATE,
+		.in_numargs = 2,
+		.in_args[0] = (struct bpf_fuse_arg) {
+			.size = sizeof(fcoio->fci),
+			.value = &fcoio->fci,
+		},
+		.in_args[1] = (struct bpf_fuse_arg) {
+			.size = entry->d_name.len + 1,
+			.flags = BPF_FUSE_IMMUTABLE,
+			.value =  (void *) entry->d_name.name,
+		},
+	};
+
+	return 0;
+}
+
+int fuse_create_open_initialize_out(struct bpf_fuse_args *fa, struct fuse_create_open_io *fcoio,
+				    struct inode *dir, struct dentry *entry,
+				    struct file *file, unsigned int flags, umode_t mode)
+{
+	fcoio->feo = (struct fuse_entry_out) { 0 };
+	fcoio->foo = (struct fuse_open_out) { 0 };
+
+	fa->out_numargs = 2;
+	fa->out_args[0] = (struct bpf_fuse_arg) {
+		.size = sizeof(fcoio->feo),
+		.value = &fcoio->feo,
+	};
+	fa->out_args[1] = (struct bpf_fuse_arg) {
+		.size = sizeof(fcoio->foo),
+		.value = &fcoio->foo,
+	};
+
+	return 0;
+}
+
+static int fuse_open_file_backing(struct inode *inode, struct file *file)
+{
+	struct fuse_mount *fm = get_fuse_mount(inode);
+	struct dentry *entry = file->f_path.dentry;
+	struct fuse_dentry *fuse_dentry = get_fuse_dentry(entry);
+	struct fuse_file *fuse_file;
+	struct file *backing_file;
+
+	fuse_file = fuse_file_alloc(fm);
+	if (!fuse_file)
+		return -ENOMEM;
+	file->private_data = fuse_file;
+
+	backing_file = dentry_open(&fuse_dentry->backing_path, file->f_flags,
+				   current_cred());
+	if (IS_ERR(backing_file)) {
+		fuse_file_free(fuse_file);
+		file->private_data = NULL;
+		return PTR_ERR(backing_file);
+	}
+	fuse_file->backing_file = backing_file;
+
+	return 0;
+}
+
+int fuse_create_open_backing(struct bpf_fuse_args *fa, int *out,
+			     struct inode *dir, struct dentry *entry,
+			     struct file *file, unsigned int flags, umode_t mode)
+{
+	struct fuse_inode *dir_fuse_inode = get_fuse_inode(dir);
+	struct path backing_path;
+	struct inode *inode = NULL;
+	struct dentry *backing_parent;
+	struct dentry *newent;
+	const struct fuse_create_in *fci = fa->in_args[0].value;
+
+	get_fuse_backing_path(entry, &backing_path);
+	if (!backing_path.dentry)
+		return -EBADF;
+
+	if (!dir_fuse_inode)
+		return -EIO;
+
+	if (IS_ERR(backing_path.dentry))
+		return PTR_ERR(backing_path.dentry);
+
+	if (d_really_is_positive(backing_path.dentry)) {
+		*out = -EIO;
+		goto out;
+	}
+
+	backing_parent = dget_parent(backing_path.dentry);
+	inode_lock_nested(dir_fuse_inode->backing_inode, I_MUTEX_PARENT);
+	*out = vfs_create(&init_user_ns, d_inode(backing_parent),
+			backing_path.dentry, fci->mode, true);
+	inode_unlock(d_inode(backing_parent));
+	dput(backing_parent);
+	if (*out)
+		goto out;
+
+	inode = fuse_iget_backing(dir->i_sb, 0, backing_path.dentry->d_inode);
+	if (IS_ERR(inode)) {
+		*out = PTR_ERR(inode);
+		goto out;
+	}
+
+	if (get_fuse_inode(inode)->bpf)
+		bpf_prog_put(get_fuse_inode(inode)->bpf);
+	get_fuse_inode(inode)->bpf = dir_fuse_inode->bpf;
+	if (get_fuse_inode(inode)->bpf)
+		bpf_prog_inc(dir_fuse_inode->bpf);
+
+	newent = d_splice_alias(inode, entry);
+	if (IS_ERR(newent)) {
+		*out = PTR_ERR(newent);
+		goto out;
+	}
+
+	entry = newent ? newent : entry;
+	*out = finish_open(file, entry, fuse_open_file_backing);
+
+out:
+	path_put(&backing_path);
+	return *out;
+}
+
+int fuse_create_open_finalize(struct bpf_fuse_args *fa, int *out,
+				struct inode *dir, struct dentry *entry,
+				struct file *file, unsigned int flags, umode_t mode)
+{
+	struct fuse_file *ff = file->private_data;
+	struct fuse_inode *fi = get_fuse_inode(file->f_inode);
+	struct fuse_entry_out *feo = fa->out_args[0].value;
+	struct fuse_open_out *foo = fa->out_args[1].value;
+
+	if (fi)
+		fi->nodeid = feo->nodeid;
+	if (ff)
+		ff->fh = foo->fh;
+	return 0;
+}
+
+int fuse_release_initialize_in(struct bpf_fuse_args *fa, struct fuse_release_in *fri,
+			       struct inode *inode, struct file *file)
+{
+	struct fuse_file *fuse_file = file->private_data;
+
+	/* Always put backing file whatever bpf/userspace says */
+	fput(fuse_file->backing_file);
+
+	*fri = (struct fuse_release_in) {
+		.fh = ((struct fuse_file *)(file->private_data))->fh,
+	};
+
+	*fa = (struct bpf_fuse_args) {
+		.nodeid = get_fuse_inode(inode)->nodeid,
+		.opcode = FUSE_RELEASE,
+		.in_numargs = 1,
+		.in_args[0].size = sizeof(*fri),
+		.in_args[0].value = fri,
+	};
+
+	return 0;
+}
+
+int fuse_release_initialize_out(struct bpf_fuse_args *fa, struct fuse_release_in *fri,
+				struct inode *inode, struct file *file)
+{
+	return 0;
+}
+
+int fuse_releasedir_initialize_in(struct bpf_fuse_args *fa,
+				  struct fuse_release_in *fri,
+				  struct inode *inode, struct file *file)
+{
+	struct fuse_file *fuse_file = file->private_data;
+
+	/* Always put backing file whatever bpf/userspace says */
+	fput(fuse_file->backing_file);
+
+	*fri = (struct fuse_release_in) {
+		.fh = ((struct fuse_file *)(file->private_data))->fh,
+	};
+
+	*fa = (struct bpf_fuse_args) {
+		.nodeid = get_fuse_inode(inode)->nodeid,
+		.opcode = FUSE_RELEASEDIR,
+		.in_numargs = 1,
+		.in_args[0].size = sizeof(*fri),
+		.in_args[0].value = fri,
+	};
+
+	return 0;
+}
+
+int fuse_releasedir_initialize_out(struct bpf_fuse_args *fa,
+				   struct fuse_release_in *fri,
+				   struct inode *inode, struct file *file)
+{
+	return 0;
+}
+
+int fuse_release_backing(struct bpf_fuse_args *fa, int *out,
+			 struct inode *inode, struct file *file)
+{
+	return 0;
+}
+
+int fuse_release_finalize(struct bpf_fuse_args *fa, int *out,
+			  struct inode *inode, struct file *file)
+{
+	fuse_file_free(file->private_data);
+	*out = 0;
+	return 0;
+}
+
 int fuse_lseek_initialize_in(struct bpf_fuse_args *fa, struct fuse_lseek_io *flio,
 			     struct file *file, loff_t offset, int whence)
 {
diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
index daaf3576fab9..a89690662b3b 100644
--- a/fs/fuse/dir.c
+++ b/fs/fuse/dir.c
@@ -652,6 +652,18 @@ static int fuse_create_open(struct inode *dir, struct dentry *entry,
 	/* Userspace expects S_IFREG in create mode */
 	BUG_ON((mode & S_IFMT) != S_IFREG);
 
+#ifdef CONFIG_FUSE_BPF
+	{
+		if (fuse_bpf_backing(dir, struct fuse_create_open_io, err,
+				       fuse_create_open_initialize_in,
+				       fuse_create_open_initialize_out,
+				       fuse_create_open_backing,
+				       fuse_create_open_finalize,
+				       dir, entry, file, flags, mode))
+			return err;
+	}
+#endif
+
 	forget = fuse_alloc_forget();
 	err = -ENOMEM;
 	if (!forget)
@@ -1562,6 +1574,15 @@ static int fuse_dir_open(struct inode *inode, struct file *file)
 
 static int fuse_dir_release(struct inode *inode, struct file *file)
 {
+#ifdef CONFIG_FUSE_BPF
+	int err = 0;
+
+	if (fuse_bpf_backing(inode, struct fuse_release_in, err,
+		       fuse_releasedir_initialize_in, fuse_releasedir_initialize_out,
+		       fuse_release_backing, fuse_release_finalize, inode, file))
+		return err;
+#endif
+
 	fuse_release_common(file, true);
 
 	return 0;
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index ef6f6b0b3b59..7feb73274c3e 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -241,6 +241,17 @@ int fuse_open_common(struct inode *inode, struct file *file, bool isdir)
 	if (err)
 		return err;
 
+#ifdef CONFIG_FUSE_BPF
+	{
+		if (fuse_bpf_backing(inode, struct fuse_open_io, err,
+				       fuse_open_initialize_in, fuse_open_initialize_out,
+				       fuse_open_backing,
+				       fuse_open_finalize,
+				       inode, file, isdir))
+			return err;
+	}
+#endif
+
 	if (is_wb_truncate || dax_truncate)
 		inode_lock(inode);
 
@@ -350,6 +361,16 @@ static int fuse_release(struct inode *inode, struct file *file)
 {
 	struct fuse_conn *fc = get_fuse_conn(inode);
 
+#ifdef CONFIG_FUSE_BPF
+	int err;
+
+	if (fuse_bpf_backing(inode, struct fuse_release_in, err,
+		       fuse_release_initialize_in, fuse_release_initialize_out,
+		       fuse_release_backing, fuse_release_finalize,
+		       inode, file))
+		return err;
+#endif
+
 	/*
 	 * Dirty pages might remain despite write_inode_now() call from
 	 * fuse_flush() due to writes racing with the close.
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index 0e4996766c6c..f36a00e30c3f 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -1406,6 +1406,54 @@ struct fuse_entry_bpf {
 
 int parse_fuse_entry_bpf(struct fuse_entry_bpf *feb);
 
+struct fuse_open_io {
+	struct fuse_open_in foi;
+	struct fuse_open_out foo;
+};
+
+int fuse_open_initialize_in(struct bpf_fuse_args *fa, struct fuse_open_io *foi,
+			    struct inode *inode, struct file *file, bool isdir);
+int fuse_open_initialize_out(struct bpf_fuse_args *fa, struct fuse_open_io *foi,
+			     struct inode *inode, struct file *file, bool isdir);
+int fuse_open_backing(struct bpf_fuse_args *fa, int *out,
+		      struct inode *inode, struct file *file, bool isdir);
+int fuse_open_finalize(struct bpf_fuse_args *fa, int *out,
+			 struct inode *inode, struct file *file, bool isdir);
+
+struct fuse_create_open_io {
+	struct fuse_create_in fci;
+	struct fuse_entry_out feo;
+	struct fuse_open_out foo;
+};
+
+int fuse_create_open_initialize_in(struct bpf_fuse_args *fa, struct fuse_create_open_io *fcoi,
+				   struct inode *dir, struct dentry *entry,
+				   struct file *file, unsigned int flags, umode_t mode);
+int fuse_create_open_initialize_out(struct bpf_fuse_args *fa, struct fuse_create_open_io *fcoi,
+				    struct inode *dir, struct dentry *entry,
+				    struct file *file, unsigned int flags, umode_t mode);
+int fuse_create_open_backing(struct bpf_fuse_args *fa, int *out,
+			     struct inode *dir, struct dentry *entry,
+			     struct file *file, unsigned int flags, umode_t mode);
+int fuse_create_open_finalize(struct bpf_fuse_args *fa, int *out,
+			      struct inode *dir, struct dentry *entry,
+			      struct file *file, unsigned int flags, umode_t mode);
+
+int fuse_release_initialize_in(struct bpf_fuse_args *fa, struct fuse_release_in *fri,
+			       struct inode *inode, struct file *file);
+int fuse_release_initialize_out(struct bpf_fuse_args *fa, struct fuse_release_in *fri,
+				struct inode *inode, struct file *file);
+int fuse_releasedir_initialize_in(struct bpf_fuse_args *fa,
+				  struct fuse_release_in *fri,
+				  struct inode *inode, struct file *file);
+int fuse_releasedir_initialize_out(struct bpf_fuse_args *fa,
+				   struct fuse_release_in *fri,
+				   struct inode *inode, struct file *file);
+int fuse_release_backing(struct bpf_fuse_args *fa, int *out,
+			 struct inode *inode, struct file *file);
+int fuse_release_finalize(struct bpf_fuse_args *fa, int *out,
+			    struct inode *inode, struct file *file);
+
 struct fuse_lseek_io {
 	struct fuse_lseek_in fli;
 	struct fuse_lseek_out flo;
-- 
2.37.3.998.g577e59143f-goog


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 14/26] fuse-bpf: Support mknod/unlink/mkdir/rmdir
  2022-09-26 23:17 [PATCH 00/26] FUSE BPF: A Stacked Filesystem Extension for FUSE Daniel Rosenberg
                   ` (12 preceding siblings ...)
  2022-09-26 23:18 ` [PATCH 13/26] fuse-bpf: Support file/dir open/close Daniel Rosenberg
@ 2022-09-26 23:18 ` Daniel Rosenberg
  2022-09-26 23:18 ` [PATCH 15/26] fuse-bpf: Add support for read/write iter Daniel Rosenberg
                   ` (11 subsequent siblings)
  25 siblings, 0 replies; 38+ messages in thread
From: Daniel Rosenberg @ 2022-09-26 23:18 UTC (permalink / raw)
  To: Miklos Szeredi, Alexei Starovoitov, Daniel Borkmann,
	John Fastabend
  Cc: Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Daniel Rosenberg, Paul Lawrence, Alessio Balsini, David Anderson,
	Sandeep Patil, linux-fsdevel, bpf, kernel-team

Signed-off-by: Daniel Rosenberg <drosen@google.com>
Signed-off-by: Paul Lawrence <paullawrence@google.com>
---
 fs/fuse/backing.c | 271 ++++++++++++++++++++++++++++++++++++++++++++++
 fs/fuse/dir.c     |  40 +++++++
 fs/fuse/fuse_i.h  |  35 ++++++
 3 files changed, 346 insertions(+)

diff --git a/fs/fuse/backing.c b/fs/fuse/backing.c
index 1a2a89ddd535..1fe61177cdfb 100644
--- a/fs/fuse/backing.c
+++ b/fs/fuse/backing.c
@@ -765,6 +765,277 @@ int fuse_revalidate_backing(struct dentry *entry, unsigned int flags)
 	return 1;
 }
 
+int fuse_mknod_initialize_in(struct bpf_fuse_args *fa, struct fuse_mknod_in *fmi,
+			     struct inode *dir, struct dentry *entry, umode_t mode, dev_t rdev)
+{
+	*fmi = (struct fuse_mknod_in) {
+		.mode = mode,
+		.rdev = new_encode_dev(rdev),
+		.umask = current_umask(),
+	};
+	*fa = (struct bpf_fuse_args) {
+		.nodeid = get_node_id(dir),
+		.opcode = FUSE_MKNOD,
+		.in_numargs = 2,
+		.in_args[0] = (struct bpf_fuse_arg) {
+			.size = sizeof(*fmi),
+			.value = fmi,
+		},
+		.in_args[1] = (struct bpf_fuse_arg) {
+			.size = entry->d_name.len + 1,
+			.flags = BPF_FUSE_IMMUTABLE,
+			.value =  (void *) entry->d_name.name,
+		},
+	};
+
+	return 0;
+}
+
+int fuse_mknod_initialize_out(struct bpf_fuse_args *fa, struct fuse_mknod_in *fmi,
+			      struct inode *dir, struct dentry *entry, umode_t mode, dev_t rdev)
+{
+	return 0;
+}
+
+int fuse_mknod_backing(struct bpf_fuse_args *fa, int *out,
+		       struct inode *dir, struct dentry *entry, umode_t mode, dev_t rdev)
+{
+	const struct fuse_mknod_in *fmi = fa->in_args[0].value;
+	struct fuse_inode *fuse_inode = get_fuse_inode(dir);
+	struct inode *backing_inode = fuse_inode->backing_inode;
+	struct path backing_path;
+	struct inode *inode = NULL;
+
+	get_fuse_backing_path(entry, &backing_path);
+	if (!backing_path.dentry)
+		return -EBADF;
+
+	inode_lock_nested(backing_inode, I_MUTEX_PARENT);
+	mode = fmi->mode;
+	if (!IS_POSIXACL(backing_inode))
+		mode &= ~fmi->umask;
+	*out = vfs_mknod(&init_user_ns, backing_inode, backing_path.dentry, mode,
+			new_decode_dev(fmi->rdev));
+	inode_unlock(backing_inode);
+	if (*out)
+		goto out;
+	if (d_really_is_negative(backing_path.dentry) ||
+	    unlikely(d_unhashed(backing_path.dentry))) {
+		*out = -EINVAL;
+		/**
+		 * TODO: overlayfs responds to this situation with a
+		 * lookupOneLen. Should we do that too?
+		 */
+		goto out;
+	}
+	inode = fuse_iget_backing(dir->i_sb, fuse_inode->nodeid, backing_inode);
+	if (IS_ERR(inode)) {
+		*out = PTR_ERR(inode);
+		goto out;
+	}
+	d_instantiate(entry, inode);
+out:
+	path_put(&backing_path);
+	return *out;
+}
+
+int fuse_mknod_finalize(struct bpf_fuse_args *fa, int *out,
+			  struct inode *dir, struct dentry *entry, umode_t mode, dev_t rdev)
+{
+	return 0;
+}
+
+int fuse_mkdir_initialize_in(struct bpf_fuse_args *fa, struct fuse_mkdir_in *fmi,
+			     struct inode *dir, struct dentry *entry, umode_t mode)
+{
+	*fmi = (struct fuse_mkdir_in) {
+		.mode = mode,
+		.umask = current_umask(),
+	};
+	*fa = (struct bpf_fuse_args) {
+		.nodeid = get_node_id(dir),
+		.opcode = FUSE_MKDIR,
+		.in_numargs = 2,
+		.in_args[0] = (struct bpf_fuse_arg) {
+			.size = sizeof(*fmi),
+			.value = fmi,
+		},
+		.in_args[1] = (struct bpf_fuse_arg) {
+			.size = entry->d_name.len + 1,
+			.flags = BPF_FUSE_IMMUTABLE,
+			.value =  (void *) entry->d_name.name,
+		},
+	};
+
+	return 0;
+}
+
+int fuse_mkdir_initialize_out(struct bpf_fuse_args *fa, struct fuse_mkdir_in *fmi,
+			      struct inode *dir, struct dentry *entry, umode_t mode)
+{
+	return 0;
+}
+
+int fuse_mkdir_backing(struct bpf_fuse_args *fa, int *out,
+		       struct inode *dir, struct dentry *entry, umode_t mode)
+{
+	const struct fuse_mkdir_in *fmi = fa->in_args[0].value;
+	struct fuse_inode *fuse_inode = get_fuse_inode(dir);
+	struct inode *backing_inode = fuse_inode->backing_inode;
+	struct path backing_path;
+	struct inode *inode = NULL;
+	struct dentry *d;
+
+	get_fuse_backing_path(entry, &backing_path);
+	if (!backing_path.dentry)
+		return -EBADF;
+
+	inode_lock_nested(backing_inode, I_MUTEX_PARENT);
+	mode = fmi->mode;
+	if (!IS_POSIXACL(backing_inode))
+		mode &= ~fmi->umask;
+	*out = vfs_mkdir(&init_user_ns, backing_inode, backing_path.dentry,
+			mode);
+	if (*out)
+		goto out;
+	if (d_really_is_negative(backing_path.dentry) ||
+	    unlikely(d_unhashed(backing_path.dentry))) {
+		d = lookup_one_len(entry->d_name.name,
+				   backing_path.dentry->d_parent,
+				   entry->d_name.len);
+		if (IS_ERR(d)) {
+			*out = PTR_ERR(d);
+			goto out;
+		}
+		dput(backing_path.dentry);
+		backing_path.dentry = d;
+	}
+	inode = fuse_iget_backing(dir->i_sb, fuse_inode->nodeid, backing_inode);
+	if (IS_ERR(inode)) {
+		*out = PTR_ERR(inode);
+		goto out;
+	}
+	d_instantiate(entry, inode);
+out:
+	inode_unlock(backing_inode);
+	path_put(&backing_path);
+	return *out;
+}
+
+int fuse_mkdir_finalize(struct bpf_fuse_args *fa, int *out,
+			  struct inode *dir, struct dentry *entry, umode_t mode)
+{
+	return 0;
+}
+
+int fuse_rmdir_initialize_in(struct bpf_fuse_args *fa, struct fuse_dummy_io *dummy,
+			     struct inode *dir, struct dentry *entry)
+{
+	*fa = (struct bpf_fuse_args) {
+		.nodeid = get_node_id(dir),
+		.opcode = FUSE_RMDIR,
+		.in_numargs = 1,
+		.in_args[0] = (struct bpf_fuse_arg) {
+			.size = entry->d_name.len + 1,
+			.flags = BPF_FUSE_IMMUTABLE,
+			.value =  (void *) entry->d_name.name,
+		},
+	};
+
+	return 0;
+}
+
+int fuse_rmdir_initialize_out(struct bpf_fuse_args *fa, struct fuse_dummy_io *dummy,
+			      struct inode *dir, struct dentry *entry)
+{
+	return 0;
+}
+
+int fuse_rmdir_backing(struct bpf_fuse_args *fa, int *out,
+		       struct inode *dir, struct dentry *entry)
+{
+	struct path backing_path;
+	struct dentry *backing_parent_dentry;
+	struct inode *backing_inode;
+
+	get_fuse_backing_path(entry, &backing_path);
+	if (!backing_path.dentry)
+		return -EBADF;
+
+	backing_parent_dentry = dget_parent(backing_path.dentry);
+	backing_inode = d_inode(backing_parent_dentry);
+
+	inode_lock_nested(backing_inode, I_MUTEX_PARENT);
+	*out = vfs_rmdir(&init_user_ns, backing_inode, backing_path.dentry);
+	inode_unlock(backing_inode);
+
+	dput(backing_parent_dentry);
+	if (!*out)
+		d_drop(entry);
+	path_put(&backing_path);
+	return *out;
+}
+
+int fuse_rmdir_finalize(struct bpf_fuse_args *fa, int *out, struct inode *dir, struct dentry *entry)
+{
+	return 0;
+}
+
+int fuse_unlink_initialize_in(struct bpf_fuse_args *fa, struct fuse_dummy_io *dummy,
+			      struct inode *dir, struct dentry *entry)
+{
+	*fa = (struct bpf_fuse_args) {
+		.nodeid = get_node_id(dir),
+		.opcode = FUSE_UNLINK,
+		.in_numargs = 1,
+		.in_args[0] = (struct bpf_fuse_arg) {
+			.size = entry->d_name.len + 1,
+			.flags = BPF_FUSE_IMMUTABLE,
+			.value =  (void *) entry->d_name.name,
+		},
+	};
+
+	return 0;
+}
+
+int fuse_unlink_initialize_out(struct bpf_fuse_args *fa, struct fuse_dummy_io *dummy,
+			       struct inode *dir, struct dentry *entry)
+{
+	return 0;
+}
+
+int fuse_unlink_backing(struct bpf_fuse_args *fa, int *out, struct inode *dir, struct dentry *entry)
+{
+	struct path backing_path;
+	struct dentry *backing_parent_dentry;
+	struct inode *backing_inode;
+
+	get_fuse_backing_path(entry, &backing_path);
+	if (!backing_path.dentry)
+		return -EBADF;
+
+	/* TODO Not sure if we should reverify like overlayfs, or get inode from d_parent */
+	backing_parent_dentry = dget_parent(backing_path.dentry);
+	backing_inode = d_inode(backing_parent_dentry);
+
+	inode_lock_nested(backing_inode, I_MUTEX_PARENT);
+	*out = vfs_unlink(&init_user_ns, backing_inode, backing_path.dentry,
+			 NULL);
+	inode_unlock(backing_inode);
+
+	dput(backing_parent_dentry);
+	if (!*out)
+		d_drop(entry);
+	path_put(&backing_path);
+	return *out;
+}
+
+int fuse_unlink_finalize(struct bpf_fuse_args *fa, int *out,
+			 struct inode *dir, struct dentry *entry)
+{
+	return 0;
+}
+
 int fuse_access_initialize_in(struct bpf_fuse_args *fa, struct fuse_access_in *fai,
 			      struct inode *inode, int mask)
 {
diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
index a89690662b3b..d8237b7a23f2 100644
--- a/fs/fuse/dir.c
+++ b/fs/fuse/dir.c
@@ -896,6 +896,16 @@ static int fuse_mknod(struct user_namespace *mnt_userns, struct inode *dir,
 	struct fuse_mount *fm = get_fuse_mount(dir);
 	FUSE_ARGS(args);
 
+#ifdef CONFIG_FUSE_BPF
+	int err;
+
+	if (fuse_bpf_backing(dir, struct fuse_mknod_in, err,
+			fuse_mknod_initialize_in, fuse_mknod_initialize_out,
+			fuse_mknod_backing, fuse_mknod_finalize,
+			dir, entry, mode, rdev))
+		return err;
+#endif
+
 	if (!fm->fc->dont_mask)
 		mode &= ~current_umask();
 
@@ -925,6 +935,16 @@ static int fuse_mkdir(struct user_namespace *mnt_userns, struct inode *dir,
 	struct fuse_mount *fm = get_fuse_mount(dir);
 	FUSE_ARGS(args);
 
+#ifdef CONFIG_FUSE_BPF
+	int err;
+
+	if (fuse_bpf_backing(dir, struct fuse_mkdir_in, err,
+			fuse_mkdir_initialize_in, fuse_mkdir_initialize_out,
+			fuse_mkdir_backing, fuse_mkdir_finalize,
+			dir, entry, mode))
+		return err;
+#endif
+
 	if (!fm->fc->dont_mask)
 		mode &= ~current_umask();
 
@@ -1010,6 +1030,16 @@ static int fuse_unlink(struct inode *dir, struct dentry *entry)
 	if (fuse_is_bad(dir))
 		return -EIO;
 
+#ifdef CONFIG_FUSE_BPF
+	{
+		if (fuse_bpf_backing(dir, struct fuse_dummy_io, err,
+					fuse_unlink_initialize_in, fuse_unlink_initialize_out,
+					fuse_unlink_backing, fuse_unlink_finalize,
+					dir, entry))
+			return err;
+	}
+#endif
+
 	args.opcode = FUSE_UNLINK;
 	args.nodeid = get_node_id(dir);
 	args.in_numargs = 1;
@@ -1033,6 +1063,16 @@ static int fuse_rmdir(struct inode *dir, struct dentry *entry)
 	if (fuse_is_bad(dir))
 		return -EIO;
 
+#ifdef CONFIG_FUSE_BPF
+	{
+		if (fuse_bpf_backing(dir, struct fuse_dummy_io, err,
+					fuse_rmdir_initialize_in, fuse_rmdir_initialize_out,
+					fuse_rmdir_backing, fuse_rmdir_finalize,
+					dir, entry))
+			return err;
+	}
+#endif
+
 	args.opcode = FUSE_RMDIR;
 	args.nodeid = get_node_id(dir);
 	args.in_numargs = 1;
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index f36a00e30c3f..9d6c9cc68268 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -1439,6 +1439,41 @@ int fuse_create_open_finalize(struct bpf_fuse_args *fa, int *out,
 			      struct inode *dir, struct dentry *entry,
 			      struct file *file, unsigned int flags, umode_t mode);
 
+int fuse_mknod_initialize_in(struct bpf_fuse_args *fa, struct fuse_mknod_in *fmi,
+			     struct inode *dir, struct dentry *entry, umode_t mode, dev_t rdev);
+int fuse_mknod_initialize_out(struct bpf_fuse_args *fa, struct fuse_mknod_in *fmi,
+			      struct inode *dir, struct dentry *entry, umode_t mode, dev_t rdev);
+int fuse_mknod_backing(struct bpf_fuse_args *fa, int *out,
+		       struct inode *dir, struct dentry *entry, umode_t mode, dev_t rdev);
+int fuse_mknod_finalize(struct bpf_fuse_args *fa, int *out,
+			struct inode *dir, struct dentry *entry, umode_t mode, dev_t rdev);
+
+int fuse_mkdir_initialize_in(struct bpf_fuse_args *fa, struct fuse_mkdir_in *fmi,
+			     struct inode *dir, struct dentry *entry, umode_t mode);
+int fuse_mkdir_initialize_out(struct bpf_fuse_args *fa, struct fuse_mkdir_in *fmi,
+			      struct inode *dir, struct dentry *entry, umode_t mode);
+int fuse_mkdir_backing(struct bpf_fuse_args *fa, int *out,
+		       struct inode *dir, struct dentry *entry, umode_t mode);
+int fuse_mkdir_finalize(struct bpf_fuse_args *fa, int *out,
+			struct inode *dir, struct dentry *entry, umode_t mode);
+
+int fuse_rmdir_initialize_in(struct bpf_fuse_args *fa, struct fuse_dummy_io *fmi,
+			     struct inode *dir, struct dentry *entry);
+int fuse_rmdir_initialize_out(struct bpf_fuse_args *fa, struct fuse_dummy_io *fmi,
+			      struct inode *dir, struct dentry *entry);
+int fuse_rmdir_backing(struct bpf_fuse_args *fa, int *out, struct inode *dir, struct dentry *entry);
+int fuse_rmdir_finalize(struct bpf_fuse_args *fa, int *out,
+			struct inode *dir, struct dentry *entry);
+
+int fuse_unlink_initialize_in(struct bpf_fuse_args *fa, struct fuse_dummy_io *fmi,
+			      struct inode *dir, struct dentry *entry);
+int fuse_unlink_initialize_out(struct bpf_fuse_args *fa, struct fuse_dummy_io *fmi,
+			       struct inode *dir, struct dentry *entry);
+int fuse_unlink_backing(struct bpf_fuse_args *fa, int *out,
+			struct inode *dir, struct dentry *entry);
+int fuse_unlink_finalize(struct bpf_fuse_args *fa, int *out,
+			 struct inode *dir, struct dentry *entry);
+
 int fuse_release_initialize_in(struct bpf_fuse_args *fa, struct fuse_release_in *fri,
 			       struct inode *inode, struct file *file);
 int fuse_release_initialize_out(struct bpf_fuse_args *fa, struct fuse_release_in *fri,
-- 
2.37.3.998.g577e59143f-goog


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 15/26] fuse-bpf: Add support for read/write iter
  2022-09-26 23:17 [PATCH 00/26] FUSE BPF: A Stacked Filesystem Extension for FUSE Daniel Rosenberg
                   ` (13 preceding siblings ...)
  2022-09-26 23:18 ` [PATCH 14/26] fuse-bpf: Support mknod/unlink/mkdir/rmdir Daniel Rosenberg
@ 2022-09-26 23:18 ` Daniel Rosenberg
  2022-10-01  6:53   ` Amir Goldstein
  2022-09-26 23:18 ` [PATCH 16/26] fuse-bpf: support FUSE_READDIR Daniel Rosenberg
                   ` (10 subsequent siblings)
  25 siblings, 1 reply; 38+ messages in thread
From: Daniel Rosenberg @ 2022-09-26 23:18 UTC (permalink / raw)
  To: Miklos Szeredi, Alexei Starovoitov, Daniel Borkmann,
	John Fastabend
  Cc: Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Daniel Rosenberg, Paul Lawrence, Alessio Balsini, David Anderson,
	Sandeep Patil, linux-fsdevel, bpf, kernel-team

Signed-off-by: Daniel Rosenberg <drosen@google.com>
Signed-off-by: Paul Lawrence <paullawrence@google.com>
---
 fs/fuse/backing.c | 291 ++++++++++++++++++++++++++++++++++++++++++++++
 fs/fuse/control.c |   2 +-
 fs/fuse/file.c    |  28 +++++
 fs/fuse/fuse_i.h  |  42 ++++++-
 fs/fuse/inode.c   |  13 +++
 5 files changed, 374 insertions(+), 2 deletions(-)

diff --git a/fs/fuse/backing.c b/fs/fuse/backing.c
index 1fe61177cdfb..cf4ad9f4fe10 100644
--- a/fs/fuse/backing.c
+++ b/fs/fuse/backing.c
@@ -12,6 +12,47 @@
 #include <linux/namei.h>
 #include <linux/bpf_fuse.h>
 
+#define FUSE_BPF_IOCB_MASK (IOCB_APPEND | IOCB_DSYNC | IOCB_HIPRI | IOCB_NOWAIT | IOCB_SYNC)
+
+struct fuse_bpf_aio_req {
+	struct kiocb iocb;
+	refcount_t ref;
+	struct kiocb *iocb_orig;
+};
+
+static struct kmem_cache *fuse_bpf_aio_request_cachep;
+
+static void fuse_file_accessed(struct file *dst_file, struct file *src_file)
+{
+	struct inode *dst_inode;
+	struct inode *src_inode;
+
+	if (dst_file->f_flags & O_NOATIME)
+		return;
+
+	dst_inode = file_inode(dst_file);
+	src_inode = file_inode(src_file);
+
+	if ((!timespec64_equal(&dst_inode->i_mtime, &src_inode->i_mtime) ||
+	     !timespec64_equal(&dst_inode->i_ctime, &src_inode->i_ctime))) {
+		dst_inode->i_mtime = src_inode->i_mtime;
+		dst_inode->i_ctime = src_inode->i_ctime;
+	}
+
+	touch_atime(&dst_file->f_path);
+}
+
+static void fuse_copyattr(struct file *dst_file, struct file *src_file)
+{
+	struct inode *dst = file_inode(dst_file);
+	struct inode *src = file_inode(src_file);
+
+	dst->i_atime = src->i_atime;
+	dst->i_mtime = src->i_mtime;
+	dst->i_ctime = src->i_ctime;
+	i_size_write(dst, i_size_read(src));
+}
+
 struct bpf_prog *fuse_get_bpf_prog(struct file *file)
 {
 	struct bpf_prog *bpf_prog = ERR_PTR(-EINVAL);
@@ -469,6 +510,241 @@ int fuse_lseek_finalize(struct bpf_fuse_args *fa, loff_t *out,
 	return 0;
 }
 
+static inline void fuse_bpf_aio_put(struct fuse_bpf_aio_req *aio_req)
+{
+	if (refcount_dec_and_test(&aio_req->ref))
+		kmem_cache_free(fuse_bpf_aio_request_cachep, aio_req);
+}
+
+static void fuse_bpf_aio_cleanup_handler(struct fuse_bpf_aio_req *aio_req)
+{
+	struct kiocb *iocb = &aio_req->iocb;
+	struct kiocb *iocb_orig = aio_req->iocb_orig;
+
+	if (iocb->ki_flags & IOCB_WRITE) {
+		__sb_writers_acquired(file_inode(iocb->ki_filp)->i_sb,
+				      SB_FREEZE_WRITE);
+		file_end_write(iocb->ki_filp);
+		fuse_copyattr(iocb_orig->ki_filp, iocb->ki_filp);
+	}
+	iocb_orig->ki_pos = iocb->ki_pos;
+	fuse_bpf_aio_put(aio_req);
+}
+
+static void fuse_bpf_aio_rw_complete(struct kiocb *iocb, long res)
+{
+	struct fuse_bpf_aio_req *aio_req =
+		container_of(iocb, struct fuse_bpf_aio_req, iocb);
+	struct kiocb *iocb_orig = aio_req->iocb_orig;
+
+	fuse_bpf_aio_cleanup_handler(aio_req);
+	iocb_orig->ki_complete(iocb_orig, res);
+}
+
+int fuse_file_read_iter_initialize_in(struct bpf_fuse_args *fa, struct fuse_file_read_iter_io *fri,
+				      struct kiocb *iocb, struct iov_iter *to)
+{
+	struct file *file = iocb->ki_filp;
+	struct fuse_file *ff = file->private_data;
+
+	fri->fri = (struct fuse_read_in) {
+		.fh = ff->fh,
+		.offset = iocb->ki_pos,
+		.size = to->count,
+	};
+
+	/* TODO we can't assume 'to' is a kvec */
+	/* TODO we also can't assume the vector has only one component */
+	*fa = (struct bpf_fuse_args) {
+		.opcode = FUSE_READ,
+		.nodeid = ff->nodeid,
+		.in_numargs = 1,
+		.in_args[0].size = sizeof(fri->fri),
+		.in_args[0].value = &fri->fri,
+		/*
+		 * TODO Design this properly.
+		 * Possible approach: do not pass buf to bpf
+		 * If going to userland, do a deep copy
+		 * For extra credit, do that to/from the vector, rather than
+		 * making an extra copy in the kernel
+		 */
+	};
+
+	return 0;
+}
+
+int fuse_file_read_iter_initialize_out(struct bpf_fuse_args *fa, struct fuse_file_read_iter_io *fri,
+				       struct kiocb *iocb, struct iov_iter *to)
+{
+	fri->frio = (struct fuse_read_iter_out) {
+		.ret = fri->fri.size,
+	};
+
+	fa->out_numargs = 1;
+	fa->out_args[0].size = sizeof(fri->frio);
+	fa->out_args[0].value = &fri->frio;
+
+	return 0;
+}
+
+int fuse_file_read_iter_backing(struct bpf_fuse_args *fa, ssize_t *out,
+				struct kiocb *iocb, struct iov_iter *to)
+{
+	struct fuse_read_iter_out *frio = fa->out_args[0].value;
+	struct file *file = iocb->ki_filp;
+	struct fuse_file *ff = file->private_data;
+
+	if (!iov_iter_count(to))
+		return 0;
+
+	if ((iocb->ki_flags & IOCB_DIRECT) &&
+	    (!ff->backing_file->f_mapping->a_ops ||
+	     !ff->backing_file->f_mapping->a_ops->direct_IO))
+		return -EINVAL;
+
+	/* TODO This just plain ignores any change to fuse_read_in */
+	if (is_sync_kiocb(iocb)) {
+		*out = vfs_iter_read(ff->backing_file, to, &iocb->ki_pos,
+				iocb_to_rw_flags(iocb->ki_flags, FUSE_BPF_IOCB_MASK));
+	} else {
+		struct fuse_bpf_aio_req *aio_req;
+
+		*out = -ENOMEM;
+		aio_req = kmem_cache_zalloc(fuse_bpf_aio_request_cachep, GFP_KERNEL);
+		if (!aio_req)
+			goto out;
+
+		aio_req->iocb_orig = iocb;
+		kiocb_clone(&aio_req->iocb, iocb, ff->backing_file);
+		aio_req->iocb.ki_complete = fuse_bpf_aio_rw_complete;
+		refcount_set(&aio_req->ref, 2);
+		*out = vfs_iocb_iter_read(ff->backing_file, &aio_req->iocb, to);
+		fuse_bpf_aio_put(aio_req);
+		if (*out != -EIOCBQUEUED)
+			fuse_bpf_aio_cleanup_handler(aio_req);
+	}
+
+	frio->ret = *out;
+
+	/* TODO Need to point value at the buffer for post-modification */
+
+out:
+	fuse_file_accessed(file, ff->backing_file);
+
+	return *out;
+}
+
+int fuse_file_read_iter_finalize(struct bpf_fuse_args *fa, ssize_t *out,
+				 struct kiocb *iocb, struct iov_iter *to)
+{
+	struct fuse_read_iter_out *frio = fa->out_args[0].value;
+
+	*out = frio->ret;
+
+	return 0;
+}
+
+int fuse_file_write_iter_initialize_in(struct bpf_fuse_args *fa,
+				       struct fuse_file_write_iter_io *fwio,
+				       struct kiocb *iocb, struct iov_iter *from)
+{
+	struct file *file = iocb->ki_filp;
+	struct fuse_file *ff = file->private_data;
+
+	*fwio = (struct fuse_file_write_iter_io) {
+		.fwi.fh = ff->fh,
+		.fwi.offset = iocb->ki_pos,
+		.fwi.size = from->count,
+	};
+
+	/* TODO we can't assume 'from' is a kvec */
+	*fa = (struct bpf_fuse_args) {
+		.opcode = FUSE_WRITE,
+		.nodeid = ff->nodeid,
+		.in_numargs = 2,
+		.in_args[0].size = sizeof(fwio->fwi),
+		.in_args[0].value = &fwio->fwi,
+		.in_args[1].size = fwio->fwi.size,
+		.in_args[1].value = from->kvec->iov_base,
+	};
+
+	return 0;
+}
+
+int fuse_file_write_iter_initialize_out(struct bpf_fuse_args *fa,
+					struct fuse_file_write_iter_io *fwio,
+					struct kiocb *iocb, struct iov_iter *from)
+{
+	/* TODO we can't assume 'from' is a kvec */
+	fa->out_numargs = 1;
+	fa->out_args[0].size = sizeof(fwio->fwio);
+	fa->out_args[0].value = &fwio->fwio;
+
+	return 0;
+}
+
+int fuse_file_write_iter_backing(struct bpf_fuse_args *fa, ssize_t *out,
+				 struct kiocb *iocb, struct iov_iter *from)
+{
+	struct file *file = iocb->ki_filp;
+	struct fuse_file *ff = file->private_data;
+	struct fuse_write_iter_out *fwio = fa->out_args[0].value;
+
+	if (!iov_iter_count(from))
+		return 0;
+
+	/* TODO This just plain ignores any change to fuse_write_in */
+	/* TODO uint32_t seems smaller than ssize_t.... right? */
+	inode_lock(file_inode(file));
+
+	fuse_copyattr(file, ff->backing_file);
+
+	if (is_sync_kiocb(iocb)) {
+		file_start_write(ff->backing_file);
+		*out = vfs_iter_write(ff->backing_file, from, &iocb->ki_pos,
+					   iocb_to_rw_flags(iocb->ki_flags, FUSE_BPF_IOCB_MASK));
+		file_end_write(ff->backing_file);
+
+		/* Must reflect change in size of backing file to upper file */
+		if (*out > 0)
+			fuse_copyattr(file, ff->backing_file);
+	} else {
+		struct fuse_bpf_aio_req *aio_req;
+
+		*out = -ENOMEM;
+		aio_req = kmem_cache_zalloc(fuse_bpf_aio_request_cachep, GFP_KERNEL);
+		if (!aio_req)
+			goto out;
+
+		file_start_write(ff->backing_file);
+		__sb_writers_release(file_inode(ff->backing_file)->i_sb, SB_FREEZE_WRITE);
+		aio_req->iocb_orig = iocb;
+		kiocb_clone(&aio_req->iocb, iocb, ff->backing_file);
+		aio_req->iocb.ki_complete = fuse_bpf_aio_rw_complete;
+		refcount_set(&aio_req->ref, 2);
+		*out = vfs_iocb_iter_write(ff->backing_file, &aio_req->iocb, from);
+		fuse_bpf_aio_put(aio_req);
+		if (*out != -EIOCBQUEUED)
+			fuse_bpf_aio_cleanup_handler(aio_req);
+	}
+
+out:
+	inode_unlock(file_inode(file));
+	fwio->ret = *out;
+	if (*out < 0)
+		return *out;
+	return 0;
+}
+
+int fuse_file_write_iter_finalize(struct bpf_fuse_args *fa, ssize_t *out,
+				  struct kiocb *iocb, struct iov_iter *from)
+{
+	struct fuse_write_iter_out *fwio = fa->out_args[0].value;
+
+	*out = fwio->ret;
+	return 0;
+}
+
 ssize_t fuse_backing_mmap(struct file *file, struct vm_area_struct *vma)
 {
 	int ret;
@@ -1074,3 +1350,18 @@ int fuse_access_finalize(struct bpf_fuse_args *fa, int *out, struct inode *inode
 	return 0;
 }
 
+int __init fuse_bpf_init(void)
+{
+	fuse_bpf_aio_request_cachep = kmem_cache_create("fuse_bpf_aio_req",
+						   sizeof(struct fuse_bpf_aio_req),
+						   0, SLAB_HWCACHE_ALIGN, NULL);
+	if (!fuse_bpf_aio_request_cachep)
+		return -ENOMEM;
+
+	return 0;
+}
+
+void __exit fuse_bpf_cleanup(void)
+{
+	kmem_cache_destroy(fuse_bpf_aio_request_cachep);
+}
diff --git a/fs/fuse/control.c b/fs/fuse/control.c
index 247ef4f76761..685552453751 100644
--- a/fs/fuse/control.c
+++ b/fs/fuse/control.c
@@ -378,7 +378,7 @@ int __init fuse_ctl_init(void)
 	return register_filesystem(&fuse_ctl_fs_type);
 }
 
-void __exit fuse_ctl_cleanup(void)
+void fuse_ctl_cleanup(void)
 {
 	unregister_filesystem(&fuse_ctl_fs_type);
 }
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 7feb73274c3e..443f1af8a431 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -1625,6 +1625,20 @@ static ssize_t fuse_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
 	if (FUSE_IS_DAX(inode))
 		return fuse_dax_read_iter(iocb, to);
 
+#ifdef CONFIG_FUSE_BPF
+	{
+		ssize_t ret;
+
+		if (fuse_bpf_backing(inode, struct fuse_file_read_iter_io, ret,
+				       fuse_file_read_iter_initialize_in,
+				       fuse_file_read_iter_initialize_out,
+				       fuse_file_read_iter_backing,
+				       fuse_file_read_iter_finalize,
+				       iocb, to))
+			return ret;
+	}
+#endif
+
 	if (!(ff->open_flags & FOPEN_DIRECT_IO))
 		return fuse_cache_read_iter(iocb, to);
 	else
@@ -1643,6 +1657,20 @@ static ssize_t fuse_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
 	if (FUSE_IS_DAX(inode))
 		return fuse_dax_write_iter(iocb, from);
 
+#ifdef CONFIG_FUSE_BPF
+	{
+		ssize_t ret = 0;
+
+		if (fuse_bpf_backing(inode, struct fuse_file_write_iter_io, ret,
+				       fuse_file_write_iter_initialize_in,
+				       fuse_file_write_iter_initialize_out,
+				       fuse_file_write_iter_backing,
+				       fuse_file_write_iter_finalize,
+				       iocb, from))
+			return ret;
+	}
+#endif
+
 	if (!(ff->open_flags & FOPEN_DIRECT_IO))
 		return fuse_cache_write_iter(iocb, from);
 	else
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index 9d6c9cc68268..f427a7bb367c 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -1135,7 +1135,7 @@ int fuse_dev_init(void);
 void fuse_dev_cleanup(void);
 
 int fuse_ctl_init(void);
-void __exit fuse_ctl_cleanup(void);
+void fuse_ctl_cleanup(void);
 
 /**
  * Simple request sending that does request allocation and freeing
@@ -1503,6 +1503,43 @@ int fuse_lseek_backing(struct bpf_fuse_args *fa, loff_t *out, struct file *file,
 int fuse_lseek_finalize(struct bpf_fuse_args *fa, loff_t *out, struct file *file,
 			loff_t offset, int whence);
 
+struct fuse_read_iter_out {
+	uint64_t ret;
+};
+struct fuse_file_read_iter_io {
+	struct fuse_read_in fri;
+	struct fuse_read_iter_out frio;
+};
+
+int fuse_file_read_iter_initialize_in(struct bpf_fuse_args *fa, struct fuse_file_read_iter_io *fri,
+				      struct kiocb *iocb, struct iov_iter *to);
+int fuse_file_read_iter_initialize_out(struct bpf_fuse_args *fa, struct fuse_file_read_iter_io *fri,
+				       struct kiocb *iocb, struct iov_iter *to);
+int fuse_file_read_iter_backing(struct bpf_fuse_args *fa, ssize_t *out,
+				struct kiocb *iocb, struct iov_iter *to);
+int fuse_file_read_iter_finalize(struct bpf_fuse_args *fa, ssize_t *out,
+				 struct kiocb *iocb, struct iov_iter *to);
+
+struct fuse_write_iter_out {
+	uint64_t ret;
+};
+struct fuse_file_write_iter_io {
+	struct fuse_write_in fwi;
+	struct fuse_write_out fwo;
+	struct fuse_write_iter_out fwio;
+};
+
+int fuse_file_write_iter_initialize_in(struct bpf_fuse_args *fa,
+				       struct fuse_file_write_iter_io *fwio,
+				       struct kiocb *iocb, struct iov_iter *from);
+int fuse_file_write_iter_initialize_out(struct bpf_fuse_args *fa,
+					struct fuse_file_write_iter_io *fwio,
+					struct kiocb *iocb, struct iov_iter *from);
+int fuse_file_write_iter_backing(struct bpf_fuse_args *fa, ssize_t *out,
+				 struct kiocb *iocb, struct iov_iter *from);
+int fuse_file_write_iter_finalize(struct bpf_fuse_args *fa, ssize_t *out,
+				  struct kiocb *iocb, struct iov_iter *from);
+
 ssize_t fuse_backing_mmap(struct file *file, struct vm_area_struct *vma);
 
 int fuse_file_fallocate_initialize_in(struct bpf_fuse_args *fa,
@@ -1570,6 +1607,9 @@ static inline u64 attr_timeout(struct fuse_attr_out *o)
 }
 
 #ifdef CONFIG_FUSE_BPF
+int __init fuse_bpf_init(void);
+void __exit fuse_bpf_cleanup(void);
+
 /*
  * expression statement to wrap the backing filter logic
  * struct inode *inode: inode with bpf and backing inode
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index 290eae750282..c96cfcbfd96a 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -2117,11 +2117,21 @@ static int __init fuse_init(void)
 	if (res)
 		goto err_sysfs_cleanup;
 
+#ifdef CONFIG_FUSE_BPF
+	res = fuse_bpf_init();
+	if (res)
+		goto err_ctl_cleanup;
+#endif
+
 	sanitize_global_limit(&max_user_bgreq);
 	sanitize_global_limit(&max_user_congthresh);
 
 	return 0;
 
+#ifdef CONFIG_FUSE_BPF
+ err_ctl_cleanup:
+	fuse_ctl_cleanup();
+#endif
  err_sysfs_cleanup:
 	fuse_sysfs_cleanup();
  err_dev_cleanup:
@@ -2139,6 +2149,9 @@ static void __exit fuse_exit(void)
 	fuse_ctl_cleanup();
 	fuse_sysfs_cleanup();
 	fuse_fs_cleanup();
+#ifdef CONFIG_FUSE_BPF
+	fuse_bpf_cleanup();
+#endif
 	fuse_dev_cleanup();
 }
 
-- 
2.37.3.998.g577e59143f-goog


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 16/26] fuse-bpf: support FUSE_READDIR
  2022-09-26 23:17 [PATCH 00/26] FUSE BPF: A Stacked Filesystem Extension for FUSE Daniel Rosenberg
                   ` (14 preceding siblings ...)
  2022-09-26 23:18 ` [PATCH 15/26] fuse-bpf: Add support for read/write iter Daniel Rosenberg
@ 2022-09-26 23:18 ` Daniel Rosenberg
  2022-09-26 23:18 ` [PATCH 17/26] fuse-bpf: Add support for sync operations Daniel Rosenberg
                   ` (9 subsequent siblings)
  25 siblings, 0 replies; 38+ messages in thread
From: Daniel Rosenberg @ 2022-09-26 23:18 UTC (permalink / raw)
  To: Miklos Szeredi, Alexei Starovoitov, Daniel Borkmann,
	John Fastabend
  Cc: Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Daniel Rosenberg, Paul Lawrence, Alessio Balsini, David Anderson,
	Sandeep Patil, linux-fsdevel, bpf, kernel-team

Signed-off-by: Daniel Rosenberg <drosen@google.com>
Signed-off-by: Paul Lawrence <paullawrence@google.com>
---
 fs/fuse/backing.c         | 162 ++++++++++++++++++++++++++++++++++++++
 fs/fuse/fuse_i.h          |  18 +++++
 fs/fuse/readdir.c         |  22 ++++++
 include/uapi/linux/fuse.h |   6 ++
 4 files changed, 208 insertions(+)

diff --git a/fs/fuse/backing.c b/fs/fuse/backing.c
index cf4ad9f4fe10..a31199064dc7 100644
--- a/fs/fuse/backing.c
+++ b/fs/fuse/backing.c
@@ -1312,6 +1312,168 @@ int fuse_unlink_finalize(struct bpf_fuse_args *fa, int *out,
 	return 0;
 }
 
+int fuse_readdir_initialize_in(struct bpf_fuse_args *fa, struct fuse_read_io *frio,
+			    struct file *file, struct dir_context *ctx,
+			    bool *force_again, bool *allow_force, bool is_continued)
+{
+	struct fuse_file *ff = file->private_data;
+
+	*fa = (struct bpf_fuse_args) {
+		.nodeid = ff->nodeid,
+		.opcode = FUSE_READDIR,
+		.in_numargs = 1,
+		.in_args[0] = (struct bpf_fuse_arg) {
+			.size = sizeof(frio->fri),
+			.value = &frio->fri,
+		},
+	};
+
+	frio->fri = (struct fuse_read_in) {
+		.fh = ff->fh,
+		.offset = ctx->pos,
+		.size = PAGE_SIZE,
+	};
+
+	*force_again = false;
+	*allow_force = true;
+	return 0;
+}
+
+int fuse_readdir_initialize_out(struct bpf_fuse_args *fa, struct fuse_read_io *frio,
+				struct file *file, struct dir_context *ctx,
+				bool *force_again, bool *allow_force, bool is_continued)
+{
+	u8 *page = (u8 *)__get_free_page(GFP_KERNEL);
+
+	if (!page)
+		return -ENOMEM;
+
+	fa->flags = FUSE_BPF_OUT_ARGVAR;
+	fa->out_numargs = 2;
+	fa->out_args[0] = (struct bpf_fuse_arg) {
+		.size = sizeof(frio->fro),
+		.value = &frio->fro,
+	};
+	fa->out_args[1] = (struct bpf_fuse_arg) {
+		.size = PAGE_SIZE,
+		.max_size = PAGE_SIZE,
+		.flags = BPF_FUSE_VARIABLE_SIZE,
+		.value = page,
+	};
+	frio->fro = (struct fuse_read_out) {
+		.again = 0,
+		.offset = 0,
+	};
+
+	return 0;
+}
+
+struct extfuse_ctx {
+	struct dir_context ctx;
+	u8 *addr;
+	size_t offset;
+};
+
+static int filldir(struct dir_context *ctx, const char *name, int namelen,
+		   loff_t offset, u64 ino, unsigned int d_type)
+{
+	struct extfuse_ctx *ec = container_of(ctx, struct extfuse_ctx, ctx);
+	struct fuse_dirent *fd = (struct fuse_dirent *)(ec->addr + ec->offset);
+
+	if (ec->offset + sizeof(struct fuse_dirent) + namelen > PAGE_SIZE)
+		return -ENOMEM;
+
+	*fd = (struct fuse_dirent) {
+		.ino = ino,
+		.off = offset,
+		.namelen = namelen,
+		.type = d_type,
+	};
+
+	memcpy(fd->name, name, namelen);
+	ec->offset += FUSE_DIRENT_SIZE(fd);
+
+	return 0;
+}
+
+static int parse_dirfile(char *buf, size_t nbytes, struct dir_context *ctx)
+{
+	while (nbytes >= FUSE_NAME_OFFSET) {
+		struct fuse_dirent *dirent = (struct fuse_dirent *) buf;
+		size_t reclen = FUSE_DIRENT_SIZE(dirent);
+
+		if (!dirent->namelen || dirent->namelen > FUSE_NAME_MAX)
+			return -EIO;
+		if (reclen > nbytes)
+			break;
+		if (memchr(dirent->name, '/', dirent->namelen) != NULL)
+			return -EIO;
+
+		ctx->pos = dirent->off;
+		if (!dir_emit(ctx, dirent->name, dirent->namelen, dirent->ino,
+				dirent->type))
+			break;
+
+		buf += reclen;
+		nbytes -= reclen;
+	}
+
+	return 0;
+}
+
+
+int fuse_readdir_backing(struct bpf_fuse_args *fa, int *out,
+			 struct file *file, struct dir_context *ctx,
+			 bool *force_again, bool *allow_force, bool is_continued)
+{
+	struct fuse_file *ff = file->private_data;
+	struct file *backing_dir = ff->backing_file;
+	struct fuse_read_out *fro = fa->out_args[0].value;
+	struct extfuse_ctx ec;
+
+	ec = (struct extfuse_ctx) {
+		.ctx.actor = filldir,
+		.ctx.pos = ctx->pos,
+		.addr = fa->out_args[1].value,
+	};
+
+	if (!ec.addr)
+		return -ENOMEM;
+
+	if (!is_continued)
+		backing_dir->f_pos = file->f_pos;
+
+	*out = iterate_dir(backing_dir, &ec.ctx);
+	if (ec.offset == 0)
+		*allow_force = false;
+	fa->out_args[1].size = ec.offset;
+
+	fro->offset = ec.ctx.pos;
+	fro->again = false;
+
+	return *out;
+}
+
+int fuse_readdir_finalize(struct bpf_fuse_args *fa, int *out,
+			    struct file *file, struct dir_context *ctx,
+			    bool *force_again, bool *allow_force, bool is_continued)
+{
+	struct fuse_read_out *fro = fa->out_args[0].value;
+	struct fuse_file *ff = file->private_data;
+	struct file *backing_dir = ff->backing_file;
+
+	*out = parse_dirfile(fa->out_args[1].value, fa->out_args[1].size, ctx);
+	*force_again = !!fro->again;
+	if (*force_again && !*allow_force)
+		*out = -EINVAL;
+
+	ctx->pos = fro->offset;
+	backing_dir->f_pos = fro->offset;
+
+	free_page((unsigned long)fa->out_args[1].value);
+	return *out;
+}
+
 int fuse_access_initialize_in(struct bpf_fuse_args *fa, struct fuse_access_in *fai,
 			      struct inode *inode, int mask)
 {
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index f427a7bb367c..8780a50be244 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -1572,6 +1572,24 @@ int fuse_lookup_finalize(struct bpf_fuse_args *fa, struct dentry **out,
 			 struct inode *dir, struct dentry *entry, unsigned int flags);
 int fuse_revalidate_backing(struct dentry *entry, unsigned int flags);
 
+struct fuse_read_io {
+	struct fuse_read_in fri;
+	struct fuse_read_out fro;
+};
+
+int fuse_readdir_initialize_in(struct bpf_fuse_args *fa, struct fuse_read_io *frio,
+			       struct file *file, struct dir_context *ctx,
+			       bool *force_again, bool *allow_force, bool is_continued);
+int fuse_readdir_initialize_out(struct bpf_fuse_args *fa, struct fuse_read_io *frio,
+				struct file *file, struct dir_context *ctx,
+				bool *force_again, bool *allow_force, bool is_continued);
+int fuse_readdir_backing(struct bpf_fuse_args *fa, int *out,
+			 struct file *file, struct dir_context *ctx,
+			 bool *force_again, bool *allow_force, bool is_continued);
+int fuse_readdir_finalize(struct bpf_fuse_args *fa, int *out,
+			    struct file *file, struct dir_context *ctx,
+			    bool *force_again, bool *allow_force, bool is_continued);
+
 int fuse_access_initialize_in(struct bpf_fuse_args *fa, struct fuse_access_in *fai,
 			      struct inode *inode, int mask);
 int fuse_access_initialize_out(struct bpf_fuse_args *fa, struct fuse_access_in *fai,
diff --git a/fs/fuse/readdir.c b/fs/fuse/readdir.c
index b4e565711045..07da8570e337 100644
--- a/fs/fuse/readdir.c
+++ b/fs/fuse/readdir.c
@@ -20,6 +20,8 @@ static bool fuse_use_readdirplus(struct inode *dir, struct dir_context *ctx)
 
 	if (!fc->do_readdirplus)
 		return false;
+	if (fi->nodeid == 0)
+		return false;
 	if (!fc->readdirplus_auto)
 		return true;
 	if (test_and_clear_bit(FUSE_I_ADVISE_RDPLUS, &fi->state))
@@ -571,6 +573,26 @@ int fuse_readdir(struct file *file, struct dir_context *ctx)
 	struct inode *inode = file_inode(file);
 	int err;
 
+#ifdef CONFIG_FUSE_BPF
+	bool bpf_ret = false;
+	bool allow_force;
+	bool force_again = false;
+	bool is_continued = false;
+
+again:
+	bpf_ret = fuse_bpf_backing(inode, struct fuse_read_io, err,
+			       fuse_readdir_initialize_in, fuse_readdir_initialize_out,
+			       fuse_readdir_backing, fuse_readdir_finalize,
+			       file, ctx, &force_again, &allow_force, is_continued);
+	if (force_again && err >= 0) {
+		is_continued = true;
+		goto again;
+	}
+
+	if (bpf_ret)
+		return err;
+#endif
+
 	if (fuse_is_bad(inode))
 		return -EIO;
 
diff --git a/include/uapi/linux/fuse.h b/include/uapi/linux/fuse.h
index 8c80c146e69b..b7736cb4bdaf 100644
--- a/include/uapi/linux/fuse.h
+++ b/include/uapi/linux/fuse.h
@@ -701,6 +701,12 @@ struct fuse_read_in {
 	uint32_t	padding;
 };
 
+struct fuse_read_out {
+	uint64_t	offset;
+	uint32_t	again;
+	uint32_t	padding;
+};
+
 #define FUSE_COMPAT_WRITE_IN_SIZE 24
 
 struct fuse_write_in {
-- 
2.37.3.998.g577e59143f-goog


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 17/26] fuse-bpf: Add support for sync operations
  2022-09-26 23:17 [PATCH 00/26] FUSE BPF: A Stacked Filesystem Extension for FUSE Daniel Rosenberg
                   ` (15 preceding siblings ...)
  2022-09-26 23:18 ` [PATCH 16/26] fuse-bpf: support FUSE_READDIR Daniel Rosenberg
@ 2022-09-26 23:18 ` Daniel Rosenberg
  2022-09-26 23:18 ` [PATCH 18/26] fuse-bpf: Add Rename support Daniel Rosenberg
                   ` (8 subsequent siblings)
  25 siblings, 0 replies; 38+ messages in thread
From: Daniel Rosenberg @ 2022-09-26 23:18 UTC (permalink / raw)
  To: Miklos Szeredi, Alexei Starovoitov, Daniel Borkmann,
	John Fastabend
  Cc: Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Daniel Rosenberg, Paul Lawrence, Alessio Balsini, David Anderson,
	Sandeep Patil, linux-fsdevel, bpf, kernel-team

Signed-off-by: Daniel Rosenberg <drosen@google.com>
Signed-off-by: Paul Lawrence <paullawrence@google.com>
---
 fs/fuse/backing.c | 117 ++++++++++++++++++++++++++++++++++++++++++++++
 fs/fuse/dir.c     |   8 ++++
 fs/fuse/file.c    |  17 +++++++
 fs/fuse/fuse_i.h  |  21 +++++++++
 4 files changed, 163 insertions(+)

diff --git a/fs/fuse/backing.c b/fs/fuse/backing.c
index a31199064dc7..4fd7442c94a1 100644
--- a/fs/fuse/backing.c
+++ b/fs/fuse/backing.c
@@ -436,6 +436,49 @@ int fuse_release_finalize(struct bpf_fuse_args *fa, int *out,
 	return 0;
 }
 
+int fuse_flush_initialize_in(struct bpf_fuse_args *fa, struct fuse_flush_in *ffi,
+			     struct file *file, fl_owner_t id)
+{
+	struct fuse_file *fuse_file = file->private_data;
+
+	*ffi = (struct fuse_flush_in) {
+		.fh = fuse_file->fh,
+	};
+
+	*fa = (struct bpf_fuse_args) {
+		.nodeid = get_node_id(file->f_inode),
+		.opcode = FUSE_FLUSH,
+		.in_numargs = 1,
+		.in_args[0].size = sizeof(*ffi),
+		.in_args[0].value = ffi,
+		.flags = FUSE_BPF_FORCE,
+	};
+
+	return 0;
+}
+
+int fuse_flush_initialize_out(struct bpf_fuse_args *fa, struct fuse_flush_in *ffi,
+			      struct file *file, fl_owner_t id)
+{
+	return 0;
+}
+
+int fuse_flush_backing(struct bpf_fuse_args *fa, int *out, struct file *file, fl_owner_t id)
+{
+	struct fuse_file *fuse_file = file->private_data;
+	struct file *backing_file = fuse_file->backing_file;
+
+	*out = 0;
+	if (backing_file->f_op->flush)
+		*out = backing_file->f_op->flush(backing_file, id);
+	return *out;
+}
+
+int fuse_flush_finalize(struct bpf_fuse_args *fa, int *out, struct file *file, fl_owner_t id)
+{
+	return 0;
+}
+
 int fuse_lseek_initialize_in(struct bpf_fuse_args *fa, struct fuse_lseek_io *flio,
 			     struct file *file, loff_t offset, int whence)
 {
@@ -510,6 +553,80 @@ int fuse_lseek_finalize(struct bpf_fuse_args *fa, loff_t *out,
 	return 0;
 }
 
+int fuse_fsync_initialize_in(struct bpf_fuse_args *fa, struct fuse_fsync_in *ffi,
+			     struct file *file, loff_t start, loff_t end, int datasync)
+{
+	struct fuse_file *fuse_file = file->private_data;
+
+	*ffi = (struct fuse_fsync_in) {
+		.fh = fuse_file->fh,
+		.fsync_flags = datasync ? FUSE_FSYNC_FDATASYNC : 0,
+	};
+
+	*fa = (struct bpf_fuse_args) {
+		.nodeid = get_fuse_inode(file->f_inode)->nodeid,
+		.opcode = FUSE_FSYNC,
+		.in_numargs = 1,
+		.in_args[0].size = sizeof(*ffi),
+		.in_args[0].value = ffi,
+		.flags = FUSE_BPF_FORCE,
+	};
+
+	return 0;
+}
+
+int fuse_fsync_initialize_out(struct bpf_fuse_args *fa, struct fuse_fsync_in *ffi,
+			      struct file *file, loff_t start, loff_t end, int datasync)
+{
+	return 0;
+}
+
+int fuse_fsync_backing(struct bpf_fuse_args *fa, int *out,
+		       struct file *file, loff_t start, loff_t end, int datasync)
+{
+	struct fuse_file *fuse_file = file->private_data;
+	struct file *backing_file = fuse_file->backing_file;
+	const struct fuse_fsync_in *ffi = fa->in_args[0].value;
+	int new_datasync = (ffi->fsync_flags & FUSE_FSYNC_FDATASYNC) ? 1 : 0;
+
+	*out = vfs_fsync(backing_file, new_datasync);
+	return 0;
+}
+
+int fuse_fsync_finalize(struct bpf_fuse_args *fa, int *out,
+			struct file *file, loff_t start, loff_t end, int datasync)
+{
+	return 0;
+}
+
+int fuse_dir_fsync_initialize_in(struct bpf_fuse_args *fa, struct fuse_fsync_in *ffi,
+				 struct file *file, loff_t start, loff_t end, int datasync)
+{
+	struct fuse_file *fuse_file = file->private_data;
+
+	*ffi = (struct fuse_fsync_in) {
+		.fh = fuse_file->fh,
+		.fsync_flags = datasync ? FUSE_FSYNC_FDATASYNC : 0,
+	};
+
+	*fa = (struct bpf_fuse_args) {
+		.nodeid = get_fuse_inode(file->f_inode)->nodeid,
+		.opcode = FUSE_FSYNCDIR,
+		.in_numargs = 1,
+		.in_args[0].size = sizeof(*ffi),
+		.in_args[0].value = ffi,
+		.flags = FUSE_BPF_FORCE,
+	};
+
+	return 0;
+}
+
+int fuse_dir_fsync_initialize_out(struct bpf_fuse_args *fa, struct fuse_fsync_in *ffi,
+				  struct file *file, loff_t start, loff_t end, int datasync)
+{
+	return 0;
+}
+
 static inline void fuse_bpf_aio_put(struct fuse_bpf_aio_req *aio_req)
 {
 	if (refcount_dec_and_test(&aio_req->ref))
diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
index d8237b7a23f2..f159b9a6d305 100644
--- a/fs/fuse/dir.c
+++ b/fs/fuse/dir.c
@@ -1638,6 +1638,14 @@ static int fuse_dir_fsync(struct file *file, loff_t start, loff_t end,
 	if (fuse_is_bad(inode))
 		return -EIO;
 
+#ifdef CONFIG_FUSE_BPF
+	if (fuse_bpf_backing(inode, struct fuse_fsync_in, err,
+			fuse_dir_fsync_initialize_in, fuse_dir_fsync_initialize_out,
+			fuse_fsync_backing, fuse_fsync_finalize,
+			file, start, end, datasync))
+		return err;
+#endif
+
 	if (fc->no_fsyncdir)
 		return 0;
 
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 443f1af8a431..fc8f8e3a06b3 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -513,6 +513,15 @@ static int fuse_flush(struct file *file, fl_owner_t id)
 	FUSE_ARGS(args);
 	int err;
 
+#ifdef CONFIG_FUSE_BPF
+	if (fuse_bpf_backing(file->f_inode, struct fuse_flush_in, err,
+			       fuse_flush_initialize_in, fuse_flush_initialize_out,
+			       fuse_flush_backing,
+			       fuse_flush_finalize,
+			       file, id))
+	return err;
+#endif
+
 	if (fuse_is_bad(inode))
 		return -EIO;
 
@@ -588,6 +597,14 @@ static int fuse_fsync(struct file *file, loff_t start, loff_t end,
 	struct fuse_conn *fc = get_fuse_conn(inode);
 	int err;
 
+#ifdef CONFIG_FUSE_BPF
+	if (fuse_bpf_backing(inode, struct fuse_fsync_in, err,
+			       fuse_fsync_initialize_in, fuse_fsync_initialize_out,
+			       fuse_fsync_backing, fuse_fsync_finalize,
+			       file, start, end, datasync))
+		return err;
+#endif
+
 	if (fuse_is_bad(inode))
 		return -EIO;
 
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index 8780a50be244..db769dd0a2e4 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -1489,6 +1489,14 @@ int fuse_release_backing(struct bpf_fuse_args *fa, int *out,
 int fuse_release_finalize(struct bpf_fuse_args *fa, int *out,
 			    struct inode *inode, struct file *file);
 
+int fuse_flush_initialize_in(struct bpf_fuse_args *fa, struct fuse_flush_in *ffi,
+			     struct file *file, fl_owner_t id);
+int fuse_flush_initialize_out(struct bpf_fuse_args *fa, struct fuse_flush_in *ffi,
+			      struct file *file, fl_owner_t id);
+int fuse_flush_backing(struct bpf_fuse_args *fa, int *out, struct file *file, fl_owner_t id);
+int fuse_flush_finalize(struct bpf_fuse_args *fa, int *out,
+			struct file *file, fl_owner_t id);
+
 struct fuse_lseek_io {
 	struct fuse_lseek_in fli;
 	struct fuse_lseek_out flo;
@@ -1503,6 +1511,19 @@ int fuse_lseek_backing(struct bpf_fuse_args *fa, loff_t *out, struct file *file,
 int fuse_lseek_finalize(struct bpf_fuse_args *fa, loff_t *out, struct file *file,
 			loff_t offset, int whence);
 
+int fuse_fsync_initialize_in(struct bpf_fuse_args *fa, struct fuse_fsync_in *ffi,
+			     struct file *file, loff_t start, loff_t end, int datasync);
+int fuse_fsync_initialize_out(struct bpf_fuse_args *fa, struct fuse_fsync_in *ffi,
+			      struct file *file, loff_t start, loff_t end, int datasync);
+int fuse_fsync_backing(struct bpf_fuse_args *fa, int *out,
+		       struct file *file, loff_t start, loff_t end, int datasync);
+int fuse_fsync_finalize(struct bpf_fuse_args *fa, int *out,
+			struct file *file, loff_t start, loff_t end, int datasync);
+int fuse_dir_fsync_initialize_in(struct bpf_fuse_args *fa, struct fuse_fsync_in *ffi,
+				 struct file *file, loff_t start, loff_t end, int datasync);
+int fuse_dir_fsync_initialize_out(struct bpf_fuse_args *fa, struct fuse_fsync_in *ffi,
+				  struct file *file, loff_t start, loff_t end, int datasync);
+
 struct fuse_read_iter_out {
 	uint64_t ret;
 };
-- 
2.37.3.998.g577e59143f-goog


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 18/26] fuse-bpf: Add Rename support
  2022-09-26 23:17 [PATCH 00/26] FUSE BPF: A Stacked Filesystem Extension for FUSE Daniel Rosenberg
                   ` (16 preceding siblings ...)
  2022-09-26 23:18 ` [PATCH 17/26] fuse-bpf: Add support for sync operations Daniel Rosenberg
@ 2022-09-26 23:18 ` Daniel Rosenberg
  2022-09-26 23:18 ` [PATCH 19/26] fuse-bpf: Add attr support Daniel Rosenberg
                   ` (7 subsequent siblings)
  25 siblings, 0 replies; 38+ messages in thread
From: Daniel Rosenberg @ 2022-09-26 23:18 UTC (permalink / raw)
  To: Miklos Szeredi, Alexei Starovoitov, Daniel Borkmann,
	John Fastabend
  Cc: Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Daniel Rosenberg, Paul Lawrence, Alessio Balsini, David Anderson,
	Sandeep Patil, linux-fsdevel, bpf, kernel-team

Signed-off-by: Daniel Rosenberg <drosen@google.com>
Signed-off-by: Paul Lawrence <paullawrence@google.com>
---
 fs/fuse/backing.c | 197 ++++++++++++++++++++++++++++++++++++++++++++++
 fs/fuse/dir.c     |  19 +++++
 fs/fuse/fuse_i.h  |  30 +++++++
 3 files changed, 246 insertions(+)

diff --git a/fs/fuse/backing.c b/fs/fuse/backing.c
index 4fd7442c94a1..f4ab92dc8099 100644
--- a/fs/fuse/backing.c
+++ b/fs/fuse/backing.c
@@ -1374,6 +1374,203 @@ int fuse_rmdir_finalize(struct bpf_fuse_args *fa, int *out, struct inode *dir, s
 	return 0;
 }
 
+static int fuse_rename_backing_common(struct inode *olddir,
+				      struct dentry *oldent,
+				      struct inode *newdir,
+				      struct dentry *newent, unsigned int flags)
+{
+	int err = 0;
+	struct path old_backing_path;
+	struct path new_backing_path;
+	struct dentry *old_backing_dir_dentry;
+	struct dentry *old_backing_dentry;
+	struct dentry *new_backing_dir_dentry;
+	struct dentry *new_backing_dentry;
+	struct dentry *trap = NULL;
+	struct inode *target_inode;
+	struct renamedata rd;
+
+	//TODO Actually deal with changing anything that isn't a flag
+	get_fuse_backing_path(oldent, &old_backing_path);
+	if (!old_backing_path.dentry)
+		return -EBADF;
+	get_fuse_backing_path(newent, &new_backing_path);
+	if (!new_backing_path.dentry) {
+		/*
+		 * TODO A file being moved from a backing path to another
+		 * backing path which is not yet instrumented with FUSE-BPF.
+		 * This may be slow and should be substituted with something
+		 * more clever.
+		 */
+		err = -EXDEV;
+		goto put_old_path;
+	}
+	if (new_backing_path.mnt != old_backing_path.mnt) {
+		err = -EXDEV;
+		goto put_new_path;
+	}
+	old_backing_dentry = old_backing_path.dentry;
+	new_backing_dentry = new_backing_path.dentry;
+	old_backing_dir_dentry = dget_parent(old_backing_dentry);
+	new_backing_dir_dentry = dget_parent(new_backing_dentry);
+	target_inode = d_inode(newent);
+
+	trap = lock_rename(old_backing_dir_dentry, new_backing_dir_dentry);
+	if (trap == old_backing_dentry) {
+		err = -EINVAL;
+		goto put_parents;
+	}
+	if (trap == new_backing_dentry) {
+		err = -ENOTEMPTY;
+		goto put_parents;
+	}
+
+	rd = (struct renamedata) {
+		.old_mnt_userns = &init_user_ns,
+		.old_dir = d_inode(old_backing_dir_dentry),
+		.old_dentry = old_backing_dentry,
+		.new_mnt_userns = &init_user_ns,
+		.new_dir = d_inode(new_backing_dir_dentry),
+		.new_dentry = new_backing_dentry,
+		.flags = flags,
+	};
+	err = vfs_rename(&rd);
+	if (err)
+		goto unlock;
+	if (target_inode)
+		fsstack_copy_attr_all(target_inode,
+				get_fuse_inode(target_inode)->backing_inode);
+	fsstack_copy_attr_all(d_inode(oldent), d_inode(old_backing_dentry));
+unlock:
+	unlock_rename(old_backing_dir_dentry, new_backing_dir_dentry);
+put_parents:
+	dput(new_backing_dir_dentry);
+	dput(old_backing_dir_dentry);
+put_new_path:
+	path_put(&new_backing_path);
+put_old_path:
+	path_put(&old_backing_path);
+	return err;
+}
+
+int fuse_rename2_initialize_in(struct bpf_fuse_args *fa, struct fuse_rename2_in *fri,
+			       struct inode *olddir, struct dentry *oldent,
+			       struct inode *newdir, struct dentry *newent,
+			       unsigned int flags)
+{
+	*fri = (struct fuse_rename2_in) {
+		.newdir = get_node_id(newdir),
+		.flags = flags,
+	};
+	*fa = (struct bpf_fuse_args) {
+		.nodeid = get_node_id(olddir),
+		.opcode = FUSE_RENAME2,
+		.in_numargs = 3,
+		.in_args[0] = (struct bpf_fuse_arg) {
+			.size = sizeof(*fri),
+			.value = fri,
+		},
+		.in_args[1] = (struct bpf_fuse_arg) {
+			.size = oldent->d_name.len + 1,
+			.max_size = NAME_MAX + 1,
+			.flags = BPF_FUSE_IMMUTABLE,
+			.value =  (void *) oldent->d_name.name,
+		},
+		.in_args[2] = (struct bpf_fuse_arg) {
+			.size = newent->d_name.len + 1,
+			.max_size = NAME_MAX + 1,
+			.flags = BPF_FUSE_IMMUTABLE,
+			.value =  (void *) newent->d_name.name,
+		},
+	};
+
+	return 0;
+}
+
+int fuse_rename2_initialize_out(struct bpf_fuse_args *fa, struct fuse_rename2_in *fri,
+				struct inode *olddir, struct dentry *oldent,
+				struct inode *newdir, struct dentry *newent,
+				unsigned int flags)
+{
+	return 0;
+}
+
+int fuse_rename2_backing(struct bpf_fuse_args *fa, int *out,
+			 struct inode *olddir, struct dentry *oldent,
+			 struct inode *newdir, struct dentry *newent,
+			 unsigned int flags)
+{
+	const struct fuse_rename2_in *fri = fa->in_args[0].value;
+
+	/* TODO: deal with changing dirs/ents */
+	*out = fuse_rename_backing_common(olddir, oldent, newdir, newent,
+					  fri->flags);
+	return *out;
+}
+
+int fuse_rename2_finalize(struct bpf_fuse_args *fa, int *out,
+			  struct inode *olddir, struct dentry *oldent,
+			  struct inode *newdir, struct dentry *newent,
+			  unsigned int flags)
+{
+	return 0;
+}
+
+int fuse_rename_initialize_in(struct bpf_fuse_args *fa, struct fuse_rename_in *fri,
+			      struct inode *olddir, struct dentry *oldent,
+			      struct inode *newdir, struct dentry *newent)
+{
+	*fri = (struct fuse_rename_in) {
+		.newdir = get_node_id(newdir),
+	};
+	*fa = (struct bpf_fuse_args) {
+		.nodeid = get_node_id(olddir),
+		.opcode = FUSE_RENAME,
+		.in_numargs = 3,
+		.in_args[0] = (struct bpf_fuse_arg) {
+			.size = sizeof(*fri),
+			.value = fri,
+		},
+		.in_args[1] = (struct bpf_fuse_arg) {
+			.size = oldent->d_name.len + 1,
+			.max_size = NAME_MAX + 1,
+			.flags = BPF_FUSE_IMMUTABLE,
+			.value =  (void *) oldent->d_name.name,
+		},
+		.in_args[2] = (struct bpf_fuse_arg) {
+			.size = newent->d_name.len + 1,
+			.max_size = NAME_MAX + 1,
+			.flags = BPF_FUSE_IMMUTABLE,
+			.value =  (void *) newent->d_name.name,
+		},
+	};
+
+	return 0;
+}
+
+int fuse_rename_initialize_out(struct bpf_fuse_args *fa, struct fuse_rename_in *fri,
+			       struct inode *olddir, struct dentry *oldent,
+			       struct inode *newdir, struct dentry *newent)
+{
+	return 0;
+}
+
+int fuse_rename_backing(struct bpf_fuse_args *fa, int *out,
+			struct inode *olddir, struct dentry *oldent,
+			struct inode *newdir, struct dentry *newent)
+{
+	/* TODO: deal with changing dirs/ents */
+	*out = fuse_rename_backing_common(olddir, oldent, newdir, newent, 0);
+	return *out;
+}
+
+int fuse_rename_finalize(struct bpf_fuse_args *fa, int *out,
+			 struct inode *olddir, struct dentry *oldent,
+			 struct inode *newdir, struct dentry *newent)
+{
+	return 0;
+}
+
 int fuse_unlink_initialize_in(struct bpf_fuse_args *fa, struct fuse_dummy_io *dummy,
 			      struct inode *dir, struct dentry *entry)
 {
diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
index f159b9a6d305..7c9d8540668c 100644
--- a/fs/fuse/dir.c
+++ b/fs/fuse/dir.c
@@ -1151,6 +1151,16 @@ static int fuse_rename2(struct user_namespace *mnt_userns, struct inode *olddir,
 		return -EINVAL;
 
 	if (flags) {
+#ifdef CONFIG_FUSE_BPF
+		if (fuse_bpf_backing(olddir, struct fuse_rename2_in, err,
+						fuse_rename2_initialize_in,
+						fuse_rename2_initialize_out, fuse_rename2_backing,
+						fuse_rename2_finalize,
+						olddir, oldent, newdir, newent, flags))
+			return err;
+#endif
+
+		/* TODO: how should this go with bpfs involved? */
 		if (fc->no_rename2 || fc->minor < 23)
 			return -EINVAL;
 
@@ -1162,6 +1172,15 @@ static int fuse_rename2(struct user_namespace *mnt_userns, struct inode *olddir,
 			err = -EINVAL;
 		}
 	} else {
+#ifdef CONFIG_FUSE_BPF
+		if (fuse_bpf_backing(olddir, struct fuse_rename_in, err,
+						fuse_rename_initialize_in,
+						fuse_rename_initialize_out, fuse_rename_backing,
+						fuse_rename_finalize,
+						olddir, oldent, newdir, newent))
+			return err;
+#endif
+
 		err = fuse_rename_common(olddir, oldent, newdir, newent, 0,
 					 FUSE_RENAME,
 					 sizeof(struct fuse_rename_in));
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index db769dd0a2e4..6c2f75ae9a5a 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -1465,6 +1465,36 @@ int fuse_rmdir_backing(struct bpf_fuse_args *fa, int *out, struct inode *dir, st
 int fuse_rmdir_finalize(struct bpf_fuse_args *fa, int *out,
 			struct inode *dir, struct dentry *entry);
 
+int fuse_rename2_initialize_in(struct bpf_fuse_args *fa, struct fuse_rename2_in *fri,
+			       struct inode *olddir, struct dentry *oldent,
+			       struct inode *newdir, struct dentry *newent,
+			       unsigned int flags);
+int fuse_rename2_initialize_out(struct bpf_fuse_args *fa, struct fuse_rename2_in *fri,
+				struct inode *olddir, struct dentry *oldent,
+				struct inode *newdir, struct dentry *newent,
+				unsigned int flags);
+int fuse_rename2_backing(struct bpf_fuse_args *fa, int *out,
+			 struct inode *olddir, struct dentry *oldent,
+			 struct inode *newdir, struct dentry *newent,
+			 unsigned int flags);
+int fuse_rename2_finalize(struct bpf_fuse_args *fa, int *out,
+			  struct inode *olddir, struct dentry *oldent,
+			  struct inode *newdir, struct dentry *newent,
+			  unsigned int flags);
+
+int fuse_rename_initialize_in(struct bpf_fuse_args *fa, struct fuse_rename_in *fri,
+			      struct inode *olddir, struct dentry *oldent,
+			      struct inode *newdir, struct dentry *newent);
+int fuse_rename_initialize_out(struct bpf_fuse_args *fa, struct fuse_rename_in *fri,
+			       struct inode *olddir, struct dentry *oldent,
+			       struct inode *newdir, struct dentry *newent);
+int fuse_rename_backing(struct bpf_fuse_args *fa, int *out,
+			struct inode *olddir, struct dentry *oldent,
+			struct inode *newdir, struct dentry *newent);
+int fuse_rename_finalize(struct bpf_fuse_args *fa, int *out,
+			 struct inode *olddir, struct dentry *oldent,
+			 struct inode *newdir, struct dentry *newent);
+
 int fuse_unlink_initialize_in(struct bpf_fuse_args *fa, struct fuse_dummy_io *fmi,
 			      struct inode *dir, struct dentry *entry);
 int fuse_unlink_initialize_out(struct bpf_fuse_args *fa, struct fuse_dummy_io *fmi,
-- 
2.37.3.998.g577e59143f-goog


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 19/26] fuse-bpf: Add attr support
  2022-09-26 23:17 [PATCH 00/26] FUSE BPF: A Stacked Filesystem Extension for FUSE Daniel Rosenberg
                   ` (17 preceding siblings ...)
  2022-09-26 23:18 ` [PATCH 18/26] fuse-bpf: Add Rename support Daniel Rosenberg
@ 2022-09-26 23:18 ` Daniel Rosenberg
  2022-09-26 23:18 ` [PATCH 20/26] fuse-bpf: Add support for FUSE_COPY_FILE_RANGE Daniel Rosenberg
                   ` (6 subsequent siblings)
  25 siblings, 0 replies; 38+ messages in thread
From: Daniel Rosenberg @ 2022-09-26 23:18 UTC (permalink / raw)
  To: Miklos Szeredi, Alexei Starovoitov, Daniel Borkmann,
	John Fastabend
  Cc: Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Daniel Rosenberg, Paul Lawrence, Alessio Balsini, David Anderson,
	Sandeep Patil, linux-fsdevel, bpf, kernel-team

Signed-off-by: Daniel Rosenberg <drosen@google.com>
Signed-off-by: Paul Lawrence <paullawrence@google.com>
---
 fs/fuse/backing.c | 264 ++++++++++++++++++++++++++++++++++++++++++++++
 fs/fuse/dir.c     |  84 +++++----------
 fs/fuse/fuse_i.h  | 141 +++++++++++++++++++++++++
 fs/fuse/inode.c   |  22 ++--
 4 files changed, 441 insertions(+), 70 deletions(-)

diff --git a/fs/fuse/backing.c b/fs/fuse/backing.c
index f4ab92dc8099..13075eddeb7e 100644
--- a/fs/fuse/backing.c
+++ b/fs/fuse/backing.c
@@ -1626,6 +1626,270 @@ int fuse_unlink_finalize(struct bpf_fuse_args *fa, int *out,
 	return 0;
 }
 
+int fuse_getattr_initialize_in(struct bpf_fuse_args *fa, struct fuse_getattr_io *fgio,
+			       const struct dentry *entry, struct kstat *stat,
+			       u32 request_mask, unsigned int flags)
+{
+	fgio->fgi = (struct fuse_getattr_in) {
+		.getattr_flags = flags,
+		.fh = -1, /* TODO is this OK? */
+	};
+
+	*fa = (struct bpf_fuse_args) {
+		.nodeid = get_node_id(entry->d_inode),
+		.opcode = FUSE_GETATTR,
+		.in_numargs = 1,
+		.in_args[0] = (struct bpf_fuse_arg) {
+			.size = sizeof(fgio->fgi),
+			.value = &fgio->fgi,
+		},
+	};
+
+	return 0;
+}
+
+int fuse_getattr_initialize_out(struct bpf_fuse_args *fa, struct fuse_getattr_io *fgio,
+				const struct dentry *entry, struct kstat *stat,
+				u32 request_mask, unsigned int flags)
+{
+	fgio->fao = (struct fuse_attr_out) { 0 };
+
+	fa->out_numargs = 1;
+	fa->out_args[0] = (struct bpf_fuse_arg) {
+		.size = sizeof(fgio->fao),
+		.value = &fgio->fao,
+	};
+
+	return 0;
+}
+
+static void fuse_stat_to_attr(struct fuse_conn *fc, struct inode *inode,
+			      struct kstat *stat, struct fuse_attr *attr)
+{
+	unsigned int blkbits;
+
+	/* see the comment in fuse_change_attributes() */
+	if (fc->writeback_cache && S_ISREG(inode->i_mode)) {
+		stat->size = i_size_read(inode);
+		stat->mtime.tv_sec = inode->i_mtime.tv_sec;
+		stat->mtime.tv_nsec = inode->i_mtime.tv_nsec;
+		stat->ctime.tv_sec = inode->i_ctime.tv_sec;
+		stat->ctime.tv_nsec = inode->i_ctime.tv_nsec;
+	}
+
+	attr->ino = stat->ino;
+	attr->mode = (inode->i_mode & S_IFMT) | (stat->mode & 07777);
+	attr->nlink = stat->nlink;
+	attr->uid = from_kuid(fc->user_ns, stat->uid);
+	attr->gid = from_kgid(fc->user_ns, stat->gid);
+	attr->atime = stat->atime.tv_sec;
+	attr->atimensec = stat->atime.tv_nsec;
+	attr->mtime = stat->mtime.tv_sec;
+	attr->mtimensec = stat->mtime.tv_nsec;
+	attr->ctime = stat->ctime.tv_sec;
+	attr->ctimensec = stat->ctime.tv_nsec;
+	attr->size = stat->size;
+	attr->blocks = stat->blocks;
+
+	if (stat->blksize != 0)
+		blkbits = ilog2(stat->blksize);
+	else
+		blkbits = inode->i_sb->s_blocksize_bits;
+
+	attr->blksize = 1 << blkbits;
+}
+
+int fuse_getattr_backing(struct bpf_fuse_args *fa, int *out,
+			 const struct dentry *entry, struct kstat *stat,
+			 u32 request_mask, unsigned int flags)
+{
+	struct path *backing_path = &get_fuse_dentry(entry)->backing_path;
+	struct inode *backing_inode = backing_path->dentry->d_inode;
+	struct fuse_attr_out *fao = fa->out_args[0].value;
+	struct kstat tmp;
+
+	if (!stat)
+		stat = &tmp;
+
+	*out = vfs_getattr(backing_path, stat, request_mask, flags);
+
+	if (!*out)
+		fuse_stat_to_attr(get_fuse_conn(entry->d_inode), backing_inode,
+				  stat, &fao->attr);
+
+	return 0;
+}
+
+int fuse_getattr_finalize(struct bpf_fuse_args *fa, int *out,
+			  const struct dentry *entry, struct kstat *stat,
+			  u32 request_mask, unsigned int flags)
+{
+	struct fuse_attr_out *outarg = fa->out_args[0].value;
+	struct inode *inode = entry->d_inode;
+	u64 attr_version = fuse_get_attr_version(get_fuse_mount(inode)->fc);
+
+	/* TODO: Ensure this doesn't happen if we had an error getting attrs in
+	 * backing.
+	 */
+	*out = finalize_attr(inode, outarg, attr_version, stat);
+	return 0;
+}
+
+static void fattr_to_iattr(struct fuse_conn *fc,
+			   const struct fuse_setattr_in *arg,
+			   struct iattr *iattr)
+{
+	unsigned int fvalid = arg->valid;
+
+	if (fvalid & FATTR_MODE)
+		iattr->ia_valid |= ATTR_MODE, iattr->ia_mode = arg->mode;
+	if (fvalid & FATTR_UID) {
+		iattr->ia_valid |= ATTR_UID;
+		iattr->ia_uid = make_kuid(fc->user_ns, arg->uid);
+	}
+	if (fvalid & FATTR_GID) {
+		iattr->ia_valid |= ATTR_GID;
+		iattr->ia_gid = make_kgid(fc->user_ns, arg->gid);
+	}
+	if (fvalid & FATTR_SIZE)
+		iattr->ia_valid |= ATTR_SIZE, iattr->ia_size = arg->size;
+	if (fvalid & FATTR_ATIME) {
+		iattr->ia_valid |= ATTR_ATIME;
+		iattr->ia_atime.tv_sec = arg->atime;
+		iattr->ia_atime.tv_nsec = arg->atimensec;
+		if (!(fvalid & FATTR_ATIME_NOW))
+			iattr->ia_valid |= ATTR_ATIME_SET;
+	}
+	if (fvalid & FATTR_MTIME) {
+		iattr->ia_valid |= ATTR_MTIME;
+		iattr->ia_mtime.tv_sec = arg->mtime;
+		iattr->ia_mtime.tv_nsec = arg->mtimensec;
+		if (!(fvalid & FATTR_MTIME_NOW))
+			iattr->ia_valid |= ATTR_MTIME_SET;
+	}
+	if (fvalid & FATTR_CTIME) {
+		iattr->ia_valid |= ATTR_CTIME;
+		iattr->ia_ctime.tv_sec = arg->ctime;
+		iattr->ia_ctime.tv_nsec = arg->ctimensec;
+	}
+}
+
+int fuse_setattr_initialize_in(struct bpf_fuse_args *fa, struct fuse_setattr_io *fsio,
+			       struct dentry *dentry, struct iattr *attr, struct file *file)
+{
+	struct fuse_conn *fc = get_fuse_conn(dentry->d_inode);
+
+	*fsio = (struct fuse_setattr_io) { 0 };
+	iattr_to_fattr(fc, attr, &fsio->fsi, true);
+
+	*fa = (struct bpf_fuse_args) {
+		.opcode = FUSE_SETATTR,
+		.nodeid = get_node_id(dentry->d_inode),
+		.in_numargs = 1,
+		.in_args[0].size = sizeof(fsio->fsi),
+		.in_args[0].value = &fsio->fsi,
+	};
+
+	return 0;
+}
+
+int fuse_setattr_initialize_out(struct bpf_fuse_args *fa, struct fuse_setattr_io *fsio,
+				struct dentry *dentry, struct iattr *attr, struct file *file)
+{
+	fa->out_numargs = 1;
+	fa->out_args[0].size = sizeof(fsio->fao);
+	fa->out_args[0].value = &fsio->fao;
+
+	return 0;
+}
+
+int fuse_setattr_backing(struct bpf_fuse_args *fa, int *out,
+			 struct dentry *dentry, struct iattr *attr, struct file *file)
+{
+	struct fuse_conn *fc = get_fuse_conn(dentry->d_inode);
+	const struct fuse_setattr_in *fsi = fa->in_args[0].value;
+	struct iattr new_attr = { 0 };
+	struct path *backing_path = &get_fuse_dentry(dentry)->backing_path;
+
+	fattr_to_iattr(fc, fsi, &new_attr);
+	/* TODO: Some info doesn't get saved by the attr->fattr->attr transition
+	 * When we actually allow the bpf to change these, we may have to consider
+	 * the extra flags more, or pass more info into the bpf. Until then we can
+	 * keep everything except for ATTR_FILE, since we'd need a file on the
+	 * lower fs. For what it's worth, neither f2fs nor ext4 make use of that
+	 * even if it is present.
+	 */
+	new_attr.ia_valid = attr->ia_valid & ~ATTR_FILE;
+	inode_lock(d_inode(backing_path->dentry));
+	*out = notify_change(&init_user_ns, backing_path->dentry, &new_attr,
+			    NULL);
+	inode_unlock(d_inode(backing_path->dentry));
+
+	if (*out == 0 && (new_attr.ia_valid & ATTR_SIZE))
+		i_size_write(dentry->d_inode, new_attr.ia_size);
+	return 0;
+}
+
+int fuse_setattr_finalize(struct bpf_fuse_args *fa, int *out,
+			  struct dentry *dentry, struct iattr *attr, struct file *file)
+{
+	return 0;
+}
+
+int fuse_statfs_initialize_in(struct bpf_fuse_args *fa, struct fuse_statfs_out *fso,
+			      struct dentry *dentry, struct kstatfs *buf)
+{
+	*fa = (struct bpf_fuse_args) {
+		.nodeid = get_node_id(d_inode(dentry)),
+		.opcode = FUSE_STATFS,
+	};
+
+	return 0;
+}
+
+int fuse_statfs_initialize_out(struct bpf_fuse_args *fa, struct fuse_statfs_out *fso,
+			       struct dentry *dentry, struct kstatfs *buf)
+{
+	*fso = (struct fuse_statfs_out) { 0 };
+
+	fa->out_numargs = 1;
+	fa->out_args[0].size = sizeof(fso);
+	fa->out_args[0].value = fso;
+
+	return 0;
+}
+
+int fuse_statfs_backing(struct bpf_fuse_args *fa, int *out,
+			struct dentry *dentry, struct kstatfs *buf)
+{
+	struct path backing_path;
+	struct fuse_statfs_out *fso = fa->out_args[0].value;
+
+	*out = 0;
+	get_fuse_backing_path(dentry, &backing_path);
+	if (!backing_path.dentry)
+		return -EBADF;
+	*out = vfs_statfs(&backing_path, buf);
+	path_put(&backing_path);
+	buf->f_type = FUSE_SUPER_MAGIC;
+
+	//TODO Provide postfilter opportunity to modify
+	if (!*out)
+		convert_statfs_to_fuse(&fso->st, buf);
+
+	return 0;
+}
+
+int fuse_statfs_finalize(struct bpf_fuse_args *fa, int *out,
+			 struct dentry *dentry, struct kstatfs *buf)
+{
+	struct fuse_statfs_out *fso = fa->out_args[0].value;
+
+	if (!fa->error_in)
+		convert_fuse_statfs(buf, &fso->st);
+	return 0;
+}
+
 int fuse_readdir_initialize_in(struct bpf_fuse_args *fa, struct fuse_read_io *frio,
 			    struct file *file, struct dir_context *ctx,
 			    bool *force_again, bool *allow_force, bool is_continued)
diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
index 7c9d8540668c..af1f715a405d 100644
--- a/fs/fuse/dir.c
+++ b/fs/fuse/dir.c
@@ -1215,7 +1215,7 @@ static int fuse_link(struct dentry *entry, struct inode *newdir,
 	return err;
 }
 
-static void fuse_fillattr(struct inode *inode, struct fuse_attr *attr,
+void fuse_fillattr(struct inode *inode, struct fuse_attr *attr,
 			  struct kstat *stat)
 {
 	unsigned int blkbits;
@@ -1292,6 +1292,7 @@ static int fuse_do_getattr(struct inode *inode, struct kstat *stat,
 }
 
 static int fuse_update_get_attr(struct inode *inode, struct file *file,
+				const struct path *path,
 				struct kstat *stat, u32 request_mask,
 				unsigned int flags)
 {
@@ -1301,6 +1302,14 @@ static int fuse_update_get_attr(struct inode *inode, struct file *file,
 	u32 inval_mask = READ_ONCE(fi->inval_mask);
 	u32 cache_mask = fuse_get_cache_mask(inode);
 
+#ifdef CONFIG_FUSE_BPF
+	if (fuse_bpf_backing(inode, struct fuse_getattr_io, err,
+			       fuse_getattr_initialize_in, fuse_getattr_initialize_out,
+			       fuse_getattr_backing, fuse_getattr_finalize,
+			       path->dentry, stat, request_mask, flags))
+		return err;
+#endif
+
 	if (flags & AT_STATX_FORCE_SYNC)
 		sync = true;
 	else if (flags & AT_STATX_DONT_SYNC)
@@ -1324,7 +1333,7 @@ static int fuse_update_get_attr(struct inode *inode, struct file *file,
 
 int fuse_update_attributes(struct inode *inode, struct file *file, u32 mask)
 {
-	return fuse_update_get_attr(inode, file, NULL, mask, 0);
+	return fuse_update_get_attr(inode, file, &file->f_path, NULL, mask, 0);
 }
 
 int fuse_reverse_inval_entry(struct fuse_conn *fc, u64 parent_nodeid,
@@ -1703,58 +1712,6 @@ static long fuse_dir_compat_ioctl(struct file *file, unsigned int cmd,
 				 FUSE_IOCTL_COMPAT | FUSE_IOCTL_DIR);
 }
 
-static inline bool update_mtime(unsigned int ivalid, bool trust_local_mtime)
-{
-	/* Always update if mtime is explicitly set  */
-	if (ivalid & ATTR_MTIME_SET)
-		return true;
-
-	/* Or if kernel i_mtime is the official one */
-	if (trust_local_mtime)
-		return true;
-
-	/* If it's an open(O_TRUNC) or an ftruncate(), don't update */
-	if ((ivalid & ATTR_SIZE) && (ivalid & (ATTR_OPEN | ATTR_FILE)))
-		return false;
-
-	/* In all other cases update */
-	return true;
-}
-
-static void iattr_to_fattr(struct fuse_conn *fc, struct iattr *iattr,
-			   struct fuse_setattr_in *arg, bool trust_local_cmtime)
-{
-	unsigned ivalid = iattr->ia_valid;
-
-	if (ivalid & ATTR_MODE)
-		arg->valid |= FATTR_MODE,   arg->mode = iattr->ia_mode;
-	if (ivalid & ATTR_UID)
-		arg->valid |= FATTR_UID,    arg->uid = from_kuid(fc->user_ns, iattr->ia_uid);
-	if (ivalid & ATTR_GID)
-		arg->valid |= FATTR_GID,    arg->gid = from_kgid(fc->user_ns, iattr->ia_gid);
-	if (ivalid & ATTR_SIZE)
-		arg->valid |= FATTR_SIZE,   arg->size = iattr->ia_size;
-	if (ivalid & ATTR_ATIME) {
-		arg->valid |= FATTR_ATIME;
-		arg->atime = iattr->ia_atime.tv_sec;
-		arg->atimensec = iattr->ia_atime.tv_nsec;
-		if (!(ivalid & ATTR_ATIME_SET))
-			arg->valid |= FATTR_ATIME_NOW;
-	}
-	if ((ivalid & ATTR_MTIME) && update_mtime(ivalid, trust_local_cmtime)) {
-		arg->valid |= FATTR_MTIME;
-		arg->mtime = iattr->ia_mtime.tv_sec;
-		arg->mtimensec = iattr->ia_mtime.tv_nsec;
-		if (!(ivalid & ATTR_MTIME_SET) && !trust_local_cmtime)
-			arg->valid |= FATTR_MTIME_NOW;
-	}
-	if ((ivalid & ATTR_CTIME) && trust_local_cmtime) {
-		arg->valid |= FATTR_CTIME;
-		arg->ctime = iattr->ia_ctime.tv_sec;
-		arg->ctimensec = iattr->ia_ctime.tv_nsec;
-	}
-}
-
 /*
  * Prevent concurrent writepages on inode
  *
@@ -1869,6 +1826,13 @@ int fuse_do_setattr(struct dentry *dentry, struct iattr *attr,
 	bool trust_local_cmtime = is_wb;
 	bool fault_blocked = false;
 
+#ifdef CONFIG_FUSE_BPF
+	if (fuse_bpf_backing(inode, struct fuse_setattr_io, err,
+			       fuse_setattr_initialize_in, fuse_setattr_initialize_out,
+			       fuse_setattr_backing, fuse_setattr_finalize, dentry, attr, file))
+		return err;
+#endif
+
 	if (!fc->default_permissions)
 		attr->ia_valid |= ATTR_FORCE;
 
@@ -2044,11 +2008,19 @@ static int fuse_setattr(struct user_namespace *mnt_userns, struct dentry *entry,
 		 * This should be done on write(), truncate() and chown().
 		 */
 		if (!fc->handle_killpriv && !fc->handle_killpriv_v2) {
+#ifdef CONFIG_FUSE_BPF
 			/*
 			 * ia_mode calculation may have used stale i_mode.
 			 * Refresh and recalculate.
 			 */
-			ret = fuse_do_getattr(inode, NULL, file);
+			if (!fuse_bpf_backing(inode, struct fuse_getattr_io, ret,
+					       fuse_getattr_initialize_in,
+					       fuse_getattr_initialize_out,
+					       fuse_getattr_backing,
+					       fuse_getattr_finalize,
+					       entry, NULL, 0, 0))
+#endif
+				ret = fuse_do_getattr(inode, NULL, file);
 			if (ret)
 				return ret;
 
@@ -2105,7 +2077,7 @@ static int fuse_getattr(struct user_namespace *mnt_userns,
 		return -EACCES;
 	}
 
-	return fuse_update_get_attr(inode, NULL, stat, request_mask, flags);
+	return fuse_update_get_attr(inode, NULL, path, stat, request_mask, flags);
 }
 
 static const struct inode_operations fuse_dir_inode_operations = {
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index 6c2f75ae9a5a..f8eddcb24137 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -1623,6 +1623,46 @@ int fuse_lookup_finalize(struct bpf_fuse_args *fa, struct dentry **out,
 			 struct inode *dir, struct dentry *entry, unsigned int flags);
 int fuse_revalidate_backing(struct dentry *entry, unsigned int flags);
 
+struct fuse_getattr_io {
+	struct fuse_getattr_in fgi;
+	struct fuse_attr_out fao;
+};
+int fuse_getattr_initialize_in(struct bpf_fuse_args *fa, struct fuse_getattr_io *fgio,
+			       const struct dentry *entry, struct kstat *stat,
+			       u32 request_mask, unsigned int flags);
+int fuse_getattr_initialize_out(struct bpf_fuse_args *fa, struct fuse_getattr_io *fgio,
+				const struct dentry *entry, struct kstat *stat,
+				u32 request_mask, unsigned int flags);
+int fuse_getattr_backing(struct bpf_fuse_args *fa, int *out,
+			 const struct dentry *entry, struct kstat *stat,
+			 u32 request_mask, unsigned int flags);
+int fuse_getattr_finalize(struct bpf_fuse_args *fa, int *out,
+			  const struct dentry *entry, struct kstat *stat,
+			  u32 request_mask, unsigned int flags);
+
+struct fuse_setattr_io {
+	struct fuse_setattr_in fsi;
+	struct fuse_attr_out fao;
+};
+
+int fuse_setattr_initialize_in(struct bpf_fuse_args *fa, struct fuse_setattr_io *fsi,
+			       struct dentry *dentry, struct iattr *attr, struct file *file);
+int fuse_setattr_initialize_out(struct bpf_fuse_args *fa, struct fuse_setattr_io *fsi,
+				struct dentry *dentry, struct iattr *attr, struct file *file);
+int fuse_setattr_backing(struct bpf_fuse_args *fa, int *out,
+			 struct dentry *dentry, struct iattr *attr, struct file *file);
+int fuse_setattr_finalize(struct bpf_fuse_args *fa, int *out,
+			  struct dentry *dentry, struct iattr *attr, struct file *file);
+
+int fuse_statfs_initialize_in(struct bpf_fuse_args *fa, struct fuse_statfs_out *fso,
+			      struct dentry *dentry, struct kstatfs *buf);
+int fuse_statfs_initialize_out(struct bpf_fuse_args *fa, struct fuse_statfs_out *fso,
+			       struct dentry *dentry, struct kstatfs *buf);
+int fuse_statfs_backing(struct bpf_fuse_args *fa, int *out,
+			struct dentry *dentry, struct kstatfs *buf);
+int fuse_statfs_finalize(struct bpf_fuse_args *fa, int *out,
+			 struct dentry *dentry, struct kstatfs *buf);
+
 struct fuse_read_io {
 	struct fuse_read_in fri;
 	struct fuse_read_out fro;
@@ -1675,6 +1715,107 @@ static inline u64 attr_timeout(struct fuse_attr_out *o)
 	return time_to_jiffies(o->attr_valid, o->attr_valid_nsec);
 }
 
+static inline bool update_mtime(unsigned int ivalid, bool trust_local_mtime)
+{
+	/* Always update if mtime is explicitly set  */
+	if (ivalid & ATTR_MTIME_SET)
+		return true;
+
+	/* Or if kernel i_mtime is the official one */
+	if (trust_local_mtime)
+		return true;
+
+	/* If it's an open(O_TRUNC) or an ftruncate(), don't update */
+	if ((ivalid & ATTR_SIZE) && (ivalid & (ATTR_OPEN | ATTR_FILE)))
+		return false;
+
+	/* In all other cases update */
+	return true;
+}
+
+void fuse_fillattr(struct inode *inode, struct fuse_attr *attr,
+			  struct kstat *stat);
+
+static inline void iattr_to_fattr(struct fuse_conn *fc, struct iattr *iattr,
+			   struct fuse_setattr_in *arg, bool trust_local_cmtime)
+{
+	unsigned int ivalid = iattr->ia_valid;
+
+	if (ivalid & ATTR_MODE)
+		arg->valid |= FATTR_MODE,   arg->mode = iattr->ia_mode;
+	if (ivalid & ATTR_UID)
+		arg->valid |= FATTR_UID,    arg->uid = from_kuid(fc->user_ns, iattr->ia_uid);
+	if (ivalid & ATTR_GID)
+		arg->valid |= FATTR_GID,    arg->gid = from_kgid(fc->user_ns, iattr->ia_gid);
+	if (ivalid & ATTR_SIZE)
+		arg->valid |= FATTR_SIZE,   arg->size = iattr->ia_size;
+	if (ivalid & ATTR_ATIME) {
+		arg->valid |= FATTR_ATIME;
+		arg->atime = iattr->ia_atime.tv_sec;
+		arg->atimensec = iattr->ia_atime.tv_nsec;
+		if (!(ivalid & ATTR_ATIME_SET))
+			arg->valid |= FATTR_ATIME_NOW;
+	}
+	if ((ivalid & ATTR_MTIME) && update_mtime(ivalid, trust_local_cmtime)) {
+		arg->valid |= FATTR_MTIME;
+		arg->mtime = iattr->ia_mtime.tv_sec;
+		arg->mtimensec = iattr->ia_mtime.tv_nsec;
+		if (!(ivalid & ATTR_MTIME_SET) && !trust_local_cmtime)
+			arg->valid |= FATTR_MTIME_NOW;
+	}
+	if ((ivalid & ATTR_CTIME) && trust_local_cmtime) {
+		arg->valid |= FATTR_CTIME;
+		arg->ctime = iattr->ia_ctime.tv_sec;
+		arg->ctimensec = iattr->ia_ctime.tv_nsec;
+	}
+}
+
+static inline int finalize_attr(struct inode *inode, struct fuse_attr_out *outarg,
+				u64 attr_version, struct kstat *stat)
+{
+	int err = 0;
+
+	if (fuse_invalid_attr(&outarg->attr) ||
+	    ((inode->i_mode ^ outarg->attr.mode) & S_IFMT)) {
+		fuse_make_bad(inode);
+		err = -EIO;
+	} else {
+		fuse_change_attributes(inode, &outarg->attr,
+				       attr_timeout(outarg),
+				       attr_version);
+		if (stat)
+			fuse_fillattr(inode, &outarg->attr, stat);
+	}
+	return err;
+}
+
+static inline void convert_statfs_to_fuse(struct fuse_kstatfs *attr, struct kstatfs *stbuf)
+{
+	attr->bsize   = stbuf->f_bsize;
+	attr->frsize  = stbuf->f_frsize;
+	attr->blocks  = stbuf->f_blocks;
+	attr->bfree   = stbuf->f_bfree;
+	attr->bavail  = stbuf->f_bavail;
+	attr->files   = stbuf->f_files;
+	attr->ffree   = stbuf->f_ffree;
+	attr->namelen = stbuf->f_namelen;
+	/* fsid is left zero */
+}
+
+static inline void convert_fuse_statfs(struct kstatfs *stbuf, struct fuse_kstatfs *attr)
+{
+	stbuf->f_type    = FUSE_SUPER_MAGIC;
+	stbuf->f_bsize   = attr->bsize;
+	stbuf->f_frsize  = attr->frsize;
+	stbuf->f_blocks  = attr->blocks;
+	stbuf->f_bfree   = attr->bfree;
+	stbuf->f_bavail  = attr->bavail;
+	stbuf->f_files   = attr->files;
+	stbuf->f_ffree   = attr->ffree;
+	stbuf->f_namelen = attr->namelen;
+	/* fsid is left zero */
+}
+
 #ifdef CONFIG_FUSE_BPF
 int __init fuse_bpf_init(void);
 void __exit fuse_bpf_cleanup(void);
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index c96cfcbfd96a..d178c3eb445f 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -620,20 +620,6 @@ static void fuse_send_destroy(struct fuse_mount *fm)
 	}
 }
 
-static void convert_fuse_statfs(struct kstatfs *stbuf, struct fuse_kstatfs *attr)
-{
-	stbuf->f_type    = FUSE_SUPER_MAGIC;
-	stbuf->f_bsize   = attr->bsize;
-	stbuf->f_frsize  = attr->frsize;
-	stbuf->f_blocks  = attr->blocks;
-	stbuf->f_bfree   = attr->bfree;
-	stbuf->f_bavail  = attr->bavail;
-	stbuf->f_files   = attr->files;
-	stbuf->f_ffree   = attr->ffree;
-	stbuf->f_namelen = attr->namelen;
-	/* fsid is left zero */
-}
-
 static int fuse_statfs(struct dentry *dentry, struct kstatfs *buf)
 {
 	struct super_block *sb = dentry->d_sb;
@@ -647,6 +633,14 @@ static int fuse_statfs(struct dentry *dentry, struct kstatfs *buf)
 		return 0;
 	}
 
+#ifdef CONFIG_FUSE_BPF
+	if (fuse_bpf_backing(dentry->d_inode, struct fuse_statfs_out, err,
+			       fuse_statfs_initialize_in, fuse_statfs_initialize_out,
+			       fuse_statfs_backing, fuse_statfs_finalize,
+			       dentry, buf))
+		return err;
+#endif
+
 	memset(&outarg, 0, sizeof(outarg));
 	args.in_numargs = 0;
 	args.opcode = FUSE_STATFS;
-- 
2.37.3.998.g577e59143f-goog


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 20/26] fuse-bpf: Add support for FUSE_COPY_FILE_RANGE
  2022-09-26 23:17 [PATCH 00/26] FUSE BPF: A Stacked Filesystem Extension for FUSE Daniel Rosenberg
                   ` (18 preceding siblings ...)
  2022-09-26 23:18 ` [PATCH 19/26] fuse-bpf: Add attr support Daniel Rosenberg
@ 2022-09-26 23:18 ` Daniel Rosenberg
  2022-09-26 23:18 ` [PATCH 21/26] fuse-bpf: Add xattr support Daniel Rosenberg
                   ` (5 subsequent siblings)
  25 siblings, 0 replies; 38+ messages in thread
From: Daniel Rosenberg @ 2022-09-26 23:18 UTC (permalink / raw)
  To: Miklos Szeredi, Alexei Starovoitov, Daniel Borkmann,
	John Fastabend
  Cc: Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Daniel Rosenberg, Paul Lawrence, Alessio Balsini, David Anderson,
	Sandeep Patil, linux-fsdevel, bpf, kernel-team

Signed-off-by: Daniel Rosenberg <drosen@google.com>
Signed-off-by: Paul Lawrence <paullawrence@google.com>
---
 fs/fuse/backing.c | 68 +++++++++++++++++++++++++++++++++++++++++++++++
 fs/fuse/file.c    | 10 +++++++
 fs/fuse/fuse_i.h  | 24 +++++++++++++++++
 3 files changed, 102 insertions(+)

diff --git a/fs/fuse/backing.c b/fs/fuse/backing.c
index 13075eddeb7e..8fd5cbfdd4fa 100644
--- a/fs/fuse/backing.c
+++ b/fs/fuse/backing.c
@@ -553,6 +553,74 @@ int fuse_lseek_finalize(struct bpf_fuse_args *fa, loff_t *out,
 	return 0;
 }
 
+int fuse_copy_file_range_initialize_in(struct bpf_fuse_args *fa,
+					struct fuse_copy_file_range_io *fcf,
+					struct file *file_in, loff_t pos_in, struct file *file_out,
+					loff_t pos_out, size_t len, unsigned int flags)
+{
+	struct fuse_file *fuse_file_in = file_in->private_data;
+	struct fuse_file *fuse_file_out = file_out->private_data;
+
+	fcf->fci = (struct fuse_copy_file_range_in) {
+		.fh_in = fuse_file_in->fh,
+		.off_in = pos_in,
+		.nodeid_out = fuse_file_out->nodeid,
+		.fh_out = fuse_file_out->fh,
+		.off_out = pos_out,
+		.len = len,
+		.flags = flags,
+	};
+
+	*fa = (struct bpf_fuse_args) {
+		.nodeid = get_node_id(file_in->f_inode),
+		.opcode = FUSE_COPY_FILE_RANGE,
+		.in_numargs = 1,
+		.in_args[0].size = sizeof(fcf->fci),
+		.in_args[0].value = &fcf->fci,
+	};
+
+	return 0;
+}
+
+int fuse_copy_file_range_initialize_out(struct bpf_fuse_args *fa,
+					struct fuse_copy_file_range_io *fcf,
+					struct file *file_in, loff_t pos_in, struct file *file_out,
+					loff_t pos_out, size_t len, unsigned int flags)
+{
+	fa->out_numargs = 1;
+	fa->out_args[0].size = sizeof(fcf->fwo);
+	fa->out_args[0].value = &fcf->fwo;
+
+	return 0;
+}
+
+int fuse_copy_file_range_backing(struct bpf_fuse_args *fa, ssize_t *out, struct file *file_in,
+				 loff_t pos_in, struct file *file_out, loff_t pos_out, size_t len,
+				 unsigned int flags)
+{
+	const struct fuse_copy_file_range_in *fci = fa->in_args[0].value;
+	struct fuse_file *fuse_file_in = file_in->private_data;
+	struct file *backing_file_in = fuse_file_in->backing_file;
+	struct fuse_file *fuse_file_out = file_out->private_data;
+	struct file *backing_file_out = fuse_file_out->backing_file;
+
+	/* TODO: Handle changing of in/out files */
+	if (backing_file_out)
+		*out = vfs_copy_file_range(backing_file_in, fci->off_in, backing_file_out,
+					   fci->off_out, fci->len, fci->flags);
+	else
+		*out = generic_copy_file_range(file_in, pos_in, file_out, pos_out, len,
+					       flags);
+	return 0;
+}
+
+int fuse_copy_file_range_finalize(struct bpf_fuse_args *fa, ssize_t *out, struct file *file_in,
+				  loff_t pos_in, struct file *file_out, loff_t pos_out, size_t len,
+				  unsigned int flags)
+{
+	return 0;
+}
+
 int fuse_fsync_initialize_in(struct bpf_fuse_args *fa, struct fuse_fsync_in *ffi,
 			     struct file *file, loff_t start, loff_t end, int datasync)
 {
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index fc8f8e3a06b3..85aeb6ade085 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -3180,6 +3180,16 @@ static ssize_t __fuse_copy_file_range(struct file *file_in, loff_t pos_in,
 	bool is_unstable = (!fc->writeback_cache) &&
 			   ((pos_out + len) > inode_out->i_size);
 
+#ifdef CONFIG_FUSE_BPF
+	if (fuse_bpf_backing(file_in->f_inode, struct fuse_copy_file_range_io, err,
+			       fuse_copy_file_range_initialize_in,
+			       fuse_copy_file_range_initialize_out,
+			       fuse_copy_file_range_backing,
+			       fuse_copy_file_range_finalize,
+			       file_in, pos_in, file_out, pos_out, len, flags))
+		return err;
+#endif
+
 	if (fc->no_copy_file_range)
 		return -EOPNOTSUPP;
 
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index f8eddcb24137..370fe944387e 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -1541,6 +1541,30 @@ int fuse_lseek_backing(struct bpf_fuse_args *fa, loff_t *out, struct file *file,
 int fuse_lseek_finalize(struct bpf_fuse_args *fa, loff_t *out, struct file *file,
 			loff_t offset, int whence);
 
+struct fuse_copy_file_range_io {
+	struct fuse_copy_file_range_in fci;
+	struct fuse_write_out fwo;
+};
+
+int fuse_copy_file_range_initialize_in(struct bpf_fuse_args *fa,
+				       struct fuse_copy_file_range_io *fcf,
+				       struct file *file_in, loff_t pos_in,
+				       struct file *file_out, loff_t pos_out,
+				       size_t len, unsigned int flags);
+int fuse_copy_file_range_initialize_out(struct bpf_fuse_args *fa,
+					struct fuse_copy_file_range_io *fcf,
+					struct file *file_in, loff_t pos_in,
+					struct file *file_out, loff_t pos_out,
+					size_t len, unsigned int flags);
+int fuse_copy_file_range_backing(struct bpf_fuse_args *fa, ssize_t *out,
+				 struct file *file_in, loff_t pos_in,
+				 struct file *file_out, loff_t pos_out,
+				 size_t len, unsigned int flags);
+int fuse_copy_file_range_finalize(struct bpf_fuse_args *fa, ssize_t *out,
+				  struct file *file_in, loff_t pos_in,
+				  struct file *file_out, loff_t pos_out,
+				  size_t len, unsigned int flags);
+
 int fuse_fsync_initialize_in(struct bpf_fuse_args *fa, struct fuse_fsync_in *ffi,
 			     struct file *file, loff_t start, loff_t end, int datasync);
 int fuse_fsync_initialize_out(struct bpf_fuse_args *fa, struct fuse_fsync_in *ffi,
-- 
2.37.3.998.g577e59143f-goog


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 21/26] fuse-bpf: Add xattr support
  2022-09-26 23:17 [PATCH 00/26] FUSE BPF: A Stacked Filesystem Extension for FUSE Daniel Rosenberg
                   ` (19 preceding siblings ...)
  2022-09-26 23:18 ` [PATCH 20/26] fuse-bpf: Add support for FUSE_COPY_FILE_RANGE Daniel Rosenberg
@ 2022-09-26 23:18 ` Daniel Rosenberg
  2022-09-26 23:18 ` [PATCH 22/26] fuse-bpf: Add symlink/link support Daniel Rosenberg
                   ` (4 subsequent siblings)
  25 siblings, 0 replies; 38+ messages in thread
From: Daniel Rosenberg @ 2022-09-26 23:18 UTC (permalink / raw)
  To: Miklos Szeredi, Alexei Starovoitov, Daniel Borkmann,
	John Fastabend
  Cc: Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Daniel Rosenberg, Paul Lawrence, Alessio Balsini, David Anderson,
	Sandeep Patil, linux-fsdevel, bpf, kernel-team

Signed-off-by: Daniel Rosenberg <drosen@google.com>
Signed-off-by: Paul Lawrence <paullawrence@google.com>
---
 fs/fuse/backing.c | 257 ++++++++++++++++++++++++++++++++++++++++++++++
 fs/fuse/fuse_i.h  |  55 ++++++++++
 fs/fuse/xattr.c   |  36 +++++++
 3 files changed, 348 insertions(+)

diff --git a/fs/fuse/backing.c b/fs/fuse/backing.c
index 8fd5cbfdd4fa..d8c86234f253 100644
--- a/fs/fuse/backing.c
+++ b/fs/fuse/backing.c
@@ -695,6 +695,263 @@ int fuse_dir_fsync_initialize_out(struct bpf_fuse_args *fa, struct fuse_fsync_in
 	return 0;
 }
 
+int fuse_getxattr_initialize_in(struct bpf_fuse_args *fa,
+				struct fuse_getxattr_io *fgio,
+				struct dentry *dentry, const char *name, void *value,
+				size_t size)
+{
+	*fgio = (struct fuse_getxattr_io) {
+		.fgi.size = size,
+	};
+
+	*fa = (struct bpf_fuse_args) {
+		.nodeid = get_fuse_inode(dentry->d_inode)->nodeid,
+		.opcode = FUSE_GETXATTR,
+		.in_numargs = 2,
+		.in_args[0] = (struct bpf_fuse_arg) {
+			.size = sizeof(fgio->fgi),
+			.value = &fgio->fgi,
+		},
+		.in_args[1] = (struct bpf_fuse_arg) {
+			.size = strlen(name) + 1,
+			.max_size = XATTR_NAME_MAX + 1,
+			.flags = BPF_FUSE_MUST_ALLOCATE | BPF_FUSE_VARIABLE_SIZE,
+			.value =  (void *) name,
+		},
+	};
+
+	return 0;
+}
+
+int fuse_getxattr_initialize_out(struct bpf_fuse_args *fa,
+				 struct fuse_getxattr_io *fgio,
+				 struct dentry *dentry, const char *name, void *value,
+				 size_t size)
+{
+	fa->flags = size ? FUSE_BPF_OUT_ARGVAR : 0;
+	fa->out_numargs = 1;
+	if (size) {
+		fa->out_args[0].size = size;
+		fa->out_args[0].max_size = size;
+		fa->out_args[0].flags = BPF_FUSE_VARIABLE_SIZE;
+		fa->out_args[0].value = value;
+	} else {
+		fa->out_args[0].size = sizeof(fgio->fgo);
+		fa->out_args[0].value = &fgio->fgo;
+	}
+	return 0;
+}
+
+int fuse_getxattr_backing(struct bpf_fuse_args *fa, int *out,
+			  struct dentry *dentry, const char *name, void *value,
+			  size_t size)
+{
+	ssize_t ret = vfs_getxattr(&init_user_ns,
+				   get_fuse_dentry(dentry)->backing_path.dentry,
+				   fa->in_args[1].value, value, size);
+
+	if (fa->flags & FUSE_BPF_OUT_ARGVAR)
+		fa->out_args[0].size = ret;
+	else
+		((struct fuse_getxattr_out *)fa->out_args[0].value)->size = ret;
+
+	return 0;
+}
+
+int fuse_getxattr_finalize(struct bpf_fuse_args *fa, int *out,
+			   struct dentry *dentry, const char *name, void *value,
+			   size_t size)
+{
+	struct fuse_getxattr_out *fgo;
+
+	if (fa->flags & FUSE_BPF_OUT_ARGVAR) {
+		*out = fa->out_args[0].size;
+		return 0;
+	}
+
+	fgo = fa->out_args[0].value;
+
+	*out = fgo->size;
+	return 0;
+}
+
+int fuse_listxattr_initialize_in(struct bpf_fuse_args *fa,
+				 struct fuse_getxattr_io *fgio,
+				 struct dentry *dentry, char *list, size_t size)
+{
+	*fgio = (struct fuse_getxattr_io) {
+		.fgi.size = size,
+	};
+
+	*fa = (struct bpf_fuse_args) {
+		.nodeid = get_fuse_inode(dentry->d_inode)->nodeid,
+		.opcode = FUSE_LISTXATTR,
+		.in_numargs = 1,
+		.in_args[0] =
+			(struct bpf_fuse_arg) {
+				.size = sizeof(fgio->fgi),
+				.value = &fgio->fgi,
+			},
+	};
+
+	return 0;
+}
+
+int fuse_listxattr_initialize_out(struct bpf_fuse_args *fa,
+				  struct fuse_getxattr_io *fgio,
+				  struct dentry *dentry, char *list, size_t size)
+{
+	fa->out_numargs = 1;
+
+	if (size) {
+		fa->flags = FUSE_BPF_OUT_ARGVAR;
+		fa->out_args[0].size = size;
+		fa->out_args[0].max_size = size;
+		fa->out_args[0].flags = BPF_FUSE_VARIABLE_SIZE;
+		fa->out_args[0].value = (void *)list;
+	} else {
+		fa->out_args[0].size = sizeof(fgio->fgo);
+		fa->out_args[0].value = &fgio->fgo;
+	}
+	return 0;
+}
+
+int fuse_listxattr_backing(struct bpf_fuse_args *fa, ssize_t *out, struct dentry *dentry,
+			   char *list, size_t size)
+{
+	*out = vfs_listxattr(get_fuse_dentry(dentry)->backing_path.dentry, list, size);
+
+	if (*out < 0)
+		return *out;
+
+	if (fa->flags & FUSE_BPF_OUT_ARGVAR)
+		fa->out_args[0].size = *out;
+	else
+		((struct fuse_getxattr_out *)fa->out_args[0].value)->size = *out;
+
+	return 0;
+}
+
+int fuse_listxattr_finalize(struct bpf_fuse_args *fa, ssize_t *out, struct dentry *dentry,
+			    char *list, size_t size)
+{
+	struct fuse_getxattr_out *fgo;
+
+	if (fa->error_in)
+		return 0;
+
+	if (fa->flags & FUSE_BPF_OUT_ARGVAR) {
+		*out = fa->out_args[0].size;
+		return 0;
+	}
+
+	fgo = fa->out_args[0].value;
+	*out = fgo->size;
+	return 0;
+}
+
+int fuse_setxattr_initialize_in(struct bpf_fuse_args *fa,
+				struct fuse_setxattr_in *fsxi,
+				struct dentry *dentry, const char *name,
+				const void *value, size_t size, int flags)
+{
+	*fsxi = (struct fuse_setxattr_in) {
+		.size = size,
+		.flags = flags,
+	};
+
+	*fa = (struct bpf_fuse_args) {
+		.nodeid = get_fuse_inode(dentry->d_inode)->nodeid,
+		.opcode = FUSE_SETXATTR,
+		.in_numargs = 3,
+		.in_args[0] = (struct bpf_fuse_arg) {
+			.size = sizeof(*fsxi),
+			.value = fsxi,
+		},
+		.in_args[1] = (struct bpf_fuse_arg) {
+			.size = strlen(name) + 1,
+			.max_size = XATTR_NAME_MAX + 1,
+			.flags = BPF_FUSE_VARIABLE_SIZE | BPF_FUSE_MUST_ALLOCATE,
+			.value =  (void *) name,
+		},
+		.in_args[2] = (struct bpf_fuse_arg) {
+			.size = size,
+			.max_size = XATTR_SIZE_MAX,
+			.flags = BPF_FUSE_VARIABLE_SIZE | BPF_FUSE_MUST_ALLOCATE,
+			.value = (void *) value,
+		},
+	};
+
+	return 0;
+}
+
+int fuse_setxattr_initialize_out(struct bpf_fuse_args *fa,
+				 struct fuse_setxattr_in *fsxi,
+				 struct dentry *dentry, const char *name,
+				 const void *value, size_t size, int flags)
+{
+	return 0;
+}
+
+int fuse_setxattr_backing(struct bpf_fuse_args *fa, int *out, struct dentry *dentry,
+			  const char *name, const void *value, size_t size,
+			  int flags)
+{
+	*out = vfs_setxattr(&init_user_ns,
+			    get_fuse_dentry(dentry)->backing_path.dentry, name,
+			    (void *) value, size, flags);
+	return 0;
+}
+
+int fuse_setxattr_finalize(struct bpf_fuse_args *fa, int *out, struct dentry *dentry,
+			   const char *name, const void *value, size_t size,
+			   int flags)
+{
+	return 0;
+}
+
+int fuse_removexattr_initialize_in(struct bpf_fuse_args *fa,
+				   struct fuse_dummy_io *unused,
+				   struct dentry *dentry, const char *name)
+{
+	*fa = (struct bpf_fuse_args) {
+		.nodeid = get_fuse_inode(dentry->d_inode)->nodeid,
+		.opcode = FUSE_REMOVEXATTR,
+		.in_numargs = 1,
+		.in_args[0] = (struct bpf_fuse_arg) {
+			.size = strlen(name) + 1,
+			.max_size = XATTR_NAME_MAX + 1,
+			.flags = BPF_FUSE_VARIABLE_SIZE | BPF_FUSE_MUST_ALLOCATE,
+			.value =  (void *) name,
+		},
+	};
+
+	return 0;
+}
+
+int fuse_removexattr_initialize_out(struct bpf_fuse_args *fa,
+				    struct fuse_dummy_io *unused,
+				    struct dentry *dentry, const char *name)
+{
+	return 0;
+}
+
+int fuse_removexattr_backing(struct bpf_fuse_args *fa, int *out,
+			     struct dentry *dentry, const char *name)
+{
+	struct path *backing_path = &get_fuse_dentry(dentry)->backing_path;
+
+	/* TODO account for changes of the name by prefilter */
+	*out = vfs_removexattr(&init_user_ns, backing_path->dentry, name);
+	return 0;
+}
+
+int fuse_removexattr_finalize(struct bpf_fuse_args *fa, int *out,
+			      struct dentry *dentry, const char *name)
+{
+	return 0;
+}
+
 static inline void fuse_bpf_aio_put(struct fuse_bpf_aio_req *aio_req)
 {
 	if (refcount_dec_and_test(&aio_req->ref))
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index 370fe944387e..b313a45c7774 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -1578,6 +1578,61 @@ int fuse_dir_fsync_initialize_in(struct bpf_fuse_args *fa, struct fuse_fsync_in
 int fuse_dir_fsync_initialize_out(struct bpf_fuse_args *fa, struct fuse_fsync_in *ffi,
 				  struct file *file, loff_t start, loff_t end, int datasync);
 
+struct fuse_getxattr_io {
+	struct fuse_getxattr_in fgi;
+	struct fuse_getxattr_out fgo;
+};
+
+int fuse_getxattr_initialize_in(struct bpf_fuse_args *fa, struct fuse_getxattr_io *fgio,
+				struct dentry *dentry, const char *name, void *value,
+				size_t size);
+int fuse_getxattr_initialize_out(struct bpf_fuse_args *fa, struct fuse_getxattr_io *fgio,
+				 struct dentry *dentry, const char *name, void *value,
+				 size_t size);
+int fuse_getxattr_backing(struct bpf_fuse_args *fa, int *out,
+			  struct dentry *dentry, const char *name, void *value,
+			  size_t size);
+int fuse_getxattr_finalize(struct bpf_fuse_args *fa, int *out,
+			   struct dentry *dentry, const char *name, void *value,
+			   size_t size);
+
+int fuse_listxattr_initialize_in(struct bpf_fuse_args *fa,
+				 struct fuse_getxattr_io *fgio,
+				 struct dentry *dentry, char *list, size_t size);
+int fuse_listxattr_initialize_out(struct bpf_fuse_args *fa,
+				  struct fuse_getxattr_io *fgio,
+				  struct dentry *dentry, char *list, size_t size);
+int fuse_listxattr_backing(struct bpf_fuse_args *fa, ssize_t *out, struct dentry *dentry,
+			   char *list, size_t size);
+int fuse_listxattr_finalize(struct bpf_fuse_args *fa, ssize_t *out, struct dentry *dentry,
+			    char *list, size_t size);
+
+int fuse_setxattr_initialize_in(struct bpf_fuse_args *fa,
+				struct fuse_setxattr_in *fsxi,
+				struct dentry *dentry, const char *name,
+				const void *value, size_t size, int flags);
+int fuse_setxattr_initialize_out(struct bpf_fuse_args *fa,
+				 struct fuse_setxattr_in *fsxi,
+				 struct dentry *dentry, const char *name,
+				 const void *value, size_t size, int flags);
+int fuse_setxattr_backing(struct bpf_fuse_args *fa, int *out, struct dentry *dentry,
+			  const char *name, const void *value, size_t size,
+			  int flags);
+int fuse_setxattr_finalize(struct bpf_fuse_args *fa, int *out, struct dentry *dentry,
+			   const char *name, const void *value, size_t size,
+			   int flags);
+
+int fuse_removexattr_initialize_in(struct bpf_fuse_args *fa,
+				   struct fuse_dummy_io *unused,
+				   struct dentry *dentry, const char *name);
+int fuse_removexattr_initialize_out(struct bpf_fuse_args *fa,
+				    struct fuse_dummy_io *unused,
+				    struct dentry *dentry, const char *name);
+int fuse_removexattr_backing(struct bpf_fuse_args *fa, int *out,
+			     struct dentry *dentry, const char *name);
+int fuse_removexattr_finalize(struct bpf_fuse_args *fa, int *out,
+			      struct dentry *dentry, const char *name);
+
 struct fuse_read_iter_out {
 	uint64_t ret;
 };
diff --git a/fs/fuse/xattr.c b/fs/fuse/xattr.c
index 0d3e7177fce0..96728bd907ce 100644
--- a/fs/fuse/xattr.c
+++ b/fs/fuse/xattr.c
@@ -115,6 +115,14 @@ ssize_t fuse_listxattr(struct dentry *entry, char *list, size_t size)
 	struct fuse_getxattr_out outarg;
 	ssize_t ret;
 
+#ifdef CONFIG_FUSE_BPF
+	if (fuse_bpf_backing(inode, struct fuse_getxattr_io, ret,
+			       fuse_listxattr_initialize_in, fuse_listxattr_initialize_out,
+			       fuse_listxattr_backing, fuse_listxattr_finalize,
+			       entry, list, size))
+		return ret;
+#endif
+
 	if (fuse_is_bad(inode))
 		return -EIO;
 
@@ -182,6 +190,16 @@ static int fuse_xattr_get(const struct xattr_handler *handler,
 			 struct dentry *dentry, struct inode *inode,
 			 const char *name, void *value, size_t size)
 {
+#ifdef CONFIG_FUSE_BPF
+	int err;
+
+	if (fuse_bpf_backing(inode, struct fuse_getxattr_io, err,
+			       fuse_getxattr_initialize_in, fuse_getxattr_initialize_out,
+			       fuse_getxattr_backing, fuse_getxattr_finalize,
+			       dentry, name, value, size))
+		return err;
+#endif
+
 	if (fuse_is_bad(inode))
 		return -EIO;
 
@@ -194,6 +212,24 @@ static int fuse_xattr_set(const struct xattr_handler *handler,
 			  const char *name, const void *value, size_t size,
 			  int flags)
 {
+#ifdef CONFIG_FUSE_BPF
+	int err;
+	bool handled;
+
+	if (value)
+		handled = fuse_bpf_backing(inode, struct fuse_setxattr_in, err,
+			       fuse_setxattr_initialize_in, fuse_setxattr_initialize_out,
+			       fuse_setxattr_backing, fuse_setxattr_finalize,
+			       dentry, name, value, size, flags);
+	else
+		handled = fuse_bpf_backing(inode, struct fuse_dummy_io, err,
+			       fuse_removexattr_initialize_in, fuse_removexattr_initialize_out,
+			       fuse_removexattr_backing, fuse_removexattr_finalize,
+			       dentry, name);
+	if (handled)
+		return err;
+#endif
+
 	if (fuse_is_bad(inode))
 		return -EIO;
 
-- 
2.37.3.998.g577e59143f-goog


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 22/26] fuse-bpf: Add symlink/link support
  2022-09-26 23:17 [PATCH 00/26] FUSE BPF: A Stacked Filesystem Extension for FUSE Daniel Rosenberg
                   ` (20 preceding siblings ...)
  2022-09-26 23:18 ` [PATCH 21/26] fuse-bpf: Add xattr support Daniel Rosenberg
@ 2022-09-26 23:18 ` Daniel Rosenberg
  2022-09-26 23:18 ` [PATCH 23/26] fuse-bpf: allow mounting with no userspace daemon Daniel Rosenberg
                   ` (3 subsequent siblings)
  25 siblings, 0 replies; 38+ messages in thread
From: Daniel Rosenberg @ 2022-09-26 23:18 UTC (permalink / raw)
  To: Miklos Szeredi, Alexei Starovoitov, Daniel Borkmann,
	John Fastabend
  Cc: Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Daniel Rosenberg, Paul Lawrence, Alessio Balsini, David Anderson,
	Sandeep Patil, linux-fsdevel, bpf, kernel-team

Signed-off-by: Daniel Rosenberg <drosen@google.com>
Signed-off-by: Paul Lawrence <paullawrence@google.com>
---
 fs/fuse/backing.c | 251 ++++++++++++++++++++++++++++++++++++++++++++++
 fs/fuse/dir.c     |  31 ++++++
 fs/fuse/fuse_i.h  |  33 ++++++
 3 files changed, 315 insertions(+)

diff --git a/fs/fuse/backing.c b/fs/fuse/backing.c
index d8c86234f253..485b6f1e8503 100644
--- a/fs/fuse/backing.c
+++ b/fs/fuse/backing.c
@@ -1951,6 +1951,97 @@ int fuse_unlink_finalize(struct bpf_fuse_args *fa, int *out,
 	return 0;
 }
 
+int fuse_link_initialize_in(struct bpf_fuse_args *fa, struct fuse_link_in *fli,
+			    struct dentry *entry, struct inode *dir,
+			    struct dentry *newent)
+{
+	struct inode *src_inode = entry->d_inode;
+
+	*fli = (struct fuse_link_in) {
+		.oldnodeid = get_node_id(src_inode),
+	};
+
+	fa->opcode = FUSE_LINK;
+	fa->in_numargs = 2;
+	fa->in_args[0].size = sizeof(*fli);
+	fa->in_args[0].value = fli;
+	fa->in_args[1].size = newent->d_name.len + 1;
+	fa->in_args[1].max_size = NAME_MAX + 1;
+	fa->in_args[1].value = (void *) newent->d_name.name;
+	fa->in_args[1].flags = BPF_FUSE_VARIABLE_SIZE | BPF_FUSE_MUST_ALLOCATE;
+
+	return 0;
+}
+
+int fuse_link_initialize_out(struct bpf_fuse_args *fa, struct fuse_link_in *fli,
+			     struct dentry *entry, struct inode *dir,
+			     struct dentry *newent)
+{
+	return 0;
+}
+
+int fuse_link_backing(struct bpf_fuse_args *fa, int *out, struct dentry *entry,
+		      struct inode *dir, struct dentry *newent)
+{
+	struct path backing_old_path;
+	struct path backing_new_path;
+	struct dentry *backing_dir_dentry;
+	struct inode *fuse_new_inode = NULL;
+	struct fuse_inode *fuse_dir_inode = get_fuse_inode(dir);
+	struct inode *backing_dir_inode = fuse_dir_inode->backing_inode;
+
+	*out = 0;
+	get_fuse_backing_path(entry, &backing_old_path);
+	if (!backing_old_path.dentry)
+		return -EBADF;
+
+	get_fuse_backing_path(newent, &backing_new_path);
+	if (!backing_new_path.dentry) {
+		*out = -EBADF;
+		goto err_dst_path;
+	}
+
+	backing_dir_dentry = dget_parent(backing_new_path.dentry);
+	backing_dir_inode = d_inode(backing_dir_dentry);
+
+	inode_lock_nested(backing_dir_inode, I_MUTEX_PARENT);
+	*out = vfs_link(backing_old_path.dentry, &init_user_ns,
+		       backing_dir_inode, backing_new_path.dentry, NULL);
+	inode_unlock(backing_dir_inode);
+	if (*out)
+		goto out;
+
+	if (d_really_is_negative(backing_new_path.dentry) ||
+	    unlikely(d_unhashed(backing_new_path.dentry))) {
+		*out = -EINVAL;
+		/**
+		 * TODO: overlayfs responds to this situation with a
+		 * lookupOneLen. Should we do that too?
+		 */
+		goto out;
+	}
+
+	fuse_new_inode = fuse_iget_backing(dir->i_sb, fuse_dir_inode->nodeid, backing_dir_inode);
+	if (IS_ERR(fuse_new_inode)) {
+		*out = PTR_ERR(fuse_new_inode);
+		goto out;
+	}
+	d_instantiate(newent, fuse_new_inode);
+
+out:
+	dput(backing_dir_dentry);
+	path_put(&backing_new_path);
+err_dst_path:
+	path_put(&backing_old_path);
+	return *out;
+}
+
+int fuse_link_finalize(struct bpf_fuse_args *fa, int *out, struct dentry *entry,
+		       struct inode *dir, struct dentry *newent)
+{
+	return 0;
+}
+
 int fuse_getattr_initialize_in(struct bpf_fuse_args *fa, struct fuse_getattr_io *fgio,
 			       const struct dentry *entry, struct kstat *stat,
 			       u32 request_mask, unsigned int flags)
@@ -2215,6 +2306,166 @@ int fuse_statfs_finalize(struct bpf_fuse_args *fa, int *out,
 	return 0;
 }
 
+int fuse_get_link_initialize_in(struct bpf_fuse_args *fa, struct fuse_dummy_io *unused,
+				struct inode *inode, struct dentry *dentry,
+				struct delayed_call *callback)
+{
+	/*
+	 * TODO
+	 * If we want to handle changing these things, we'll need to copy
+	 * the lower fs's data into our own buffer, and provide our own callback
+	 * to free that buffer.
+	 *
+	 * Pre could change the name we're looking at
+	 * postfilter can change the name we return
+	 *
+	 * We ought to only make that buffer if it's been requested, so leaving
+	 * this unimplemented for the moment
+	 */
+	*fa = (struct bpf_fuse_args) {
+		.opcode = FUSE_READLINK,
+		.nodeid = get_node_id(inode),
+		.in_numargs = 1,
+		.in_args[0] = (struct bpf_fuse_arg) {
+			.size = dentry->d_name.len + 1,
+			.max_size = NAME_MAX + 1,
+			.flags = BPF_FUSE_VARIABLE_SIZE | BPF_FUSE_MUST_ALLOCATE,
+			.value =  (void *) dentry->d_name.name,
+		},
+		/*
+		 * .out_argvar = 1,
+		 * .out_numargs = 1,
+		 * .out_args[0].size = ,
+		 * .out_args[0].value = ,
+		 */
+	};
+
+	return 0;
+}
+
+int fuse_get_link_initialize_out(struct bpf_fuse_args *fa, struct fuse_dummy_io *unused,
+				 struct inode *inode, struct dentry *dentry,
+				 struct delayed_call *callback)
+{
+	/*
+	 * .out_argvar = 1,
+	 * .out_numargs = 1,
+	 * .out_args[0].size = ,
+	 * .out_args[0].value = ,
+	 */
+
+	return 0;
+}
+
+int fuse_get_link_backing(struct bpf_fuse_args *fa, const char **out,
+			  struct inode *inode, struct dentry *dentry,
+			  struct delayed_call *callback)
+{
+	struct path backing_path;
+
+	if (!dentry) {
+		*out = ERR_PTR(-ECHILD);
+		return PTR_ERR(*out);
+	}
+
+	get_fuse_backing_path(dentry, &backing_path);
+	if (!backing_path.dentry) {
+		*out = ERR_PTR(-ECHILD);
+		return PTR_ERR(*out);
+	}
+
+	/*
+	 * TODO: If we want to do our own thing, copy the data and then call the
+	 * callback
+	 */
+	*out = vfs_get_link(backing_path.dentry, callback);
+
+	path_put(&backing_path);
+	return 0;
+}
+
+int fuse_get_link_finalize(struct bpf_fuse_args *fa, const char **out,
+			     struct inode *inode, struct dentry *dentry,
+			     struct delayed_call *callback)
+{
+	return 0;
+}
+
+int fuse_symlink_initialize_in(struct bpf_fuse_args *fa, struct fuse_dummy_io *unused,
+			       struct inode *dir, struct dentry *entry, const char *link, int len)
+{
+	*fa = (struct bpf_fuse_args) {
+		.nodeid = get_node_id(dir),
+		.opcode = FUSE_SYMLINK,
+		.in_numargs = 2,
+		.in_args[0] = (struct bpf_fuse_arg) {
+			.size = entry->d_name.len + 1,
+			.flags = BPF_FUSE_IMMUTABLE,
+			.value =  (void *) entry->d_name.name,
+		},
+		.in_args[1] = (struct bpf_fuse_arg) {
+			.size = len,
+			.max_size = PATH_MAX,
+			.flags = BPF_FUSE_VARIABLE_SIZE | BPF_FUSE_MUST_ALLOCATE,
+			.value = (void *) link,
+		},
+	};
+
+	return 0;
+}
+
+int fuse_symlink_initialize_out(struct bpf_fuse_args *fa, struct fuse_dummy_io *unused,
+				struct inode *dir, struct dentry *entry, const char *link, int len)
+{
+	return 0;
+}
+
+int fuse_symlink_backing(struct bpf_fuse_args *fa, int *out,
+			 struct inode *dir, struct dentry *entry, const char *link, int len)
+{
+	struct fuse_inode *fuse_inode = get_fuse_inode(dir);
+	struct inode *backing_inode = fuse_inode->backing_inode;
+	struct path backing_path;
+	struct inode *inode = NULL;
+
+	*out = 0;
+	//TODO Actually deal with changing the backing entry in symlink
+	get_fuse_backing_path(entry, &backing_path);
+	if (!backing_path.dentry)
+		return -EBADF;
+
+	inode_lock_nested(backing_inode, I_MUTEX_PARENT);
+	*out = vfs_symlink(&init_user_ns, backing_inode, backing_path.dentry,
+			  link);
+	inode_unlock(backing_inode);
+	if (*out)
+		goto out;
+	if (d_really_is_negative(backing_path.dentry) ||
+	    unlikely(d_unhashed(backing_path.dentry))) {
+		*out = -EINVAL;
+		/**
+		 * TODO: overlayfs responds to this situation with a
+		 * lookupOneLen. Should we do that too?
+		 */
+		goto out;
+	}
+	inode = fuse_iget_backing(dir->i_sb, fuse_inode->nodeid, backing_inode);
+	if (IS_ERR(inode)) {
+		*out = PTR_ERR(inode);
+		goto out;
+	}
+	d_instantiate(entry, inode);
+out:
+	path_put(&backing_path);
+	return *out;
+}
+
+int  fuse_symlink_finalize(struct bpf_fuse_args *fa, int *out,
+			   struct inode *dir, struct dentry *entry, const char *link, int len)
+{
+	return 0;
+}
+
 int fuse_readdir_initialize_in(struct bpf_fuse_args *fa, struct fuse_read_io *frio,
 			    struct file *file, struct dir_context *ctx,
 			    bool *force_again, bool *allow_force, bool is_continued)
diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
index af1f715a405d..a4fd1cb018be 100644
--- a/fs/fuse/dir.c
+++ b/fs/fuse/dir.c
@@ -967,6 +967,16 @@ static int fuse_symlink(struct user_namespace *mnt_userns, struct inode *dir,
 	unsigned len = strlen(link) + 1;
 	FUSE_ARGS(args);
 
+#ifdef CONFIG_FUSE_BPF
+	int err;
+
+	if (fuse_bpf_backing(dir, struct fuse_dummy_io, err,
+			fuse_symlink_initialize_in, fuse_symlink_initialize_out,
+			fuse_symlink_backing, fuse_symlink_finalize,
+			dir, entry, link, len))
+		return err;
+#endif
+
 	args.opcode = FUSE_SYMLINK;
 	args.in_numargs = 2;
 	args.in_args[0].size = entry->d_name.len + 1;
@@ -1198,6 +1208,14 @@ static int fuse_link(struct dentry *entry, struct inode *newdir,
 	struct fuse_mount *fm = get_fuse_mount(inode);
 	FUSE_ARGS(args);
 
+#ifdef CONFIG_FUSE_BPF
+	if (fuse_bpf_backing(inode, struct fuse_link_in, err,
+				fuse_link_initialize_in, fuse_link_initialize_out,
+				fuse_link_backing, fuse_link_finalize, entry,
+				newdir, newent))
+		return err;
+#endif
+
 	memset(&inarg, 0, sizeof(inarg));
 	inarg.oldnodeid = get_node_id(inode);
 	args.opcode = FUSE_LINK;
@@ -1609,6 +1627,19 @@ static const char *fuse_get_link(struct dentry *dentry, struct inode *inode,
 	if (fuse_is_bad(inode))
 		goto out_err;
 
+#ifdef CONFIG_FUSE_BPF
+	{
+		const char *out = NULL;
+
+		if (fuse_bpf_backing(inode, struct fuse_dummy_io, out,
+				       fuse_get_link_initialize_in, fuse_get_link_initialize_out,
+				       fuse_get_link_backing,
+				       fuse_get_link_finalize,
+				       inode, dentry, callback))
+			return out;
+	}
+#endif
+
 	if (fc->cache_symlinks)
 		return page_get_link(dentry, inode, callback);
 
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index b313a45c7774..cbfd56d669c7 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -1504,6 +1504,17 @@ int fuse_unlink_backing(struct bpf_fuse_args *fa, int *out,
 int fuse_unlink_finalize(struct bpf_fuse_args *fa, int *out,
 			 struct inode *dir, struct dentry *entry);
 
+int fuse_link_initialize_in(struct bpf_fuse_args *fa, struct fuse_link_in *fli,
+			    struct dentry *entry, struct inode *dir,
+			    struct dentry *newent);
+int fuse_link_initialize_out(struct bpf_fuse_args *fa, struct fuse_link_in *fli,
+			     struct dentry *entry, struct inode *dir,
+			     struct dentry *newent);
+int fuse_link_backing(struct bpf_fuse_args *fa, int *out, struct dentry *entry,
+		      struct inode *dir, struct dentry *newent);
+int fuse_link_finalize(struct bpf_fuse_args *fa, int *out, struct dentry *entry,
+		       struct inode *dir, struct dentry *newent);
+
 int fuse_release_initialize_in(struct bpf_fuse_args *fa, struct fuse_release_in *fri,
 			       struct inode *inode, struct file *file);
 int fuse_release_initialize_out(struct bpf_fuse_args *fa, struct fuse_release_in *fri,
@@ -1742,6 +1753,28 @@ int fuse_statfs_backing(struct bpf_fuse_args *fa, int *out,
 int fuse_statfs_finalize(struct bpf_fuse_args *fa, int *out,
 			 struct dentry *dentry, struct kstatfs *buf);
 
+int fuse_get_link_initialize_in(struct bpf_fuse_args *fa, struct fuse_dummy_io *dummy,
+				struct inode *inode, struct dentry *dentry,
+				struct delayed_call *callback);
+int fuse_get_link_initialize_out(struct bpf_fuse_args *fa, struct fuse_dummy_io *dummy,
+				 struct inode *inode, struct dentry *dentry,
+				 struct delayed_call *callback);
+int fuse_get_link_backing(struct bpf_fuse_args *fa, const char **out,
+			  struct inode *inode, struct dentry *dentry,
+			  struct delayed_call *callback);
+int fuse_get_link_finalize(struct bpf_fuse_args *fa, const char **out,
+			   struct inode *inode, struct dentry *dentry,
+			   struct delayed_call *callback);
+
+int fuse_symlink_initialize_in(struct bpf_fuse_args *fa, struct fuse_dummy_io *unused,
+			       struct inode *dir, struct dentry *entry, const char *link, int len);
+int fuse_symlink_initialize_out(struct bpf_fuse_args *fa, struct fuse_dummy_io *unused,
+				struct inode *dir, struct dentry *entry, const char *link, int len);
+int fuse_symlink_backing(struct bpf_fuse_args *fa, int *out,
+			 struct inode *dir, struct dentry *entry, const char *link, int len);
+int fuse_symlink_finalize(struct bpf_fuse_args *fa, int *out,
+			  struct inode *dir, struct dentry *entry, const char *link, int len);
+
 struct fuse_read_io {
 	struct fuse_read_in fri;
 	struct fuse_read_out fro;
-- 
2.37.3.998.g577e59143f-goog


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 23/26] fuse-bpf: allow mounting with no userspace daemon
  2022-09-26 23:17 [PATCH 00/26] FUSE BPF: A Stacked Filesystem Extension for FUSE Daniel Rosenberg
                   ` (21 preceding siblings ...)
  2022-09-26 23:18 ` [PATCH 22/26] fuse-bpf: Add symlink/link support Daniel Rosenberg
@ 2022-09-26 23:18 ` Daniel Rosenberg
  2022-09-26 23:18 ` [PATCH 24/26] fuse-bpf: Call bpf for pre/post filters Daniel Rosenberg
                   ` (2 subsequent siblings)
  25 siblings, 0 replies; 38+ messages in thread
From: Daniel Rosenberg @ 2022-09-26 23:18 UTC (permalink / raw)
  To: Miklos Szeredi, Alexei Starovoitov, Daniel Borkmann,
	John Fastabend
  Cc: Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Daniel Rosenberg, Paul Lawrence, Alessio Balsini, David Anderson,
	Sandeep Patil, linux-fsdevel, bpf, kernel-team

When using fuse-bpf in pure passthrough mode, we don't explicitly need a
userspace daemon. This allows simple testing of the backing operations.

Signed-off-by: Daniel Rosenberg <drosen@google.com>
Signed-off-by: Paul Lawrence <paullawrence@google.com>
---
 fs/fuse/fuse_i.h |  4 ++++
 fs/fuse/inode.c  | 25 +++++++++++++++++++------
 2 files changed, 23 insertions(+), 6 deletions(-)

diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index cbfd56d669c7..6fb5c7a1ff11 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -571,6 +571,7 @@ struct fuse_fs_context {
 	bool no_control:1;
 	bool no_force_umount:1;
 	bool legacy_opts_show:1;
+	bool no_daemon:1;
 	enum fuse_dax_mode dax_mode;
 	unsigned int max_read;
 	unsigned int blksize;
@@ -847,6 +848,9 @@ struct fuse_conn {
 	/* Does the filesystem support per inode DAX? */
 	unsigned int inode_dax:1;
 
+	/** BPF Only, no Daemon running */
+	unsigned int no_daemon:1;
+
 	/** The number of requests waiting for completion */
 	atomic_t num_waiting;
 
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index d178c3eb445f..bc349102ce3b 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -759,6 +759,7 @@ enum {
 	OPT_BLKSIZE,
 	OPT_ROOT_BPF,
 	OPT_ROOT_DIR,
+	OPT_NO_DAEMON,
 	OPT_ERR
 };
 
@@ -775,6 +776,7 @@ static const struct fs_parameter_spec fuse_fs_parameters[] = {
 	fsparam_string	("subtype",		OPT_SUBTYPE),
 	fsparam_u32	("root_bpf",		OPT_ROOT_BPF),
 	fsparam_u32	("root_dir",		OPT_ROOT_DIR),
+	fsparam_flag	("no_daemon",		OPT_NO_DAEMON),
 	{}
 };
 
@@ -873,6 +875,11 @@ static int fuse_parse_param(struct fs_context *fsc, struct fs_parameter *param)
 			return invalfc(fsc, "Unable to open root directory");
 		break;
 
+	case OPT_NO_DAEMON:
+		ctx->no_daemon = true;
+		ctx->fd_present = true;
+		break;
+
 	default:
 		return -EINVAL;
 	}
@@ -1438,7 +1445,7 @@ void fuse_send_init(struct fuse_mount *fm)
 	ia->args.nocreds = true;
 	ia->args.end = process_init_reply;
 
-	if (fuse_simple_background(fm, &ia->args, GFP_KERNEL) != 0)
+	if (unlikely(fm->fc->no_daemon) || fuse_simple_background(fm, &ia->args, GFP_KERNEL) != 0)
 		process_init_reply(fm, &ia->args, -ENOTCONN);
 }
 EXPORT_SYMBOL_GPL(fuse_send_init);
@@ -1720,6 +1727,7 @@ int fuse_fill_super_common(struct super_block *sb, struct fuse_fs_context *ctx)
 	fc->destroy = ctx->destroy;
 	fc->no_control = ctx->no_control;
 	fc->no_force_umount = ctx->no_force_umount;
+	fc->no_daemon = ctx->no_daemon;
 
 	err = -ENOMEM;
 	root = fuse_get_root_inode(sb, ctx->rootmode, ctx->root_bpf,
@@ -1767,7 +1775,7 @@ static int fuse_fill_super(struct super_block *sb, struct fs_context *fsc)
 	struct fuse_fs_context *ctx = fsc->fs_private;
 	int err;
 
-	if (!ctx->file || !ctx->rootmode_present ||
+	if (!!ctx->file == ctx->no_daemon || !ctx->rootmode_present ||
 	    !ctx->user_id_present || !ctx->group_id_present)
 		return -EINVAL;
 
@@ -1775,10 +1783,12 @@ static int fuse_fill_super(struct super_block *sb, struct fs_context *fsc)
 	 * Require mount to happen from the same user namespace which
 	 * opened /dev/fuse to prevent potential attacks.
 	 */
-	if ((ctx->file->f_op != &fuse_dev_operations) ||
-	    (ctx->file->f_cred->user_ns != sb->s_user_ns))
-		return -EINVAL;
-	ctx->fudptr = &ctx->file->private_data;
+	if (ctx->file) {
+		if ((ctx->file->f_op != &fuse_dev_operations) ||
+		    (ctx->file->f_cred->user_ns != sb->s_user_ns))
+			return -EINVAL;
+		ctx->fudptr = &ctx->file->private_data;
+	}
 
 	err = fuse_fill_super_common(sb, ctx);
 	if (err)
@@ -1828,6 +1838,9 @@ static int fuse_get_tree(struct fs_context *fsc)
 
 	fsc->s_fs_info = fm;
 
+	if (ctx->no_daemon)
+		return get_tree_nodev(fsc, fuse_fill_super);;
+
 	if (ctx->fd_present)
 		ctx->file = fget(ctx->fd);
 
-- 
2.37.3.998.g577e59143f-goog


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 24/26] fuse-bpf: Call bpf for pre/post filters
  2022-09-26 23:17 [PATCH 00/26] FUSE BPF: A Stacked Filesystem Extension for FUSE Daniel Rosenberg
                   ` (22 preceding siblings ...)
  2022-09-26 23:18 ` [PATCH 23/26] fuse-bpf: allow mounting with no userspace daemon Daniel Rosenberg
@ 2022-09-26 23:18 ` Daniel Rosenberg
  2022-09-26 23:18 ` [PATCH 25/26] fuse-bpf: Add userspace " Daniel Rosenberg
  2022-09-28  6:41 ` [PATCH 00/26] FUSE BPF: A Stacked Filesystem Extension for FUSE Martin KaFai Lau
  25 siblings, 0 replies; 38+ messages in thread
From: Daniel Rosenberg @ 2022-09-26 23:18 UTC (permalink / raw)
  To: Miklos Szeredi, Alexei Starovoitov, Daniel Borkmann,
	John Fastabend
  Cc: Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Daniel Rosenberg, Paul Lawrence, Alessio Balsini, David Anderson,
	Sandeep Patil, linux-fsdevel, bpf, kernel-team

This allows altering input or output parameters to fuse calls that will
be handled directly by the backing filesystems. BPF programs can signal
whether the entire operation should instead go through regular fuse, or
if a postfilter call is needed.

Signed-off-by: Daniel Rosenberg <drosen@google.com>
Signed-off-by: Paul Lawrence <paullawrence@google.com>
---
 fs/fuse/fuse_i.h | 72 ++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 72 insertions(+)

diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index 6fb5c7a1ff11..07b50be2c6e4 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -1936,6 +1936,46 @@ static inline void convert_fuse_statfs(struct kstatfs *stbuf, struct fuse_kstatf
 int __init fuse_bpf_init(void);
 void __exit fuse_bpf_cleanup(void);
 
+static inline void fuse_bpf_set_in_ends(struct bpf_fuse_args *fa)
+{
+	int i;
+
+	for (i = 0; i < FUSE_MAX_ARGS_IN; i++)
+		fa->in_args[i].end_offset = (void *)
+			((char *)fa->in_args[i].value
+			+ fa->in_args[i].size);
+}
+
+static inline void fuse_bpf_set_in_immutable(struct bpf_fuse_args *fa)
+{
+	int i;
+
+	for (i = 0; i < FUSE_MAX_ARGS_IN; i++)
+		fa->in_args[i].flags |= BPF_FUSE_IMMUTABLE;
+}
+
+static inline void fuse_bpf_set_out_ends(struct bpf_fuse_args *fa)
+{
+	int i;
+
+	for (i = 0; i < FUSE_MAX_ARGS_OUT; i++)
+		fa->out_args[i].end_offset = (void *)
+			((char *)fa->out_args[i].value
+			+ fa->out_args[i].size);
+}
+
+static inline void fuse_bpf_free_alloced(struct bpf_fuse_args *fa)
+{
+	int i;
+
+	for (i = 0; i < FUSE_MAX_ARGS_IN; i++)
+		if (fa->in_args[i].flags & BPF_FUSE_ALLOCATED)
+			kfree(fa->in_args[i].value);
+	for (i = 0; i < FUSE_MAX_ARGS_OUT; i++)
+		if (fa->out_args[i].flags & BPF_FUSE_ALLOCATED)
+			kfree(fa->out_args[i].value);
+}
+
 /*
  * expression statement to wrap the backing filter logic
  * struct inode *inode: inode with bpf and backing inode
@@ -1958,6 +1998,7 @@ void __exit fuse_bpf_cleanup(void);
 	bool initialized = false;					\
 	bool handled = false;						\
 	ssize_t res;							\
+	int bpf_next;							\
 	io feo = { 0 };							\
 	int error = 0;							\
 									\
@@ -1969,17 +2010,47 @@ void __exit fuse_bpf_cleanup(void);
 		error = initialize_in(&fa, &feo, args);			\
 		if (error)						\
 			break;						\
+		fuse_bpf_set_in_ends(&fa);				\
+									\
+		fa.opcode |= FUSE_PREFILTER;				\
+		bpf_next = fuse_inode->bpf ?				\
+			bpf_prog_run(fuse_inode->bpf, &fa) :		\
+			BPF_FUSE_CONTINUE;				\
+		if (bpf_next < 0) {					\
+			error = bpf_next;				\
+			break;						\
+		}							\
+									\
+		fuse_bpf_set_in_immutable(&fa);				\
 									\
 		error = initialize_out(&fa, &feo, args);		\
 		if (error)						\
 			break;						\
+		fuse_bpf_set_out_ends(&fa);				\
 									\
 		initialized = true;					\
+		if (bpf_next == BPF_FUSE_USER) {			\
+			handled = false;				\
+			break;						\
+		}							\
+									\
+		fa.opcode &= ~FUSE_PREFILTER;				\
 									\
 		error = backing(&fa, &out, args);			\
 		if (error < 0)						\
 			fa.error_in = error;				\
 									\
+		if (bpf_next == BPF_FUSE_CONTINUE)			\
+			break;						\
+									\
+		fa.opcode |= FUSE_POSTFILTER;				\
+		if (bpf_next == BPF_FUSE_POSTFILTER)			\
+			bpf_next = bpf_prog_run(fuse_inode->bpf, &fa);	\
+		if (bpf_next < 0) {					\
+			error = bpf_next;				\
+			break;						\
+		}							\
+									\
 	} while (false);						\
 									\
 	if (initialized && handled) {					\
@@ -1987,6 +2058,7 @@ void __exit fuse_bpf_cleanup(void);
 		if (res)						\
 			error = res;					\
 	}								\
+	fuse_bpf_free_alloced(&fa);					\
 									\
 	out = error ? _Generic((out),					\
 			default :					\
-- 
2.37.3.998.g577e59143f-goog


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 25/26] fuse-bpf: Add userspace pre/post filters
  2022-09-26 23:17 [PATCH 00/26] FUSE BPF: A Stacked Filesystem Extension for FUSE Daniel Rosenberg
                   ` (23 preceding siblings ...)
  2022-09-26 23:18 ` [PATCH 24/26] fuse-bpf: Call bpf for pre/post filters Daniel Rosenberg
@ 2022-09-26 23:18 ` Daniel Rosenberg
  2022-09-28  6:41 ` [PATCH 00/26] FUSE BPF: A Stacked Filesystem Extension for FUSE Martin KaFai Lau
  25 siblings, 0 replies; 38+ messages in thread
From: Daniel Rosenberg @ 2022-09-26 23:18 UTC (permalink / raw)
  To: Miklos Szeredi, Alexei Starovoitov, Daniel Borkmann,
	John Fastabend
  Cc: Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Daniel Rosenberg, Paul Lawrence, Alessio Balsini, David Anderson,
	Sandeep Patil, linux-fsdevel, bpf, kernel-team

This allows fuse-bpf to call out to userspace to handle pre and post
filters. Any of the inputs may be changed by the prefilter, so we must
handle up to 3 outputs. For the postfilter, our inputs include the
output arguments, so we must handle up to 5 inputs.

As long as you don't request both pre-filter and post-filter in
userspace, we will end up doing fewer round trips to userspace.

Signed-off-by: Daniel Rosenberg <drosen@google.com>
---
 fs/fuse/backing.c        | 70 ++++++++++++++++++++++++++++++++++++++++
 fs/fuse/dev.c            |  2 ++
 fs/fuse/fuse_i.h         | 42 ++++++++++++++++++++++--
 include/linux/bpf_fuse.h |  1 +
 4 files changed, 113 insertions(+), 2 deletions(-)

diff --git a/fs/fuse/backing.c b/fs/fuse/backing.c
index 485b6f1e8503..7a3b1fdb2c56 100644
--- a/fs/fuse/backing.c
+++ b/fs/fuse/backing.c
@@ -2681,3 +2681,73 @@ void __exit fuse_bpf_cleanup(void)
 {
 	kmem_cache_destroy(fuse_bpf_aio_request_cachep);
 }
+
+static ssize_t fuse_bpf_simple_request(struct fuse_mount *fm, struct bpf_fuse_args *fa,
+				       unsigned short in_numargs, unsigned short out_numargs,
+				       struct bpf_fuse_arg *out_arg_array, bool add_out_to_in)
+{
+	int i;
+	uint32_t max_size;
+	ssize_t res;
+
+	struct fuse_args args = {
+		.nodeid = fa->nodeid,
+		.opcode = fa->opcode,
+		.error_in = fa->error_in,
+		.in_numargs = in_numargs,
+		.out_numargs = out_numargs,
+		.force = !!(fa->flags & FUSE_BPF_FORCE),
+		.out_argvar = !!(fa->flags & FUSE_BPF_OUT_ARGVAR),
+		.is_lookup = !!(fa->flags & FUSE_BPF_IS_LOOKUP),
+	};
+
+	/* Set in args */
+	for (i = 0; i < fa->in_numargs; ++i)
+		args.in_args[i] = (struct fuse_in_arg) {
+			.size = fa->in_args[i].size,
+			.value = fa->in_args[i].value,
+		};
+	if (add_out_to_in) {
+		for (i = 0; i < fa->out_numargs; ++i)
+			args.in_args[fa->in_numargs + i] = (struct fuse_in_arg) {
+				.size = fa->out_args[i].size,
+				.value = fa->out_args[i].value,
+			};
+	}
+
+	/* All out args must be writeable */
+	for (i = 0; i < out_numargs; ++i) {
+		max_size = out_arg_array[i].max_size ?: out_arg_array[i].size;
+		if (!bpf_fuse_get_writeable(&out_arg_array[i], max_size, true))
+			return -ENOMEM;
+	}
+
+	/* Set out args */
+	for (i = 0; i < out_numargs; ++i)
+		args.out_args[i] = (struct fuse_arg) {
+			.size = out_arg_array[i].size,
+			.value = out_arg_array[i].value,
+		};
+
+	res = fuse_simple_request(fm, &args);
+
+	/* update used areas of buffers */
+	for (i = 0; i < out_numargs; ++i)
+		if (out_arg_array[i].flags & BPF_FUSE_VARIABLE_SIZE)
+			out_arg_array[i].size = args.out_args[i].size;
+	fa->ret = args.ret;
+
+	return res;
+}
+
+ssize_t fuse_prefilter_simple_request(struct fuse_mount *fm, struct bpf_fuse_args *fa)
+{
+	return fuse_bpf_simple_request(fm, fa, fa->in_numargs, fa->in_numargs,
+				       fa->in_args, false);
+}
+
+ssize_t fuse_postfilter_simple_request(struct fuse_mount *fm, struct bpf_fuse_args *fa)
+{
+	return fuse_bpf_simple_request(fm, fa, fa->in_numargs + fa->out_numargs, fa->out_numargs,
+				       fa->out_args, true);
+}
diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 626dbbf92874..765bc95bd560 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -520,6 +520,8 @@ ssize_t fuse_simple_request(struct fuse_mount *fm, struct fuse_args *args)
 		BUG_ON(args->out_numargs == 0);
 		ret = args->out_args[args->out_numargs - 1].size;
 	}
+	if (args->is_filter)
+		args->ret = req->out.h.error;
 	fuse_put_request(req);
 
 	return ret;
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index 07b50be2c6e4..a619c6eac6e5 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -305,6 +305,17 @@ struct fuse_page_desc {
 	unsigned int offset;
 };
 
+/* To deal with bpf pre and post filters in userspace calls, we must support
+ * passing the inputs and outputs as inputs, and we must have enough space in
+ * outputs to handle all of the inputs.
+ */
+#define FUSE_EXTENDED_MAX_ARGS_IN (FUSE_MAX_ARGS_IN + FUSE_MAX_ARGS_OUT)
+#if FUSE_MAX_ARGS_IN > FUSE_MAX_ARGS_OUT
+#define FUSE_EXTENDED_MAX_ARGS_OUT FUSE_MAX_ARGS_IN
+#else
+#define FUSE_EXTENDED_MAX_ARGS_OUT FUSE_MAX_ARGS_OUT
+#endif
+
 struct fuse_args {
 	uint64_t nodeid;
 	uint32_t opcode;
@@ -321,9 +332,11 @@ struct fuse_args {
 	bool page_zeroing:1;
 	bool page_replace:1;
 	bool may_block:1;
+	bool is_filter:1;
 	bool is_lookup:1;
-	struct fuse_in_arg in_args[3];
-	struct fuse_arg out_args[2];
+	uint32_t ret;
+	struct fuse_in_arg in_args[FUSE_EXTENDED_MAX_ARGS_IN];
+	struct fuse_arg out_args[FUSE_EXTENDED_MAX_ARGS_OUT];
 	void (*end)(struct fuse_mount *fm, struct fuse_args *args, int error);
 };
 
@@ -1936,6 +1949,9 @@ static inline void convert_fuse_statfs(struct kstatfs *stbuf, struct fuse_kstatf
 int __init fuse_bpf_init(void);
 void __exit fuse_bpf_cleanup(void);
 
+ssize_t fuse_prefilter_simple_request(struct fuse_mount *fm, struct bpf_fuse_args *args);
+ssize_t fuse_postfilter_simple_request(struct fuse_mount *fm, struct bpf_fuse_args *args);
+
 static inline void fuse_bpf_set_in_ends(struct bpf_fuse_args *fa)
 {
 	int i;
@@ -1994,9 +2010,11 @@ static inline void fuse_bpf_free_alloced(struct bpf_fuse_args *fa)
 			 backing, finalize, args...)			\
 ({									\
 	struct fuse_inode *fuse_inode = get_fuse_inode(inode);		\
+	struct fuse_mount *fm = get_fuse_mount(inode);			\
 	struct bpf_fuse_args fa = { 0 };				\
 	bool initialized = false;					\
 	bool handled = false;						\
+	bool locked;							\
 	ssize_t res;							\
 	int bpf_next;							\
 	io feo = { 0 };							\
@@ -2021,6 +2039,16 @@ static inline void fuse_bpf_free_alloced(struct bpf_fuse_args *fa)
 			break;						\
 		}							\
 									\
+		if (bpf_next == BPF_FUSE_USER_PREFILTER) {		\
+			locked = fuse_lock_inode(inode);		\
+			res = fuse_prefilter_simple_request(fm, &fa);	\
+			fuse_unlock_inode(inode, locked);		\
+			if (res < 0) {					\
+				error = res;				\
+				break;					\
+			}						\
+			bpf_next = fa.ret;				\
+		}							\
 		fuse_bpf_set_in_immutable(&fa);				\
 									\
 		error = initialize_out(&fa, &feo, args);		\
@@ -2051,6 +2079,16 @@ static inline void fuse_bpf_free_alloced(struct bpf_fuse_args *fa)
 			break;						\
 		}							\
 									\
+		if (!(bpf_next == BPF_FUSE_USER_POSTFILTER))		\
+			break;						\
+									\
+		locked = fuse_lock_inode(inode);			\
+		res = fuse_postfilter_simple_request(fm, &fa);		\
+		fuse_unlock_inode(inode, locked);			\
+		if (res < 0) {						\
+			error = res;					\
+			break;						\
+		}							\
 	} while (false);						\
 									\
 	if (initialized && handled) {					\
diff --git a/include/linux/bpf_fuse.h b/include/linux/bpf_fuse.h
index ef5c8fdaffee..2802ca71ddd1 100644
--- a/include/linux/bpf_fuse.h
+++ b/include/linux/bpf_fuse.h
@@ -40,6 +40,7 @@ struct bpf_fuse_args {
 	uint32_t in_numargs;
 	uint32_t out_numargs;
 	uint32_t flags;
+	uint32_t ret;
 	struct bpf_fuse_arg in_args[FUSE_MAX_ARGS_IN];
 	struct bpf_fuse_arg out_args[FUSE_MAX_ARGS_OUT];
 };
-- 
2.37.3.998.g577e59143f-goog


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [PATCH 03/26] fuse-bpf: Update uapi for fuse-bpf
  2022-09-26 23:17 ` [PATCH 03/26] fuse-bpf: Update uapi for fuse-bpf Daniel Rosenberg
@ 2022-09-27 18:19   ` Miklos Szeredi
  2022-09-30 22:02     ` Paul Lawrence
  0 siblings, 1 reply; 38+ messages in thread
From: Miklos Szeredi @ 2022-09-27 18:19 UTC (permalink / raw)
  To: Daniel Rosenberg
  Cc: Alexei Starovoitov, Daniel Borkmann, John Fastabend,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Paul Lawrence,
	Alessio Balsini, David Anderson, Sandeep Patil, linux-fsdevel,
	bpf, kernel-team, Jann Horn, Amir Goldstein

On Tue, 27 Sept 2022 at 01:18, Daniel Rosenberg <drosen@google.com> wrote:

> diff --git a/include/uapi/linux/fuse.h b/include/uapi/linux/fuse.h
> index d6ccee961891..8c80c146e69b 100644
> --- a/include/uapi/linux/fuse.h
> +++ b/include/uapi/linux/fuse.h
> @@ -572,6 +572,17 @@ struct fuse_entry_out {
>         struct fuse_attr attr;
>  };
>
> +#define FUSE_ACTION_KEEP       0
> +#define FUSE_ACTION_REMOVE     1
> +#define FUSE_ACTION_REPLACE    2
> +
> +struct fuse_entry_bpf_out {
> +       uint64_t        backing_action;
> +       uint64_t        backing_fd;

This is a security issue.   See this post from Jann:

https://lore.kernel.org/all/CAG48ez17uXtjCTa7xpa=JWz3iBbNDQTKO2hvn6PAZtfW3kXgcA@mail.gmail.com/

The fuse-passthrough series solved this by pre-registering the
passthrogh fd with an ioctl. Since this requires an expicit syscall on
the server side the attack is thwarted.

It would be nice if this mechanism was agreed between these projects.

BTW, does fuse-bpf provide a superset of fuse-passthrough?  I mean
could fuse-bpf work with a NULL bpf program as a simple passthrough?

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 12/26] fuse-bpf: Add support for fallocate
  2022-09-26 23:18 ` [PATCH 12/26] fuse-bpf: Add support for fallocate Daniel Rosenberg
@ 2022-09-27 22:07   ` Dave Chinner
  2022-09-27 23:36     ` Daniel Rosenberg
  0 siblings, 1 reply; 38+ messages in thread
From: Dave Chinner @ 2022-09-27 22:07 UTC (permalink / raw)
  To: Daniel Rosenberg
  Cc: Miklos Szeredi, Alexei Starovoitov, Daniel Borkmann,
	John Fastabend, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Paul Lawrence, Alessio Balsini, David Anderson, Sandeep Patil,
	linux-fsdevel, bpf, kernel-team

On Mon, Sep 26, 2022 at 04:18:08PM -0700, Daniel Rosenberg wrote:
> Signed-off-by: Daniel Rosenberg <drosen@google.com>
> Signed-off-by: Paul Lawrence <paullawrence@google.com>
> ---
>  fs/fuse/backing.c | 48 +++++++++++++++++++++++++++++++++++++++++++++++
>  fs/fuse/file.c    | 10 ++++++++++
>  fs/fuse/fuse_i.h  | 11 +++++++++++
>  3 files changed, 69 insertions(+)
> 
> diff --git a/fs/fuse/backing.c b/fs/fuse/backing.c
> index 97e92c633cfd..95c60d6d7597 100644
> --- a/fs/fuse/backing.c
> +++ b/fs/fuse/backing.c
> @@ -188,6 +188,54 @@ ssize_t fuse_backing_mmap(struct file *file, struct vm_area_struct *vma)
>  	return ret;
>  }
>  
> +int fuse_file_fallocate_initialize_in(struct bpf_fuse_args *fa,
> +				      struct fuse_fallocate_in *ffi,
> +				      struct file *file, int mode, loff_t offset, loff_t length)
> +{
> +	struct fuse_file *ff = file->private_data;
> +
> +	*ffi = (struct fuse_fallocate_in) {
> +		.fh = ff->fh,
> +		.offset = offset,
> +		.length = length,
> +		.mode = mode,
> +	};
> +
> +	*fa = (struct bpf_fuse_args) {
> +		.opcode = FUSE_FALLOCATE,
> +		.nodeid = ff->nodeid,
> +		.in_numargs = 1,
> +		.in_args[0].size = sizeof(*ffi),
> +		.in_args[0].value = ffi,
> +	};
> +
> +	return 0;
> +}
> +
> +int fuse_file_fallocate_initialize_out(struct bpf_fuse_args *fa,
> +				       struct fuse_fallocate_in *ffi,
> +				       struct file *file, int mode, loff_t offset, loff_t length)
> +{
> +	return 0;
> +}
> +
> +int fuse_file_fallocate_backing(struct bpf_fuse_args *fa, int *out,
> +				struct file *file, int mode, loff_t offset, loff_t length)
> +{
> +	const struct fuse_fallocate_in *ffi = fa->in_args[0].value;
> +	struct fuse_file *ff = file->private_data;
> +
> +	*out = vfs_fallocate(ff->backing_file, ffi->mode, ffi->offset,
> +			     ffi->length);
> +	return 0;
> +}
> +
> +int fuse_file_fallocate_finalize(struct bpf_fuse_args *fa, int *out,
> +				 struct file *file, int mode, loff_t offset, loff_t length)
> +{
> +	return 0;
> +}
> +
>  /*******************************************************************************
>   * Directory operations after here                                             *
>   ******************************************************************************/
> diff --git a/fs/fuse/file.c b/fs/fuse/file.c
> index dd4485261cc7..ef6f6b0b3b59 100644
> --- a/fs/fuse/file.c
> +++ b/fs/fuse/file.c
> @@ -3002,6 +3002,16 @@ static long fuse_file_fallocate(struct file *file, int mode, loff_t offset,
>  
>  	bool block_faults = FUSE_IS_DAX(inode) && lock_inode;
>  
> +#ifdef CONFIG_FUSE_BPF
> +	if (fuse_bpf_backing(inode, struct fuse_fallocate_in, err,
> +			       fuse_file_fallocate_initialize_in,
> +			       fuse_file_fallocate_initialize_out,
> +			       fuse_file_fallocate_backing,
> +			       fuse_file_fallocate_finalize,
> +			       file, mode, offset, length))
> +		return err;
> +#endif

As I browse through this series, I find this pattern unnecessarily
verbose and it exposes way too much of the filtering mechanism to
code that should not have to know anything about it.

Wouldn't it be better to code this as:

	error = fuse_filter_fallocate(file, mode, offset, length);
	if (error < 0)
		return error;


And then make this fuse_bpf_backing() call and all the indirect
functions it uses internal (i.e. static) in fs/fuse/backing.c?

That way the interface in fs/fuse/fuse_i.h can be much cleaner and
handle the #ifdef CONFIG_FUSE_BPF directly by:

#ifdef CONFIG_FUSE_BPF
....
int fuse_filter_fallocate(file, mode, offset, length);
....
#else /* !CONFIG_FUSE_BPF */
....
static inline fuse_filter_fallocate(file, mode, offset, length)
{
	return 0;
}
....
#endif /* CONFIG_FUSE_BPF */

This seems much cleaner to me than exposing fuse_bpf_backing()
boiler plate all over the code...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 12/26] fuse-bpf: Add support for fallocate
  2022-09-27 22:07   ` Dave Chinner
@ 2022-09-27 23:36     ` Daniel Rosenberg
  0 siblings, 0 replies; 38+ messages in thread
From: Daniel Rosenberg @ 2022-09-27 23:36 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Miklos Szeredi, Alexei Starovoitov, Daniel Borkmann,
	John Fastabend, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Paul Lawrence, Alessio Balsini, David Anderson, Sandeep Patil,
	linux-fsdevel, bpf, kernel-team

On Tue, Sep 27, 2022 at 3:07 PM Dave Chinner <david@fromorbit.com> wrote:
>
> As I browse through this series, I find this pattern unnecessarily
> verbose and it exposes way too much of the filtering mechanism to
> code that should not have to know anything about it.
>
> Wouldn't it be better to code this as:
>
>         error = fuse_filter_fallocate(file, mode, offset, length);
>         if (error < 0)
>                 return error;
>
>
> And then make this fuse_bpf_backing() call and all the indirect
> functions it uses internal (i.e. static) in fs/fuse/backing.c?
>
> That way the interface in fs/fuse/fuse_i.h can be much cleaner and
> handle the #ifdef CONFIG_FUSE_BPF directly by:
>
> #ifdef CONFIG_FUSE_BPF
> ....
> int fuse_filter_fallocate(file, mode, offset, length);
> ....
> #else /* !CONFIG_FUSE_BPF */
> ....
> static inline fuse_filter_fallocate(file, mode, offset, length)
> {
>         return 0;
> }
> ....
> #endif /* CONFIG_FUSE_BPF */
>
> This seems much cleaner to me than exposing fuse_bpf_backing()
> boiler plate all over the code...
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com
>
> --
> To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@android.com.
>

Thanks for the suggestion, that'll help clean things up a bit. It's
quite nice to have fresh eyes looking over the code.

-Daniel

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 00/26] FUSE BPF: A Stacked Filesystem Extension for FUSE
  2022-09-26 23:17 [PATCH 00/26] FUSE BPF: A Stacked Filesystem Extension for FUSE Daniel Rosenberg
                   ` (24 preceding siblings ...)
  2022-09-26 23:18 ` [PATCH 25/26] fuse-bpf: Add userspace " Daniel Rosenberg
@ 2022-09-28  6:41 ` Martin KaFai Lau
  2022-09-28 12:31   ` Brian Foster
  2022-10-01  0:05   ` Daniel Rosenberg
  25 siblings, 2 replies; 38+ messages in thread
From: Martin KaFai Lau @ 2022-09-28  6:41 UTC (permalink / raw)
  To: Daniel Rosenberg
  Cc: Andrii Nakryiko, Song Liu, Yonghong Song, KP Singh,
	Stanislav Fomichev, Hao Luo, Jiri Olsa, Paul Lawrence,
	Alessio Balsini, David Anderson, Sandeep Patil, linux-fsdevel,
	bpf, kernel-team, Miklos Szeredi, Alexei Starovoitov,
	Daniel Borkmann, John Fastabend

On 9/26/22 4:17 PM, Daniel Rosenberg wrote:
> These patches extend FUSE to be able to act as a stacked filesystem. This
> allows pure passthrough, where the fuse file system simply reflects the lower
> filesystem, and also allows optional pre and post filtering in BPF and/or the
> userspace daemon as needed. This can dramatically reduce or even eliminate
> transitions to and from userspace.
> 
> Currently, we either set the backing file/bpf at mount time at the root level,
> or at lookup time, via an optional block added at the end of the lookup return
> call. The added lookup block contains an fd for the backing file/folder and bpf
> if necessary, or a signal to clear or inherit the parent values. We're looking
> into two options for extending this to mkdir/mknod/etc, as we currently only
> support setting the backing to a pre-existing file, although naturally you can
> create new ones. When we're doing a lookup for create, we could pass an
> fd for the parent dir and the name of the backing file we're creating. This has
> the benefit of avoiding an additional call to userspace, but requires hanging
> on to some data in a negative dentry where there is no elegant place to store it.
> Another option is adding the same block we added to lookup to the create type
> op codes. This keeps that code more uniform, but means userspace must implement
> that logic in more areas.
> 
> As is, the patches definitely need some work before they're ready. We still
> need to go through and ensure we respect changed filter values/disallow changes
> that don't make sense. We aren't currently calling mnt_want_write for the lower
> calls where appropriate, and we don't have an override_creds layer either. We
> also plan to add to our read/write iter filters to allow for more interesting
> use cases. There are also probably some node id inconsistencies. For nodes that
> will be completely passthrough, we give an id of 0.
> 
> For the BPF verification side, we have currently set things set up in the old
> style, with a new bpf program type and helper functions. From LPC, my
> understanding is that newer bpf additions are done in a new style, so I imagine
> much of that will need to be redone as well, but hopefully these patches get
> across what our needs there are.
> 
> For testing, we've provided the selftest code we have been using. We also have
> a mode to run with no userspace daemon in a pure passthrough mode that I have
> been running xfstests over to get some coverage on the backing operation code.
> I had to modify mounts/unmounts to get that running, along with some other
> small touch ups. The most notable failure I currently see there is in
> generic/126, which I suspect is likely related to override_creds.
> 

Interesting idea.

Some comments on review logistics:
- The set is too long and some of the individual patches are way too long for 
one single patch to review.  Keep in mind that not all of us here are experts in 
both fuse and bpf.  Making it easier to review first will help at the beginning. 
  Some ideas:

   - Only implement a few ops in the initial revision. From quickly browsing the 
set, it is implementing the 'struct file_operations fuse_file_operations'? 
Maybe the first few revisions can start with a few of the ops first.

   - Please make the patches that can be applied to the bpf-next tree cleanly. 
For example, in patch 3, where is 18e2ec5bf453 coming from? I cannot find it in 
bpf-next and linux-next tree.
   - Without applying it to an upstream tree cleanly, in a big set like this, I 
have no idea when bpf_prog_run() is called in patch 24 because the diff context 
is in fuse_bpf_cleanup and apparently it is not where the bpf prog is run.

Some high level comments on the set:
- Instead of adding bpf helpers, you should consider kfunc instead. You can take 
a look at the recent HID patchset v10 or the recent nf conntrack bpf set.

- Instead of expressing as packet data, using the recent dynptr is a better way 
to go for handling a mem blob.

- iiuc, the idea is to allow bpf prog to optionally handle the 'struct 
file_operations' without going back to the user daemon? Have you looked at 
struct_ops which seems to be a better fit here?  If the bpf prog does not know 
how to handle an operation (or file?), it can call fuse_file_llseek (for 
example) as a kfunc to handle the request.

- The test SEC("test_trace") seems mostly a synthetic test for checking 
correctness.  Does it have a test that shows a more real life use case? or I 
have missed things in patch 26?

- Please use the skel to load the program.  It is pretty hard to read the loader 
in patch 26.

- I assume the main objective is for performance by not going back to the user 
daemon?  Do you have performance number?

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 00/26] FUSE BPF: A Stacked Filesystem Extension for FUSE
  2022-09-28  6:41 ` [PATCH 00/26] FUSE BPF: A Stacked Filesystem Extension for FUSE Martin KaFai Lau
@ 2022-09-28 12:31   ` Brian Foster
  2022-10-01  0:47     ` Daniel Rosenberg
  2022-10-01  0:05   ` Daniel Rosenberg
  1 sibling, 1 reply; 38+ messages in thread
From: Brian Foster @ 2022-09-28 12:31 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: Daniel Rosenberg, Andrii Nakryiko, Song Liu, Yonghong Song,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Paul Lawrence,
	Alessio Balsini, David Anderson, Sandeep Patil, linux-fsdevel,
	bpf, kernel-team, Miklos Szeredi, Alexei Starovoitov,
	Daniel Borkmann, John Fastabend

On Tue, Sep 27, 2022 at 11:41:50PM -0700, Martin KaFai Lau wrote:
> On 9/26/22 4:17 PM, Daniel Rosenberg wrote:
> > These patches extend FUSE to be able to act as a stacked filesystem. This
> > allows pure passthrough, where the fuse file system simply reflects the lower
> > filesystem, and also allows optional pre and post filtering in BPF and/or the
> > userspace daemon as needed. This can dramatically reduce or even eliminate
> > transitions to and from userspace.
> > 
> > Currently, we either set the backing file/bpf at mount time at the root level,
> > or at lookup time, via an optional block added at the end of the lookup return
> > call. The added lookup block contains an fd for the backing file/folder and bpf
> > if necessary, or a signal to clear or inherit the parent values. We're looking
> > into two options for extending this to mkdir/mknod/etc, as we currently only
> > support setting the backing to a pre-existing file, although naturally you can
> > create new ones. When we're doing a lookup for create, we could pass an
> > fd for the parent dir and the name of the backing file we're creating. This has
> > the benefit of avoiding an additional call to userspace, but requires hanging
> > on to some data in a negative dentry where there is no elegant place to store it.
> > Another option is adding the same block we added to lookup to the create type
> > op codes. This keeps that code more uniform, but means userspace must implement
> > that logic in more areas.
> > 
> > As is, the patches definitely need some work before they're ready. We still
> > need to go through and ensure we respect changed filter values/disallow changes
> > that don't make sense. We aren't currently calling mnt_want_write for the lower
> > calls where appropriate, and we don't have an override_creds layer either. We
> > also plan to add to our read/write iter filters to allow for more interesting
> > use cases. There are also probably some node id inconsistencies. For nodes that
> > will be completely passthrough, we give an id of 0.
> > 
> > For the BPF verification side, we have currently set things set up in the old
> > style, with a new bpf program type and helper functions. From LPC, my
> > understanding is that newer bpf additions are done in a new style, so I imagine
> > much of that will need to be redone as well, but hopefully these patches get
> > across what our needs there are.
> > 
> > For testing, we've provided the selftest code we have been using. We also have
> > a mode to run with no userspace daemon in a pure passthrough mode that I have
> > been running xfstests over to get some coverage on the backing operation code.
> > I had to modify mounts/unmounts to get that running, along with some other
> > small touch ups. The most notable failure I currently see there is in
> > generic/126, which I suspect is likely related to override_creds.
> > 
> 
> Interesting idea.
> 
> Some comments on review logistics:
> - The set is too long and some of the individual patches are way too long
> for one single patch to review.  Keep in mind that not all of us here are
> experts in both fuse and bpf.  Making it easier to review first will help at
> the beginning.  Some ideas:
> 
>   - Only implement a few ops in the initial revision. From quickly browsing
> the set, it is implementing the 'struct file_operations
> fuse_file_operations'? Maybe the first few revisions can start with a few of
> the ops first.
> 

I had a similar thought when poking through this. A related question I
had is how much of a functional dependency does the core passthrough
mechanism have on bpf? If bpf is optional for filtering purposes and
isn't absolutely necessary to set up a basic form of passthrough, I
think review would be made easier by splitting off those core bits from
the bpf components so each part is easier to review by people who know
them best. For example, introduce all the fuse enhancements, hooks and
cleanups to set up a passthrough to start the series, then plumb in the
bpf filtering magic on top. Hm?

FWIW, if this is an RFC/prototype and you want more efficient review
cycles, another idea to take that a step further could be to start with
read-only support (or maybe even just directory walking?).

BTW if the bpf bits are optional, how might one mount a fuse/no
daemon/passthrough filesystem from userspace? Is that possible with this
series as is?

Something more on the fuse side.. it looks like we introduce a pattern
where bits of generic request completion processing can end up
duplicated between the shortcut (i.e.  _backing()/_finalize()) handlers
and the traditional post request code, because the shortcuts basically
bypass the entire rest of the codepath. For example, something like
create_new_entry() is currently reused for several inode creation
operations. With passthrough mode, it looks like some of that code (i.e.
vfs dentry fixups) is split off from create_new_entry() into each
individual backing mode handler.

It looks like much of the lower level request processing code was
refactored into the fuse_iqueue to support things like virtiofs. Have
you looked into whether that abstraction can be reused or enhanced to
support bpf filtering, direct passthrough calls, etc.? Or perhaps
whether more of the higher level code could be refactored in a similar
way to encourage more reuse and avoid branching off every fs operation
into a special passthrough codepath?

Brian

>   - Please make the patches that can be applied to the bpf-next tree
> cleanly. For example, in patch 3, where is 18e2ec5bf453 coming from? I
> cannot find it in bpf-next and linux-next tree.
>   - Without applying it to an upstream tree cleanly, in a big set like this,
> I have no idea when bpf_prog_run() is called in patch 24 because the diff
> context is in fuse_bpf_cleanup and apparently it is not where the bpf prog
> is run.
> 
> Some high level comments on the set:
> - Instead of adding bpf helpers, you should consider kfunc instead. You can
> take a look at the recent HID patchset v10 or the recent nf conntrack bpf
> set.
> 
> - Instead of expressing as packet data, using the recent dynptr is a better
> way to go for handling a mem blob.
> 
> - iiuc, the idea is to allow bpf prog to optionally handle the 'struct
> file_operations' without going back to the user daemon? Have you looked at
> struct_ops which seems to be a better fit here?  If the bpf prog does not
> know how to handle an operation (or file?), it can call fuse_file_llseek
> (for example) as a kfunc to handle the request.
> 
> - The test SEC("test_trace") seems mostly a synthetic test for checking
> correctness.  Does it have a test that shows a more real life use case? or I
> have missed things in patch 26?
> 
> - Please use the skel to load the program.  It is pretty hard to read the
> loader in patch 26.
> 
> - I assume the main objective is for performance by not going back to the
> user daemon?  Do you have performance number?
> 


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 03/26] fuse-bpf: Update uapi for fuse-bpf
  2022-09-27 18:19   ` Miklos Szeredi
@ 2022-09-30 22:02     ` Paul Lawrence
  2022-10-01  7:47       ` Amir Goldstein
  0 siblings, 1 reply; 38+ messages in thread
From: Paul Lawrence @ 2022-09-30 22:02 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Daniel Rosenberg, Alexei Starovoitov, Daniel Borkmann,
	John Fastabend, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Alessio Balsini, David Anderson, Sandeep Patil, linux-fsdevel,
	bpf, kernel-team, Jann Horn, Amir Goldstein

On Tue, Sep 27, 2022 at 11:19 AM Miklos Szeredi <miklos@szeredi.hu> wrote:
>
> On Tue, 27 Sept 2022 at 01:18, Daniel Rosenberg <drosen@google.com> wrote:
>
> > diff --git a/include/uapi/linux/fuse.h b/include/uapi/linux/fuse.h
> > index d6ccee961891..8c80c146e69b 100644
> > --- a/include/uapi/linux/fuse.h
> > +++ b/include/uapi/linux/fuse.h
> > @@ -572,6 +572,17 @@ struct fuse_entry_out {
> >         struct fuse_attr attr;
> >  };
> >
> > +#define FUSE_ACTION_KEEP       0
> > +#define FUSE_ACTION_REMOVE     1
> > +#define FUSE_ACTION_REPLACE    2
> > +
> > +struct fuse_entry_bpf_out {
> > +       uint64_t        backing_action;
> > +       uint64_t        backing_fd;
>
> This is a security issue.   See this post from Jann:
>
> https://lore.kernel.org/all/CAG48ez17uXtjCTa7xpa=JWz3iBbNDQTKO2hvn6PAZtfW3kXgcA@mail.gmail.com/
>
> The fuse-passthrough series solved this by pre-registering the
> passthrogh fd with an ioctl. Since this requires an expicit syscall on
> the server side the attack is thwarted.
>
> It would be nice if this mechanism was agreed between these projects.
>
> BTW, does fuse-bpf provide a superset of fuse-passthrough?  I mean
> could fuse-bpf work with a NULL bpf program as a simple passthrough?
>
> Thanks,
> Miklos

To deal with the easy part. Yes, fuse-bpf can take a null bpf program, and
if you install that on files, it should behave exactly like bpf passthrough.

Our intent is that all accesses to the backing files go through the normal
vfs layer checks, so even once a backing file is installed, it can only be
accessed if the client already has sufficient rights. However, the same
statement seems to be true for the fuse passthrough code so I assume
that is not sufficient. I would be interested in further understanding the
remaining security issue (or is it defense in depth?) We understand that
the solution in fuse passthrough was to change the response to a fuse open
to be an ioctl? This would seem straightforward in fuse-bpf as well if it is
needed, though of course it would be in the lookup.

Thank you for reminding us of this,

Paul

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 00/26] FUSE BPF: A Stacked Filesystem Extension for FUSE
  2022-09-28  6:41 ` [PATCH 00/26] FUSE BPF: A Stacked Filesystem Extension for FUSE Martin KaFai Lau
  2022-09-28 12:31   ` Brian Foster
@ 2022-10-01  0:05   ` Daniel Rosenberg
  2022-10-01  0:24     ` Alexei Starovoitov
  2022-10-06  1:58     ` Martin KaFai Lau
  1 sibling, 2 replies; 38+ messages in thread
From: Daniel Rosenberg @ 2022-10-01  0:05 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: Andrii Nakryiko, Song Liu, Yonghong Song, KP Singh,
	Stanislav Fomichev, Hao Luo, Jiri Olsa, Paul Lawrence,
	Alessio Balsini, David Anderson, Sandeep Patil, linux-fsdevel,
	bpf, kernel-team, Miklos Szeredi, Alexei Starovoitov,
	Daniel Borkmann, John Fastabend

On Tue, Sep 27, 2022 at 11:41 PM Martin KaFai Lau <martin.lau@linux.dev> wrote:
>
> Interesting idea.
>
> Some comments on review logistics:
> - The set is too long and some of the individual patches are way too long for
> one single patch to review.  Keep in mind that not all of us here are experts in
> both fuse and bpf.  Making it easier to review first will help at the beginning.
>   Some ideas:
>
>    - Only implement a few ops in the initial revision. From quickly browsing the
> set, it is implementing the 'struct file_operations fuse_file_operations'?
> Maybe the first few revisions can start with a few of the ops first.
>

I've split it up a fair bit already, do you mean just sending a subset
of them at a time? I think the current splitting roughly allows for
that. Patch 1-4 and 5 deal with bpf/verifier code which isn't used
until patch 24. I can reorder/split up the opcodes arbitrarily.
Putting the op codes that implement file passthrough first makes
sense. The code is much easier to test when all/most are present,
since then I can just use patch 23 to mount without a daemon and run
xfs tests on them. At least initially I felt the whole stack was
useful to give the full picture.

>    - Please make the patches that can be applied to the bpf-next tree cleanly.
> For example, in patch 3, where is 18e2ec5bf453 coming from? I cannot find it in
> bpf-next and linux-next tree.
>    - Without applying it to an upstream tree cleanly, in a big set like this, I
> have no idea when bpf_prog_run() is called in patch 24 because the diff context
> is in fuse_bpf_cleanup and apparently it is not where the bpf prog is run.
>

Currently this is based off of
bf682942cd26ce9cd5e87f73ae099b383041e782 in
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
I would have rebased on top of bpf-next, except that from my
conversations at plumbers, I figured that the set up would need to
change significantly, and that effort would be wasted. My goal
including them here was to give more of a sense of what our needs are,
and be a starting point for working out what we really ought to be
using.

> Some high level comments on the set:
> - Instead of adding bpf helpers, you should consider kfunc instead. You can take
> a look at the recent HID patchset v10 or the recent nf conntrack bpf set.
>
> - Instead of expressing as packet data, using the recent dynptr is a better way
> to go for handling a mem blob.
>

I'll look into those, I remember them coming up at LPC. My current use
of packets/buffers does seem to abuse their intended meaning a bit.

> - iiuc, the idea is to allow bpf prog to optionally handle the 'struct
> file_operations' without going back to the user daemon? Have you looked at
> struct_ops which seems to be a better fit here?  If the bpf prog does not know
> how to handle an operation (or file?), it can call fuse_file_llseek (for
> example) as a kfunc to handle the request.
>

I wasn't aware of struct_ops. It looks like that may work for us
instead of making a new prog type. I'll definitely look into that.
I'll likely sign up for the bpf office hours next week.

> - The test SEC("test_trace") seems mostly a synthetic test for checking
> correctness.  Does it have a test that shows a more real life use case? or I
> have missed things in patch 26?
>

Patch 26 is pretty much all synthetic tests. A lot of them are just
ensuring that we even call in to the bpf program, and some limited
testing that changing the filters has the expected results. We
mentioned a few more concrete usecases in the LPC talk. One of those
is folder hiding. In Android we've had an issue with leaking the
existence of some apps to other apps. We needed to hide certain
directories from apps where they do have permissions to traverse the
directory. Attempting to access a folder without permissions would
result in EPERM, revealing the existence of that folder. Since the
application doesn't have permission to create arbitrary folders at
that level, we can hide it by using fuse-bpf to change the EPERM into
an ENOENT, and then filter readdir to remove disallowed entries. You
can see something like that in bpf_test_redact_readdir. We also have
some file level redaction. If an app doesn't have location
permissions, but does have file permissions, they could just read
picture metadata to get location information. We could have bpf
redirect reads that might contain location data to the daemon, while
passing through other parts.

> - Please use the skel to load the program.  It is pretty hard to read the loader
> in patch 26.

Yeah, patch 26 is not in great shape currently. I included it mostly
as something that exercises the code, and contains some example bpf
programs. Any suggestions on setting up the tests better are
appreciated.

>
> - I assume the main objective is for performance by not going back to the user
> daemon?  Do you have performance number?
>

I don't have any on hand from the current version. It's a little
tricky to know what numbers are relevant here since the numbers will
change greatly depending on what you do with it. In pure passthrough
without a bpf program, we were seeing performance pretty comparable to
the lower filesystem. Depending on how large of a bpf program we use
we were seeing pretty different slowdowns from that, though at least
some of that was from having a large switch statement.

Do you have any suggestions on what to test here? My thoughts would be
comparing lower fs performance to pure passthrough, to maybe the
example ll_passthrough in libfuse, but that doesn't really show any
bpf impact.

-Daniel

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 00/26] FUSE BPF: A Stacked Filesystem Extension for FUSE
  2022-10-01  0:05   ` Daniel Rosenberg
@ 2022-10-01  0:24     ` Alexei Starovoitov
  2022-10-06  1:58     ` Martin KaFai Lau
  1 sibling, 0 replies; 38+ messages in thread
From: Alexei Starovoitov @ 2022-10-01  0:24 UTC (permalink / raw)
  To: Daniel Rosenberg
  Cc: Martin KaFai Lau, Andrii Nakryiko, Song Liu, Yonghong Song,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Paul Lawrence,
	Alessio Balsini, David Anderson, Sandeep Patil, Linux-Fsdevel,
	bpf, Android Kernel Team, Miklos Szeredi, Alexei Starovoitov,
	Daniel Borkmann, John Fastabend

On Fri, Sep 30, 2022 at 5:05 PM Daniel Rosenberg <drosen@google.com> wrote:
>
> >    - Please make the patches that can be applied to the bpf-next tree cleanly.
> > For example, in patch 3, where is 18e2ec5bf453 coming from? I cannot find it in
> > bpf-next and linux-next tree.
> >    - Without applying it to an upstream tree cleanly, in a big set like this, I
> > have no idea when bpf_prog_run() is called in patch 24 because the diff context
> > is in fuse_bpf_cleanup and apparently it is not where the bpf prog is run.
> >
>
> Currently this is based off of
> bf682942cd26ce9cd5e87f73ae099b383041e782 in
> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
> I would have rebased on top of bpf-next, except that from my
> conversations at plumbers, I figured that the set up would need to
> change significantly, and that effort would be wasted. My goal
> including them here was to give more of a sense of what our needs are,
> and be a starting point for working out what we really ought to be
> using.

It was a good idea to send it early :)

>
> > Some high level comments on the set:
> > - Instead of adding bpf helpers, you should consider kfunc instead. You can take
> > a look at the recent HID patchset v10 or the recent nf conntrack bpf set.
> >
> > - Instead of expressing as packet data, using the recent dynptr is a better way
> > to go for handling a mem blob.
> >
>
> I'll look into those, I remember them coming up at LPC. My current use
> of packets/buffers does seem to abuse their intended meaning a bit.

This 'abuse' is sorta, kinda, ok-ish. We can accept that,
but once you convert to kfunc interface you might realize that
"packet" abstraction is not necessary here and there are
cleaner alternatives. Have you looked at dynptr ?

> > - iiuc, the idea is to allow bpf prog to optionally handle the 'struct
> > file_operations' without going back to the user daemon? Have you looked at
> > struct_ops which seems to be a better fit here?  If the bpf prog does not know
> > how to handle an operation (or file?), it can call fuse_file_llseek (for
> > example) as a kfunc to handle the request.
> >
>
> I wasn't aware of struct_ops. It looks like that may work for us
> instead of making a new prog type. I'll definitely look into that.
> I'll likely sign up for the bpf office hours next week.

I have to second everything that Martin suggested.

To reiterate his points in different words:
. patch 26 with printk debug only gives very low
confidence that the presented api towards bpf programs
will be usable.
The patch series gotta have a production worthy bpf program
that actually does things you want it to do.

. please use kfunc mechanism similar to the way hid-bpf is doing.
If individual funcs are not enough and you need to attach
a set of bpf programs all at once then use struct_ops.
kfuncs are prefered if you don't need atomicity of a set of progs.
If it's fine to attach progs one at a time to different nop==empty
functions than just use a set of nop funcs and call back into
the kernel with kfuncs.
That would be easier to rip out when api turns out to
be insufficient or extensions are necessary.

> > - The test SEC("test_trace") seems mostly a synthetic test for checking
> > correctness.  Does it have a test that shows a more real life use case? or I
> > have missed things in patch 26?
> >
>
> Patch 26 is pretty much all synthetic tests. A lot of them are just
> ensuring that we even call in to the bpf program, and some limited
> testing that changing the filters has the expected results. We
> mentioned a few more concrete usecases in the LPC talk. One of those
> is folder hiding. In Android we've had an issue with leaking the
> existence of some apps to other apps. We needed to hide certain
> directories from apps where they do have permissions to traverse the
> directory. Attempting to access a folder without permissions would
> result in EPERM, revealing the existence of that folder. Since the
> application doesn't have permission to create arbitrary folders at
> that level, we can hide it by using fuse-bpf to change the EPERM into
> an ENOENT, and then filter readdir to remove disallowed entries. You
> can see something like that in bpf_test_redact_readdir. We also have
> some file level redaction. If an app doesn't have location
> permissions, but does have file permissions, they could just read
> picture metadata to get location information. We could have bpf
> redirect reads that might contain location data to the daemon, while
> passing through other parts.

The described use cases sound useful.
Just present them as real bpf progs.

My main concern, though, is with patches 7-25.
I don't understand file systems, but it looks like the patches
bring features into fuse from other file systems.
In many ways it looks like overlayfs.
Maybe just add bpf hooks to overlayfs?
Why fuse at all?
It might look like a bunch of nop functions sprinkled around overlayfs
code.
If you need to talk to user space from bpf prog you'll have
ringbuf and user_ringbuf to stream data from bpf prog to user
and back.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 00/26] FUSE BPF: A Stacked Filesystem Extension for FUSE
  2022-09-28 12:31   ` Brian Foster
@ 2022-10-01  0:47     ` Daniel Rosenberg
  0 siblings, 0 replies; 38+ messages in thread
From: Daniel Rosenberg @ 2022-10-01  0:47 UTC (permalink / raw)
  To: Brian Foster
  Cc: Martin KaFai Lau, Andrii Nakryiko, Song Liu, Yonghong Song,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Paul Lawrence,
	Alessio Balsini, David Anderson, Sandeep Patil, linux-fsdevel,
	bpf, kernel-team, Miklos Szeredi, Alexei Starovoitov,
	Daniel Borkmann, John Fastabend

On Wed, Sep 28, 2022 at 5:31 AM Brian Foster <bfoster@redhat.com> wrote:
>
> I had a similar thought when poking through this. A related question I
> had is how much of a functional dependency does the core passthrough
> mechanism have on bpf? If bpf is optional for filtering purposes and
> isn't absolutely necessary to set up a basic form of passthrough, I
> think review would be made easier by splitting off those core bits from
> the bpf components so each part is easier to review by people who know
> them best. For example, introduce all the fuse enhancements, hooks and
> cleanups to set up a passthrough to start the series, then plumb in the
> bpf filtering magic on top. Hm?
>

The passthrough code has no dependency on the bpf functionality. I can
reorder these patches to not have any bpf changes until patch 24. I'll
probably change the order like I described in my previous email. The
patches do become a lot more useful once the pre/post filters enter
the mix though.

> BTW if the bpf bits are optional, how might one mount a fuse/no
> daemon/passthrough filesystem from userspace? Is that possible with this
> series as is?
>
This is provided by patch 23. You can mount with the "no_daemon"
option. Anywhere FUSE attempts to call the daemon will end up with an
error, since the daemon is not connected. If you pair this with
"root_dir=[fd]" and optionally "root_bpf=[fd]", you can run in a
daemon-less passthrough mode. It's a bit less exciting though, since
at that point you're kind of doing a bind mount with extra steps.
Useful for testing though, and in theory you may be able to implement
most of a daemon in bpf.

> Something more on the fuse side.. it looks like we introduce a pattern
> where bits of generic request completion processing can end up
> duplicated between the shortcut (i.e.  _backing()/_finalize()) handlers
> and the traditional post request code, because the shortcuts basically
> bypass the entire rest of the codepath. For example, something like
> create_new_entry() is currently reused for several inode creation
> operations. With passthrough mode, it looks like some of that code (i.e.
> vfs dentry fixups) is split off from create_new_entry() into each
> individual backing mode handler.
>
> It looks like much of the lower level request processing code was
> refactored into the fuse_iqueue to support things like virtiofs. Have
> you looked into whether that abstraction can be reused or enhanced to
> support bpf filtering, direct passthrough calls, etc.? Or perhaps
> whether more of the higher level code could be refactored in a similar
> way to encourage more reuse and avoid branching off every fs operation
> into a special passthrough codepath?
>
> Brian
>

The largest opportunity for reducing duplicate code would probably be
trying to unify the backing calls between overlayfs and our work here.
In places where you need to do more work than directly calling the
relevant vfs calls we probably could factor out some common helpers. I
haven't looked too much into that yet since I want to see where the
fuse-bpf code ends up before I try to commit to that. I've thought
about unifying some of the code around node creation in the backing
implementations, but haven't gotten around to it yet. We definitely
need to branch off for every operation though, since fuse otherwise
has no concept of the backing filesystem. We do have some more work to
do to ensure there is a clean handoff between regular fuse and
fuse-bpf. The goal is to be able to handle just the parts you need to
in the daemon, while the rest can be passed through if you're acting
as a stacked filesystem. There are some oddities around things fuse
does for efficiency that fuse-bpf doesn't need to do. For instance, if
you're using passthrough for getattr, you don't really need to do a
readdir_plus, since you don't have to worry about all the extra daemon
requests.

-Daniel

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 15/26] fuse-bpf: Add support for read/write iter
  2022-09-26 23:18 ` [PATCH 15/26] fuse-bpf: Add support for read/write iter Daniel Rosenberg
@ 2022-10-01  6:53   ` Amir Goldstein
  0 siblings, 0 replies; 38+ messages in thread
From: Amir Goldstein @ 2022-10-01  6:53 UTC (permalink / raw)
  To: Daniel Rosenberg
  Cc: Miklos Szeredi, Alexei Starovoitov, Daniel Borkmann,
	John Fastabend, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Paul Lawrence, Alessio Balsini, David Anderson, Sandeep Patil,
	linux-fsdevel, bpf, kernel-team

On Tue, Sep 27, 2022 at 2:46 AM Daniel Rosenberg <drosen@google.com> wrote:
>
> Signed-off-by: Daniel Rosenberg <drosen@google.com>
> Signed-off-by: Paul Lawrence <paullawrence@google.com>
> ---
>  fs/fuse/backing.c | 291 ++++++++++++++++++++++++++++++++++++++++++++++
>  fs/fuse/control.c |   2 +-
>  fs/fuse/file.c    |  28 +++++
>  fs/fuse/fuse_i.h  |  42 ++++++-
>  fs/fuse/inode.c   |  13 +++
>  5 files changed, 374 insertions(+), 2 deletions(-)
>
> diff --git a/fs/fuse/backing.c b/fs/fuse/backing.c
> index 1fe61177cdfb..cf4ad9f4fe10 100644
> --- a/fs/fuse/backing.c
> +++ b/fs/fuse/backing.c
> @@ -12,6 +12,47 @@
>  #include <linux/namei.h>
>  #include <linux/bpf_fuse.h>
>
> +#define FUSE_BPF_IOCB_MASK (IOCB_APPEND | IOCB_DSYNC | IOCB_HIPRI | IOCB_NOWAIT | IOCB_SYNC)
> +
> +struct fuse_bpf_aio_req {
> +       struct kiocb iocb;
> +       refcount_t ref;
> +       struct kiocb *iocb_orig;
> +};
> +
> +static struct kmem_cache *fuse_bpf_aio_request_cachep;
> +
> +static void fuse_file_accessed(struct file *dst_file, struct file *src_file)
> +{
> +       struct inode *dst_inode;
> +       struct inode *src_inode;
> +
> +       if (dst_file->f_flags & O_NOATIME)
> +               return;
> +
> +       dst_inode = file_inode(dst_file);
> +       src_inode = file_inode(src_file);
> +
> +       if ((!timespec64_equal(&dst_inode->i_mtime, &src_inode->i_mtime) ||
> +            !timespec64_equal(&dst_inode->i_ctime, &src_inode->i_ctime))) {
> +               dst_inode->i_mtime = src_inode->i_mtime;
> +               dst_inode->i_ctime = src_inode->i_ctime;
> +       }
> +
> +       touch_atime(&dst_file->f_path);
> +}
> +
> +static void fuse_copyattr(struct file *dst_file, struct file *src_file)
> +{
> +       struct inode *dst = file_inode(dst_file);
> +       struct inode *src = file_inode(src_file);
> +
> +       dst->i_atime = src->i_atime;
> +       dst->i_mtime = src->i_mtime;
> +       dst->i_ctime = src->i_ctime;
> +       i_size_write(dst, i_size_read(src));
> +}
> +
>  struct bpf_prog *fuse_get_bpf_prog(struct file *file)
>  {
>         struct bpf_prog *bpf_prog = ERR_PTR(-EINVAL);
> @@ -469,6 +510,241 @@ int fuse_lseek_finalize(struct bpf_fuse_args *fa, loff_t *out,
>         return 0;
>  }
>
> +static inline void fuse_bpf_aio_put(struct fuse_bpf_aio_req *aio_req)
> +{
> +       if (refcount_dec_and_test(&aio_req->ref))
> +               kmem_cache_free(fuse_bpf_aio_request_cachep, aio_req);
> +}
> +
> +static void fuse_bpf_aio_cleanup_handler(struct fuse_bpf_aio_req *aio_req)
> +{
> +       struct kiocb *iocb = &aio_req->iocb;
> +       struct kiocb *iocb_orig = aio_req->iocb_orig;
> +
> +       if (iocb->ki_flags & IOCB_WRITE) {
> +               __sb_writers_acquired(file_inode(iocb->ki_filp)->i_sb,
> +                                     SB_FREEZE_WRITE);
> +               file_end_write(iocb->ki_filp);
> +               fuse_copyattr(iocb_orig->ki_filp, iocb->ki_filp);
> +       }
> +       iocb_orig->ki_pos = iocb->ki_pos;
> +       fuse_bpf_aio_put(aio_req);
> +}
> +
> +static void fuse_bpf_aio_rw_complete(struct kiocb *iocb, long res)
> +{
> +       struct fuse_bpf_aio_req *aio_req =
> +               container_of(iocb, struct fuse_bpf_aio_req, iocb);
> +       struct kiocb *iocb_orig = aio_req->iocb_orig;
> +
> +       fuse_bpf_aio_cleanup_handler(aio_req);
> +       iocb_orig->ki_complete(iocb_orig, res);
> +}
> +
> +int fuse_file_read_iter_initialize_in(struct bpf_fuse_args *fa, struct fuse_file_read_iter_io *fri,
> +                                     struct kiocb *iocb, struct iov_iter *to)
> +{
> +       struct file *file = iocb->ki_filp;
> +       struct fuse_file *ff = file->private_data;
> +
> +       fri->fri = (struct fuse_read_in) {
> +               .fh = ff->fh,
> +               .offset = iocb->ki_pos,
> +               .size = to->count,
> +       };
> +
> +       /* TODO we can't assume 'to' is a kvec */
> +       /* TODO we also can't assume the vector has only one component */
> +       *fa = (struct bpf_fuse_args) {
> +               .opcode = FUSE_READ,
> +               .nodeid = ff->nodeid,
> +               .in_numargs = 1,
> +               .in_args[0].size = sizeof(fri->fri),
> +               .in_args[0].value = &fri->fri,
> +               /*
> +                * TODO Design this properly.
> +                * Possible approach: do not pass buf to bpf
> +                * If going to userland, do a deep copy
> +                * For extra credit, do that to/from the vector, rather than
> +                * making an extra copy in the kernel
> +                */
> +       };
> +
> +       return 0;
> +}
> +
> +int fuse_file_read_iter_initialize_out(struct bpf_fuse_args *fa, struct fuse_file_read_iter_io *fri,
> +                                      struct kiocb *iocb, struct iov_iter *to)
> +{
> +       fri->frio = (struct fuse_read_iter_out) {
> +               .ret = fri->fri.size,
> +       };
> +
> +       fa->out_numargs = 1;
> +       fa->out_args[0].size = sizeof(fri->frio);
> +       fa->out_args[0].value = &fri->frio;
> +
> +       return 0;
> +}
> +
> +int fuse_file_read_iter_backing(struct bpf_fuse_args *fa, ssize_t *out,
> +                               struct kiocb *iocb, struct iov_iter *to)
> +{
> +       struct fuse_read_iter_out *frio = fa->out_args[0].value;
> +       struct file *file = iocb->ki_filp;
> +       struct fuse_file *ff = file->private_data;
> +
> +       if (!iov_iter_count(to))
> +               return 0;
> +
> +       if ((iocb->ki_flags & IOCB_DIRECT) &&
> +           (!ff->backing_file->f_mapping->a_ops ||
> +            !ff->backing_file->f_mapping->a_ops->direct_IO))
> +               return -EINVAL;
> +
> +       /* TODO This just plain ignores any change to fuse_read_in */
> +       if (is_sync_kiocb(iocb)) {
> +               *out = vfs_iter_read(ff->backing_file, to, &iocb->ki_pos,
> +                               iocb_to_rw_flags(iocb->ki_flags, FUSE_BPF_IOCB_MASK));
> +       } else {
> +               struct fuse_bpf_aio_req *aio_req;
> +
> +               *out = -ENOMEM;
> +               aio_req = kmem_cache_zalloc(fuse_bpf_aio_request_cachep, GFP_KERNEL);
> +               if (!aio_req)
> +                       goto out;
> +
> +               aio_req->iocb_orig = iocb;
> +               kiocb_clone(&aio_req->iocb, iocb, ff->backing_file);
> +               aio_req->iocb.ki_complete = fuse_bpf_aio_rw_complete;
> +               refcount_set(&aio_req->ref, 2);
> +               *out = vfs_iocb_iter_read(ff->backing_file, &aio_req->iocb, to);
> +               fuse_bpf_aio_put(aio_req);
> +               if (*out != -EIOCBQUEUED)
> +                       fuse_bpf_aio_cleanup_handler(aio_req);
> +       }
> +
> +       frio->ret = *out;
> +
> +       /* TODO Need to point value at the buffer for post-modification */
> +
> +out:
> +       fuse_file_accessed(file, ff->backing_file);
> +
> +       return *out;
> +}
> +
> +int fuse_file_read_iter_finalize(struct bpf_fuse_args *fa, ssize_t *out,
> +                                struct kiocb *iocb, struct iov_iter *to)
> +{
> +       struct fuse_read_iter_out *frio = fa->out_args[0].value;
> +
> +       *out = frio->ret;
> +
> +       return 0;
> +}
> +
> +int fuse_file_write_iter_initialize_in(struct bpf_fuse_args *fa,
> +                                      struct fuse_file_write_iter_io *fwio,
> +                                      struct kiocb *iocb, struct iov_iter *from)
> +{
> +       struct file *file = iocb->ki_filp;
> +       struct fuse_file *ff = file->private_data;
> +
> +       *fwio = (struct fuse_file_write_iter_io) {
> +               .fwi.fh = ff->fh,
> +               .fwi.offset = iocb->ki_pos,
> +               .fwi.size = from->count,
> +       };
> +
> +       /* TODO we can't assume 'from' is a kvec */
> +       *fa = (struct bpf_fuse_args) {
> +               .opcode = FUSE_WRITE,
> +               .nodeid = ff->nodeid,
> +               .in_numargs = 2,
> +               .in_args[0].size = sizeof(fwio->fwi),
> +               .in_args[0].value = &fwio->fwi,
> +               .in_args[1].size = fwio->fwi.size,
> +               .in_args[1].value = from->kvec->iov_base,
> +       };
> +
> +       return 0;
> +}
> +
> +int fuse_file_write_iter_initialize_out(struct bpf_fuse_args *fa,
> +                                       struct fuse_file_write_iter_io *fwio,
> +                                       struct kiocb *iocb, struct iov_iter *from)
> +{
> +       /* TODO we can't assume 'from' is a kvec */
> +       fa->out_numargs = 1;
> +       fa->out_args[0].size = sizeof(fwio->fwio);
> +       fa->out_args[0].value = &fwio->fwio;
> +
> +       return 0;
> +}
> +
> +int fuse_file_write_iter_backing(struct bpf_fuse_args *fa, ssize_t *out,
> +                                struct kiocb *iocb, struct iov_iter *from)
> +{
> +       struct file *file = iocb->ki_filp;
> +       struct fuse_file *ff = file->private_data;
> +       struct fuse_write_iter_out *fwio = fa->out_args[0].value;
> +
> +       if (!iov_iter_count(from))
> +               return 0;
> +
> +       /* TODO This just plain ignores any change to fuse_write_in */
> +       /* TODO uint32_t seems smaller than ssize_t.... right? */
> +       inode_lock(file_inode(file));
> +
> +       fuse_copyattr(file, ff->backing_file);
> +
> +       if (is_sync_kiocb(iocb)) {
> +               file_start_write(ff->backing_file);
> +               *out = vfs_iter_write(ff->backing_file, from, &iocb->ki_pos,
> +                                          iocb_to_rw_flags(iocb->ki_flags, FUSE_BPF_IOCB_MASK));
> +               file_end_write(ff->backing_file);
> +
> +               /* Must reflect change in size of backing file to upper file */
> +               if (*out > 0)
> +                       fuse_copyattr(file, ff->backing_file);

Regarding attribute cache, things can get a tad more complicated
when the inode is not purely passthrough.

To put things in context, the reason that ovl_copyattr() is correct
in ovl_write_iter() is because the overlayfs inode is purely passthrough
to the backing inode, that is, all operations are passthrough.
The only incident when backing inode changes (copy up) takes
care of copying inode attributes.

This is not the case with FUSE passthrough.
Not in Alessio's FUSE_PASSTHROUGH patches and not in
this proposal.

With FUSE passthrough, every inode is hybrid, some operations
can be served from the backing inode and some operations served
from the server.

My FUSE passthrough branch [1] has two fixes on top of Allesio's patches
to fix two issues regarding attribute caches (size and times).

[1] https://github.com/amir73il/linux/commits/linux-5.10.y-fuse-passthrough

This problem is not unique to FUSE.
It also exists for attribute caches of NFS clients and the NFS protocol
has several techniques to deal with this problem, but it is certainly not
a trivial issue.

As a matter of fact, I found the problems fixed in my branch above
by running the nfstest_posix [2] tests on the FUSE passthrough code.

[2] https://wiki.linux-nfs.org/wiki/index.php/NFStest#nfstest_posix_-_POSIX_file_system_level_access_tests

The easiest way out is to declare one of the copies (backing or remote)
the authoritative copy w.r.t. attributes (like noac nfs mount option).
Declaring the remote attribute copy authoritative (as FUSE usually does)
has performance implications.

I guess if FUSE-bpf attaches a backing inode to FUSE inode on lookup
then the option of making the backing inode attributes authoritative
(like in overlayfs) is valid, but I think this needs to be spelled out.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 03/26] fuse-bpf: Update uapi for fuse-bpf
  2022-09-30 22:02     ` Paul Lawrence
@ 2022-10-01  7:47       ` Amir Goldstein
  0 siblings, 0 replies; 38+ messages in thread
From: Amir Goldstein @ 2022-10-01  7:47 UTC (permalink / raw)
  To: Paul Lawrence
  Cc: Miklos Szeredi, Daniel Rosenberg, Alexei Starovoitov,
	Daniel Borkmann, John Fastabend, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, KP Singh,
	Stanislav Fomichev, Hao Luo, Jiri Olsa, Alessio Balsini,
	David Anderson, Sandeep Patil, linux-fsdevel, bpf, kernel-team,
	Jann Horn

On Sat, Oct 1, 2022 at 1:03 AM Paul Lawrence <paullawrence@google.com> wrote:
>
> On Tue, Sep 27, 2022 at 11:19 AM Miklos Szeredi <miklos@szeredi.hu> wrote:
> >
> > On Tue, 27 Sept 2022 at 01:18, Daniel Rosenberg <drosen@google.com> wrote:
> >
> > > diff --git a/include/uapi/linux/fuse.h b/include/uapi/linux/fuse.h
> > > index d6ccee961891..8c80c146e69b 100644
> > > --- a/include/uapi/linux/fuse.h
> > > +++ b/include/uapi/linux/fuse.h
> > > @@ -572,6 +572,17 @@ struct fuse_entry_out {
> > >         struct fuse_attr attr;
> > >  };
> > >
> > > +#define FUSE_ACTION_KEEP       0
> > > +#define FUSE_ACTION_REMOVE     1
> > > +#define FUSE_ACTION_REPLACE    2
> > > +
> > > +struct fuse_entry_bpf_out {
> > > +       uint64_t        backing_action;
> > > +       uint64_t        backing_fd;
> >
> > This is a security issue.   See this post from Jann:
> >
> > https://lore.kernel.org/all/CAG48ez17uXtjCTa7xpa=JWz3iBbNDQTKO2hvn6PAZtfW3kXgcA@mail.gmail.com/
> >
> > The fuse-passthrough series solved this by pre-registering the
> > passthrogh fd with an ioctl. Since this requires an expicit syscall on
> > the server side the attack is thwarted.
> >
> > It would be nice if this mechanism was agreed between these projects.
> >
> > BTW, does fuse-bpf provide a superset of fuse-passthrough?  I mean
> > could fuse-bpf work with a NULL bpf program as a simple passthrough?
> >
> > Thanks,
> > Miklos
>
> To deal with the easy part. Yes, fuse-bpf can take a null bpf program, and
> if you install that on files, it should behave exactly like bpf passthrough.
>
> Our intent is that all accesses to the backing files go through the normal
> vfs layer checks, so even once a backing file is installed, it can only be
> accessed if the client already has sufficient rights. However, the same
> statement seems to be true for the fuse passthrough code so I assume
> that is not sufficient. I would be interested in further understanding the
> remaining security issue (or is it defense in depth?) We understand that
> the solution in fuse passthrough was to change the response to a fuse open
> to be an ioctl? This would seem straightforward in fuse-bpf as well if it is
> needed, though of course it would be in the lookup.

Not only in lookup.
In lookup userspace can install an O_PATH fd for backing_path,
but userspace will also need to install readable/writeable fds
as backing_file to be used by open to match the open mode.

When talking about the server-less passthrough mode, some
security model like overlayfs mounter creds model will need to be
employed, although in the private case of one-to-one passthrough
I guess using the caller creds should be good enough, as long as
the security model is spelled out and implementation is audited.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 00/26] FUSE BPF: A Stacked Filesystem Extension for FUSE
  2022-10-01  0:05   ` Daniel Rosenberg
  2022-10-01  0:24     ` Alexei Starovoitov
@ 2022-10-06  1:58     ` Martin KaFai Lau
  1 sibling, 0 replies; 38+ messages in thread
From: Martin KaFai Lau @ 2022-10-06  1:58 UTC (permalink / raw)
  To: Daniel Rosenberg
  Cc: Andrii Nakryiko, Song Liu, Yonghong Song, KP Singh,
	Stanislav Fomichev, Hao Luo, Jiri Olsa, Paul Lawrence,
	Alessio Balsini, David Anderson, Sandeep Patil, linux-fsdevel,
	bpf, kernel-team, Miklos Szeredi, Alexei Starovoitov,
	Daniel Borkmann, John Fastabend

On 9/30/22 5:05 PM, Daniel Rosenberg wrote:
> On Tue, Sep 27, 2022 at 11:41 PM Martin KaFai Lau <martin.lau@linux.dev> wrote:
>>
>> Interesting idea.
>>
>> Some comments on review logistics:
>> - The set is too long and some of the individual patches are way too long for
>> one single patch to review.  Keep in mind that not all of us here are experts in
>> both fuse and bpf.  Making it easier to review first will help at the beginning.
>>    Some ideas:
>>
>>     - Only implement a few ops in the initial revision. From quickly browsing the
>> set, it is implementing the 'struct file_operations fuse_file_operations'?
>> Maybe the first few revisions can start with a few of the ops first.
>>
> 
> I've split it up a fair bit already, do you mean just sending a subset
> of them at a time? I think the current splitting roughly allows for
> that. Patch 1-4 and 5 deal with bpf/verifier code which isn't used
> until patch 24. I can reorder/split up the opcodes arbitrarily.


> Putting the op codes that implement file passthrough first makes
> sense. The code is much easier to test when all/most are present,
> since then I can just use patch 23 to mount without a daemon and run
> xfs tests on them. At least initially I felt the whole stack was
> useful to give the full picture.

I don't mind to have all op codes in each re-spin as long as it can apply 
cleanly to bpf-next where the bpf implementation part will eventually land. 
Patch 26 has to split up though.  It is a few thousand lines in one patch.

I was just thinking to only do a few op codes, eg. the few android use cases you 
have mentioned.  My feeling is other op codes should not be very different in 
term of the bpf side implementation (or it is not true?).  When the patch set 
getting enough traction, then start adding more op codes in the later revisions. 
  That will likely help to re-spin faster and save you time also.


>> - iiuc, the idea is to allow bpf prog to optionally handle the 'struct
>> file_operations' without going back to the user daemon? Have you looked at
>> struct_ops which seems to be a better fit here?  If the bpf prog does not know
>> how to handle an operation (or file?), it can call fuse_file_llseek (for
>> example) as a kfunc to handle the request.
>>
> 
> I wasn't aware of struct_ops. It looks like that may work for us
> instead of making a new prog type. I'll definitely look into that.
> I'll likely sign up for the bpf office hours next week.

You can take a look at the tools/testing/selftests/bpf/progs/bpf_cubic.c.
It implements the whole tcp congestion in bpf. In particular, the bpf prog is 
implementing the kernel 'struct tcp_congestion_ops'.  That selftest example is 
pretty much a direct copy from the kernel net/ipv4/tcp_cubic.c.  Also, in 
BPF_STRUCT_OPS(bpf_cubic_undo_cwnd, ...), it is directly calling the kfunc's 
tcp_reno_undo_cwnd() when the bpf prog does not need to do anything different 
from the kernel's tcp_reno_undo_cwnd().  Look at how it is marked as __ksym in 
bpf_cubic.c

However, echoing Alexei's earlier reply, struct_ops is good when it needs to 
implement a well defined 'struct xyz_operations' that has all function pointer 
in it.  Taking another skim at the set, it seems like it is mostly trying to 
intercept the fuse_simple_request() call?


^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2022-10-06  1:58 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-09-26 23:17 [PATCH 00/26] FUSE BPF: A Stacked Filesystem Extension for FUSE Daniel Rosenberg
2022-09-26 23:17 ` [PATCH 01/26] bpf: verifier: Allow for multiple packets Daniel Rosenberg
2022-09-26 23:17 ` [PATCH 02/26] bpf: verifier: Allow single packet invalidation Daniel Rosenberg
2022-09-26 23:17 ` [PATCH 03/26] fuse-bpf: Update uapi for fuse-bpf Daniel Rosenberg
2022-09-27 18:19   ` Miklos Szeredi
2022-09-30 22:02     ` Paul Lawrence
2022-10-01  7:47       ` Amir Goldstein
2022-09-26 23:18 ` [PATCH 04/26] fuse-bpf: Add BPF supporting functions Daniel Rosenberg
2022-09-26 23:18 ` [PATCH 05/26] fs: Generic function to convert iocb to rw flags Daniel Rosenberg
2022-09-26 23:18 ` [PATCH 06/26] bpf: Export bpf_prog_fops Daniel Rosenberg
2022-09-26 23:18 ` [PATCH 07/26] fuse-bpf: Prepare for fuse-bpf patch Daniel Rosenberg
2022-09-26 23:18 ` [PATCH 08/26] fuse: Add fuse-bpf, a stacked fs extension for FUSE Daniel Rosenberg
2022-09-26 23:18 ` [PATCH 09/26] fuse-bpf: Don't support export_operations Daniel Rosenberg
2022-09-26 23:18 ` [PATCH 10/26] fuse-bpf: Partially add mapping support Daniel Rosenberg
2022-09-26 23:18 ` [PATCH 11/26] fuse-bpf: Add lseek support Daniel Rosenberg
2022-09-26 23:18 ` [PATCH 12/26] fuse-bpf: Add support for fallocate Daniel Rosenberg
2022-09-27 22:07   ` Dave Chinner
2022-09-27 23:36     ` Daniel Rosenberg
2022-09-26 23:18 ` [PATCH 13/26] fuse-bpf: Support file/dir open/close Daniel Rosenberg
2022-09-26 23:18 ` [PATCH 14/26] fuse-bpf: Support mknod/unlink/mkdir/rmdir Daniel Rosenberg
2022-09-26 23:18 ` [PATCH 15/26] fuse-bpf: Add support for read/write iter Daniel Rosenberg
2022-10-01  6:53   ` Amir Goldstein
2022-09-26 23:18 ` [PATCH 16/26] fuse-bpf: support FUSE_READDIR Daniel Rosenberg
2022-09-26 23:18 ` [PATCH 17/26] fuse-bpf: Add support for sync operations Daniel Rosenberg
2022-09-26 23:18 ` [PATCH 18/26] fuse-bpf: Add Rename support Daniel Rosenberg
2022-09-26 23:18 ` [PATCH 19/26] fuse-bpf: Add attr support Daniel Rosenberg
2022-09-26 23:18 ` [PATCH 20/26] fuse-bpf: Add support for FUSE_COPY_FILE_RANGE Daniel Rosenberg
2022-09-26 23:18 ` [PATCH 21/26] fuse-bpf: Add xattr support Daniel Rosenberg
2022-09-26 23:18 ` [PATCH 22/26] fuse-bpf: Add symlink/link support Daniel Rosenberg
2022-09-26 23:18 ` [PATCH 23/26] fuse-bpf: allow mounting with no userspace daemon Daniel Rosenberg
2022-09-26 23:18 ` [PATCH 24/26] fuse-bpf: Call bpf for pre/post filters Daniel Rosenberg
2022-09-26 23:18 ` [PATCH 25/26] fuse-bpf: Add userspace " Daniel Rosenberg
2022-09-28  6:41 ` [PATCH 00/26] FUSE BPF: A Stacked Filesystem Extension for FUSE Martin KaFai Lau
2022-09-28 12:31   ` Brian Foster
2022-10-01  0:47     ` Daniel Rosenberg
2022-10-01  0:05   ` Daniel Rosenberg
2022-10-01  0:24     ` Alexei Starovoitov
2022-10-06  1:58     ` Martin KaFai Lau

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox