bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH bpf-next v1 0/8] Introduce load-acquire and store-release BPF instructions
@ 2025-01-25  2:16 Peilin Ye
  2025-01-25  2:17 ` [PATCH bpf-next v1 1/8] bpf/verifier: Factor out atomic_ptr_type_ok() Peilin Ye
                   ` (7 more replies)
  0 siblings, 8 replies; 27+ messages in thread
From: Peilin Ye @ 2025-01-25  2:16 UTC (permalink / raw)
  To: bpf, linux-arm-kernel
  Cc: Peilin Ye, bpf, Xu Kuohai, Eduard Zingerman, David Vernet,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Jonathan Corbet,
	Paul E. McKenney, Puranjay Mohan, Catalin Marinas, Will Deacon,
	Quentin Monnet, Mykola Lysenko, Shuah Khan, Josh Don,
	Barret Rhoden, Neel Natu, Benjamin Segall, linux-kernel

Hi all!

This patchset adds kernel support for BPF load-acquire and store-release
instructions (for background, please see [1]), including core/verifier,
arm64 JIT compiler, and Documentation/ changes.  x86-64 and riscv64 are
also planned to be supported.  The corresponding LLVM changes can be
found at:
  https://github.com/llvm/llvm-project/pull/108636

Following RFC 9669, 7.3. Adding Instructions [2], this patchset adds two
conformance groups for the new instructions:

  * atomic32v2: includes "atomic32", plus the new 8-, 16- and 32-bit
                atomic load-acquire and store-release instructions

  * atomic64v2: includes "atomic64" and "atomic32v2", plus the new
                64-bit atomic load-acquire and store-release
                instructions

See patch 8/8 for details; please suggest if you believe the new
instructions should be grouped differently.

RFC v1: https://lore.kernel.org/all/cover.1734742802.git.yepeilin@google.com

This patchset has been reorganized based on comments and suggestions
from Xu and Eduard.  Notable changes since RFC v1:

  o 1-2/8: minor verifier.c refactoring patches
  o   3/8: core/verifier changes
         * (Eduard) handle load-acquire properly in backtrack_insn()
         * (Eduard) avoid skipping checks (e.g.,
                    bpf_jit_supports_insn()) for load-acquires
         * track the value stored by store-releases, just like how
           non-atomic STX instructions are handled [3]
         * (Eduard) add missing link in commit message
         * (Eduard) always print 'r' for disasm.c changes
  o   4/8: arm64/insn: avoid treating load_acq/store_rel as
           load_ex/store_ex
  o   5/8: arm64/insn: add load_acq/store_rel
         * (Xu) include Should-Be-One (SBO) bits in "mask" and "value",
                to avoid setting fixed bits during runtime (JIT-compile
                time)
  o   6/8: arm64 JIT compiler changes
         * (Xu) use emit_a64_add_i() for "pointer + offset" to optimize
                code emission
  o   7/8: selftests
         * (Eduard) avoid adding new tests to the 'test_verifier' runner
         * add more tests, e.g., checking mark_precise logic
  o   8/8: instruction-set.rst changes

Please refer to individual kernel patches (and LLVM commits) for
details.  Any feedback would be much appreciated!

[1] https://lore.kernel.org/all/20240729183246.4110549-1-yepeilin@google.com/
[2] https://www.rfc-editor.org/rfc/rfc9669.html#section-7.3
[3] Specifically, for store-releases, make sure we do that
    check_mem_access(..., BPF_WRITE, ...) call with @value_regno equals
    'src_reg' instead of -1.

Thanks,
Peilin Ye (8):
  bpf/verifier: Factor out atomic_ptr_type_ok()
  bpf/verifier: Factor out check_atomic_rmw()
  bpf: Introduce load-acquire and store-release instructions
  arm64: insn: Add BIT(23) to {load,store}_ex's mask
  arm64: insn: Add load-acquire and store-release instructions
  bpf, arm64: Support load-acquire and store-release instructions
  selftests/bpf: Add selftests for load-acquire and store-release
    instructions
  bpf, docs: Update instruction-set.rst for load-acquire and
    store-release instructions

 .../bpf/standardization/instruction-set.rst   | 114 ++++++++++--
 arch/arm64/include/asm/insn.h                 |  12 +-
 arch/arm64/lib/insn.c                         |  28 +++
 arch/arm64/net/bpf_jit.h                      |  20 +++
 arch/arm64/net/bpf_jit_comp.c                 |  92 +++++++++-
 include/linux/filter.h                        |   2 +
 include/uapi/linux/bpf.h                      |  13 ++
 kernel/bpf/core.c                             |  41 ++++-
 kernel/bpf/disasm.c                           |  12 ++
 kernel/bpf/verifier.c                         | 165 +++++++++++++++---
 tools/include/uapi/linux/bpf.h                |  13 ++
 .../selftests/bpf/prog_tests/arena_atomics.c  |  61 ++++++-
 .../selftests/bpf/prog_tests/atomics.c        |  57 +++++-
 .../selftests/bpf/prog_tests/verifier.c       |   4 +
 .../selftests/bpf/progs/arena_atomics.c       |  62 ++++++-
 tools/testing/selftests/bpf/progs/atomics.c   |  62 ++++++-
 .../bpf/progs/verifier_load_acquire.c         |  92 ++++++++++
 .../selftests/bpf/progs/verifier_precision.c  |  39 +++++
 .../bpf/progs/verifier_store_release.c        | 153 ++++++++++++++++
 19 files changed, 988 insertions(+), 54 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/progs/verifier_load_acquire.c
 create mode 100644 tools/testing/selftests/bpf/progs/verifier_store_release.c

-- 
2.48.1.262.g85cc9f2d1e-goog


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH bpf-next v1 1/8] bpf/verifier: Factor out atomic_ptr_type_ok()
  2025-01-25  2:16 [PATCH bpf-next v1 0/8] Introduce load-acquire and store-release BPF instructions Peilin Ye
@ 2025-01-25  2:17 ` Peilin Ye
  2025-01-25  2:18 ` [PATCH bpf-next v1 2/8] bpf/verifier: Factor out check_atomic_rmw() Peilin Ye
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 27+ messages in thread
From: Peilin Ye @ 2025-01-25  2:17 UTC (permalink / raw)
  To: bpf, linux-arm-kernel
  Cc: Peilin Ye, bpf, Xu Kuohai, Eduard Zingerman, David Vernet,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Jonathan Corbet,
	Paul E. McKenney, Puranjay Mohan, Catalin Marinas, Will Deacon,
	Quentin Monnet, Mykola Lysenko, Shuah Khan, Josh Don,
	Barret Rhoden, Neel Natu, Benjamin Segall, linux-kernel

Factor out atomic_ptr_type_ok() as a helper function to be used later.

Signed-off-by: Peilin Ye <yepeilin@google.com>
---
 kernel/bpf/verifier.c | 26 +++++++++++++++++++++-----
 1 file changed, 21 insertions(+), 5 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 9971c03adfd5..0935b72fe716 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -6116,6 +6116,26 @@ static bool is_arena_reg(struct bpf_verifier_env *env, int regno)
 	return reg->type == PTR_TO_ARENA;
 }
 
+/* Return false if @regno contains a pointer whose type isn't supported for
+ * atomic instruction @insn.
+ */
+static bool atomic_ptr_type_ok(struct bpf_verifier_env *env, int regno,
+			       struct bpf_insn *insn)
+{
+	if (is_ctx_reg(env, regno))
+		return false;
+	if (is_pkt_reg(env, regno))
+		return false;
+	if (is_flow_key_reg(env, regno))
+		return false;
+	if (is_sk_reg(env, regno))
+		return false;
+	if (is_arena_reg(env, regno))
+		return bpf_jit_supports_insn(insn, true);
+
+	return true;
+}
+
 static u32 *reg2btf_ids[__BPF_REG_TYPE_MAX] = {
 #ifdef CONFIG_NET
 	[PTR_TO_SOCKET] = &btf_sock_ids[BTF_SOCK_TYPE_SOCK],
@@ -7572,11 +7592,7 @@ static int check_atomic(struct bpf_verifier_env *env, int insn_idx, struct bpf_i
 		return -EACCES;
 	}
 
-	if (is_ctx_reg(env, insn->dst_reg) ||
-	    is_pkt_reg(env, insn->dst_reg) ||
-	    is_flow_key_reg(env, insn->dst_reg) ||
-	    is_sk_reg(env, insn->dst_reg) ||
-	    (is_arena_reg(env, insn->dst_reg) && !bpf_jit_supports_insn(insn, true))) {
+	if (!atomic_ptr_type_ok(env, insn->dst_reg, insn)) {
 		verbose(env, "BPF_ATOMIC stores into R%d %s is not allowed\n",
 			insn->dst_reg,
 			reg_type_str(env, reg_state(env, insn->dst_reg)->type));
-- 
2.48.1.262.g85cc9f2d1e-goog


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH bpf-next v1 2/8] bpf/verifier: Factor out check_atomic_rmw()
  2025-01-25  2:16 [PATCH bpf-next v1 0/8] Introduce load-acquire and store-release BPF instructions Peilin Ye
  2025-01-25  2:17 ` [PATCH bpf-next v1 1/8] bpf/verifier: Factor out atomic_ptr_type_ok() Peilin Ye
@ 2025-01-25  2:18 ` Peilin Ye
  2025-01-25  2:18 ` [PATCH bpf-next v1 3/8] bpf: Introduce load-acquire and store-release instructions Peilin Ye
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 27+ messages in thread
From: Peilin Ye @ 2025-01-25  2:18 UTC (permalink / raw)
  To: bpf, linux-arm-kernel
  Cc: Peilin Ye, bpf, Xu Kuohai, Eduard Zingerman, David Vernet,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Jonathan Corbet,
	Paul E. McKenney, Puranjay Mohan, Catalin Marinas, Will Deacon,
	Quentin Monnet, Mykola Lysenko, Shuah Khan, Josh Don,
	Barret Rhoden, Neel Natu, Benjamin Segall, linux-kernel

Currently, check_atomic() only handles atomic read-modify-write (RMW)
instructions.  Since we are planning to introduce other types of atomic
instructions (i.e., atomic load/store), extract the existing RMW
handling logic into its own function named check_atomic_rmw().

Signed-off-by: Peilin Ye <yepeilin@google.com>
---
 kernel/bpf/verifier.c | 40 ++++++++++++++++++++++------------------
 1 file changed, 22 insertions(+), 18 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 0935b72fe716..2b3caa7549af 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -7536,28 +7536,12 @@ static int check_mem_access(struct bpf_verifier_env *env, int insn_idx, u32 regn
 static int save_aux_ptr_type(struct bpf_verifier_env *env, enum bpf_reg_type type,
 			     bool allow_trust_mismatch);
 
-static int check_atomic(struct bpf_verifier_env *env, int insn_idx, struct bpf_insn *insn)
+static int check_atomic_rmw(struct bpf_verifier_env *env, int insn_idx,
+			    struct bpf_insn *insn)
 {
 	int load_reg;
 	int err;
 
-	switch (insn->imm) {
-	case BPF_ADD:
-	case BPF_ADD | BPF_FETCH:
-	case BPF_AND:
-	case BPF_AND | BPF_FETCH:
-	case BPF_OR:
-	case BPF_OR | BPF_FETCH:
-	case BPF_XOR:
-	case BPF_XOR | BPF_FETCH:
-	case BPF_XCHG:
-	case BPF_CMPXCHG:
-		break;
-	default:
-		verbose(env, "BPF_ATOMIC uses invalid atomic opcode %02x\n", insn->imm);
-		return -EINVAL;
-	}
-
 	if (BPF_SIZE(insn->code) != BPF_W && BPF_SIZE(insn->code) != BPF_DW) {
 		verbose(env, "invalid atomic operand size\n");
 		return -EINVAL;
@@ -7641,6 +7625,26 @@ static int check_atomic(struct bpf_verifier_env *env, int insn_idx, struct bpf_i
 	return 0;
 }
 
+static int check_atomic(struct bpf_verifier_env *env, int insn_idx, struct bpf_insn *insn)
+{
+	switch (insn->imm) {
+	case BPF_ADD:
+	case BPF_ADD | BPF_FETCH:
+	case BPF_AND:
+	case BPF_AND | BPF_FETCH:
+	case BPF_OR:
+	case BPF_OR | BPF_FETCH:
+	case BPF_XOR:
+	case BPF_XOR | BPF_FETCH:
+	case BPF_XCHG:
+	case BPF_CMPXCHG:
+		return check_atomic_rmw(env, insn_idx, insn);
+	default:
+		verbose(env, "BPF_ATOMIC uses invalid atomic opcode %02x\n", insn->imm);
+		return -EINVAL;
+	}
+}
+
 /* When register 'regno' is used to read the stack (either directly or through
  * a helper function) make sure that it's within stack boundary and, depending
  * on the access type and privileges, that all elements of the stack are
-- 
2.48.1.262.g85cc9f2d1e-goog


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH bpf-next v1 3/8] bpf: Introduce load-acquire and store-release instructions
  2025-01-25  2:16 [PATCH bpf-next v1 0/8] Introduce load-acquire and store-release BPF instructions Peilin Ye
  2025-01-25  2:17 ` [PATCH bpf-next v1 1/8] bpf/verifier: Factor out atomic_ptr_type_ok() Peilin Ye
  2025-01-25  2:18 ` [PATCH bpf-next v1 2/8] bpf/verifier: Factor out check_atomic_rmw() Peilin Ye
@ 2025-01-25  2:18 ` Peilin Ye
  2025-01-29  0:19   ` Eduard Zingerman
                     ` (2 more replies)
  2025-01-25  2:18 ` [PATCH bpf-next v1 4/8] arm64: insn: Add BIT(23) to {load,store}_ex's mask Peilin Ye
                   ` (4 subsequent siblings)
  7 siblings, 3 replies; 27+ messages in thread
From: Peilin Ye @ 2025-01-25  2:18 UTC (permalink / raw)
  To: bpf, linux-arm-kernel
  Cc: Peilin Ye, bpf, Xu Kuohai, Eduard Zingerman, David Vernet,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Jonathan Corbet,
	Paul E. McKenney, Puranjay Mohan, Catalin Marinas, Will Deacon,
	Quentin Monnet, Mykola Lysenko, Shuah Khan, Josh Don,
	Barret Rhoden, Neel Natu, Benjamin Segall, linux-kernel

Introduce BPF instructions with load-acquire and store-release
semantics, as discussed in [1].  The following new flags are defined:

  BPF_ATOMIC_LOAD         0x10
  BPF_ATOMIC_STORE        0x20
  BPF_ATOMIC_TYPE(imm)    ((imm) & 0xf0)

  BPF_RELAXED        0x0
  BPF_ACQUIRE        0x1
  BPF_RELEASE        0x2
  BPF_ACQ_REL        0x3
  BPF_SEQ_CST        0x4

  BPF_LOAD_ACQ       (BPF_ATOMIC_LOAD | BPF_ACQUIRE)
  BPF_STORE_REL      (BPF_ATOMIC_STORE | BPF_RELEASE)

A "load-acquire" is a BPF_STX | BPF_ATOMIC instruction with the 'imm'
field set to BPF_LOAD_ACQ (0x11).

Similarly, a "store-release" is a BPF_STX | BPF_ATOMIC instruction with
the 'imm' field set to BPF_STORE_REL (0x22).

Unlike existing atomic operations that only support BPF_W (32-bit) and
BPF_DW (64-bit) size modifiers, load-acquires and store-releases also
support BPF_B (8-bit) and BPF_H (16-bit).  An 8- or 16-bit load-acquire
zero-extends the value before writing it to a 32-bit register, just like
ARM64 instruction LDARH and friends.

As an example, consider the following 64-bit load-acquire BPF
instruction:

  db 10 00 00 11 00 00 00  r0 = load_acquire((u64 *)(r1 + 0x0))

  opcode (0xdb): BPF_ATOMIC | BPF_DW | BPF_STX
  imm (0x00000011): BPF_LOAD_ACQ

Similarly, a 16-bit BPF store-release:

  cb 21 00 00 22 00 00 00  store_release((u16 *)(r1 + 0x0), w2)

  opcode (0xcb): BPF_ATOMIC | BPF_H | BPF_STX
  imm (0x00000022): BPF_STORE_REL

[1] https://lore.kernel.org/all/20240729183246.4110549-1-yepeilin@google.com/

Signed-off-by: Peilin Ye <yepeilin@google.com>
---
 include/linux/filter.h         |  2 +
 include/uapi/linux/bpf.h       | 13 +++++
 kernel/bpf/core.c              | 41 +++++++++++++-
 kernel/bpf/disasm.c            | 12 +++++
 kernel/bpf/verifier.c          | 99 ++++++++++++++++++++++++++++++++--
 tools/include/uapi/linux/bpf.h | 13 +++++
 6 files changed, 175 insertions(+), 5 deletions(-)

diff --git a/include/linux/filter.h b/include/linux/filter.h
index a3ea46281595..e36812a5b01f 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -364,6 +364,8 @@ static inline bool insn_is_cast_user(const struct bpf_insn *insn)
  *   BPF_XOR | BPF_FETCH      src_reg = atomic_fetch_xor(dst_reg + off16, src_reg);
  *   BPF_XCHG                 src_reg = atomic_xchg(dst_reg + off16, src_reg)
  *   BPF_CMPXCHG              r0 = atomic_cmpxchg(dst_reg + off16, r0, src_reg)
+ *   BPF_LOAD_ACQ             dst_reg = smp_load_acquire(src_reg + off16)
+ *   BPF_STORE_REL            smp_store_release(dst_reg + off16, src_reg)
  */
 
 #define BPF_ATOMIC_OP(SIZE, OP, DST, SRC, OFF)			\
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 2acf9b336371..4a20a125eb46 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -51,6 +51,19 @@
 #define BPF_XCHG	(0xe0 | BPF_FETCH)	/* atomic exchange */
 #define BPF_CMPXCHG	(0xf0 | BPF_FETCH)	/* atomic compare-and-write */
 
+#define BPF_ATOMIC_LOAD		0x10
+#define BPF_ATOMIC_STORE	0x20
+#define BPF_ATOMIC_TYPE(imm)	((imm) & 0xf0)
+
+#define BPF_RELAXED	0x00
+#define BPF_ACQUIRE	0x01
+#define BPF_RELEASE	0x02
+#define BPF_ACQ_REL	0x03
+#define BPF_SEQ_CST	0x04
+
+#define BPF_LOAD_ACQ	(BPF_ATOMIC_LOAD | BPF_ACQUIRE)		/* load-acquire */
+#define BPF_STORE_REL	(BPF_ATOMIC_STORE | BPF_RELEASE)	/* store-release */
+
 enum bpf_cond_pseudo_jmp {
 	BPF_MAY_GOTO = 0,
 };
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index da729cbbaeb9..ab082ab9d535 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -1663,14 +1663,17 @@ EXPORT_SYMBOL_GPL(__bpf_call_base);
 	INSN_3(JMP, JSET, K),			\
 	INSN_2(JMP, JA),			\
 	INSN_2(JMP32, JA),			\
+	/* Atomic operations. */		\
+	INSN_3(STX, ATOMIC, B),			\
+	INSN_3(STX, ATOMIC, H),			\
+	INSN_3(STX, ATOMIC, W),			\
+	INSN_3(STX, ATOMIC, DW),		\
 	/* Store instructions. */		\
 	/*   Register based. */			\
 	INSN_3(STX, MEM,  B),			\
 	INSN_3(STX, MEM,  H),			\
 	INSN_3(STX, MEM,  W),			\
 	INSN_3(STX, MEM,  DW),			\
-	INSN_3(STX, ATOMIC, W),			\
-	INSN_3(STX, ATOMIC, DW),		\
 	/*   Immediate based. */		\
 	INSN_3(ST, MEM, B),			\
 	INSN_3(ST, MEM, H),			\
@@ -2169,6 +2172,8 @@ static u64 ___bpf_prog_run(u64 *regs, const struct bpf_insn *insn)
 
 	STX_ATOMIC_DW:
 	STX_ATOMIC_W:
+	STX_ATOMIC_H:
+	STX_ATOMIC_B:
 		switch (IMM) {
 		ATOMIC_ALU_OP(BPF_ADD, add)
 		ATOMIC_ALU_OP(BPF_AND, and)
@@ -2196,6 +2201,38 @@ static u64 ___bpf_prog_run(u64 *regs, const struct bpf_insn *insn)
 					(atomic64_t *)(unsigned long) (DST + insn->off),
 					(u64) BPF_R0, (u64) SRC);
 			break;
+		case BPF_LOAD_ACQ:
+			switch (BPF_SIZE(insn->code)) {
+#define LOAD_ACQUIRE(SIZEOP, SIZE)				\
+			case BPF_##SIZEOP:			\
+				DST = (SIZE)smp_load_acquire(	\
+					(SIZE *)(unsigned long)(SRC + insn->off));	\
+				break;
+			LOAD_ACQUIRE(B,   u8)
+			LOAD_ACQUIRE(H,  u16)
+			LOAD_ACQUIRE(W,  u32)
+			LOAD_ACQUIRE(DW, u64)
+#undef LOAD_ACQUIRE
+			default:
+				goto default_label;
+			}
+			break;
+		case BPF_STORE_REL:
+			switch (BPF_SIZE(insn->code)) {
+#define STORE_RELEASE(SIZEOP, SIZE)			\
+			case BPF_##SIZEOP:		\
+				smp_store_release(	\
+					(SIZE *)(unsigned long)(DST + insn->off), (SIZE)SRC);	\
+				break;
+			STORE_RELEASE(B,   u8)
+			STORE_RELEASE(H,  u16)
+			STORE_RELEASE(W,  u32)
+			STORE_RELEASE(DW, u64)
+#undef STORE_RELEASE
+			default:
+				goto default_label;
+			}
+			break;
 
 		default:
 			goto default_label;
diff --git a/kernel/bpf/disasm.c b/kernel/bpf/disasm.c
index 309c4aa1b026..974d172d6735 100644
--- a/kernel/bpf/disasm.c
+++ b/kernel/bpf/disasm.c
@@ -267,6 +267,18 @@ void print_bpf_insn(const struct bpf_insn_cbs *cbs,
 				BPF_SIZE(insn->code) == BPF_DW ? "64" : "",
 				bpf_ldst_string[BPF_SIZE(insn->code) >> 3],
 				insn->dst_reg, insn->off, insn->src_reg);
+		} else if (BPF_MODE(insn->code) == BPF_ATOMIC &&
+			   insn->imm == BPF_LOAD_ACQ) {
+			verbose(cbs->private_data, "(%02x) r%d = load_acquire((%s *)(r%d %+d))\n",
+				insn->code, insn->dst_reg,
+				bpf_ldst_string[BPF_SIZE(insn->code) >> 3],
+				insn->src_reg, insn->off);
+		} else if (BPF_MODE(insn->code) == BPF_ATOMIC &&
+			   insn->imm == BPF_STORE_REL) {
+			verbose(cbs->private_data, "(%02x) store_release((%s *)(r%d %+d), r%d)\n",
+				insn->code,
+				bpf_ldst_string[BPF_SIZE(insn->code) >> 3],
+				insn->dst_reg, insn->off, insn->src_reg);
 		} else {
 			verbose(cbs->private_data, "BUG_%02x\n", insn->code);
 		}
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 2b3caa7549af..e44abe22aac9 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -579,6 +579,13 @@ static bool is_cmpxchg_insn(const struct bpf_insn *insn)
 	       insn->imm == BPF_CMPXCHG;
 }
 
+static bool is_atomic_load_insn(const struct bpf_insn *insn)
+{
+	return BPF_CLASS(insn->code) == BPF_STX &&
+	       BPF_MODE(insn->code) == BPF_ATOMIC &&
+	       BPF_ATOMIC_TYPE(insn->imm) == BPF_ATOMIC_LOAD;
+}
+
 static int __get_spi(s32 off)
 {
 	return (-off - 1) / BPF_REG_SIZE;
@@ -3481,7 +3488,7 @@ static bool is_reg64(struct bpf_verifier_env *env, struct bpf_insn *insn,
 	}
 
 	if (class == BPF_STX) {
-		/* BPF_STX (including atomic variants) has multiple source
+		/* BPF_STX (including atomic variants) has one or more source
 		 * operands, one of which is a ptr. Check whether the caller is
 		 * asking about it.
 		 */
@@ -4095,7 +4102,7 @@ static int backtrack_insn(struct bpf_verifier_env *env, int idx, int subseq_idx,
 			   * dreg still needs precision before this insn
 			   */
 		}
-	} else if (class == BPF_LDX) {
+	} else if (class == BPF_LDX || is_atomic_load_insn(insn)) {
 		if (!bt_is_reg_set(bt, dreg))
 			return 0;
 		bt_clear_reg(bt, dreg);
@@ -7625,6 +7632,86 @@ static int check_atomic_rmw(struct bpf_verifier_env *env, int insn_idx,
 	return 0;
 }
 
+static int check_atomic_load(struct bpf_verifier_env *env, int insn_idx,
+			     struct bpf_insn *insn)
+{
+	struct bpf_reg_state *regs = cur_regs(env);
+	int err;
+
+	err = check_reg_arg(env, insn->src_reg, SRC_OP);
+	if (err)
+		return err;
+
+	err = check_reg_arg(env, insn->dst_reg, DST_OP_NO_MARK);
+	if (err)
+		return err;
+
+	if (!atomic_ptr_type_ok(env, insn->src_reg, insn)) {
+		verbose(env, "BPF_ATOMIC loads from R%d %s is not allowed\n",
+			insn->src_reg,
+			reg_type_str(env, reg_state(env, insn->src_reg)->type));
+		return -EACCES;
+	}
+
+	if (is_arena_reg(env, insn->src_reg)) {
+		err = save_aux_ptr_type(env, PTR_TO_ARENA, false);
+		if (err)
+			return err;
+	}
+
+	/* Check whether we can read the memory. */
+	err = check_mem_access(env, insn_idx, insn->src_reg, insn->off,
+			       BPF_SIZE(insn->code), BPF_READ, insn->dst_reg,
+			       true, false);
+	if (err)
+		return err;
+
+	err = reg_bounds_sanity_check(env, &regs[insn->dst_reg], "atomic_load");
+	if (err)
+		return err;
+	return 0;
+}
+
+static int check_atomic_store(struct bpf_verifier_env *env, int insn_idx,
+			      struct bpf_insn *insn)
+{
+	int err;
+
+	err = check_reg_arg(env, insn->src_reg, SRC_OP);
+	if (err)
+		return err;
+
+	err = check_reg_arg(env, insn->dst_reg, SRC_OP);
+	if (err)
+		return err;
+
+	if (is_pointer_value(env, insn->src_reg)) {
+		verbose(env, "R%d leaks addr into mem\n", insn->src_reg);
+		return -EACCES;
+	}
+
+	if (!atomic_ptr_type_ok(env, insn->dst_reg, insn)) {
+		verbose(env, "BPF_ATOMIC stores into R%d %s is not allowed\n",
+			insn->dst_reg,
+			reg_type_str(env, reg_state(env, insn->dst_reg)->type));
+		return -EACCES;
+	}
+
+	if (is_arena_reg(env, insn->dst_reg)) {
+		err = save_aux_ptr_type(env, PTR_TO_ARENA, false);
+		if (err)
+			return err;
+	}
+
+	/* Check whether we can write into the memory. */
+	err = check_mem_access(env, insn_idx, insn->dst_reg, insn->off,
+			       BPF_SIZE(insn->code), BPF_WRITE, insn->src_reg,
+			       true, false);
+	if (err)
+		return err;
+	return 0;
+}
+
 static int check_atomic(struct bpf_verifier_env *env, int insn_idx, struct bpf_insn *insn)
 {
 	switch (insn->imm) {
@@ -7639,6 +7726,10 @@ static int check_atomic(struct bpf_verifier_env *env, int insn_idx, struct bpf_i
 	case BPF_XCHG:
 	case BPF_CMPXCHG:
 		return check_atomic_rmw(env, insn_idx, insn);
+	case BPF_LOAD_ACQ:
+		return check_atomic_load(env, insn_idx, insn);
+	case BPF_STORE_REL:
+		return check_atomic_store(env, insn_idx, insn);
 	default:
 		verbose(env, "BPF_ATOMIC uses invalid atomic opcode %02x\n", insn->imm);
 		return -EINVAL;
@@ -20420,7 +20511,9 @@ static int convert_ctx_accesses(struct bpf_verifier_env *env)
 			   insn->code == (BPF_ST | BPF_MEM | BPF_W) ||
 			   insn->code == (BPF_ST | BPF_MEM | BPF_DW)) {
 			type = BPF_WRITE;
-		} else if ((insn->code == (BPF_STX | BPF_ATOMIC | BPF_W) ||
+		} else if ((insn->code == (BPF_STX | BPF_ATOMIC | BPF_B) ||
+			    insn->code == (BPF_STX | BPF_ATOMIC | BPF_H) ||
+			    insn->code == (BPF_STX | BPF_ATOMIC | BPF_W) ||
 			    insn->code == (BPF_STX | BPF_ATOMIC | BPF_DW)) &&
 			   env->insn_aux_data[i + delta].ptr_type == PTR_TO_ARENA) {
 			insn->code = BPF_STX | BPF_PROBE_ATOMIC | BPF_SIZE(insn->code);
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 2acf9b336371..4a20a125eb46 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -51,6 +51,19 @@
 #define BPF_XCHG	(0xe0 | BPF_FETCH)	/* atomic exchange */
 #define BPF_CMPXCHG	(0xf0 | BPF_FETCH)	/* atomic compare-and-write */
 
+#define BPF_ATOMIC_LOAD		0x10
+#define BPF_ATOMIC_STORE	0x20
+#define BPF_ATOMIC_TYPE(imm)	((imm) & 0xf0)
+
+#define BPF_RELAXED	0x00
+#define BPF_ACQUIRE	0x01
+#define BPF_RELEASE	0x02
+#define BPF_ACQ_REL	0x03
+#define BPF_SEQ_CST	0x04
+
+#define BPF_LOAD_ACQ	(BPF_ATOMIC_LOAD | BPF_ACQUIRE)		/* load-acquire */
+#define BPF_STORE_REL	(BPF_ATOMIC_STORE | BPF_RELEASE)	/* store-release */
+
 enum bpf_cond_pseudo_jmp {
 	BPF_MAY_GOTO = 0,
 };
-- 
2.48.1.262.g85cc9f2d1e-goog


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH bpf-next v1 4/8] arm64: insn: Add BIT(23) to {load,store}_ex's mask
  2025-01-25  2:16 [PATCH bpf-next v1 0/8] Introduce load-acquire and store-release BPF instructions Peilin Ye
                   ` (2 preceding siblings ...)
  2025-01-25  2:18 ` [PATCH bpf-next v1 3/8] bpf: Introduce load-acquire and store-release instructions Peilin Ye
@ 2025-01-25  2:18 ` Peilin Ye
  2025-01-25  2:19 ` [PATCH bpf-next v1 5/8] arm64: insn: Add load-acquire and store-release instructions Peilin Ye
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 27+ messages in thread
From: Peilin Ye @ 2025-01-25  2:18 UTC (permalink / raw)
  To: bpf, linux-arm-kernel
  Cc: Peilin Ye, bpf, Xu Kuohai, Eduard Zingerman, David Vernet,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Jonathan Corbet,
	Paul E. McKenney, Puranjay Mohan, Catalin Marinas, Will Deacon,
	Quentin Monnet, Mykola Lysenko, Shuah Khan, Josh Don,
	Barret Rhoden, Neel Natu, Benjamin Segall, linux-kernel

We are planning to add load-acquire (LDAR{,B,H}) and store-release
(STLR{,B,H}) instructions to insn.{c,h}; add BIT(23) to mask of load_ex
and store_ex to prevent aarch64_insn_is_{load,store}_ex() from returning
false-positives for load-acquire and store-release instructions.

Reference: Arm Architecture Reference Manual (ARM DDI 0487K.a,
           ID032224),

  * C6.2.228 LDXR
  * C6.2.165 LDAXR
  * C6.2.161 LDAR
  * C6.2.393 STXR
  * C6.2.360 STLXR
  * C6.2.353 STLR

Signed-off-by: Peilin Ye <yepeilin@google.com>
---
 arch/arm64/include/asm/insn.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/insn.h b/arch/arm64/include/asm/insn.h
index e390c432f546..2d8316b3abaf 100644
--- a/arch/arm64/include/asm/insn.h
+++ b/arch/arm64/include/asm/insn.h
@@ -351,8 +351,8 @@ __AARCH64_INSN_FUNCS(ldr_imm,	0x3FC00000, 0x39400000)
 __AARCH64_INSN_FUNCS(ldr_lit,	0xBF000000, 0x18000000)
 __AARCH64_INSN_FUNCS(ldrsw_lit,	0xFF000000, 0x98000000)
 __AARCH64_INSN_FUNCS(exclusive,	0x3F800000, 0x08000000)
-__AARCH64_INSN_FUNCS(load_ex,	0x3F400000, 0x08400000)
-__AARCH64_INSN_FUNCS(store_ex,	0x3F400000, 0x08000000)
+__AARCH64_INSN_FUNCS(load_ex,	0x3FC00000, 0x08400000)
+__AARCH64_INSN_FUNCS(store_ex,	0x3FC00000, 0x08000000)
 __AARCH64_INSN_FUNCS(mops,	0x3B200C00, 0x19000400)
 __AARCH64_INSN_FUNCS(stp,	0x7FC00000, 0x29000000)
 __AARCH64_INSN_FUNCS(ldp,	0x7FC00000, 0x29400000)
-- 
2.48.1.262.g85cc9f2d1e-goog


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH bpf-next v1 5/8] arm64: insn: Add load-acquire and store-release instructions
  2025-01-25  2:16 [PATCH bpf-next v1 0/8] Introduce load-acquire and store-release BPF instructions Peilin Ye
                   ` (3 preceding siblings ...)
  2025-01-25  2:18 ` [PATCH bpf-next v1 4/8] arm64: insn: Add BIT(23) to {load,store}_ex's mask Peilin Ye
@ 2025-01-25  2:19 ` Peilin Ye
  2025-01-25  2:19 ` [PATCH bpf-next v1 6/8] bpf, arm64: Support " Peilin Ye
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 27+ messages in thread
From: Peilin Ye @ 2025-01-25  2:19 UTC (permalink / raw)
  To: bpf, linux-arm-kernel
  Cc: Peilin Ye, bpf, Xu Kuohai, Eduard Zingerman, David Vernet,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Jonathan Corbet,
	Paul E. McKenney, Puranjay Mohan, Catalin Marinas, Will Deacon,
	Quentin Monnet, Mykola Lysenko, Shuah Khan, Josh Don,
	Barret Rhoden, Neel Natu, Benjamin Segall, linux-kernel

Add load-acquire ("load_acq", LDAR{,B,H}) and store-release
("store_rel", STLR{,B,H}) instructions.  Breakdown of encoding:

                                size        L   (Rs)  o0 (Rt2) Rn    Rt
             mask (0x3fdffc00): 00 111111 1 1 0 11111 1  11111 00000 00000
  value, load_acq (0x08dffc00): 00 001000 1 1 0 11111 1  11111 00000 00000
 value, store_rel (0x089ffc00): 00 001000 1 0 0 11111 1  11111 00000 00000

As suggested by Xu [1], include all Should-Be-One (SBO) bits ("Rs" and
"Rt2" fields) in the "mask" and "value" numbers.

It is worth noting that we are adding the "no offset" variant of STLR
instead of the "pre-index" variant, which has a different encoding.

Reference: Arm Architecture Reference Manual (ARM DDI 0487K.a,
           ID032224),

  * C6.2.161 LDAR
  * C6.2.353 STLR

[1] https://lore.kernel.org/bpf/4e6641ce-3f1e-4251-8daf-4dd4b77d08c4@huaweicloud.com/

Signed-off-by: Peilin Ye <yepeilin@google.com>
---
 arch/arm64/include/asm/insn.h |  8 ++++++++
 arch/arm64/lib/insn.c         | 28 ++++++++++++++++++++++++++++
 2 files changed, 36 insertions(+)

diff --git a/arch/arm64/include/asm/insn.h b/arch/arm64/include/asm/insn.h
index 2d8316b3abaf..39577f1d079a 100644
--- a/arch/arm64/include/asm/insn.h
+++ b/arch/arm64/include/asm/insn.h
@@ -188,8 +188,10 @@ enum aarch64_insn_ldst_type {
 	AARCH64_INSN_LDST_STORE_PAIR_PRE_INDEX,
 	AARCH64_INSN_LDST_LOAD_PAIR_POST_INDEX,
 	AARCH64_INSN_LDST_STORE_PAIR_POST_INDEX,
+	AARCH64_INSN_LDST_LOAD_ACQ,
 	AARCH64_INSN_LDST_LOAD_EX,
 	AARCH64_INSN_LDST_LOAD_ACQ_EX,
+	AARCH64_INSN_LDST_STORE_REL,
 	AARCH64_INSN_LDST_STORE_EX,
 	AARCH64_INSN_LDST_STORE_REL_EX,
 	AARCH64_INSN_LDST_SIGNED_LOAD_IMM_OFFSET,
@@ -351,6 +353,8 @@ __AARCH64_INSN_FUNCS(ldr_imm,	0x3FC00000, 0x39400000)
 __AARCH64_INSN_FUNCS(ldr_lit,	0xBF000000, 0x18000000)
 __AARCH64_INSN_FUNCS(ldrsw_lit,	0xFF000000, 0x98000000)
 __AARCH64_INSN_FUNCS(exclusive,	0x3F800000, 0x08000000)
+__AARCH64_INSN_FUNCS(load_acq,  0x3FDFFC00, 0x08DFFC00)
+__AARCH64_INSN_FUNCS(store_rel, 0x3FDFFC00, 0x089FFC00)
 __AARCH64_INSN_FUNCS(load_ex,	0x3FC00000, 0x08400000)
 __AARCH64_INSN_FUNCS(store_ex,	0x3FC00000, 0x08000000)
 __AARCH64_INSN_FUNCS(mops,	0x3B200C00, 0x19000400)
@@ -602,6 +606,10 @@ u32 aarch64_insn_gen_load_store_pair(enum aarch64_insn_register reg1,
 				     int offset,
 				     enum aarch64_insn_variant variant,
 				     enum aarch64_insn_ldst_type type);
+u32 aarch64_insn_gen_load_acq_store_rel(enum aarch64_insn_register reg,
+					enum aarch64_insn_register base,
+					enum aarch64_insn_size_type size,
+					enum aarch64_insn_ldst_type type);
 u32 aarch64_insn_gen_load_store_ex(enum aarch64_insn_register reg,
 				   enum aarch64_insn_register base,
 				   enum aarch64_insn_register state,
diff --git a/arch/arm64/lib/insn.c b/arch/arm64/lib/insn.c
index b008a9b46a7f..f8b83f4d9171 100644
--- a/arch/arm64/lib/insn.c
+++ b/arch/arm64/lib/insn.c
@@ -540,6 +540,34 @@ u32 aarch64_insn_gen_load_store_pair(enum aarch64_insn_register reg1,
 					     offset >> shift);
 }
 
+u32 aarch64_insn_gen_load_acq_store_rel(enum aarch64_insn_register reg,
+					enum aarch64_insn_register base,
+					enum aarch64_insn_size_type size,
+					enum aarch64_insn_ldst_type type)
+{
+	u32 insn;
+
+	switch (type) {
+	case AARCH64_INSN_LDST_LOAD_ACQ:
+		insn = aarch64_insn_get_load_acq_value();
+		break;
+	case AARCH64_INSN_LDST_STORE_REL:
+		insn = aarch64_insn_get_store_rel_value();
+		break;
+	default:
+		pr_err("%s: unknown load-acquire/store-release encoding %d\n", __func__, type);
+		return AARCH64_BREAK_FAULT;
+	}
+
+	insn = aarch64_insn_encode_ldst_size(size, insn);
+
+	insn = aarch64_insn_encode_register(AARCH64_INSN_REGTYPE_RT, insn,
+					    reg);
+
+	return aarch64_insn_encode_register(AARCH64_INSN_REGTYPE_RN, insn,
+					    base);
+}
+
 u32 aarch64_insn_gen_load_store_ex(enum aarch64_insn_register reg,
 				   enum aarch64_insn_register base,
 				   enum aarch64_insn_register state,
-- 
2.48.1.262.g85cc9f2d1e-goog


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH bpf-next v1 6/8] bpf, arm64: Support load-acquire and store-release instructions
  2025-01-25  2:16 [PATCH bpf-next v1 0/8] Introduce load-acquire and store-release BPF instructions Peilin Ye
                   ` (4 preceding siblings ...)
  2025-01-25  2:19 ` [PATCH bpf-next v1 5/8] arm64: insn: Add load-acquire and store-release instructions Peilin Ye
@ 2025-01-25  2:19 ` Peilin Ye
  2025-01-25  2:19 ` [PATCH bpf-next v1 7/8] selftests/bpf: Add selftests for " Peilin Ye
  2025-01-25  2:19 ` [PATCH bpf-next v1 8/8] bpf, docs: Update instruction-set.rst " Peilin Ye
  7 siblings, 0 replies; 27+ messages in thread
From: Peilin Ye @ 2025-01-25  2:19 UTC (permalink / raw)
  To: bpf, linux-arm-kernel
  Cc: Peilin Ye, bpf, Xu Kuohai, Eduard Zingerman, David Vernet,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Jonathan Corbet,
	Paul E. McKenney, Puranjay Mohan, Catalin Marinas, Will Deacon,
	Quentin Monnet, Mykola Lysenko, Shuah Khan, Josh Don,
	Barret Rhoden, Neel Natu, Benjamin Segall, linux-kernel

Support BPF load-acquire (BPF_LOAD_ACQ) and store-release
(BPF_STORE_REL) instructions in the arm64 JIT compiler.  For example:

  db 10 00 00 11 00 00 00  r0 = load_acquire((u64 *)(r1 + 0x0))
  95 00 00 00 00 00 00 00  exit

  opcode (0xdb): BPF_ATOMIC | BPF_DW | BPF_STX
  imm (0x00000011): BPF_LOAD_ACQ

The JIT compiler would emit an LDAR instruction for the above, e.g.:

  ldar  x7, [x0]

Similarly, consider the following 16-bit store-release:

  cb 21 00 00 22 00 00 00  store_release((u16 *)(r1 + 0x0), w2)
  95 00 00 00 00 00 00 00  exit

  opcode (0xcb): BPF_ATOMIC | BPF_H | BPF_STX
  imm (0x00000022): BPF_ATOMIC_STORE | BPF_RELEASE

An STLRH instruction would be emitted, e.g.:

  stlrh  w1, [x0]

For a complete mapping:

  load-acquire     8-bit  LDARB
 (BPF_LOAD_ACQ)   16-bit  LDARH
                  32-bit  LDAR (32-bit)
                  64-bit  LDAR (64-bit)
  store-release    8-bit  STLRB
 (BPF_STORE_REL)  16-bit  STLRH
                  32-bit  STLR (32-bit)
                  64-bit  STLR (64-bit)

Arena accesses are supported.
bpf_jit_supports_insn(..., /*in_arena=*/true) always returns true for
BPF_LOAD_ACQ and BPF_STORE_REL instructions, as they don't depend on
ARM64_HAS_LSE_ATOMICS.

Signed-off-by: Peilin Ye <yepeilin@google.com>
---
 arch/arm64/net/bpf_jit.h      | 20 ++++++++
 arch/arm64/net/bpf_jit_comp.c | 92 ++++++++++++++++++++++++++++++++++-
 2 files changed, 110 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/net/bpf_jit.h b/arch/arm64/net/bpf_jit.h
index b22ab2f97a30..a3b0e693a125 100644
--- a/arch/arm64/net/bpf_jit.h
+++ b/arch/arm64/net/bpf_jit.h
@@ -119,6 +119,26 @@
 	aarch64_insn_gen_load_store_ex(Rt, Rn, Rs, A64_SIZE(sf), \
 				       AARCH64_INSN_LDST_STORE_REL_EX)
 
+/* Load-acquire & store-release */
+#define A64_LDAR(Rt, Rn, size)  \
+	aarch64_insn_gen_load_acq_store_rel(Rt, Rn, AARCH64_INSN_SIZE_##size, \
+					    AARCH64_INSN_LDST_LOAD_ACQ)
+#define A64_STLR(Rt, Rn, size)  \
+	aarch64_insn_gen_load_acq_store_rel(Rt, Rn, AARCH64_INSN_SIZE_##size, \
+					    AARCH64_INSN_LDST_STORE_REL)
+
+/* Rt = [Rn] (load acquire) */
+#define A64_LDARB(Wt, Xn)	A64_LDAR(Wt, Xn, 8)
+#define A64_LDARH(Wt, Xn)	A64_LDAR(Wt, Xn, 16)
+#define A64_LDAR32(Wt, Xn)	A64_LDAR(Wt, Xn, 32)
+#define A64_LDAR64(Xt, Xn)	A64_LDAR(Xt, Xn, 64)
+
+/* [Rn] = Rt (store release) */
+#define A64_STLRB(Wt, Xn)	A64_STLR(Wt, Xn, 8)
+#define A64_STLRH(Wt, Xn)	A64_STLR(Wt, Xn, 16)
+#define A64_STLR32(Wt, Xn)	A64_STLR(Wt, Xn, 32)
+#define A64_STLR64(Xt, Xn)	A64_STLR(Xt, Xn, 64)
+
 /*
  * LSE atomics
  *
diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c
index 8446848edddb..488cbe094551 100644
--- a/arch/arm64/net/bpf_jit_comp.c
+++ b/arch/arm64/net/bpf_jit_comp.c
@@ -647,6 +647,87 @@ static int emit_bpf_tail_call(struct jit_ctx *ctx)
 	return 0;
 }
 
+static inline bool is_atomic_load_store(const s32 imm)
+{
+	const s32 type = BPF_ATOMIC_TYPE(imm);
+
+	return type == BPF_ATOMIC_LOAD || type == BPF_ATOMIC_STORE;
+}
+
+static int emit_atomic_load_store(const struct bpf_insn *insn, struct jit_ctx *ctx)
+{
+	const s32 type = BPF_ATOMIC_TYPE(insn->imm);
+	const s16 off = insn->off;
+	const u8 code = insn->code;
+	const bool arena = BPF_MODE(code) == BPF_PROBE_ATOMIC;
+	const u8 arena_vm_base = bpf2a64[ARENA_VM_START];
+	const u8 dst = bpf2a64[insn->dst_reg];
+	const u8 src = bpf2a64[insn->src_reg];
+	const u8 tmp = bpf2a64[TMP_REG_1];
+	u8 reg;
+
+	switch (type) {
+	case BPF_ATOMIC_LOAD:
+		reg = src;
+		break;
+	case BPF_ATOMIC_STORE:
+		reg = dst;
+		break;
+	default:
+		pr_err_once("unknown atomic load/store op type %02x\n", type);
+		return -EINVAL;
+	}
+
+	if (off) {
+		emit_a64_add_i(1, tmp, reg, tmp, off, ctx);
+		reg = tmp;
+	}
+	if (arena) {
+		emit(A64_ADD(1, tmp, reg, arena_vm_base), ctx);
+		reg = tmp;
+	}
+
+	switch (insn->imm) {
+	case BPF_LOAD_ACQ:
+		switch (BPF_SIZE(code)) {
+		case BPF_B:
+			emit(A64_LDARB(dst, reg), ctx);
+			break;
+		case BPF_H:
+			emit(A64_LDARH(dst, reg), ctx);
+			break;
+		case BPF_W:
+			emit(A64_LDAR32(dst, reg), ctx);
+			break;
+		case BPF_DW:
+			emit(A64_LDAR64(dst, reg), ctx);
+			break;
+		}
+		break;
+	case BPF_STORE_REL:
+		switch (BPF_SIZE(code)) {
+		case BPF_B:
+			emit(A64_STLRB(src, reg), ctx);
+			break;
+		case BPF_H:
+			emit(A64_STLRH(src, reg), ctx);
+			break;
+		case BPF_W:
+			emit(A64_STLR32(src, reg), ctx);
+			break;
+		case BPF_DW:
+			emit(A64_STLR64(src, reg), ctx);
+			break;
+		}
+		break;
+	default:
+		pr_err_once("unknown atomic load/store op code %02x\n", insn->imm);
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
 #ifdef CONFIG_ARM64_LSE_ATOMICS
 static int emit_lse_atomic(const struct bpf_insn *insn, struct jit_ctx *ctx)
 {
@@ -1641,11 +1722,17 @@ static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx,
 			return ret;
 		break;
 
+	case BPF_STX | BPF_ATOMIC | BPF_B:
+	case BPF_STX | BPF_ATOMIC | BPF_H:
 	case BPF_STX | BPF_ATOMIC | BPF_W:
 	case BPF_STX | BPF_ATOMIC | BPF_DW:
+	case BPF_STX | BPF_PROBE_ATOMIC | BPF_B:
+	case BPF_STX | BPF_PROBE_ATOMIC | BPF_H:
 	case BPF_STX | BPF_PROBE_ATOMIC | BPF_W:
 	case BPF_STX | BPF_PROBE_ATOMIC | BPF_DW:
-		if (cpus_have_cap(ARM64_HAS_LSE_ATOMICS))
+		if (is_atomic_load_store(insn->imm))
+			ret = emit_atomic_load_store(insn, ctx);
+		else if (cpus_have_cap(ARM64_HAS_LSE_ATOMICS))
 			ret = emit_lse_atomic(insn, ctx);
 		else
 			ret = emit_ll_sc_atomic(insn, ctx);
@@ -2669,7 +2756,8 @@ bool bpf_jit_supports_insn(struct bpf_insn *insn, bool in_arena)
 	switch (insn->code) {
 	case BPF_STX | BPF_ATOMIC | BPF_W:
 	case BPF_STX | BPF_ATOMIC | BPF_DW:
-		if (!cpus_have_cap(ARM64_HAS_LSE_ATOMICS))
+		if (!is_atomic_load_store(insn->imm) &&
+		    !cpus_have_cap(ARM64_HAS_LSE_ATOMICS))
 			return false;
 	}
 	return true;
-- 
2.48.1.262.g85cc9f2d1e-goog


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH bpf-next v1 7/8] selftests/bpf: Add selftests for load-acquire and store-release instructions
  2025-01-25  2:16 [PATCH bpf-next v1 0/8] Introduce load-acquire and store-release BPF instructions Peilin Ye
                   ` (5 preceding siblings ...)
  2025-01-25  2:19 ` [PATCH bpf-next v1 6/8] bpf, arm64: Support " Peilin Ye
@ 2025-01-25  2:19 ` Peilin Ye
  2025-01-29  1:06   ` Eduard Zingerman
  2025-01-25  2:19 ` [PATCH bpf-next v1 8/8] bpf, docs: Update instruction-set.rst " Peilin Ye
  7 siblings, 1 reply; 27+ messages in thread
From: Peilin Ye @ 2025-01-25  2:19 UTC (permalink / raw)
  To: bpf, linux-arm-kernel
  Cc: Peilin Ye, bpf, Xu Kuohai, Eduard Zingerman, David Vernet,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Jonathan Corbet,
	Paul E. McKenney, Puranjay Mohan, Catalin Marinas, Will Deacon,
	Quentin Monnet, Mykola Lysenko, Shuah Khan, Josh Don,
	Barret Rhoden, Neel Natu, Benjamin Segall, linux-kernel

Add several ./test_progs tests:

  - atomics/load_acquire
  - atomics/store_release
  - arena_atomics/load_acquire
  - arena_atomics/store_release
  - verifier_load_acquire/*
  - verifier_store_release/*
  - verifier_precision/bpf_load_acquire
  - verifier_precision/bpf_store_release

The last two tests are added to check if backtrack_insn() handles the
new instructions correctly.

Additionally, the last test also makes sure that the verifier
"remembers" the value (in src_reg) we store-release into e.g. a stack
slot.  For example, if we take a look at the test program:

    #0:  "r1 = 8;"
    #1:  "store_release((u64 *)(r10 - 8), r1);"
    #2:  "r1 = *(u64 *)(r10 - 8);"
    #3:  "r2 = r10;"
    #4:  "r2 += r1;"	/* mark_precise */
    #5:  "r0 = 0;"
    #6:  "exit;"

At #1, if the verifier doesn't remember that we wrote 8 to the stack,
then later at #4 we would be adding an unbounded scalar value to the
stack pointer, which would cause the program to be rejected:

  VERIFIER LOG:
  =============
...
  math between fp pointer and register with unbounded min value is not allowed

All new tests depend on the pre-defined __BPF_FEATURE_LOAD_ACQ_STORE_REL
feature macro, which implies -mcpu>=v4.

Signed-off-by: Peilin Ye <yepeilin@google.com>
---
 .../selftests/bpf/prog_tests/arena_atomics.c  |  61 ++++++-
 .../selftests/bpf/prog_tests/atomics.c        |  57 ++++++-
 .../selftests/bpf/prog_tests/verifier.c       |   4 +
 .../selftests/bpf/progs/arena_atomics.c       |  62 ++++++-
 tools/testing/selftests/bpf/progs/atomics.c   |  62 ++++++-
 .../bpf/progs/verifier_load_acquire.c         |  92 +++++++++++
 .../selftests/bpf/progs/verifier_precision.c  |  39 +++++
 .../bpf/progs/verifier_store_release.c        | 153 ++++++++++++++++++
 8 files changed, 524 insertions(+), 6 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/progs/verifier_load_acquire.c
 create mode 100644 tools/testing/selftests/bpf/progs/verifier_store_release.c

diff --git a/tools/testing/selftests/bpf/prog_tests/arena_atomics.c b/tools/testing/selftests/bpf/prog_tests/arena_atomics.c
index 26e7c06c6cb4..81d3575d7652 100644
--- a/tools/testing/selftests/bpf/prog_tests/arena_atomics.c
+++ b/tools/testing/selftests/bpf/prog_tests/arena_atomics.c
@@ -162,6 +162,60 @@ static void test_uaf(struct arena_atomics *skel)
 	ASSERT_EQ(skel->arena->uaf_recovery_fails, 0, "uaf_recovery_fails");
 }
 
+static void test_load_acquire(struct arena_atomics *skel)
+{
+	LIBBPF_OPTS(bpf_test_run_opts, topts);
+	int err, prog_fd;
+
+	if (skel->data->skip_lacq_srel_tests) {
+		printf("%s:SKIP:Clang does not support BPF load-acquire or addr_space_cast\n",
+		       __func__);
+		test__skip();
+		return;
+	}
+
+	/* No need to attach it, just run it directly */
+	prog_fd = bpf_program__fd(skel->progs.load_acquire);
+	err = bpf_prog_test_run_opts(prog_fd, &topts);
+	if (!ASSERT_OK(err, "test_run_opts err"))
+		return;
+	if (!ASSERT_OK(topts.retval, "test_run_opts retval"))
+		return;
+
+	ASSERT_EQ(skel->arena->load_acquire8_result, 0x12, "load_acquire8_result");
+	ASSERT_EQ(skel->arena->load_acquire16_result, 0x1234, "load_acquire16_result");
+	ASSERT_EQ(skel->arena->load_acquire32_result, 0x12345678, "load_acquire32_result");
+	ASSERT_EQ(skel->arena->load_acquire64_result, 0x1234567890abcdef,
+		  "load_acquire64_result");
+}
+
+static void test_store_release(struct arena_atomics *skel)
+{
+	LIBBPF_OPTS(bpf_test_run_opts, topts);
+	int err, prog_fd;
+
+	if (skel->data->skip_lacq_srel_tests) {
+		printf("%s:SKIP:Clang does not support BPF store-release or addr_space_cast\n",
+		       __func__);
+		test__skip();
+		return;
+	}
+
+	/* No need to attach it, just run it directly */
+	prog_fd = bpf_program__fd(skel->progs.store_release);
+	err = bpf_prog_test_run_opts(prog_fd, &topts);
+	if (!ASSERT_OK(err, "test_run_opts err"))
+		return;
+	if (!ASSERT_OK(topts.retval, "test_run_opts retval"))
+		return;
+
+	ASSERT_EQ(skel->arena->store_release8_result, 0x12, "store_release8_result");
+	ASSERT_EQ(skel->arena->store_release16_result, 0x1234, "store_release16_result");
+	ASSERT_EQ(skel->arena->store_release32_result, 0x12345678, "store_release32_result");
+	ASSERT_EQ(skel->arena->store_release64_result, 0x1234567890abcdef,
+		  "store_release64_result");
+}
+
 void test_arena_atomics(void)
 {
 	struct arena_atomics *skel;
@@ -171,7 +225,7 @@ void test_arena_atomics(void)
 	if (!ASSERT_OK_PTR(skel, "arena atomics skeleton open"))
 		return;
 
-	if (skel->data->skip_tests) {
+	if (skel->data->skip_all_tests) {
 		printf("%s:SKIP:no ENABLE_ATOMICS_TESTS or no addr_space_cast support in clang",
 		       __func__);
 		test__skip();
@@ -199,6 +253,11 @@ void test_arena_atomics(void)
 	if (test__start_subtest("uaf"))
 		test_uaf(skel);
 
+	if (test__start_subtest("load_acquire"))
+		test_load_acquire(skel);
+	if (test__start_subtest("store_release"))
+		test_store_release(skel);
+
 cleanup:
 	arena_atomics__destroy(skel);
 }
diff --git a/tools/testing/selftests/bpf/prog_tests/atomics.c b/tools/testing/selftests/bpf/prog_tests/atomics.c
index 13e101f370a1..5d7cff3eed2b 100644
--- a/tools/testing/selftests/bpf/prog_tests/atomics.c
+++ b/tools/testing/selftests/bpf/prog_tests/atomics.c
@@ -162,6 +162,56 @@ static void test_xchg(struct atomics_lskel *skel)
 	ASSERT_EQ(skel->bss->xchg32_result, 1, "xchg32_result");
 }
 
+static void test_load_acquire(struct atomics_lskel *skel)
+{
+	LIBBPF_OPTS(bpf_test_run_opts, topts);
+	int err, prog_fd;
+
+	if (skel->data->skip_lacq_srel_tests) {
+		printf("%s:SKIP:Clang does not support BPF load-acquire\n", __func__);
+		test__skip();
+		return;
+	}
+
+	/* No need to attach it, just run it directly */
+	prog_fd = skel->progs.load_acquire.prog_fd;
+	err = bpf_prog_test_run_opts(prog_fd, &topts);
+	if (!ASSERT_OK(err, "test_run_opts err"))
+		return;
+	if (!ASSERT_OK(topts.retval, "test_run_opts retval"))
+		return;
+
+	ASSERT_EQ(skel->bss->load_acquire8_result, 0x12, "load_acquire8_result");
+	ASSERT_EQ(skel->bss->load_acquire16_result, 0x1234, "load_acquire16_result");
+	ASSERT_EQ(skel->bss->load_acquire32_result, 0x12345678, "load_acquire32_result");
+	ASSERT_EQ(skel->bss->load_acquire64_result, 0x1234567890abcdef, "load_acquire64_result");
+}
+
+static void test_store_release(struct atomics_lskel *skel)
+{
+	LIBBPF_OPTS(bpf_test_run_opts, topts);
+	int err, prog_fd;
+
+	if (skel->data->skip_lacq_srel_tests) {
+		printf("%s:SKIP:Clang does not support BPF store-release\n", __func__);
+		test__skip();
+		return;
+	}
+
+	/* No need to attach it, just run it directly */
+	prog_fd = skel->progs.store_release.prog_fd;
+	err = bpf_prog_test_run_opts(prog_fd, &topts);
+	if (!ASSERT_OK(err, "test_run_opts err"))
+		return;
+	if (!ASSERT_OK(topts.retval, "test_run_opts retval"))
+		return;
+
+	ASSERT_EQ(skel->bss->store_release8_result, 0x12, "store_release8_result");
+	ASSERT_EQ(skel->bss->store_release16_result, 0x1234, "store_release16_result");
+	ASSERT_EQ(skel->bss->store_release32_result, 0x12345678, "store_release32_result");
+	ASSERT_EQ(skel->bss->store_release64_result, 0x1234567890abcdef, "store_release64_result");
+}
+
 void test_atomics(void)
 {
 	struct atomics_lskel *skel;
@@ -170,7 +220,7 @@ void test_atomics(void)
 	if (!ASSERT_OK_PTR(skel, "atomics skeleton load"))
 		return;
 
-	if (skel->data->skip_tests) {
+	if (skel->data->skip_all_tests) {
 		printf("%s:SKIP:no ENABLE_ATOMICS_TESTS (missing Clang BPF atomics support)",
 		       __func__);
 		test__skip();
@@ -193,6 +243,11 @@ void test_atomics(void)
 	if (test__start_subtest("xchg"))
 		test_xchg(skel);
 
+	if (test__start_subtest("load_acquire"))
+		test_load_acquire(skel);
+	if (test__start_subtest("store_release"))
+		test_store_release(skel);
+
 cleanup:
 	atomics_lskel__destroy(skel);
 }
diff --git a/tools/testing/selftests/bpf/prog_tests/verifier.c b/tools/testing/selftests/bpf/prog_tests/verifier.c
index 8a0e1ff8a2dc..8bdad4167cf5 100644
--- a/tools/testing/selftests/bpf/prog_tests/verifier.c
+++ b/tools/testing/selftests/bpf/prog_tests/verifier.c
@@ -45,6 +45,7 @@
 #include "verifier_ldsx.skel.h"
 #include "verifier_leak_ptr.skel.h"
 #include "verifier_linked_scalars.skel.h"
+#include "verifier_load_acquire.skel.h"
 #include "verifier_loops1.skel.h"
 #include "verifier_lwt.skel.h"
 #include "verifier_map_in_map.skel.h"
@@ -80,6 +81,7 @@
 #include "verifier_spill_fill.skel.h"
 #include "verifier_spin_lock.skel.h"
 #include "verifier_stack_ptr.skel.h"
+#include "verifier_store_release.skel.h"
 #include "verifier_subprog_precision.skel.h"
 #include "verifier_subreg.skel.h"
 #include "verifier_tailcall_jit.skel.h"
@@ -173,6 +175,7 @@ void test_verifier_int_ptr(void)              { RUN(verifier_int_ptr); }
 void test_verifier_iterating_callbacks(void)  { RUN(verifier_iterating_callbacks); }
 void test_verifier_jeq_infer_not_null(void)   { RUN(verifier_jeq_infer_not_null); }
 void test_verifier_jit_convergence(void)      { RUN(verifier_jit_convergence); }
+void test_verifier_load_acquire(void)	      { RUN(verifier_load_acquire); }
 void test_verifier_ld_ind(void)               { RUN(verifier_ld_ind); }
 void test_verifier_ldsx(void)                  { RUN(verifier_ldsx); }
 void test_verifier_leak_ptr(void)             { RUN(verifier_leak_ptr); }
@@ -211,6 +214,7 @@ void test_verifier_sockmap_mutate(void)       { RUN(verifier_sockmap_mutate); }
 void test_verifier_spill_fill(void)           { RUN(verifier_spill_fill); }
 void test_verifier_spin_lock(void)            { RUN(verifier_spin_lock); }
 void test_verifier_stack_ptr(void)            { RUN(verifier_stack_ptr); }
+void test_verifier_store_release(void)	      { RUN(verifier_store_release); }
 void test_verifier_subprog_precision(void)    { RUN(verifier_subprog_precision); }
 void test_verifier_subreg(void)               { RUN(verifier_subreg); }
 void test_verifier_tailcall_jit(void)         { RUN(verifier_tailcall_jit); }
diff --git a/tools/testing/selftests/bpf/progs/arena_atomics.c b/tools/testing/selftests/bpf/progs/arena_atomics.c
index 40dd57fca5cc..fe8b67d9c87b 100644
--- a/tools/testing/selftests/bpf/progs/arena_atomics.c
+++ b/tools/testing/selftests/bpf/progs/arena_atomics.c
@@ -19,9 +19,15 @@ struct {
 } arena SEC(".maps");
 
 #if defined(ENABLE_ATOMICS_TESTS) && defined(__BPF_FEATURE_ADDR_SPACE_CAST)
-bool skip_tests __attribute((__section__(".data"))) = false;
+bool skip_all_tests __attribute((__section__(".data"))) = false;
 #else
-bool skip_tests = true;
+bool skip_all_tests = true;
+#endif
+
+#if defined(__BPF_FEATURE_LOAD_ACQ_STORE_REL) && defined(__BPF_FEATURE_ADDR_SPACE_CAST)
+bool skip_lacq_srel_tests __attribute((__section__(".data"))) = false;
+#else
+bool skip_lacq_srel_tests = true;
 #endif
 
 __u32 pid = 0;
@@ -274,4 +280,56 @@ int uaf(const void *ctx)
 	return 0;
 }
 
+__u8 __arena_global load_acquire8_value = 0x12;
+__u16 __arena_global load_acquire16_value = 0x1234;
+__u32 __arena_global load_acquire32_value = 0x12345678;
+__u64 __arena_global load_acquire64_value = 0x1234567890abcdef;
+
+__u8 __arena_global load_acquire8_result = 0;
+__u16 __arena_global load_acquire16_result = 0;
+__u32 __arena_global load_acquire32_result = 0;
+__u64 __arena_global load_acquire64_result = 0;
+
+SEC("raw_tp/sys_enter")
+int load_acquire(const void *ctx)
+{
+	if (pid != (bpf_get_current_pid_tgid() >> 32))
+		return 0;
+
+#ifdef __BPF_FEATURE_LOAD_ACQ_STORE_REL
+	load_acquire8_result = __atomic_load_n(&load_acquire8_value, __ATOMIC_ACQUIRE);
+	load_acquire16_result = __atomic_load_n(&load_acquire16_value, __ATOMIC_ACQUIRE);
+	load_acquire32_result = __atomic_load_n(&load_acquire32_value, __ATOMIC_ACQUIRE);
+	load_acquire64_result = __atomic_load_n(&load_acquire64_value, __ATOMIC_ACQUIRE);
+#endif
+
+	return 0;
+}
+
+__u8 __arena_global store_release8_result = 0;
+__u16 __arena_global store_release16_result = 0;
+__u32 __arena_global store_release32_result = 0;
+__u64 __arena_global store_release64_result = 0;
+
+SEC("raw_tp/sys_enter")
+int store_release(const void *ctx)
+{
+	if (pid != (bpf_get_current_pid_tgid() >> 32))
+		return 0;
+
+#ifdef __BPF_FEATURE_LOAD_ACQ_STORE_REL
+	__u8 val8 = 0x12;
+	__u16 val16 = 0x1234;
+	__u32 val32 = 0x12345678;
+	__u64 val64 = 0x1234567890abcdef;
+
+	__atomic_store_n(&store_release8_result, val8, __ATOMIC_RELEASE);
+	__atomic_store_n(&store_release16_result, val16, __ATOMIC_RELEASE);
+	__atomic_store_n(&store_release32_result, val32, __ATOMIC_RELEASE);
+	__atomic_store_n(&store_release64_result, val64, __ATOMIC_RELEASE);
+#endif
+
+	return 0;
+}
+
 char _license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/progs/atomics.c b/tools/testing/selftests/bpf/progs/atomics.c
index f89c7f0cc53b..4c23d7d0d37d 100644
--- a/tools/testing/selftests/bpf/progs/atomics.c
+++ b/tools/testing/selftests/bpf/progs/atomics.c
@@ -5,9 +5,15 @@
 #include <stdbool.h>
 
 #ifdef ENABLE_ATOMICS_TESTS
-bool skip_tests __attribute((__section__(".data"))) = false;
+bool skip_all_tests __attribute((__section__(".data"))) = false;
 #else
-bool skip_tests = true;
+bool skip_all_tests = true;
+#endif
+
+#ifdef __BPF_FEATURE_LOAD_ACQ_STORE_REL
+bool skip_lacq_srel_tests __attribute((__section__(".data"))) = false;
+#else
+bool skip_lacq_srel_tests = true;
 #endif
 
 __u32 pid = 0;
@@ -168,3 +174,55 @@ int xchg(const void *ctx)
 
 	return 0;
 }
+
+__u8 load_acquire8_value = 0x12;
+__u16 load_acquire16_value = 0x1234;
+__u32 load_acquire32_value = 0x12345678;
+__u64 load_acquire64_value = 0x1234567890abcdef;
+
+__u8 load_acquire8_result = 0;
+__u16 load_acquire16_result = 0;
+__u32 load_acquire32_result = 0;
+__u64 load_acquire64_result = 0;
+
+SEC("raw_tp/sys_enter")
+int load_acquire(const void *ctx)
+{
+	if (pid != (bpf_get_current_pid_tgid() >> 32))
+		return 0;
+
+#ifdef __BPF_FEATURE_LOAD_ACQ_STORE_REL
+	load_acquire8_result = __atomic_load_n(&load_acquire8_value, __ATOMIC_ACQUIRE);
+	load_acquire16_result = __atomic_load_n(&load_acquire16_value, __ATOMIC_ACQUIRE);
+	load_acquire32_result = __atomic_load_n(&load_acquire32_value, __ATOMIC_ACQUIRE);
+	load_acquire64_result = __atomic_load_n(&load_acquire64_value, __ATOMIC_ACQUIRE);
+#endif
+
+	return 0;
+}
+
+__u8 store_release8_result = 0;
+__u16 store_release16_result = 0;
+__u32 store_release32_result = 0;
+__u64 store_release64_result = 0;
+
+SEC("raw_tp/sys_enter")
+int store_release(const void *ctx)
+{
+	if (pid != (bpf_get_current_pid_tgid() >> 32))
+		return 0;
+
+#ifdef __BPF_FEATURE_LOAD_ACQ_STORE_REL
+	__u8 val8 = 0x12;
+	__u16 val16 = 0x1234;
+	__u32 val32 = 0x12345678;
+	__u64 val64 = 0x1234567890abcdef;
+
+	__atomic_store_n(&store_release8_result, val8, __ATOMIC_RELEASE);
+	__atomic_store_n(&store_release16_result, val16, __ATOMIC_RELEASE);
+	__atomic_store_n(&store_release32_result, val32, __ATOMIC_RELEASE);
+	__atomic_store_n(&store_release64_result, val64, __ATOMIC_RELEASE);
+#endif
+
+	return 0;
+}
diff --git a/tools/testing/selftests/bpf/progs/verifier_load_acquire.c b/tools/testing/selftests/bpf/progs/verifier_load_acquire.c
new file mode 100644
index 000000000000..506df4b8231b
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/verifier_load_acquire.c
@@ -0,0 +1,92 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <linux/bpf.h>
+#include <bpf/bpf_helpers.h>
+#include "bpf_misc.h"
+
+#if defined(__TARGET_ARCH_arm64) && defined(__BPF_FEATURE_LOAD_ACQ_STORE_REL)
+
+SEC("socket")
+__description("load-acquire, 8-bit")
+__success __success_unpriv __retval(0x12)
+__naked void load_acquire_8(void)
+{
+	asm volatile (
+	"*(u8 *)(r10 - 1) = 0x12;"
+	"w0 = load_acquire((u8 *)(r10 - 1));"
+	"exit;"
+	::: __clobber_all);
+}
+
+SEC("socket")
+__description("load-acquire, 16-bit")
+__success __success_unpriv __retval(0x1234)
+__naked void load_acquire_16(void)
+{
+	asm volatile (
+	"*(u16 *)(r10 - 2) = 0x1234;"
+	"w0 = load_acquire((u16 *)(r10 - 2));"
+	"exit;"
+	::: __clobber_all);
+}
+
+SEC("socket")
+__description("load-acquire, 32-bit")
+__success __success_unpriv __retval(0x12345678)
+__naked void load_acquire_32(void)
+{
+	asm volatile (
+	"*(u32 *)(r10 - 4) = 0x12345678;"
+	"w0 = load_acquire((u32 *)(r10 - 4));"
+	"exit;"
+	::: __clobber_all);
+}
+
+SEC("socket")
+__description("load-acquire, 64-bit")
+__success __success_unpriv __retval(0x1234567890abcdef)
+__naked void load_acquire_64(void)
+{
+	asm volatile (
+	"*(u64 *)(r10 - 8) = 0x1234567890abcdef;"
+	"r0 = load_acquire((u64 *)(r10 - 8));"
+	"exit;"
+	::: __clobber_all);
+}
+
+SEC("socket")
+__description("load-acquire with uninitialized src_reg")
+__failure __failure_unpriv __msg("R2 !read_ok")
+__naked void load_acquire_with_uninitialized_src_reg(void)
+{
+	asm volatile (
+	"r0 = load_acquire((u64 *)(r2 + 0));"
+	"exit;"
+	::: __clobber_all);
+}
+
+SEC("socket")
+__description("load-acquire with non-pointer src_reg")
+__failure __failure_unpriv __msg("R1 invalid mem access 'scalar'")
+__naked void load_acquire_with_non_pointer_src_reg(void)
+{
+	asm volatile (
+	"r1 = 0;"
+	"r0 = load_acquire((u64 *)(r1 + 0));"
+	"exit;"
+	::: __clobber_all);
+}
+
+#else
+
+SEC("socket")
+__description("load-acquire is not supported by compiler or jit, use a dummy test")
+__success
+int dummy_test(void)
+{
+	return 0;
+}
+
+#endif
+
+char _license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/progs/verifier_precision.c b/tools/testing/selftests/bpf/progs/verifier_precision.c
index 6b564d4c0986..7d5b9e95e3cf 100644
--- a/tools/testing/selftests/bpf/progs/verifier_precision.c
+++ b/tools/testing/selftests/bpf/progs/verifier_precision.c
@@ -90,6 +90,45 @@ __naked int bpf_end_bswap(void)
 		::: __clobber_all);
 }
 
+#if defined(__TARGET_ARCH_arm64) && defined(__BPF_FEATURE_LOAD_ACQ_STORE_REL)
+
+SEC("?raw_tp")
+__success __log_level(2)
+__msg("mark_precise: frame0: regs=r1 stack= before 2: (bf) r2 = r10")
+__msg("mark_precise: frame0: regs=r1 stack= before 1: (db) r1 = load_acquire((u64 *)(r10 -8))")
+__msg("mark_precise: frame0: regs= stack=-8 before 0: (7a) *(u64 *)(r10 -8) = 8")
+__naked int bpf_load_acquire(void)
+{
+	asm volatile (
+		"*(u64 *)(r10 - 8) = 8;"
+		"r1 = load_acquire((u64 *)(r10 - 8));"
+		"r2 = r10;"
+		"r2 += r1;"	/* mark_precise */
+		"r0 = 0;"
+		"exit;"
+		::: __clobber_all);
+}
+
+SEC("?raw_tp")
+__success __log_level(2)
+__msg("mark_precise: frame0: regs=r1 stack= before 3: (bf) r2 = r10")
+__msg("mark_precise: frame0: regs=r1 stack= before 2: (79) r1 = *(u64 *)(r10 -8)")
+__msg("mark_precise: frame0: regs= stack=-8 before 1: (db) store_release((u64 *)(r10 -8), r1)")
+__msg("mark_precise: frame0: regs=r1 stack= before 0: (b7) r1 = 8")
+__naked int bpf_store_release(void)
+{
+	asm volatile (
+		"r1 = 8;"
+		"store_release((u64 *)(r10 - 8), r1);"
+		"r1 = *(u64 *)(r10 - 8);"
+		"r2 = r10;"
+		"r2 += r1;"	/* mark_precise */
+		"r0 = 0;"
+		"exit;"
+		::: __clobber_all);
+}
+
+#endif /* load-acquire, store-release */
 #endif /* v4 instruction */
 
 SEC("?raw_tp")
diff --git a/tools/testing/selftests/bpf/progs/verifier_store_release.c b/tools/testing/selftests/bpf/progs/verifier_store_release.c
new file mode 100644
index 000000000000..d8c3b73388cb
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/verifier_store_release.c
@@ -0,0 +1,153 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <linux/bpf.h>
+#include <bpf/bpf_helpers.h>
+#include "bpf_misc.h"
+
+#if defined(__TARGET_ARCH_arm64) && defined(__BPF_FEATURE_LOAD_ACQ_STORE_REL)
+
+SEC("socket")
+__description("store-release, 8-bit")
+__success __success_unpriv __retval(0x12)
+__naked void store_release_8(void)
+{
+	asm volatile (
+	"w1 = 0x12;"
+	"store_release((u8 *)(r10 - 1), w1);"
+	"w0 = *(u8 *)(r10 - 1);"
+	"exit;"
+	::: __clobber_all);
+}
+
+SEC("socket")
+__description("store-release, 16-bit")
+__success __success_unpriv __retval(0x1234)
+__naked void store_release_16(void)
+{
+	asm volatile (
+	"w1 = 0x1234;"
+	"store_release((u16 *)(r10 - 2), w1);"
+	"w0 = *(u16 *)(r10 - 2);"
+	"exit;"
+	::: __clobber_all);
+}
+
+SEC("socket")
+__description("store-release, 32-bit")
+__success __success_unpriv __retval(0x12345678)
+__naked void store_release_32(void)
+{
+	asm volatile (
+	"w1 = 0x12345678;"
+	"store_release((u32 *)(r10 - 4), w1);"
+	"w0 = *(u32 *)(r10 - 4);"
+	"exit;"
+	::: __clobber_all);
+}
+
+SEC("socket")
+__description("store-release, 64-bit")
+__success __success_unpriv __retval(0x1234567890abcdef)
+__naked void store_release_64(void)
+{
+	asm volatile (
+	"r1 = 0x1234567890abcdef;"
+	"store_release((u64 *)(r10 - 8), r1);"
+	"r0 = *(u64 *)(r10 - 8);"
+	"exit;"
+	::: __clobber_all);
+}
+
+SEC("socket")
+__description("store-release with uninitialized src_reg")
+__failure __failure_unpriv __msg("R2 !read_ok")
+__naked void store_release_with_uninitialized_src_reg(void)
+{
+	asm volatile (
+	"store_release((u64 *)(r10 - 8), r2);"
+	"exit;"
+	::: __clobber_all);
+}
+
+SEC("socket")
+__description("store-release with uninitialized dst_reg")
+__failure __failure_unpriv __msg("R2 !read_ok")
+__naked void store_release_with_uninitialized_dst_reg(void)
+{
+	asm volatile (
+	"r1 = 0x1234567890abcdef;"
+	"store_release((u64 *)(r2 - 8), r1);"
+	"exit;"
+	::: __clobber_all);
+}
+
+SEC("socket")
+__description("store-release with non-pointer dst_reg")
+__failure __failure_unpriv __msg("R1 invalid mem access 'scalar'")
+__naked void store_release_with_non_pointer_dst_reg(void)
+{
+	asm volatile (
+	"r1 = 0;"
+	"store_release((u64 *)(r1 + 0), r1);"
+	"exit;"
+	::: __clobber_all);
+}
+
+SEC("socket")
+__description("store-release, leak pointer to stack")
+__success __retval(0)
+__failure_unpriv __msg_unpriv("R1 leaks addr into mem")
+__naked void store_release_leak_pointer_to_stack(void)
+{
+	asm volatile (
+	"store_release((u64 *)(r10 - 8), r1);"
+	"r0 = 0;"
+	"exit;"
+	::: __clobber_all);
+}
+
+struct {
+	__uint(type, BPF_MAP_TYPE_HASH);
+	__uint(max_entries, 1);
+	__type(key, long long);
+	__type(value, long long);
+} map_hash_8b SEC(".maps");
+
+SEC("socket")
+__description("store-release, leak pointer to map")
+__success __retval(0)
+__failure_unpriv __msg_unpriv("R6 leaks addr into mem")
+__naked void store_release_leak_pointer_to_map(void)
+{
+	asm volatile (
+	"r6 = r1;"
+	"r1 = 0;"
+	"*(u64 *)(r10 - 8) = r1;"
+	"r2 = r10;"
+	"r2 += -8;"
+	"r1 = %[map_hash_8b] ll;"
+	"call %[bpf_map_lookup_elem];"
+	"if r0 == 0 goto l0_%=;"
+	"store_release((u64 *)(r0 + 0), r6);"
+"l0_%=:"
+	"r0 = 0;"
+	"exit;"
+	:
+	: __imm(bpf_map_lookup_elem),
+	  __imm_addr(map_hash_8b)
+	: __clobber_all);
+}
+
+#else
+
+SEC("socket")
+__description("store-release is not supported by compiler or jit, use a dummy test")
+__success
+int dummy_test(void)
+{
+	return 0;
+}
+
+#endif
+
+char _license[] SEC("license") = "GPL";
-- 
2.48.1.262.g85cc9f2d1e-goog


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH bpf-next v1 8/8] bpf, docs: Update instruction-set.rst for load-acquire and store-release instructions
  2025-01-25  2:16 [PATCH bpf-next v1 0/8] Introduce load-acquire and store-release BPF instructions Peilin Ye
                   ` (6 preceding siblings ...)
  2025-01-25  2:19 ` [PATCH bpf-next v1 7/8] selftests/bpf: Add selftests for " Peilin Ye
@ 2025-01-25  2:19 ` Peilin Ye
  2025-01-30  0:44   ` Alexei Starovoitov
  7 siblings, 1 reply; 27+ messages in thread
From: Peilin Ye @ 2025-01-25  2:19 UTC (permalink / raw)
  To: bpf, linux-arm-kernel
  Cc: Peilin Ye, bpf, Xu Kuohai, Eduard Zingerman, David Vernet,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Jonathan Corbet,
	Paul E. McKenney, Puranjay Mohan, Catalin Marinas, Will Deacon,
	Quentin Monnet, Mykola Lysenko, Shuah Khan, Josh Don,
	Barret Rhoden, Neel Natu, Benjamin Segall, linux-kernel

Update documentation for the new load-acquire and store-release
instructions.  Rename existing atomic operations as "atomic
read-modify-write (RMW) operations".

Following RFC 9669, section 7.3. "Adding Instructions", create new
conformance groups "atomic32v2" and "atomic64v2", where:

  * atomic32v2: includes all instructions in "atomic32", plus the new
                8-bit, 16-bit and 32-bit atomic load-acquire and
                store-release instructions

  * atomic64v2: includes all instructions in "atomic64" and
                "atomic32v2", plus the new 64-bit atomic load-acquire
                and store-release instructions

Cc: bpf@ietf.org
Signed-off-by: Peilin Ye <yepeilin@google.com>
---
 .../bpf/standardization/instruction-set.rst   | 114 +++++++++++++++---
 1 file changed, 98 insertions(+), 16 deletions(-)

diff --git a/Documentation/bpf/standardization/instruction-set.rst b/Documentation/bpf/standardization/instruction-set.rst
index ab820d565052..86917932e9ef 100644
--- a/Documentation/bpf/standardization/instruction-set.rst
+++ b/Documentation/bpf/standardization/instruction-set.rst
@@ -139,8 +139,14 @@ This document defines the following conformance groups:
   specification unless otherwise noted.
 * base64: includes base32, plus instructions explicitly noted
   as being in the base64 conformance group.
-* atomic32: includes 32-bit atomic operation instructions (see `Atomic operations`_).
-* atomic64: includes atomic32, plus 64-bit atomic operation instructions.
+* atomic32: includes 32-bit atomic read-modify-write instructions (see
+  `Atomic operations`_).
+* atomic32v2: includes atomic32, plus 8-bit, 16-bit and 32-bit atomic
+  load-acquire and store-release instructions.
+* atomic64: includes atomic32, plus 64-bit atomic read-modify-write
+  instructions.
+* atomic64v2: unifies atomic32v2 and atomic64, plus 64-bit atomic load-acquire
+  and store-release instructions.
 * divmul32: includes 32-bit division, multiplication, and modulo instructions.
 * divmul64: includes divmul32, plus 64-bit division, multiplication,
   and modulo instructions.
@@ -653,20 +659,31 @@ Atomic operations are operations that operate on memory and can not be
 interrupted or corrupted by other access to the same memory region
 by other BPF programs or means outside of this specification.
 
-All atomic operations supported by BPF are encoded as store operations
-that use the ``ATOMIC`` mode modifier as follows:
+All atomic operations supported by BPF are encoded as ``STX`` instructions
+that use the ``ATOMIC`` mode modifier, with the 'imm' field encoding the
+actual atomic operation.  These operations are categorized based on the second
+lowest nibble (bits 4-7) of the 'imm' field:
 
-* ``{ATOMIC, W, STX}`` for 32-bit operations, which are
+* ``ATOMIC_LOAD`` and ``ATOMIC_STORE`` indicate atomic load and store
+  operations, respectively (see `Atomic load and store operations`_).
+* All other defined values indicate an atomic read-modify-write operation, as
+  described in the following section.
+
+Atomic read-modify-write operations
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The atomic read-modify-write (RMW) operations are encoded as follows:
+
+* ``{ATOMIC, W, STX}`` for 32-bit RMW operations, which are
   part of the "atomic32" conformance group.
-* ``{ATOMIC, DW, STX}`` for 64-bit operations, which are
+* ``{ATOMIC, DW, STX}`` for 64-bit RMW operations, which are
   part of the "atomic64" conformance group.
-* 8-bit and 16-bit wide atomic operations are not supported.
+* 8-bit and 16-bit wide atomic RMW operations are not supported.
 
-The 'imm' field is used to encode the actual atomic operation.
-Simple atomic operation use a subset of the values defined to encode
-arithmetic operations in the 'imm' field to encode the atomic operation:
+Simple atomic RMW operation use a subset of the values defined to encode
+arithmetic operations in the 'imm' field to encode the atomic RMW operation:
 
-.. table:: Simple atomic operations
+.. table:: Simple atomic read-modify-write operations
 
   ========  =====  ===========
   imm       value  description
@@ -686,10 +703,10 @@ arithmetic operations in the 'imm' field to encode the atomic operation:
 
   *(u64 *)(dst + offset) += src
 
-In addition to the simple atomic operations, there also is a modifier and
-two complex atomic operations:
+In addition to the simple atomic RMW operations, there also is a modifier and
+two complex atomic RMW operations:
 
-.. table:: Complex atomic operations
+.. table:: Complex atomic read-modify-write operations
 
   ===========  ================  ===========================
   imm          value             description
@@ -699,8 +716,8 @@ two complex atomic operations:
   CMPXCHG      0xf0 | FETCH      atomic compare and exchange
   ===========  ================  ===========================
 
-The ``FETCH`` modifier is optional for simple atomic operations, and
-always set for the complex atomic operations.  If the ``FETCH`` flag
+The ``FETCH`` modifier is optional for simple atomic RMW operations, and
+always set for the complex atomic RMW operations.  If the ``FETCH`` flag
 is set, then the operation also overwrites ``src`` with the value that
 was in memory before it was modified.
 
@@ -713,6 +730,71 @@ The ``CMPXCHG`` operation atomically compares the value addressed by
 value that was at ``dst + offset`` before the operation is zero-extended
 and loaded back to ``R0``.
 
+Atomic load and store operations
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+To encode an atomic load or store operation, the lowest 8 bits of the 'imm'
+field are divided as follows::
+
+  +-+-+-+-+-+-+-+-+
+  | type  | order |
+  +-+-+-+-+-+-+-+-+
+
+**type**
+  The operation type is one of:
+
+.. table:: Atomic load and store operation types
+
+  ============  =====  ============
+  type          value  description
+  ============  =====  ============
+  ATOMIC_LOAD   0x1    atomic load
+  ATOMIC_STORE  0x2    atomic store
+  ============  =====  ============
+
+**order**
+  The memory order is one of:
+
+.. table:: Memory orders
+
+  =======  =====  =======================
+  order    value  description
+  =======  =====  =======================
+  RELAXED  0x0    relaxed
+  ACQUIRE  0x1    acquire
+  RELEASE  0x2    release
+  ACQ_REL  0x3    acquire and release
+  SEQ_CST  0x4    sequentially consistent
+  =======  =====  =======================
+
+Currently the following combinations of ``type`` and ``order`` are allowed:
+
+.. table:: Atomic load and store operations
+
+  ========= =====  ====================
+  imm       value  description
+  ========= =====  ====================
+  LOAD_ACQ  0x11   atomic load-acquire
+  STORE_REL 0x22   atomic store-release
+  ========= =====  ====================
+
+``{ATOMIC, <size>, STX}`` with 'imm' = LOAD_ACQ means::
+
+  dst = load_acquire((unsigned size *)(src + offset))
+
+``{ATOMIC, <size>, STX}`` with 'imm' = STORE_REL means::
+
+  store_release((unsigned size *)(dst + offset), src)
+
+Where '<size>' is one of: ``B``, ``H``, ``W``, or ``DW``, and 'unsigned size'
+is one of: u8, u16, u32, or u64.
+
+8-bit, 16-bit and 32-bit atomic load-acquire and store-release instructions
+are part of the "atomic32v2" conformance group.
+
+64-bit atomic load-acquire and store-release instructions are part of the
+"atomic64v2" conformance group.
+
 64-bit immediate instructions
 -----------------------------
 
-- 
2.48.1.262.g85cc9f2d1e-goog


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH bpf-next v1 3/8] bpf: Introduce load-acquire and store-release instructions
  2025-01-25  2:18 ` [PATCH bpf-next v1 3/8] bpf: Introduce load-acquire and store-release instructions Peilin Ye
@ 2025-01-29  0:19   ` Eduard Zingerman
  2025-01-29 22:04     ` Peilin Ye
  2025-01-29  1:30   ` Eduard Zingerman
  2025-01-30  0:41   ` Alexei Starovoitov
  2 siblings, 1 reply; 27+ messages in thread
From: Eduard Zingerman @ 2025-01-29  0:19 UTC (permalink / raw)
  To: Peilin Ye, bpf, linux-arm-kernel
  Cc: bpf, Xu Kuohai, David Vernet, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Jonathan Corbet, Paul E. McKenney, Puranjay Mohan,
	Catalin Marinas, Will Deacon, Quentin Monnet, Mykola Lysenko,
	Shuah Khan, Josh Don, Barret Rhoden, Neel Natu, Benjamin Segall,
	linux-kernel

On Sat, 2025-01-25 at 02:18 +0000, Peilin Ye wrote:
> Introduce BPF instructions with load-acquire and store-release
> semantics, as discussed in [1].  The following new flags are defined:
> 
>   BPF_ATOMIC_LOAD         0x10
>   BPF_ATOMIC_STORE        0x20
>   BPF_ATOMIC_TYPE(imm)    ((imm) & 0xf0)
> 
>   BPF_RELAXED        0x0
>   BPF_ACQUIRE        0x1
>   BPF_RELEASE        0x2
>   BPF_ACQ_REL        0x3
>   BPF_SEQ_CST        0x4
> 
>   BPF_LOAD_ACQ       (BPF_ATOMIC_LOAD | BPF_ACQUIRE)
>   BPF_STORE_REL      (BPF_ATOMIC_STORE | BPF_RELEASE)
> 
> A "load-acquire" is a BPF_STX | BPF_ATOMIC instruction with the 'imm'
> field set to BPF_LOAD_ACQ (0x11).
> 
> Similarly, a "store-release" is a BPF_STX | BPF_ATOMIC instruction with
> the 'imm' field set to BPF_STORE_REL (0x22).
> 
> Unlike existing atomic operations that only support BPF_W (32-bit) and
> BPF_DW (64-bit) size modifiers, load-acquires and store-releases also
> support BPF_B (8-bit) and BPF_H (16-bit).  An 8- or 16-bit load-acquire
> zero-extends the value before writing it to a 32-bit register, just like
> ARM64 instruction LDARH and friends.
> 
> As an example, consider the following 64-bit load-acquire BPF
> instruction:
> 
>   db 10 00 00 11 00 00 00  r0 = load_acquire((u64 *)(r1 + 0x0))
> 
>   opcode (0xdb): BPF_ATOMIC | BPF_DW | BPF_STX
>   imm (0x00000011): BPF_LOAD_ACQ
> 
> Similarly, a 16-bit BPF store-release:
> 
>   cb 21 00 00 22 00 00 00  store_release((u16 *)(r1 + 0x0), w2)
> 
>   opcode (0xcb): BPF_ATOMIC | BPF_H | BPF_STX
>   imm (0x00000022): BPF_STORE_REL
> 
> [1] https://lore.kernel.org/all/20240729183246.4110549-1-yepeilin@google.com/
> 
> Signed-off-by: Peilin Ye <yepeilin@google.com>
> ---

I think bpf_jit_supports_insn() in arch/{x86,s390}/net/bpf_jit_comp.c
need an update, as both would accept BPF_LOAD_ACQ/BPF_STORE_REL at the
moment.

Acked-by: Eduard Zingerman <eddyz87@gmail.com>

[...]

> +static int check_atomic_load(struct bpf_verifier_env *env, int insn_idx,
> +			     struct bpf_insn *insn)
> +{
> +	struct bpf_reg_state *regs = cur_regs(env);
> +	int err;
> +
> +	err = check_reg_arg(env, insn->src_reg, SRC_OP);
> +	if (err)
> +		return err;
> +
> +	err = check_reg_arg(env, insn->dst_reg, DST_OP_NO_MARK);
> +	if (err)
> +		return err;
> +
> +	if (!atomic_ptr_type_ok(env, insn->src_reg, insn)) {
> +		verbose(env, "BPF_ATOMIC loads from R%d %s is not allowed\n",
> +			insn->src_reg,
> +			reg_type_str(env, reg_state(env, insn->src_reg)->type));
> +		return -EACCES;
> +	}
> +
> +	if (is_arena_reg(env, insn->src_reg)) {
> +		err = save_aux_ptr_type(env, PTR_TO_ARENA, false);
> +		if (err)
> +			return err;

Nit: this and the next function look very similar to processing of
     generic load and store in do_check(). Maybe extract that code
     as an auxiliary function and call it in both places?
     The only major difference is is_arena_reg() check guarding
     save_aux_ptr_type(), but I think it is ok to do save_aux_ptr_type
     unconditionally. Fwiw, the code would be a bit simpler,
     just spent half an hour convincing myself that such conditional handling
     is not an error. Wdyt?

> +	}
> +
> +	/* Check whether we can read the memory. */
> +	err = check_mem_access(env, insn_idx, insn->src_reg, insn->off,
> +			       BPF_SIZE(insn->code), BPF_READ, insn->dst_reg,
> +			       true, false);
> +	if (err)
> +		return err;
> +
> +	err = reg_bounds_sanity_check(env, &regs[insn->dst_reg], "atomic_load");
> +	if (err)
> +		return err;
> +	return 0;
> +}
> +
> +static int check_atomic_store(struct bpf_verifier_env *env, int insn_idx,
> +			      struct bpf_insn *insn)
> +{
> +	int err;
> +
> +	err = check_reg_arg(env, insn->src_reg, SRC_OP);
> +	if (err)
> +		return err;
> +
> +	err = check_reg_arg(env, insn->dst_reg, SRC_OP);
> +	if (err)
> +		return err;
> +
> +	if (is_pointer_value(env, insn->src_reg)) {
> +		verbose(env, "R%d leaks addr into mem\n", insn->src_reg);
> +		return -EACCES;
> +	}
> +
> +	if (!atomic_ptr_type_ok(env, insn->dst_reg, insn)) {
> +		verbose(env, "BPF_ATOMIC stores into R%d %s is not allowed\n",
> +			insn->dst_reg,
> +			reg_type_str(env, reg_state(env, insn->dst_reg)->type));
> +		return -EACCES;
> +	}
> +
> +	if (is_arena_reg(env, insn->dst_reg)) {
> +		err = save_aux_ptr_type(env, PTR_TO_ARENA, false);
> +		if (err)
> +			return err;
> +	}
> +
> +	/* Check whether we can write into the memory. */
> +	err = check_mem_access(env, insn_idx, insn->dst_reg, insn->off,
> +			       BPF_SIZE(insn->code), BPF_WRITE, insn->src_reg,
> +			       true, false);
> +	if (err)
> +		return err;
> +	return 0;
> +}
> +

[...]


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH bpf-next v1 7/8] selftests/bpf: Add selftests for load-acquire and store-release instructions
  2025-01-25  2:19 ` [PATCH bpf-next v1 7/8] selftests/bpf: Add selftests for " Peilin Ye
@ 2025-01-29  1:06   ` Eduard Zingerman
  2025-01-29  2:07     ` Ihor Solodrai
                       ` (2 more replies)
  0 siblings, 3 replies; 27+ messages in thread
From: Eduard Zingerman @ 2025-01-29  1:06 UTC (permalink / raw)
  To: Peilin Ye, bpf, linux-arm-kernel
  Cc: bpf, Xu Kuohai, David Vernet, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Jonathan Corbet, Paul E. McKenney, Puranjay Mohan,
	Catalin Marinas, Will Deacon, Quentin Monnet, Mykola Lysenko,
	Shuah Khan, Josh Don, Barret Rhoden, Neel Natu, Benjamin Segall,
	linux-kernel

On Sat, 2025-01-25 at 02:19 +0000, Peilin Ye wrote:
> Add several ./test_progs tests:
> 
>   - atomics/load_acquire
>   - atomics/store_release
>   - arena_atomics/load_acquire
>   - arena_atomics/store_release
>   - verifier_load_acquire/*
>   - verifier_store_release/*
>   - verifier_precision/bpf_load_acquire
>   - verifier_precision/bpf_store_release
> 
> The last two tests are added to check if backtrack_insn() handles the
> new instructions correctly.
> 
> Additionally, the last test also makes sure that the verifier
> "remembers" the value (in src_reg) we store-release into e.g. a stack
> slot.  For example, if we take a look at the test program:
> 
>     #0:  "r1 = 8;"
>     #1:  "store_release((u64 *)(r10 - 8), r1);"
>     #2:  "r1 = *(u64 *)(r10 - 8);"
>     #3:  "r2 = r10;"
>     #4:  "r2 += r1;"	/* mark_precise */
>     #5:  "r0 = 0;"
>     #6:  "exit;"
> 
> At #1, if the verifier doesn't remember that we wrote 8 to the stack,
> then later at #4 we would be adding an unbounded scalar value to the
> stack pointer, which would cause the program to be rejected:
> 
>   VERIFIER LOG:
>   =============
> ...
>   math between fp pointer and register with unbounded min value is not allowed
> 
> All new tests depend on the pre-defined __BPF_FEATURE_LOAD_ACQ_STORE_REL
> feature macro, which implies -mcpu>=v4.

This restriction would mean that tests are skipped on BPF CI, as it
currently runs using llvm 17 and 18. Instead, I suggest using some
macro hiding an inline assembly as below:

	asm volatile (".8byte %[insn];"
	              :
	              : [insn]"i"(*(long *)&(BPF_RAW_INSN(...)))
	              : /* correct clobbers here */);

See the usage of the __imm_insn() macro in the test suite.

Also, "BPF_ATOMIC loads from R%d %s is not allowed\n" and
      "BPF_ATOMIC stores into R%d %s is not allowed\n"
situations are not tested.

[...]

> diff --git a/tools/testing/selftests/bpf/prog_tests/atomics.c b/tools/testing/selftests/bpf/prog_tests/atomics.c
> index 13e101f370a1..5d7cff3eed2b 100644
> --- a/tools/testing/selftests/bpf/prog_tests/atomics.c
> +++ b/tools/testing/selftests/bpf/prog_tests/atomics.c
> @@ -162,6 +162,56 @@ static void test_xchg(struct atomics_lskel *skel)
>  	ASSERT_EQ(skel->bss->xchg32_result, 1, "xchg32_result");
>  }

Nit: Given the tests in verifier_load_acquire.c and verifier_store_release.c
     that use __retval annotation, are these tests really necessary?
     (assuming that verifier_store_release.c tests are modified to read
      stored location into r0 before exit).

> +static void test_load_acquire(struct atomics_lskel *skel)
> +{
> +	LIBBPF_OPTS(bpf_test_run_opts, topts);
> +	int err, prog_fd;
> +
> +	if (skel->data->skip_lacq_srel_tests) {
> +		printf("%s:SKIP:Clang does not support BPF load-acquire\n", __func__);
> +		test__skip();
> +		return;
> +	}
> +
> +	/* No need to attach it, just run it directly */
> +	prog_fd = skel->progs.load_acquire.prog_fd;
> +	err = bpf_prog_test_run_opts(prog_fd, &topts);
> +	if (!ASSERT_OK(err, "test_run_opts err"))
> +		return;
> +	if (!ASSERT_OK(topts.retval, "test_run_opts retval"))
> +		return;
> +
> +	ASSERT_EQ(skel->bss->load_acquire8_result, 0x12, "load_acquire8_result");
> +	ASSERT_EQ(skel->bss->load_acquire16_result, 0x1234, "load_acquire16_result");
> +	ASSERT_EQ(skel->bss->load_acquire32_result, 0x12345678, "load_acquire32_result");
> +	ASSERT_EQ(skel->bss->load_acquire64_result, 0x1234567890abcdef, "load_acquire64_result");
> +}

[...]

> --- a/tools/testing/selftests/bpf/progs/arena_atomics.c
> +++ b/tools/testing/selftests/bpf/progs/arena_atomics.c
[...]

> +SEC("raw_tp/sys_enter")
> +int load_acquire(const void *ctx)
> +{
> +	if (pid != (bpf_get_current_pid_tgid() >> 32))
> +		return 0;

Nit: This check is not needed, since bpf_prog_test_run_opts() is used
     to run the tests.

> +
> +#ifdef __BPF_FEATURE_LOAD_ACQ_STORE_REL
> +	load_acquire8_result = __atomic_load_n(&load_acquire8_value, __ATOMIC_ACQUIRE);
> +	load_acquire16_result = __atomic_load_n(&load_acquire16_value, __ATOMIC_ACQUIRE);
> +	load_acquire32_result = __atomic_load_n(&load_acquire32_value, __ATOMIC_ACQUIRE);
> +	load_acquire64_result = __atomic_load_n(&load_acquire64_value, __ATOMIC_ACQUIRE);
> +#endif
> +
> +	return 0;
> +}

[...]


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH bpf-next v1 3/8] bpf: Introduce load-acquire and store-release instructions
  2025-01-25  2:18 ` [PATCH bpf-next v1 3/8] bpf: Introduce load-acquire and store-release instructions Peilin Ye
  2025-01-29  0:19   ` Eduard Zingerman
@ 2025-01-29  1:30   ` Eduard Zingerman
  2025-01-29 22:17     ` Peilin Ye
  2025-01-30  0:41   ` Alexei Starovoitov
  2 siblings, 1 reply; 27+ messages in thread
From: Eduard Zingerman @ 2025-01-29  1:30 UTC (permalink / raw)
  To: Peilin Ye, bpf, linux-arm-kernel
  Cc: bpf, Xu Kuohai, David Vernet, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Jonathan Corbet, Paul E. McKenney, Puranjay Mohan,
	Catalin Marinas, Will Deacon, Quentin Monnet, Mykola Lysenko,
	Shuah Khan, Josh Don, Barret Rhoden, Neel Natu, Benjamin Segall,
	linux-kernel

On Sat, 2025-01-25 at 02:18 +0000, Peilin Ye wrote:

[...]

> +static int check_atomic_store(struct bpf_verifier_env *env, int insn_idx,
> +			      struct bpf_insn *insn)
> +{
> +	int err;
> +
> +	err = check_reg_arg(env, insn->src_reg, SRC_OP);
> +	if (err)
> +		return err;
> +
> +	err = check_reg_arg(env, insn->dst_reg, SRC_OP);
> +	if (err)
> +		return err;
> +
> +	if (is_pointer_value(env, insn->src_reg)) {
> +		verbose(env, "R%d leaks addr into mem\n", insn->src_reg);
> +		return -EACCES;
> +	}

Nit: this check is done by check_mem_access(), albeit only for
     PTR_TO_MEM, I think it's better to be consistent with
     what happens for regular stores and avoid this check here.

> +
> +	if (!atomic_ptr_type_ok(env, insn->dst_reg, insn)) {
> +		verbose(env, "BPF_ATOMIC stores into R%d %s is not allowed\n",
> +			insn->dst_reg,
> +			reg_type_str(env, reg_state(env, insn->dst_reg)->type));
> +		return -EACCES;
> +	}
> +
> +	if (is_arena_reg(env, insn->dst_reg)) {
> +		err = save_aux_ptr_type(env, PTR_TO_ARENA, false);
> +		if (err)
> +			return err;
> +	}
> +
> +	/* Check whether we can write into the memory. */
> +	err = check_mem_access(env, insn_idx, insn->dst_reg, insn->off,
> +			       BPF_SIZE(insn->code), BPF_WRITE, insn->src_reg,
> +			       true, false);
> +	if (err)
> +		return err;
> +	return 0;
> +}

[...]


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH bpf-next v1 7/8] selftests/bpf: Add selftests for load-acquire and store-release instructions
  2025-01-29  1:06   ` Eduard Zingerman
@ 2025-01-29  2:07     ` Ihor Solodrai
  2025-01-29  2:07       ` [Bpf] " Ihor Solodrai
  2025-01-29  2:17       ` Eduard Zingerman
  2025-01-30  0:03     ` Peilin Ye
  2025-02-04  0:30     ` Peilin Ye
  2 siblings, 2 replies; 27+ messages in thread
From: Ihor Solodrai @ 2025-01-29  2:07 UTC (permalink / raw)
  To: Eduard Zingerman, Peilin Ye, bpf, linux-arm-kernel
  Cc: bpf, Xu Kuohai, David Vernet, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai  Lau, Song Liu, Yonghong Song,
	John Fastabend, KP  Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Jonathan Corbet, Paul E. McKenney, Puranjay Mohan,
	Catalin Marinas, Will  Deacon, Quentin Monnet, Mykola Lysenko,
	Shuah Khan, Josh Don, Barret Rhoden, Neel Natu, Benjamin Segall,
	linux-kernel

January 28, 2025 at 5:06 PM, "Eduard Zingerman" <eddyz87@gmail.com> wrote:



> 
> On Sat, 2025-01-25 at 02:19 +0000, Peilin Ye wrote:
> 
> > 
> > Add several ./test_progs tests:
> > 
> >  
> > 
> >  - atomics/load_acquire
> > 
> >  - atomics/store_release
> > 
> >  - arena_atomics/load_acquire
> > 
> >  - arena_atomics/store_release
> > 
> >  - verifier_load_acquire/*
> > 
> >  - verifier_store_release/*
> > 
> >  - verifier_precision/bpf_load_acquire
> > 
> >  - verifier_precision/bpf_store_release
> > 
> >  
> > 
> >  The last two tests are added to check if backtrack_insn() handles the
> > 
> >  new instructions correctly.
> > 
> >  
> > 
> >  Additionally, the last test also makes sure that the verifier
> > 
> >  "remembers" the value (in src_reg) we store-release into e.g. a stack
> > 
> >  slot. For example, if we take a look at the test program:
> > 
> >  
> > 
> >  #0: "r1 = 8;"
> > 
> >  #1: "store_release((u64 *)(r10 - 8), r1);"
> > 
> >  #2: "r1 = *(u64 *)(r10 - 8);"
> > 
> >  #3: "r2 = r10;"
> > 
> >  #4: "r2 += r1;" /* mark_precise */
> > 
> >  #5: "r0 = 0;"
> > 
> >  #6: "exit;"
> > 
> >  
> > 
> >  At #1, if the verifier doesn't remember that we wrote 8 to the stack,
> > 
> >  then later at #4 we would be adding an unbounded scalar value to the
> > 
> >  stack pointer, which would cause the program to be rejected:
> > 
> >  
> > 
> >  VERIFIER LOG:
> > 
> >  =============
> > 
> >  ...
> > 
> >  math between fp pointer and register with unbounded min value is not allowed
> > 
> >  
> > 
> >  All new tests depend on the pre-defined __BPF_FEATURE_LOAD_ACQ_STORE_REL
> > 
> >  feature macro, which implies -mcpu>=v4.
> > 
> 
> This restriction would mean that tests are skipped on BPF CI, as it
> 
> currently runs using llvm 17 and 18. Instead, I suggest using some

Eduard, if this feature requires a particular version of LLVM, it's
not difficult to add a configuration for it to BPF CI.

Whether we want to do it is an open question though. Issues may come
up with other tests when newer LLVM is used.

> 
> macro hiding an inline assembly as below:
> 
>  asm volatile (".8byte %[insn];"
> 
>  :
> 
>  : [insn]"i"(*(long *)&(BPF_RAW_INSN(...)))
> 
>  : /* correct clobbers here */);
> 
> See the usage of the __imm_insn() macro in the test suite.
> 
> Also, "BPF_ATOMIC loads from R%d %s is not allowed\n" and
> 
>  "BPF_ATOMIC stores into R%d %s is not allowed\n"
> 
> situations are not tested.
> 
> [...]
> 
> > 
> 
> [...]
>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bpf] Re: [PATCH bpf-next v1 7/8] selftests/bpf: Add selftests for load-acquire and store-release instructions
  2025-01-29  2:07     ` Ihor Solodrai
@ 2025-01-29  2:07       ` Ihor Solodrai
  2025-01-29  2:17       ` Eduard Zingerman
  1 sibling, 0 replies; 27+ messages in thread
From: Ihor Solodrai @ 2025-01-29  2:07 UTC (permalink / raw)
  To: Eduard Zingerman, Peilin Ye, bpf, linux-arm-kernel
  Cc: bpf, Xu Kuohai, David Vernet, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Jonathan Corbet, Paul E. McKenney, Puranjay Mohan,
	Catalin Marinas, Will Deacon, Quentin Monnet, Mykola Lysenko,
	Shuah Khan, Josh Don, Barret Rhoden, Neel Natu, Benjamin Segall,
	linux-kernel

January 28, 2025 at 5:06 PM, "Eduard Zingerman" <eddyz87@gmail.com> wrote:



> 
> On Sat, 2025-01-25 at 02:19 +0000, Peilin Ye wrote:
> 
> > 
> > Add several ./test_progs tests:
> > 
> >  
> > 
> >  - atomics/load_acquire
> > 
> >  - atomics/store_release
> > 
> >  - arena_atomics/load_acquire
> > 
> >  - arena_atomics/store_release
> > 
> >  - verifier_load_acquire/*
> > 
> >  - verifier_store_release/*
> > 
> >  - verifier_precision/bpf_load_acquire
> > 
> >  - verifier_precision/bpf_store_release
> > 
> >  
> > 
> >  The last two tests are added to check if backtrack_insn() handles the
> > 
> >  new instructions correctly.
> > 
> >  
> > 
> >  Additionally, the last test also makes sure that the verifier
> > 
> >  "remembers" the value (in src_reg) we store-release into e.g. a stack
> > 
> >  slot. For example, if we take a look at the test program:
> > 
> >  
> > 
> >  #0: "r1 = 8;"
> > 
> >  #1: "store_release((u64 *)(r10 - 8), r1);"
> > 
> >  #2: "r1 = *(u64 *)(r10 - 8);"
> > 
> >  #3: "r2 = r10;"
> > 
> >  #4: "r2 += r1;" /* mark_precise */
> > 
> >  #5: "r0 = 0;"
> > 
> >  #6: "exit;"
> > 
> >  
> > 
> >  At #1, if the verifier doesn't remember that we wrote 8 to the stack,
> > 
> >  then later at #4 we would be adding an unbounded scalar value to the
> > 
> >  stack pointer, which would cause the program to be rejected:
> > 
> >  
> > 
> >  VERIFIER LOG:
> > 
> >  =============
> > 
> >  ...
> > 
> >  math between fp pointer and register with unbounded min value is not allowed
> > 
> >  
> > 
> >  All new tests depend on the pre-defined __BPF_FEATURE_LOAD_ACQ_STORE_REL
> > 
> >  feature macro, which implies -mcpu>=v4.
> > 
> 
> This restriction would mean that tests are skipped on BPF CI, as it
> 
> currently runs using llvm 17 and 18. Instead, I suggest using some

Eduard, if this feature requires a particular version of LLVM, it's
not difficult to add a configuration for it to BPF CI.

Whether we want to do it is an open question though. Issues may come
up with other tests when newer LLVM is used.

> 
> macro hiding an inline assembly as below:
> 
>  asm volatile (".8byte %[insn];"
> 
>  :
> 
>  : [insn]"i"(*(long *)&(BPF_RAW_INSN(...)))
> 
>  : /* correct clobbers here */);
> 
> See the usage of the __imm_insn() macro in the test suite.
> 
> Also, "BPF_ATOMIC loads from R%d %s is not allowed\n" and
> 
>  "BPF_ATOMIC stores into R%d %s is not allowed\n"
> 
> situations are not tested.
> 
> [...]
> 
> > 
> 
> [...]
>

-- 
Bpf mailing list -- bpf@ietf.org
To unsubscribe send an email to bpf-leave@ietf.org

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH bpf-next v1 7/8] selftests/bpf: Add selftests for load-acquire and store-release instructions
  2025-01-29  2:07     ` Ihor Solodrai
  2025-01-29  2:07       ` [Bpf] " Ihor Solodrai
@ 2025-01-29  2:17       ` Eduard Zingerman
  1 sibling, 0 replies; 27+ messages in thread
From: Eduard Zingerman @ 2025-01-29  2:17 UTC (permalink / raw)
  To: Ihor Solodrai, Peilin Ye, bpf, linux-arm-kernel
  Cc: bpf, Xu Kuohai, David Vernet, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Jonathan Corbet, Paul E. McKenney, Puranjay Mohan,
	Catalin Marinas, Will Deacon, Quentin Monnet, Mykola Lysenko,
	Shuah Khan, Josh Don, Barret Rhoden, Neel Natu, Benjamin Segall,
	linux-kernel

On Wed, 2025-01-29 at 02:07 +0000, Ihor Solodrai wrote:

[...]

> > This restriction would mean that tests are skipped on BPF CI, as it
> > 
> > currently runs using llvm 17 and 18. Instead, I suggest using some
> 
> Eduard, if this feature requires a particular version of LLVM, it's
> not difficult to add a configuration for it to BPF CI.
> 
> Whether we want to do it is an open question though. Issues may come
> up with other tests when newer LLVM is used.

These changes have not landed yet and would be in llvm main for some time.
The workaround is straightforward and I don't see a reason to add more jobs to CI.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH bpf-next v1 3/8] bpf: Introduce load-acquire and store-release instructions
  2025-01-29  0:19   ` Eduard Zingerman
@ 2025-01-29 22:04     ` Peilin Ye
  2025-01-29 22:42       ` Eduard Zingerman
  0 siblings, 1 reply; 27+ messages in thread
From: Peilin Ye @ 2025-01-29 22:04 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: bpf, linux-arm-kernel, bpf, Xu Kuohai, David Vernet,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Jonathan Corbet,
	Paul E. McKenney, Puranjay Mohan, Catalin Marinas, Will Deacon,
	Quentin Monnet, Mykola Lysenko, Shuah Khan, Josh Don,
	Barret Rhoden, Neel Natu, Benjamin Segall, linux-kernel

On Tue, Jan 28, 2025 at 04:19:25PM -0800, Eduard Zingerman wrote:
> On Sat, 2025-01-25 at 02:18 +0000, Peilin Ye wrote:
> > Signed-off-by: Peilin Ye <yepeilin@google.com>
> > ---
> 
> I think bpf_jit_supports_insn() in arch/{x86,s390}/net/bpf_jit_comp.c
> need an update, as both would accept BPF_LOAD_ACQ/BPF_STORE_REL at the
> moment.

Got it - I will move is_atomic_load_store() into <linux/bpf.h> for that.

> Acked-by: Eduard Zingerman <eddyz87@gmail.com>

Thanks!

> > +static int check_atomic_load(struct bpf_verifier_env *env, int insn_idx,
> > +			     struct bpf_insn *insn)
> > +{
> > +	struct bpf_reg_state *regs = cur_regs(env);
> > +	int err;
> > +
> > +	err = check_reg_arg(env, insn->src_reg, SRC_OP);
> > +	if (err)
> > +		return err;
> > +
> > +	err = check_reg_arg(env, insn->dst_reg, DST_OP_NO_MARK);
> > +	if (err)
> > +		return err;
> > +
> > +	if (!atomic_ptr_type_ok(env, insn->src_reg, insn)) {
> > +		verbose(env, "BPF_ATOMIC loads from R%d %s is not allowed\n",
> > +			insn->src_reg,
> > +			reg_type_str(env, reg_state(env, insn->src_reg)->type));
> > +		return -EACCES;
> > +	}
> > +
> > +	if (is_arena_reg(env, insn->src_reg)) {
> > +		err = save_aux_ptr_type(env, PTR_TO_ARENA, false);
> > +		if (err)
> > +			return err;
> 
> Nit: this and the next function look very similar to processing of
>      generic load and store in do_check(). Maybe extract that code
>      as an auxiliary function and call it in both places?

Sure, I agree that they look a bit repetitive.

>      The only major difference is is_arena_reg() check guarding
>      save_aux_ptr_type(), but I think it is ok to do save_aux_ptr_type
>      unconditionally. Fwiw, the code would be a bit simpler,
>      just spent half an hour convincing myself that such conditional handling
>      is not an error. Wdyt?

:-O

Thanks a lot for that; would you mind sharing a bit more on how you
reasoned about it (i.e., why is it OK to save_aux_ptr_type()
unconditionally) ?

> > +	}
> > +
> > +	/* Check whether we can read the memory. */
> > +	err = check_mem_access(env, insn_idx, insn->src_reg, insn->off,
> > +			       BPF_SIZE(insn->code), BPF_READ, insn->dst_reg,
> > +			       true, false);
> > +	if (err)
> > +		return err;
> > +
> > +	err = reg_bounds_sanity_check(env, &regs[insn->dst_reg], "atomic_load");
> > +	if (err)
> > +		return err;
> > +	return 0;
> > +}
> > +
> > +static int check_atomic_store(struct bpf_verifier_env *env, int insn_idx,
> > +			      struct bpf_insn *insn)

Thanks,
Peilin Ye


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH bpf-next v1 3/8] bpf: Introduce load-acquire and store-release instructions
  2025-01-29  1:30   ` Eduard Zingerman
@ 2025-01-29 22:17     ` Peilin Ye
  0 siblings, 0 replies; 27+ messages in thread
From: Peilin Ye @ 2025-01-29 22:17 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: bpf, linux-arm-kernel, bpf, Xu Kuohai, David Vernet,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Jonathan Corbet,
	Paul E. McKenney, Puranjay Mohan, Catalin Marinas, Will Deacon,
	Quentin Monnet, Mykola Lysenko, Shuah Khan, Josh Don,
	Barret Rhoden, Neel Natu, Benjamin Segall, linux-kernel

On Tue, Jan 28, 2025 at 05:30:19PM -0800, Eduard Zingerman wrote:
> On Sat, 2025-01-25 at 02:18 +0000, Peilin Ye wrote:
> > +static int check_atomic_store(struct bpf_verifier_env *env, int insn_idx,
> > +			      struct bpf_insn *insn)
> > +{
> > +	int err;
> > +
> > +	err = check_reg_arg(env, insn->src_reg, SRC_OP);
> > +	if (err)
> > +		return err;
> > +
> > +	err = check_reg_arg(env, insn->dst_reg, SRC_OP);
> > +	if (err)
> > +		return err;
> > +
> > +	if (is_pointer_value(env, insn->src_reg)) {
> > +		verbose(env, "R%d leaks addr into mem\n", insn->src_reg);
> > +		return -EACCES;
> > +	}
> 
> Nit: this check is done by check_mem_access(), albeit only for
>      PTR_TO_MEM, I think it's better to be consistent with
>      what happens for regular stores and avoid this check here.

Got it.  Unprivileged programs will be able to store-release pointers to
the stack, then.  I'll update selftests accordingly.

Thanks,
Peilin Ye


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH bpf-next v1 3/8] bpf: Introduce load-acquire and store-release instructions
  2025-01-29 22:04     ` Peilin Ye
@ 2025-01-29 22:42       ` Eduard Zingerman
  2025-01-30  3:10         ` Peilin Ye
  0 siblings, 1 reply; 27+ messages in thread
From: Eduard Zingerman @ 2025-01-29 22:42 UTC (permalink / raw)
  To: Peilin Ye
  Cc: bpf, linux-arm-kernel, bpf, Xu Kuohai, David Vernet,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Jonathan Corbet,
	Paul E. McKenney, Puranjay Mohan, Catalin Marinas, Will Deacon,
	Quentin Monnet, Mykola Lysenko, Shuah Khan, Josh Don,
	Barret Rhoden, Neel Natu, Benjamin Segall, linux-kernel

On Wed, 2025-01-29 at 22:04 +0000, Peilin Ye wrote:

[...]

> > > +static int check_atomic_load(struct bpf_verifier_env *env, int insn_idx,
> > > +			     struct bpf_insn *insn)
> > > +{
> > > +	struct bpf_reg_state *regs = cur_regs(env);
> > > +	int err;
> > > +
> > > +	err = check_reg_arg(env, insn->src_reg, SRC_OP);
> > > +	if (err)
> > > +		return err;
> > > +
> > > +	err = check_reg_arg(env, insn->dst_reg, DST_OP_NO_MARK);
> > > +	if (err)
> > > +		return err;
> > > +
> > > +	if (!atomic_ptr_type_ok(env, insn->src_reg, insn)) {
> > > +		verbose(env, "BPF_ATOMIC loads from R%d %s is not allowed\n",
> > > +			insn->src_reg,
> > > +			reg_type_str(env, reg_state(env, insn->src_reg)->type));
> > > +		return -EACCES;
> > > +	}
> > > +
> > > +	if (is_arena_reg(env, insn->src_reg)) {
> > > +		err = save_aux_ptr_type(env, PTR_TO_ARENA, false);
> > > +		if (err)
> > > +			return err;
> > 
> > Nit: this and the next function look very similar to processing of
> >      generic load and store in do_check(). Maybe extract that code
> >      as an auxiliary function and call it in both places?
> 
> Sure, I agree that they look a bit repetitive.
> 
> >      The only major difference is is_arena_reg() check guarding
> >      save_aux_ptr_type(), but I think it is ok to do save_aux_ptr_type
> >      unconditionally. Fwiw, the code would be a bit simpler,
> >      just spent half an hour convincing myself that such conditional handling
> >      is not an error. Wdyt?
> 
> :-O
> 
> Thanks a lot for that; would you mind sharing a bit more on how you
> reasoned about it (i.e., why is it OK to save_aux_ptr_type()
> unconditionally) ?

Well, save_aux_ptr_type() does two things:
- if there is no env->insn_aux_data[env->insn_idx].ptr_type associated
  with the instruction it saves one;
- if there is .ptr_type, it checks if a new one is compatible and
  errors out if it's not.

The .ptr_type is used in convert_ctx_accesses() to rewrite access
instruction (STX/LDX, atomic or not) in a way specific to pointer
type.

So, doing save_aux_ptr_type() conditionally is already sketchy,
as there is a risk to miss if some instruction is used in a context
where pointer type requires different rewrites.

convert_ctx_accesses() rewrites instruction for pointer following
types:
- PTR_TO_CTX
- PTR_TO_SOCKET
- PTR_TO_SOCK_COMMON
- PTR_TO_TCP_SOCK
- PTR_TO_XDP_SOCK
- PTR_TO_BTF_ID
- PTR_TO_ARENA

atomic_ptr_type_ok() allows the following pointer types:
- CONST_PTR_TO_MAP
- PTR_TO_MAP_VALUE
- PTR_TO_MAP_KEY
- PTR_TO_STACK
- PTR_TO_BTF_ID
- PTR_TO_MEM
- PTR_TO_ARENA
- PTR_TO_BUF
- PTR_TO_FUNC
- CONST_PTR_TO_DYNPTR

One has to check rewrites applied by convert_ctx_accesses() to atomic
instructions to reason about correctness of the conditional
save_aux_ptr_type() call.

If is_arena_reg() guard is removed from save_aux_ptr_type() we risk to
reject programs that do atomic load/store where same instruction is
used to modify a pointer that can be either of the above types.
I speculate that this is not the problem, as do_check() processing for
BPF_STX/BPF_LDX already calls save_aux_ptr_type() unconditionally.

[...]


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH bpf-next v1 7/8] selftests/bpf: Add selftests for load-acquire and store-release instructions
  2025-01-29  1:06   ` Eduard Zingerman
  2025-01-29  2:07     ` Ihor Solodrai
@ 2025-01-30  0:03     ` Peilin Ye
  2025-02-04  0:30     ` Peilin Ye
  2 siblings, 0 replies; 27+ messages in thread
From: Peilin Ye @ 2025-01-30  0:03 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: bpf, linux-arm-kernel, bpf, Xu Kuohai, David Vernet,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Jonathan Corbet,
	Paul E. McKenney, Puranjay Mohan, Catalin Marinas, Will Deacon,
	Quentin Monnet, Mykola Lysenko, Shuah Khan, Josh Don,
	Barret Rhoden, Neel Natu, Benjamin Segall, linux-kernel,
	Ihor Solodrai

On Tue, Jan 28, 2025 at 05:06:03PM -0800, Eduard Zingerman wrote:
> On Sat, 2025-01-25 at 02:19 +0000, Peilin Ye wrote:
> > All new tests depend on the pre-defined __BPF_FEATURE_LOAD_ACQ_STORE_REL
> > feature macro, which implies -mcpu>=v4.
> 
> This restriction would mean that tests are skipped on BPF CI, as it
> currently runs using llvm 17 and 18. Instead, I suggest using some
> macro hiding an inline assembly as below:
> 
> 	asm volatile (".8byte %[insn];"
> 	              :
> 	              : [insn]"i"(*(long *)&(BPF_RAW_INSN(...)))
> 	              : /* correct clobbers here */);
> 
> See the usage of the __imm_insn() macro in the test suite.

I see, I'll do this in v2.

> Also, "BPF_ATOMIC loads from R%d %s is not allowed\n" and
>       "BPF_ATOMIC stores into R%d %s is not allowed\n"
> situations are not tested.

Thanks!

> > --- a/tools/testing/selftests/bpf/prog_tests/atomics.c
> > +++ b/tools/testing/selftests/bpf/prog_tests/atomics.c
> > @@ -162,6 +162,56 @@ static void test_xchg(struct atomics_lskel *skel)
> >  	ASSERT_EQ(skel->bss->xchg32_result, 1, "xchg32_result");
> >  }
> 
> Nit: Given the tests in verifier_load_acquire.c and verifier_store_release.c
>      that use __retval annotation, are these tests really necessary?
>      (assuming that verifier_store_release.c tests are modified to read
>       stored location into r0 before exit).
> 
> > +static void test_load_acquire(struct atomics_lskel *skel)

Ah, right.  I'll drop them and modify verifier_store_release.c
accordingly.

> > --- a/tools/testing/selftests/bpf/progs/arena_atomics.c
> > +++ b/tools/testing/selftests/bpf/progs/arena_atomics.c
> [...]
> 
> > +SEC("raw_tp/sys_enter")
> > +int load_acquire(const void *ctx)
> > +{
> > +	if (pid != (bpf_get_current_pid_tgid() >> 32))
> > +		return 0;
> 
> Nit: This check is not needed, since bpf_prog_test_run_opts() is used
>      to run the tests.

Got it!

Thanks,
Peilin Ye


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH bpf-next v1 3/8] bpf: Introduce load-acquire and store-release instructions
  2025-01-25  2:18 ` [PATCH bpf-next v1 3/8] bpf: Introduce load-acquire and store-release instructions Peilin Ye
  2025-01-29  0:19   ` Eduard Zingerman
  2025-01-29  1:30   ` Eduard Zingerman
@ 2025-01-30  0:41   ` Alexei Starovoitov
  2025-01-30  3:38     ` Peilin Ye
  2 siblings, 1 reply; 27+ messages in thread
From: Alexei Starovoitov @ 2025-01-30  0:41 UTC (permalink / raw)
  To: Peilin Ye
  Cc: bpf, linux-arm-kernel, bpf, Xu Kuohai, Eduard Zingerman,
	David Vernet, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Jonathan Corbet, Paul E. McKenney, Puranjay Mohan,
	Catalin Marinas, Will Deacon, Quentin Monnet, Mykola Lysenko,
	Shuah Khan, Josh Don, Barret Rhoden, Neel Natu, Benjamin Segall,
	LKML

> diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> index da729cbbaeb9..ab082ab9d535 100644
> --- a/kernel/bpf/core.c
> +++ b/kernel/bpf/core.c
> @@ -1663,14 +1663,17 @@ EXPORT_SYMBOL_GPL(__bpf_call_base);
>         INSN_3(JMP, JSET, K),                   \
>         INSN_2(JMP, JA),                        \
>         INSN_2(JMP32, JA),                      \
> +       /* Atomic operations. */                \
> +       INSN_3(STX, ATOMIC, B),                 \
> +       INSN_3(STX, ATOMIC, H),                 \
> +       INSN_3(STX, ATOMIC, W),                 \
> +       INSN_3(STX, ATOMIC, DW),                \
>         /* Store instructions. */               \
>         /*   Register based. */                 \
>         INSN_3(STX, MEM,  B),                   \
>         INSN_3(STX, MEM,  H),                   \
>         INSN_3(STX, MEM,  W),                   \
>         INSN_3(STX, MEM,  DW),                  \
> -       INSN_3(STX, ATOMIC, W),                 \
> -       INSN_3(STX, ATOMIC, DW),                \
>         /*   Immediate based. */                \
>         INSN_3(ST, MEM, B),                     \
>         INSN_3(ST, MEM, H),                     \
> @@ -2169,6 +2172,8 @@ static u64 ___bpf_prog_run(u64 *regs, const struct bpf_insn *insn)
>
>         STX_ATOMIC_DW:
>         STX_ATOMIC_W:
> +       STX_ATOMIC_H:
> +       STX_ATOMIC_B:
>                 switch (IMM) {
>                 ATOMIC_ALU_OP(BPF_ADD, add)
>                 ATOMIC_ALU_OP(BPF_AND, and)

STX_ATOMI_[BH] looks wrong.
It will do atomic64_*() ops in wrong size.
BPF_INSN_MAP() applies to bpf_opcode_in_insntable()
and the verifier will allow such new insns.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH bpf-next v1 8/8] bpf, docs: Update instruction-set.rst for load-acquire and store-release instructions
  2025-01-25  2:19 ` [PATCH bpf-next v1 8/8] bpf, docs: Update instruction-set.rst " Peilin Ye
@ 2025-01-30  0:44   ` Alexei Starovoitov
  2025-01-30  7:33     ` Peilin Ye
  0 siblings, 1 reply; 27+ messages in thread
From: Alexei Starovoitov @ 2025-01-30  0:44 UTC (permalink / raw)
  To: Peilin Ye
  Cc: bpf, linux-arm-kernel, bpf, Xu Kuohai, Eduard Zingerman,
	David Vernet, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Jonathan Corbet, Paul E. McKenney, Puranjay Mohan,
	Catalin Marinas, Will Deacon, Quentin Monnet, Mykola Lysenko,
	Shuah Khan, Josh Don, Barret Rhoden, Neel Natu, Benjamin Segall,
	LKML

On Fri, Jan 24, 2025 at 6:19 PM Peilin Ye <yepeilin@google.com> wrote:
>
> Update documentation for the new load-acquire and store-release
> instructions.  Rename existing atomic operations as "atomic
> read-modify-write (RMW) operations".
>
> Following RFC 9669, section 7.3. "Adding Instructions", create new
> conformance groups "atomic32v2" and "atomic64v2", where:
>
>   * atomic32v2: includes all instructions in "atomic32", plus the new
>                 8-bit, 16-bit and 32-bit atomic load-acquire and
>                 store-release instructions
>
>   * atomic64v2: includes all instructions in "atomic64" and
>                 "atomic32v2", plus the new 64-bit atomic load-acquire
>                 and store-release instructions
>
> Cc: bpf@ietf.org
> Signed-off-by: Peilin Ye <yepeilin@google.com>
> ---
>  .../bpf/standardization/instruction-set.rst   | 114 +++++++++++++++---
>  1 file changed, 98 insertions(+), 16 deletions(-)
>
> diff --git a/Documentation/bpf/standardization/instruction-set.rst b/Documentation/bpf/standardization/instruction-set.rst
> index ab820d565052..86917932e9ef 100644
> --- a/Documentation/bpf/standardization/instruction-set.rst
> +++ b/Documentation/bpf/standardization/instruction-set.rst
> @@ -139,8 +139,14 @@ This document defines the following conformance groups:
>    specification unless otherwise noted.
>  * base64: includes base32, plus instructions explicitly noted
>    as being in the base64 conformance group.
> -* atomic32: includes 32-bit atomic operation instructions (see `Atomic operations`_).
> -* atomic64: includes atomic32, plus 64-bit atomic operation instructions.
> +* atomic32: includes 32-bit atomic read-modify-write instructions (see
> +  `Atomic operations`_).
> +* atomic32v2: includes atomic32, plus 8-bit, 16-bit and 32-bit atomic
> +  load-acquire and store-release instructions.
> +* atomic64: includes atomic32, plus 64-bit atomic read-modify-write
> +  instructions.
> +* atomic64v2: unifies atomic32v2 and atomic64, plus 64-bit atomic load-acquire
> +  and store-release instructions.
>  * divmul32: includes 32-bit division, multiplication, and modulo instructions.
>  * divmul64: includes divmul32, plus 64-bit division, multiplication,
>    and modulo instructions.
> @@ -653,20 +659,31 @@ Atomic operations are operations that operate on memory and can not be
>  interrupted or corrupted by other access to the same memory region
>  by other BPF programs or means outside of this specification.
>
> -All atomic operations supported by BPF are encoded as store operations
> -that use the ``ATOMIC`` mode modifier as follows:
> +All atomic operations supported by BPF are encoded as ``STX`` instructions
> +that use the ``ATOMIC`` mode modifier, with the 'imm' field encoding the
> +actual atomic operation.  These operations are categorized based on the second
> +lowest nibble (bits 4-7) of the 'imm' field:
>
> -* ``{ATOMIC, W, STX}`` for 32-bit operations, which are
> +* ``ATOMIC_LOAD`` and ``ATOMIC_STORE`` indicate atomic load and store
> +  operations, respectively (see `Atomic load and store operations`_).
> +* All other defined values indicate an atomic read-modify-write operation, as
> +  described in the following section.
> +
> +Atomic read-modify-write operations
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +The atomic read-modify-write (RMW) operations are encoded as follows:
> +
> +* ``{ATOMIC, W, STX}`` for 32-bit RMW operations, which are
>    part of the "atomic32" conformance group.
> -* ``{ATOMIC, DW, STX}`` for 64-bit operations, which are
> +* ``{ATOMIC, DW, STX}`` for 64-bit RMW operations, which are
>    part of the "atomic64" conformance group.
> -* 8-bit and 16-bit wide atomic operations are not supported.
> +* 8-bit and 16-bit wide atomic RMW operations are not supported.
>
> -The 'imm' field is used to encode the actual atomic operation.
> -Simple atomic operation use a subset of the values defined to encode
> -arithmetic operations in the 'imm' field to encode the atomic operation:
> +Simple atomic RMW operation use a subset of the values defined to encode
> +arithmetic operations in the 'imm' field to encode the atomic RMW operation:
>
> -.. table:: Simple atomic operations
> +.. table:: Simple atomic read-modify-write operations
>
>    ========  =====  ===========
>    imm       value  description
> @@ -686,10 +703,10 @@ arithmetic operations in the 'imm' field to encode the atomic operation:
>
>    *(u64 *)(dst + offset) += src
>
> -In addition to the simple atomic operations, there also is a modifier and
> -two complex atomic operations:
> +In addition to the simple atomic RMW operations, there also is a modifier and
> +two complex atomic RMW operations:
>
> -.. table:: Complex atomic operations
> +.. table:: Complex atomic read-modify-write operations
>
>    ===========  ================  ===========================
>    imm          value             description
> @@ -699,8 +716,8 @@ two complex atomic operations:
>    CMPXCHG      0xf0 | FETCH      atomic compare and exchange
>    ===========  ================  ===========================
>
> -The ``FETCH`` modifier is optional for simple atomic operations, and
> -always set for the complex atomic operations.  If the ``FETCH`` flag
> +The ``FETCH`` modifier is optional for simple atomic RMW operations, and
> +always set for the complex atomic RMW operations.  If the ``FETCH`` flag
>  is set, then the operation also overwrites ``src`` with the value that
>  was in memory before it was modified.
>
> @@ -713,6 +730,71 @@ The ``CMPXCHG`` operation atomically compares the value addressed by
>  value that was at ``dst + offset`` before the operation is zero-extended
>  and loaded back to ``R0``.
>
> +Atomic load and store operations
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +To encode an atomic load or store operation, the lowest 8 bits of the 'imm'
> +field are divided as follows::
> +
> +  +-+-+-+-+-+-+-+-+
> +  | type  | order |
> +  +-+-+-+-+-+-+-+-+
> +
> +**type**
> +  The operation type is one of:
> +
> +.. table:: Atomic load and store operation types
> +
> +  ============  =====  ============
> +  type          value  description
> +  ============  =====  ============
> +  ATOMIC_LOAD   0x1    atomic load
> +  ATOMIC_STORE  0x2    atomic store
> +  ============  =====  ============
> +
> +**order**
> +  The memory order is one of:
> +
> +.. table:: Memory orders
> +
> +  =======  =====  =======================
> +  order    value  description
> +  =======  =====  =======================
> +  RELAXED  0x0    relaxed
> +  ACQUIRE  0x1    acquire
> +  RELEASE  0x2    release
> +  ACQ_REL  0x3    acquire and release
> +  SEQ_CST  0x4    sequentially consistent
> +  =======  =====  =======================

I understand that this is inspired by C,
but what are the chances this will map meaningfully to hw?
What JITs suppose to do with all other combinations ?

> +Currently the following combinations of ``type`` and ``order`` are allowed:
> +
> +.. table:: Atomic load and store operations
> +
> +  ========= =====  ====================
> +  imm       value  description
> +  ========= =====  ====================
> +  LOAD_ACQ  0x11   atomic load-acquire
> +  STORE_REL 0x22   atomic store-release
> +  ========= =====  ====================

Should we do LOAD_ACQ=1 and STORE_REL=2 and
do not add anything else?

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH bpf-next v1 3/8] bpf: Introduce load-acquire and store-release instructions
  2025-01-29 22:42       ` Eduard Zingerman
@ 2025-01-30  3:10         ` Peilin Ye
  0 siblings, 0 replies; 27+ messages in thread
From: Peilin Ye @ 2025-01-30  3:10 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: bpf, linux-arm-kernel, bpf, Xu Kuohai, David Vernet,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Jonathan Corbet,
	Paul E. McKenney, Puranjay Mohan, Catalin Marinas, Will Deacon,
	Quentin Monnet, Mykola Lysenko, Shuah Khan, Josh Don,
	Barret Rhoden, Neel Natu, Benjamin Segall, linux-kernel

On Wed, Jan 29, 2025 at 02:42:41PM -0800, Eduard Zingerman wrote:
> On Wed, 2025-01-29 at 22:04 +0000, Peilin Ye wrote:
> > Thanks a lot for that; would you mind sharing a bit more on how you
> > reasoned about it (i.e., why is it OK to save_aux_ptr_type()
> > unconditionally) ?
> 
> Well, save_aux_ptr_type() does two things:
> - if there is no env->insn_aux_data[env->insn_idx].ptr_type associated
>   with the instruction it saves one;
> - if there is .ptr_type, it checks if a new one is compatible and
>   errors out if it's not.
> 
> The .ptr_type is used in convert_ctx_accesses() to rewrite access
> instruction (STX/LDX, atomic or not) in a way specific to pointer
> type.
> 
> So, doing save_aux_ptr_type() conditionally is already sketchy,
> as there is a risk to miss if some instruction is used in a context
> where pointer type requires different rewrites.
> 
> convert_ctx_accesses() rewrites instruction for pointer following
> types:
> - PTR_TO_CTX
> - PTR_TO_SOCKET
> - PTR_TO_SOCK_COMMON
> - PTR_TO_TCP_SOCK
> - PTR_TO_XDP_SOCK
> - PTR_TO_BTF_ID
> - PTR_TO_ARENA
> 
> atomic_ptr_type_ok() allows the following pointer types:
> - CONST_PTR_TO_MAP
> - PTR_TO_MAP_VALUE
> - PTR_TO_MAP_KEY
> - PTR_TO_STACK
> - PTR_TO_BTF_ID
> - PTR_TO_MEM
> - PTR_TO_ARENA
> - PTR_TO_BUF
> - PTR_TO_FUNC
> - CONST_PTR_TO_DYNPTR
> 
> One has to check rewrites applied by convert_ctx_accesses() to atomic
> instructions to reason about correctness of the conditional
> save_aux_ptr_type() call.
>
> If is_arena_reg() guard is removed from save_aux_ptr_type() we risk to
> reject programs that do atomic load/store where same instruction is
> used to modify a pointer that can be either of the above types.
> I speculate that this is not the problem, as do_check() processing for
> BPF_STX/BPF_LDX already calls save_aux_ptr_type() unconditionally.

I see, thanks for the explanation!

Peilin Ye


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH bpf-next v1 3/8] bpf: Introduce load-acquire and store-release instructions
  2025-01-30  0:41   ` Alexei Starovoitov
@ 2025-01-30  3:38     ` Peilin Ye
  0 siblings, 0 replies; 27+ messages in thread
From: Peilin Ye @ 2025-01-30  3:38 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: bpf, linux-arm-kernel, bpf, Xu Kuohai, Eduard Zingerman,
	David Vernet, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Jonathan Corbet, Paul E. McKenney, Puranjay Mohan,
	Catalin Marinas, Will Deacon, Quentin Monnet, Mykola Lysenko,
	Shuah Khan, Josh Don, Barret Rhoden, Neel Natu, Benjamin Segall,
	LKML

Hi Alexei,

On Wed, Jan 29, 2025 at 04:41:31PM -0800, Alexei Starovoitov wrote:
> > diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> > index da729cbbaeb9..ab082ab9d535 100644
> > --- a/kernel/bpf/core.c
> > +++ b/kernel/bpf/core.c
> > @@ -1663,14 +1663,17 @@ EXPORT_SYMBOL_GPL(__bpf_call_base);
> >         INSN_3(JMP, JSET, K),                   \
> >         INSN_2(JMP, JA),                        \
> >         INSN_2(JMP32, JA),                      \
> > +       /* Atomic operations. */                \
> > +       INSN_3(STX, ATOMIC, B),                 \
> > +       INSN_3(STX, ATOMIC, H),                 \
> > +       INSN_3(STX, ATOMIC, W),                 \
> > +       INSN_3(STX, ATOMIC, DW),                \
> >         /* Store instructions. */               \
> >         /*   Register based. */                 \
> >         INSN_3(STX, MEM,  B),                   \
> >         INSN_3(STX, MEM,  H),                   \
> >         INSN_3(STX, MEM,  W),                   \
> >         INSN_3(STX, MEM,  DW),                  \
> > -       INSN_3(STX, ATOMIC, W),                 \
> > -       INSN_3(STX, ATOMIC, DW),                \
> >         /*   Immediate based. */                \
> >         INSN_3(ST, MEM, B),                     \
> >         INSN_3(ST, MEM, H),                     \
> > @@ -2169,6 +2172,8 @@ static u64 ___bpf_prog_run(u64 *regs, const struct bpf_insn *insn)
> >
> >         STX_ATOMIC_DW:
> >         STX_ATOMIC_W:
> > +       STX_ATOMIC_H:
> > +       STX_ATOMIC_B:
> >                 switch (IMM) {
> >                 ATOMIC_ALU_OP(BPF_ADD, add)
> >                 ATOMIC_ALU_OP(BPF_AND, and)
> 
> STX_ATOMI_[BH] looks wrong.
> It will do atomic64_*() ops in wrong size.
> BPF_INSN_MAP() applies to bpf_opcode_in_insntable()
> and the verifier will allow such new insns.

We still have this check in check_atomic():

  if (BPF_SIZE(insn->code) != BPF_W && BPF_SIZE(insn->code) != BPF_DW) {
          verbose(env, "invalid atomic operand size\n");
          return -EINVAL;
  }

(moved to check_atomic_rmw() in PATCH 2/8)

Looks like it cannot be triggered before this patch, since
STX_ATOMIC_[BH] would've already been rejected by that
bpf_opcode_in_insntable() check before reaching check_atomic().

I agree that the interpreter code handling RMW atomics might now look a
bit confusing though.  In v2 I'll refactor that part and/or add comments
to make it clearer in the code that:

  * only W and DW are allowed for atomic RMW
  * all of B, H, W and DW are allowed for atomic load/store

Thanks,
Peilin Ye


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH bpf-next v1 8/8] bpf, docs: Update instruction-set.rst for load-acquire and store-release instructions
  2025-01-30  0:44   ` Alexei Starovoitov
@ 2025-01-30  7:33     ` Peilin Ye
  0 siblings, 0 replies; 27+ messages in thread
From: Peilin Ye @ 2025-01-30  7:33 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: bpf, linux-arm-kernel, bpf, Xu Kuohai, Eduard Zingerman,
	David Vernet, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Jonathan Corbet, Paul E. McKenney, Puranjay Mohan,
	Catalin Marinas, Will Deacon, Quentin Monnet, Mykola Lysenko,
	Shuah Khan, Josh Don, Barret Rhoden, Neel Natu, Benjamin Segall,
	LKML, Yingchi Long

+Cc: Yingchi Long

On Wed, Jan 29, 2025 at 04:44:02PM -0800, Alexei Starovoitov wrote:
> On Fri, Jan 24, 2025 at 6:19 PM Peilin Ye <yepeilin@google.com> wrote:
> > +Atomic load and store operations
> > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > +
> > +To encode an atomic load or store operation, the lowest 8 bits of the 'imm'
> > +field are divided as follows::
> > +
> > +  +-+-+-+-+-+-+-+-+
> > +  | type  | order |
> > +  +-+-+-+-+-+-+-+-+
> > +
> > +**type**
> > +  The operation type is one of:
> > +
> > +.. table:: Atomic load and store operation types
> > +
> > +  ============  =====  ============
> > +  type          value  description
> > +  ============  =====  ============
> > +  ATOMIC_LOAD   0x1    atomic load
> > +  ATOMIC_STORE  0x2    atomic store
> > +  ============  =====  ============
> > +
> > +**order**
> > +  The memory order is one of:
> > +
> > +.. table:: Memory orders
> > +
> > +  =======  =====  =======================
> > +  order    value  description
> > +  =======  =====  =======================
> > +  RELAXED  0x0    relaxed
> > +  ACQUIRE  0x1    acquire
> > +  RELEASE  0x2    release
> > +  ACQ_REL  0x3    acquire and release
> > +  SEQ_CST  0x4    sequentially consistent
> > +  =======  =====  =======================
> 
> I understand that this is inspired by C,
> but what are the chances this will map meaningfully to hw?
> What JITs suppose to do with all other combinations ?

For context, those memorder flags were added after a discussion about
the SEQ_CST case on GitHub [1].

Do you anticipate we'll ever need BPF atomic seq_cst load/store
instructions?

If yes, I think we either:

  (a) add more flags to imm<4-7>: maybe LOAD_SEQ_CST (0x3) and
      STORE_SEQ_CST (0x6); need to skip OR (0x4) and AND (0x5) used by
      RMW atomics
  (b) specify memorder in imm<0-3>

I chose (b) for fewer "What would be a good numerical value so that RMW
atomics won't need to use it in imm<4-7>?" questions to answer.

If we're having dedicated fields for memorder, I think it's better to
define all possible values once and for all, just so that e.g. 0x2 will
always mean RELEASE in a memorder field.  Initially I defined all six of
them [2], then Yonghong suggested dropping CONSUME [3].

[1] https://github.com/llvm/llvm-project/pull/108636#discussion_r1817555681
[2] https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2023/n4950.pdf#page=1817
[3] https://github.com/llvm/llvm-project/pull/108636#discussion_r1819380536

Thanks,
Peilin Ye


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH bpf-next v1 7/8] selftests/bpf: Add selftests for load-acquire and store-release instructions
  2025-01-29  1:06   ` Eduard Zingerman
  2025-01-29  2:07     ` Ihor Solodrai
  2025-01-30  0:03     ` Peilin Ye
@ 2025-02-04  0:30     ` Peilin Ye
  2025-02-04  0:52       ` Eduard Zingerman
  2 siblings, 1 reply; 27+ messages in thread
From: Peilin Ye @ 2025-02-04  0:30 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: bpf, linux-arm-kernel, bpf, Xu Kuohai, David Vernet,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Jonathan Corbet,
	Paul E. McKenney, Puranjay Mohan, Catalin Marinas, Will Deacon,
	Quentin Monnet, Mykola Lysenko, Shuah Khan, Josh Don,
	Barret Rhoden, Neel Natu, Benjamin Segall, linux-kernel

Hi Eduard,

One more question (for my understanding):

On Tue, Jan 28, 2025 at 05:06:03PM -0800, Eduard Zingerman wrote:
> On Sat, 2025-01-25 at 02:19 +0000, Peilin Ye wrote:
> > --- a/tools/testing/selftests/bpf/progs/arena_atomics.c
> > +++ b/tools/testing/selftests/bpf/progs/arena_atomics.c
> [...]
> 
> > +SEC("raw_tp/sys_enter")
> > +int load_acquire(const void *ctx)
> > +{
> > +	if (pid != (bpf_get_current_pid_tgid() >> 32))
> > +		return 0;
> 
> Nit: This check is not needed, since bpf_prog_test_run_opts() is used
>      to run the tests.

Could you explain a bit more why it's not needed?

I read commit 0f4feacc9155 ("selftests/bpf: Adding pid filtering for
atomics test") which added those 'pid' checks to atomics/ tests.  The
commit message [1] says the purpose was to "make atomics test able to
run in parallel with other tests", which I couldn't understand.

How using bpf_prog_test_run_opts() makes those 'pid' checks unnecessary?

[1] https://lore.kernel.org/bpf/20211006185619.364369-11-fallentree@fb.com/#r

Thanks,
Peilin Ye


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH bpf-next v1 7/8] selftests/bpf: Add selftests for load-acquire and store-release instructions
  2025-02-04  0:30     ` Peilin Ye
@ 2025-02-04  0:52       ` Eduard Zingerman
  2025-02-04  1:29         ` Peilin Ye
  0 siblings, 1 reply; 27+ messages in thread
From: Eduard Zingerman @ 2025-02-04  0:52 UTC (permalink / raw)
  To: Peilin Ye
  Cc: bpf, linux-arm-kernel, bpf, Xu Kuohai, David Vernet,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Jonathan Corbet,
	Paul E. McKenney, Puranjay Mohan, Catalin Marinas, Will Deacon,
	Quentin Monnet, Mykola Lysenko, Shuah Khan, Josh Don,
	Barret Rhoden, Neel Natu, Benjamin Segall, linux-kernel

On Tue, 2025-02-04 at 00:30 +0000, Peilin Ye wrote:
> Hi Eduard,
> 
> One more question (for my understanding):
> 
> On Tue, Jan 28, 2025 at 05:06:03PM -0800, Eduard Zingerman wrote:
> > On Sat, 2025-01-25 at 02:19 +0000, Peilin Ye wrote:
> > > --- a/tools/testing/selftests/bpf/progs/arena_atomics.c
> > > +++ b/tools/testing/selftests/bpf/progs/arena_atomics.c
> > [...]
> > 
> > > +SEC("raw_tp/sys_enter")
> > > +int load_acquire(const void *ctx)
> > > +{
> > > +	if (pid != (bpf_get_current_pid_tgid() >> 32))
> > > +		return 0;
> > 
> > Nit: This check is not needed, since bpf_prog_test_run_opts() is used
> >      to run the tests.
> 
> Could you explain a bit more why it's not needed?
> 
> I read commit 0f4feacc9155 ("selftests/bpf: Adding pid filtering for
> atomics test") which added those 'pid' checks to atomics/ tests.  The
> commit message [1] says the purpose was to "make atomics test able to
> run in parallel with other tests", which I couldn't understand.
> 
> How using bpf_prog_test_run_opts() makes those 'pid' checks unnecessary?
> 
> [1] https://lore.kernel.org/bpf/20211006185619.364369-11-fallentree@fb.com/#r


Hi Peilin,

The entry point for the test looks as follows:

    void test_arena_atomics(void)
    {
    	...
    	skel = arena_atomics__open();
    	if (!ASSERT_OK_PTR(skel, "arena atomics skeleton open"))
    		return;
    
    	if (skel->data->skip_tests) { ... }
    	err = arena_atomics__load(skel);
    	if (!ASSERT_OK(err, "arena atomics skeleton load"))
    		return;
    	skel->bss->pid = getpid();
    
    	if (test__start_subtest("add"))
    		test_add(skel);
            ...
    
    cleanup:
    	arena_atomics__destroy(skel);
    }

Note arena_atomics__{open,load} calls but absence of the
arena_atomics__attach call. W/o arena_atomics__attach call the
programs would not be hooked to the designated extension points,
e.g. "raw_tp/sys_enter".

The bpf_prog_test_run_opts() invokes BPF_PROG_TEST_RUN command of the
bpf system call, which does not attach the program either,
but executes jitted code directly with fake context.
(See bpf_prog_ops->test_run callback (method?) and
 bpf_prog_test_run_raw_tp()).

Same happens in prog{,_tests}/arena.c: no attachment happens after
commit [2]. Commit [1] is unnecessary after [2].

[2] commit 04fcb5f9a104 ("selftests/bpf: Migrate from bpf_prog_test_run")


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH bpf-next v1 7/8] selftests/bpf: Add selftests for load-acquire and store-release instructions
  2025-02-04  0:52       ` Eduard Zingerman
@ 2025-02-04  1:29         ` Peilin Ye
  0 siblings, 0 replies; 27+ messages in thread
From: Peilin Ye @ 2025-02-04  1:29 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: bpf, linux-arm-kernel, bpf, Xu Kuohai, David Vernet,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Jonathan Corbet,
	Paul E. McKenney, Puranjay Mohan, Catalin Marinas, Will Deacon,
	Quentin Monnet, Mykola Lysenko, Shuah Khan, Josh Don,
	Barret Rhoden, Neel Natu, Benjamin Segall, linux-kernel

On Mon, Feb 03, 2025 at 04:52:52PM -0800, Eduard Zingerman wrote:
> > > > --- a/tools/testing/selftests/bpf/progs/arena_atomics.c
> > > > +++ b/tools/testing/selftests/bpf/progs/arena_atomics.c
> > > [...]
> > > 
> > > > +SEC("raw_tp/sys_enter")
> > > > +int load_acquire(const void *ctx)
> > > > +{
> > > > +	if (pid != (bpf_get_current_pid_tgid() >> 32))
> > > > +		return 0;
> > > 
> > > Nit: This check is not needed, since bpf_prog_test_run_opts() is used
> > >      to run the tests.
> > 
> > Could you explain a bit more why it's not needed?
> > 
> > I read commit 0f4feacc9155 ("selftests/bpf: Adding pid filtering for
> > atomics test") which added those 'pid' checks to atomics/ tests.  The
> > commit message [1] says the purpose was to "make atomics test able to
> > run in parallel with other tests", which I couldn't understand.
> > 
> > How using bpf_prog_test_run_opts() makes those 'pid' checks unnecessary?
> > 
> > [1] https://lore.kernel.org/bpf/20211006185619.364369-11-fallentree@fb.com/#r
> 
> Hi Peilin,
> 
> The entry point for the test looks as follows:
> 
>     void test_arena_atomics(void)
>     {
>     	...
>     	skel = arena_atomics__open();
>     	if (!ASSERT_OK_PTR(skel, "arena atomics skeleton open"))
>     		return;
>     
>     	if (skel->data->skip_tests) { ... }
>     	err = arena_atomics__load(skel);
>     	if (!ASSERT_OK(err, "arena atomics skeleton load"))
>     		return;
>     	skel->bss->pid = getpid();
>     
>     	if (test__start_subtest("add"))
>     		test_add(skel);
>             ...
>     
>     cleanup:
>     	arena_atomics__destroy(skel);
>     }
> 
> Note arena_atomics__{open,load} calls but absence of the
> arena_atomics__attach call. W/o arena_atomics__attach call the
> programs would not be hooked to the designated extension points,
> e.g. "raw_tp/sys_enter".
> 
> The bpf_prog_test_run_opts() invokes BPF_PROG_TEST_RUN command of the
> bpf system call, which does not attach the program either,
> but executes jitted code directly with fake context.
> (See bpf_prog_ops->test_run callback (method?) and
>  bpf_prog_test_run_raw_tp()).
> 
> Same happens in prog{,_tests}/arena.c: no attachment happens after
> commit [2]. Commit [1] is unnecessary after [2].
> 
> [2] commit 04fcb5f9a104 ("selftests/bpf: Migrate from bpf_prog_test_run")

I see.  Thanks for the quick reply as always!

Peilin Ye


^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2025-02-04  1:29 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-25  2:16 [PATCH bpf-next v1 0/8] Introduce load-acquire and store-release BPF instructions Peilin Ye
2025-01-25  2:17 ` [PATCH bpf-next v1 1/8] bpf/verifier: Factor out atomic_ptr_type_ok() Peilin Ye
2025-01-25  2:18 ` [PATCH bpf-next v1 2/8] bpf/verifier: Factor out check_atomic_rmw() Peilin Ye
2025-01-25  2:18 ` [PATCH bpf-next v1 3/8] bpf: Introduce load-acquire and store-release instructions Peilin Ye
2025-01-29  0:19   ` Eduard Zingerman
2025-01-29 22:04     ` Peilin Ye
2025-01-29 22:42       ` Eduard Zingerman
2025-01-30  3:10         ` Peilin Ye
2025-01-29  1:30   ` Eduard Zingerman
2025-01-29 22:17     ` Peilin Ye
2025-01-30  0:41   ` Alexei Starovoitov
2025-01-30  3:38     ` Peilin Ye
2025-01-25  2:18 ` [PATCH bpf-next v1 4/8] arm64: insn: Add BIT(23) to {load,store}_ex's mask Peilin Ye
2025-01-25  2:19 ` [PATCH bpf-next v1 5/8] arm64: insn: Add load-acquire and store-release instructions Peilin Ye
2025-01-25  2:19 ` [PATCH bpf-next v1 6/8] bpf, arm64: Support " Peilin Ye
2025-01-25  2:19 ` [PATCH bpf-next v1 7/8] selftests/bpf: Add selftests for " Peilin Ye
2025-01-29  1:06   ` Eduard Zingerman
2025-01-29  2:07     ` Ihor Solodrai
2025-01-29  2:07       ` [Bpf] " Ihor Solodrai
2025-01-29  2:17       ` Eduard Zingerman
2025-01-30  0:03     ` Peilin Ye
2025-02-04  0:30     ` Peilin Ye
2025-02-04  0:52       ` Eduard Zingerman
2025-02-04  1:29         ` Peilin Ye
2025-01-25  2:19 ` [PATCH bpf-next v1 8/8] bpf, docs: Update instruction-set.rst " Peilin Ye
2025-01-30  0:44   ` Alexei Starovoitov
2025-01-30  7:33     ` Peilin Ye

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).