linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH bpf-next v2 0/9] Introduce load-acquire and store-release BPF instructions
@ 2025-02-07  2:04 Peilin Ye
  2025-02-07  2:05 ` [PATCH bpf-next v2 1/9] bpf/verifier: Factor out atomic_ptr_type_ok() Peilin Ye
                   ` (9 more replies)
  0 siblings, 10 replies; 35+ messages in thread
From: Peilin Ye @ 2025-02-07  2:04 UTC (permalink / raw)
  To: bpf, linux-arm-kernel
  Cc: Peilin Ye, bpf, Xu Kuohai, Eduard Zingerman, David Vernet,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Jonathan Corbet,
	Paul E. McKenney, Puranjay Mohan, Ilya Leoshkevich,
	Heiko Carstens, Vasily Gorbik, Catalin Marinas, Will Deacon,
	Quentin Monnet, Mykola Lysenko, Shuah Khan, Ihor Solodrai,
	Yingchi Long, Josh Don, Barret Rhoden, Neel Natu, Benjamin Segall,
	linux-kernel

Hi all!

This patchset adds kernel support for BPF load-acquire and store-release
instructions (for background, please see [1]), mainly including
core/verifier, arm64 JIT compiler, and Documentation/ changes.  x86-64
and riscv64 are also planned to be supported.  The corresponding LLVM
changes can be found at:
  https://github.com/llvm/llvm-project/pull/108636

An atomic load/store is a BPF_STX | BPF_ATOMIC instruction, with the
lowest 8 bits of the 'imm' field in the following format:

  +-+-+-+-+-+-+-+-+
  | type  | order |
  +-+-+-+-+-+-+-+-+

  o 'type' indicates ATOMIC_LOAD (0x1) or ATOMIC_STORE (0x2)
  o 'order' is one of RELAXED (0x0), ACQUIRE (0x1), RELEASE (0x2),
    ACQ_REL (0x3, acquire-and-release) and SEQ_CST (0x4, sequentially
    consistent)

Currently only two combinations are legal:

  o LOAD_ACQ (0x11: ATOMIC_LOAD, ACQUIRE)
  o STORE_REL (0x22: ATOMIC_STORE, RELEASE)

During v1 there was a discussion on whether it is necessary to have a
dedicated 'order' field and define all possible values, considering that
we only allow two combinations at the moment.  Please refer to [2] for
more context.

v1: https://lore.kernel.org/all/cover.1737763916.git.yepeilin@google.com/
v1..v2 notable changes:

  o (Eduard) for x86 and s390, make
             bpf_jit_supports_insn(..., /*in_arena=*/true) return false
	     for load_acq/store_rel
  o add Eduard's Acked-by: tag
  o (Eduard) extract LDX and non-ATOMIC STX handling into helpers, see
             PATCH v2 3/9
  o allow unpriv programs to store-release pointers to stack
  o (Alexei) make it clearer in the interpreter code (PATCH v2 4/9) that
             only W and DW are supported for atomic RMW
  o test misaligned load_acq/store_rel
  o (Eduard) other selftests/ changes:
    * test load_acq/store_rel with !atomic_ptr_type_ok() pointers:
      - PTR_TO_CTX, for is_ctx_reg()
      - PTR_TO_PACKET, for is_pkt_reg()
      - PTR_TO_FLOW_KEYS, for is_flow_key_reg()
      - PTR_TO_SOCKET, for is_sk_reg()
    * drop atomics/ tests
    * delete unnecessary 'pid' checks from arena_atomics/ tests
    * avoid depending on __BPF_FEATURE_LOAD_ACQ_STORE_REL, use
      __imm_insn() and inline asm macros instead

RFC v1: https://lore.kernel.org/all/cover.1734742802.git.yepeilin@google.com
RFC v1..v1 notable changes:

  o 1-2/8: minor verifier.c refactoring patches
  o   3/8: core/verifier changes
         * (Eduard) handle load-acquire properly in backtrack_insn()
         * (Eduard) avoid skipping checks (e.g.,
                    bpf_jit_supports_insn()) for load-acquires
         * track the value stored by store-releases, just like how
           non-atomic STX instructions are handled
         * (Eduard) add missing link in commit message
         * (Eduard) always print 'r' for disasm.c changes
  o   4/8: arm64/insn: avoid treating load_acq/store_rel as
           load_ex/store_ex
  o   5/8: arm64/insn: add load_acq/store_rel
         * (Xu) include Should-Be-One (SBO) bits in "mask" and "value",
                to avoid setting fixed bits during runtime (JIT-compile
                time)
  o   6/8: arm64 JIT compiler changes
         * (Xu) use emit_a64_add_i() for "pointer + offset" to optimize
                code emission
  o   7/8: selftests
         * (Eduard) avoid adding new tests to the 'test_verifier' runner
         * add more tests, e.g., checking mark_precise logic
  o   8/8: instruction-set.rst changes

Please see the individual kernel patches (and LLVM commits) for details.
The LLVM GitHub PR will be split into three (each containing a single
commit) soon.  Feedback is much appreciated!

[1] https://lore.kernel.org/all/20240729183246.4110549-1-yepeilin@google.com/
[2] https://lore.kernel.org/bpf/Z5srM--fdH_JAgYT@google.com

Thanks,
Peilin Ye (9):
  bpf/verifier: Factor out atomic_ptr_type_ok()
  bpf/verifier: Factor out check_atomic_rmw()
  bpf/verifier: Factor out check_load_mem() and check_store_reg()
  bpf: Introduce load-acquire and store-release instructions
  arm64: insn: Add BIT(23) to {load,store}_ex's mask
  arm64: insn: Add load-acquire and store-release instructions
  bpf, arm64: Support load-acquire and store-release instructions
  selftests/bpf: Add selftests for load-acquire and store-release
    instructions
  bpf, docs: Update instruction-set.rst for load-acquire and
    store-release instructions

 .../bpf/standardization/instruction-set.rst   | 114 ++++++--
 arch/arm64/include/asm/insn.h                 |  12 +-
 arch/arm64/lib/insn.c                         |  29 ++
 arch/arm64/net/bpf_jit.h                      |  20 ++
 arch/arm64/net/bpf_jit_comp.c                 |  87 +++++-
 arch/s390/net/bpf_jit_comp.c                  |  14 +-
 arch/x86/net/bpf_jit_comp.c                   |   4 +
 include/linux/bpf.h                           |  11 +
 include/linux/filter.h                        |   2 +
 include/uapi/linux/bpf.h                      |  13 +
 kernel/bpf/core.c                             |  63 ++++-
 kernel/bpf/disasm.c                           |  12 +
 kernel/bpf/verifier.c                         | 234 +++++++++++-----
 tools/include/uapi/linux/bpf.h                |  13 +
 .../selftests/bpf/prog_tests/arena_atomics.c  |  50 ++++
 .../selftests/bpf/prog_tests/verifier.c       |   4 +
 .../selftests/bpf/progs/arena_atomics.c       |  88 ++++++
 .../bpf/progs/verifier_load_acquire.c         | 190 +++++++++++++
 .../selftests/bpf/progs/verifier_precision.c  |  47 ++++
 .../bpf/progs/verifier_store_release.c        | 262 ++++++++++++++++++
 20 files changed, 1164 insertions(+), 105 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/progs/verifier_load_acquire.c
 create mode 100644 tools/testing/selftests/bpf/progs/verifier_store_release.c

-- 
2.48.1.502.g6dc24dfdaf-goog



^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH bpf-next v2 1/9] bpf/verifier: Factor out atomic_ptr_type_ok()
  2025-02-07  2:04 [PATCH bpf-next v2 0/9] Introduce load-acquire and store-release BPF instructions Peilin Ye
@ 2025-02-07  2:05 ` Peilin Ye
  2025-02-07  2:05 ` [PATCH bpf-next v2 2/9] bpf/verifier: Factor out check_atomic_rmw() Peilin Ye
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 35+ messages in thread
From: Peilin Ye @ 2025-02-07  2:05 UTC (permalink / raw)
  To: bpf, linux-arm-kernel
  Cc: Peilin Ye, bpf, Xu Kuohai, Eduard Zingerman, David Vernet,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Jonathan Corbet,
	Paul E. McKenney, Puranjay Mohan, Ilya Leoshkevich,
	Heiko Carstens, Vasily Gorbik, Catalin Marinas, Will Deacon,
	Quentin Monnet, Mykola Lysenko, Shuah Khan, Ihor Solodrai,
	Yingchi Long, Josh Don, Barret Rhoden, Neel Natu, Benjamin Segall,
	linux-kernel

Factor out atomic_ptr_type_ok() as a helper function to be used later.

Signed-off-by: Peilin Ye <yepeilin@google.com>
---
 kernel/bpf/verifier.c | 26 +++++++++++++++++++++-----
 1 file changed, 21 insertions(+), 5 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 9971c03adfd5..0935b72fe716 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -6116,6 +6116,26 @@ static bool is_arena_reg(struct bpf_verifier_env *env, int regno)
 	return reg->type == PTR_TO_ARENA;
 }
 
+/* Return false if @regno contains a pointer whose type isn't supported for
+ * atomic instruction @insn.
+ */
+static bool atomic_ptr_type_ok(struct bpf_verifier_env *env, int regno,
+			       struct bpf_insn *insn)
+{
+	if (is_ctx_reg(env, regno))
+		return false;
+	if (is_pkt_reg(env, regno))
+		return false;
+	if (is_flow_key_reg(env, regno))
+		return false;
+	if (is_sk_reg(env, regno))
+		return false;
+	if (is_arena_reg(env, regno))
+		return bpf_jit_supports_insn(insn, true);
+
+	return true;
+}
+
 static u32 *reg2btf_ids[__BPF_REG_TYPE_MAX] = {
 #ifdef CONFIG_NET
 	[PTR_TO_SOCKET] = &btf_sock_ids[BTF_SOCK_TYPE_SOCK],
@@ -7572,11 +7592,7 @@ static int check_atomic(struct bpf_verifier_env *env, int insn_idx, struct bpf_i
 		return -EACCES;
 	}
 
-	if (is_ctx_reg(env, insn->dst_reg) ||
-	    is_pkt_reg(env, insn->dst_reg) ||
-	    is_flow_key_reg(env, insn->dst_reg) ||
-	    is_sk_reg(env, insn->dst_reg) ||
-	    (is_arena_reg(env, insn->dst_reg) && !bpf_jit_supports_insn(insn, true))) {
+	if (!atomic_ptr_type_ok(env, insn->dst_reg, insn)) {
 		verbose(env, "BPF_ATOMIC stores into R%d %s is not allowed\n",
 			insn->dst_reg,
 			reg_type_str(env, reg_state(env, insn->dst_reg)->type));
-- 
2.48.1.502.g6dc24dfdaf-goog



^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH bpf-next v2 2/9] bpf/verifier: Factor out check_atomic_rmw()
  2025-02-07  2:04 [PATCH bpf-next v2 0/9] Introduce load-acquire and store-release BPF instructions Peilin Ye
  2025-02-07  2:05 ` [PATCH bpf-next v2 1/9] bpf/verifier: Factor out atomic_ptr_type_ok() Peilin Ye
@ 2025-02-07  2:05 ` Peilin Ye
  2025-02-07  2:05 ` [PATCH bpf-next v2 3/9] bpf/verifier: Factor out check_load_mem() and check_store_reg() Peilin Ye
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 35+ messages in thread
From: Peilin Ye @ 2025-02-07  2:05 UTC (permalink / raw)
  To: bpf, linux-arm-kernel
  Cc: Peilin Ye, bpf, Xu Kuohai, Eduard Zingerman, David Vernet,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Jonathan Corbet,
	Paul E. McKenney, Puranjay Mohan, Ilya Leoshkevich,
	Heiko Carstens, Vasily Gorbik, Catalin Marinas, Will Deacon,
	Quentin Monnet, Mykola Lysenko, Shuah Khan, Ihor Solodrai,
	Yingchi Long, Josh Don, Barret Rhoden, Neel Natu, Benjamin Segall,
	linux-kernel

Currently, check_atomic() only handles atomic read-modify-write (RMW)
instructions.  Since we are planning to introduce other types of atomic
instructions (i.e., atomic load/store), extract the existing RMW
handling logic into its own function named check_atomic_rmw().

Remove the @insn_idx parameter as it is not really necessary.  Use
'env->insn_idx' instead, as in other places in verifier.c.

Signed-off-by: Peilin Ye <yepeilin@google.com>
---
 kernel/bpf/verifier.c | 53 +++++++++++++++++++++++--------------------
 1 file changed, 29 insertions(+), 24 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 0935b72fe716..39eb990ec003 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -7536,28 +7536,12 @@ static int check_mem_access(struct bpf_verifier_env *env, int insn_idx, u32 regn
 static int save_aux_ptr_type(struct bpf_verifier_env *env, enum bpf_reg_type type,
 			     bool allow_trust_mismatch);
 
-static int check_atomic(struct bpf_verifier_env *env, int insn_idx, struct bpf_insn *insn)
+static int check_atomic_rmw(struct bpf_verifier_env *env,
+			    struct bpf_insn *insn)
 {
 	int load_reg;
 	int err;
 
-	switch (insn->imm) {
-	case BPF_ADD:
-	case BPF_ADD | BPF_FETCH:
-	case BPF_AND:
-	case BPF_AND | BPF_FETCH:
-	case BPF_OR:
-	case BPF_OR | BPF_FETCH:
-	case BPF_XOR:
-	case BPF_XOR | BPF_FETCH:
-	case BPF_XCHG:
-	case BPF_CMPXCHG:
-		break;
-	default:
-		verbose(env, "BPF_ATOMIC uses invalid atomic opcode %02x\n", insn->imm);
-		return -EINVAL;
-	}
-
 	if (BPF_SIZE(insn->code) != BPF_W && BPF_SIZE(insn->code) != BPF_DW) {
 		verbose(env, "invalid atomic operand size\n");
 		return -EINVAL;
@@ -7619,12 +7603,12 @@ static int check_atomic(struct bpf_verifier_env *env, int insn_idx, struct bpf_i
 	/* Check whether we can read the memory, with second call for fetch
 	 * case to simulate the register fill.
 	 */
-	err = check_mem_access(env, insn_idx, insn->dst_reg, insn->off,
+	err = check_mem_access(env, env->insn_idx, insn->dst_reg, insn->off,
 			       BPF_SIZE(insn->code), BPF_READ, -1, true, false);
 	if (!err && load_reg >= 0)
-		err = check_mem_access(env, insn_idx, insn->dst_reg, insn->off,
-				       BPF_SIZE(insn->code), BPF_READ, load_reg,
-				       true, false);
+		err = check_mem_access(env, env->insn_idx, insn->dst_reg,
+				       insn->off, BPF_SIZE(insn->code),
+				       BPF_READ, load_reg, true, false);
 	if (err)
 		return err;
 
@@ -7634,13 +7618,34 @@ static int check_atomic(struct bpf_verifier_env *env, int insn_idx, struct bpf_i
 			return err;
 	}
 	/* Check whether we can write into the same memory. */
-	err = check_mem_access(env, insn_idx, insn->dst_reg, insn->off,
+	err = check_mem_access(env, env->insn_idx, insn->dst_reg, insn->off,
 			       BPF_SIZE(insn->code), BPF_WRITE, -1, true, false);
 	if (err)
 		return err;
 	return 0;
 }
 
+static int check_atomic(struct bpf_verifier_env *env, struct bpf_insn *insn)
+{
+	switch (insn->imm) {
+	case BPF_ADD:
+	case BPF_ADD | BPF_FETCH:
+	case BPF_AND:
+	case BPF_AND | BPF_FETCH:
+	case BPF_OR:
+	case BPF_OR | BPF_FETCH:
+	case BPF_XOR:
+	case BPF_XOR | BPF_FETCH:
+	case BPF_XCHG:
+	case BPF_CMPXCHG:
+		return check_atomic_rmw(env, insn);
+	default:
+		verbose(env, "BPF_ATOMIC uses invalid atomic opcode %02x\n",
+			insn->imm);
+		return -EINVAL;
+	}
+}
+
 /* When register 'regno' is used to read the stack (either directly or through
  * a helper function) make sure that it's within stack boundary and, depending
  * on the access type and privileges, that all elements of the stack are
@@ -19076,7 +19081,7 @@ static int do_check(struct bpf_verifier_env *env)
 			enum bpf_reg_type dst_reg_type;
 
 			if (BPF_MODE(insn->code) == BPF_ATOMIC) {
-				err = check_atomic(env, env->insn_idx, insn);
+				err = check_atomic(env, insn);
 				if (err)
 					return err;
 				env->insn_idx++;
-- 
2.48.1.502.g6dc24dfdaf-goog



^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH bpf-next v2 3/9] bpf/verifier: Factor out check_load_mem() and check_store_reg()
  2025-02-07  2:04 [PATCH bpf-next v2 0/9] Introduce load-acquire and store-release BPF instructions Peilin Ye
  2025-02-07  2:05 ` [PATCH bpf-next v2 1/9] bpf/verifier: Factor out atomic_ptr_type_ok() Peilin Ye
  2025-02-07  2:05 ` [PATCH bpf-next v2 2/9] bpf/verifier: Factor out check_atomic_rmw() Peilin Ye
@ 2025-02-07  2:05 ` Peilin Ye
  2025-02-10 23:45   ` Eduard Zingerman
  2025-02-07  2:05 ` [PATCH bpf-next v2 4/9] bpf: Introduce load-acquire and store-release instructions Peilin Ye
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 35+ messages in thread
From: Peilin Ye @ 2025-02-07  2:05 UTC (permalink / raw)
  To: bpf, linux-arm-kernel
  Cc: Peilin Ye, bpf, Xu Kuohai, Eduard Zingerman, David Vernet,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Jonathan Corbet,
	Paul E. McKenney, Puranjay Mohan, Ilya Leoshkevich,
	Heiko Carstens, Vasily Gorbik, Catalin Marinas, Will Deacon,
	Quentin Monnet, Mykola Lysenko, Shuah Khan, Ihor Solodrai,
	Yingchi Long, Josh Don, Barret Rhoden, Neel Natu, Benjamin Segall,
	linux-kernel

Extract BPF_LDX and most non-atomic BPF_STX instruction handling logic
in do_check() into helper functions to be used later.  While we are
here, make that comment about "reserved fields" more specific.

Suggested-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Peilin Ye <yepeilin@google.com>
---
 kernel/bpf/verifier.c | 110 +++++++++++++++++++++++++-----------------
 1 file changed, 67 insertions(+), 43 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 39eb990ec003..82a5a4acf576 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -7536,6 +7536,67 @@ static int check_mem_access(struct bpf_verifier_env *env, int insn_idx, u32 regn
 static int save_aux_ptr_type(struct bpf_verifier_env *env, enum bpf_reg_type type,
 			     bool allow_trust_mismatch);
 
+static int check_load_mem(struct bpf_verifier_env *env, struct bpf_insn *insn,
+			  bool strict_alignment_once, bool is_ldsx,
+			  bool allow_trust_mismatch, const char *ctx)
+{
+	struct bpf_reg_state *regs = cur_regs(env);
+	enum bpf_reg_type src_reg_type;
+	int err;
+
+	/* check src operand */
+	err = check_reg_arg(env, insn->src_reg, SRC_OP);
+	if (err)
+		return err;
+
+	/* check dst operand */
+	err = check_reg_arg(env, insn->dst_reg, DST_OP_NO_MARK);
+	if (err)
+		return err;
+
+	src_reg_type = regs[insn->src_reg].type;
+
+	/* Check if (src_reg + off) is readable. The state of dst_reg will be
+	 * updated by this call.
+	 */
+	err = check_mem_access(env, env->insn_idx, insn->src_reg, insn->off,
+			       BPF_SIZE(insn->code), BPF_READ, insn->dst_reg,
+			       strict_alignment_once, is_ldsx);
+	err = err ?: save_aux_ptr_type(env, src_reg_type,
+				       allow_trust_mismatch);
+	err = err ?: reg_bounds_sanity_check(env, &regs[insn->dst_reg], ctx);
+
+	return err;
+}
+
+static int check_store_reg(struct bpf_verifier_env *env, struct bpf_insn *insn,
+			   bool strict_alignment_once)
+{
+	struct bpf_reg_state *regs = cur_regs(env);
+	enum bpf_reg_type dst_reg_type;
+	int err;
+
+	/* check src1 operand */
+	err = check_reg_arg(env, insn->src_reg, SRC_OP);
+	if (err)
+		return err;
+
+	/* check src2 operand */
+	err = check_reg_arg(env, insn->dst_reg, SRC_OP);
+	if (err)
+		return err;
+
+	dst_reg_type = regs[insn->dst_reg].type;
+
+	/* Check if (dst_reg + off) is writeable. */
+	err = check_mem_access(env, env->insn_idx, insn->dst_reg, insn->off,
+			       BPF_SIZE(insn->code), BPF_WRITE, insn->src_reg,
+			       strict_alignment_once, false);
+	err = err ?: save_aux_ptr_type(env, dst_reg_type, false);
+
+	return err;
+}
+
 static int check_atomic_rmw(struct bpf_verifier_env *env,
 			    struct bpf_insn *insn)
 {
@@ -19051,35 +19112,16 @@ static int do_check(struct bpf_verifier_env *env)
 				return err;
 
 		} else if (class == BPF_LDX) {
-			enum bpf_reg_type src_reg_type;
-
-			/* check for reserved fields is already done */
-
-			/* check src operand */
-			err = check_reg_arg(env, insn->src_reg, SRC_OP);
-			if (err)
-				return err;
+			bool is_ldsx = BPF_MODE(insn->code) == BPF_MEMSX;
 
-			err = check_reg_arg(env, insn->dst_reg, DST_OP_NO_MARK);
-			if (err)
-				return err;
-
-			src_reg_type = regs[insn->src_reg].type;
-
-			/* check that memory (src_reg + off) is readable,
-			 * the state of dst_reg will be updated by this func
+			/* Check for reserved fields is already done in
+			 * resolve_pseudo_ldimm64().
 			 */
-			err = check_mem_access(env, env->insn_idx, insn->src_reg,
-					       insn->off, BPF_SIZE(insn->code),
-					       BPF_READ, insn->dst_reg, false,
-					       BPF_MODE(insn->code) == BPF_MEMSX);
-			err = err ?: save_aux_ptr_type(env, src_reg_type, true);
-			err = err ?: reg_bounds_sanity_check(env, &regs[insn->dst_reg], "ldx");
+			err = check_load_mem(env, insn, false, is_ldsx, true,
+					     "ldx");
 			if (err)
 				return err;
 		} else if (class == BPF_STX) {
-			enum bpf_reg_type dst_reg_type;
-
 			if (BPF_MODE(insn->code) == BPF_ATOMIC) {
 				err = check_atomic(env, insn);
 				if (err)
@@ -19093,25 +19135,7 @@ static int do_check(struct bpf_verifier_env *env)
 				return -EINVAL;
 			}
 
-			/* check src1 operand */
-			err = check_reg_arg(env, insn->src_reg, SRC_OP);
-			if (err)
-				return err;
-			/* check src2 operand */
-			err = check_reg_arg(env, insn->dst_reg, SRC_OP);
-			if (err)
-				return err;
-
-			dst_reg_type = regs[insn->dst_reg].type;
-
-			/* check that memory (dst_reg + off) is writeable */
-			err = check_mem_access(env, env->insn_idx, insn->dst_reg,
-					       insn->off, BPF_SIZE(insn->code),
-					       BPF_WRITE, insn->src_reg, false, false);
-			if (err)
-				return err;
-
-			err = save_aux_ptr_type(env, dst_reg_type, false);
+			err = check_store_reg(env, insn, false);
 			if (err)
 				return err;
 		} else if (class == BPF_ST) {
-- 
2.48.1.502.g6dc24dfdaf-goog



^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH bpf-next v2 4/9] bpf: Introduce load-acquire and store-release instructions
  2025-02-07  2:04 [PATCH bpf-next v2 0/9] Introduce load-acquire and store-release BPF instructions Peilin Ye
                   ` (2 preceding siblings ...)
  2025-02-07  2:05 ` [PATCH bpf-next v2 3/9] bpf/verifier: Factor out check_load_mem() and check_store_reg() Peilin Ye
@ 2025-02-07  2:05 ` Peilin Ye
  2025-02-07 11:28   ` Ilya Leoshkevich
  2025-02-08 21:30   ` Alexei Starovoitov
  2025-02-07  2:06 ` [PATCH bpf-next v2 5/9] arm64: insn: Add BIT(23) to {load,store}_ex's mask Peilin Ye
                   ` (5 subsequent siblings)
  9 siblings, 2 replies; 35+ messages in thread
From: Peilin Ye @ 2025-02-07  2:05 UTC (permalink / raw)
  To: bpf, linux-arm-kernel
  Cc: Peilin Ye, bpf, Xu Kuohai, Eduard Zingerman, David Vernet,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Jonathan Corbet,
	Paul E. McKenney, Puranjay Mohan, Ilya Leoshkevich,
	Heiko Carstens, Vasily Gorbik, Catalin Marinas, Will Deacon,
	Quentin Monnet, Mykola Lysenko, Shuah Khan, Ihor Solodrai,
	Yingchi Long, Josh Don, Barret Rhoden, Neel Natu, Benjamin Segall,
	linux-kernel

Introduce BPF instructions with load-acquire and store-release
semantics, as discussed in [1].  The following new flags are defined:

  BPF_ATOMIC_LOAD         0x10
  BPF_ATOMIC_STORE        0x20
  BPF_ATOMIC_TYPE(imm)    ((imm) & 0xf0)

  BPF_RELAXED        0x0
  BPF_ACQUIRE        0x1
  BPF_RELEASE        0x2
  BPF_ACQ_REL        0x3
  BPF_SEQ_CST        0x4

  BPF_LOAD_ACQ       (BPF_ATOMIC_LOAD | BPF_ACQUIRE)
  BPF_STORE_REL      (BPF_ATOMIC_STORE | BPF_RELEASE)

A "load-acquire" is a BPF_STX | BPF_ATOMIC instruction with the 'imm'
field set to BPF_LOAD_ACQ (0x11).

Similarly, a "store-release" is a BPF_STX | BPF_ATOMIC instruction with
the 'imm' field set to BPF_STORE_REL (0x22).

Unlike existing atomic operations that only support BPF_W (32-bit) and
BPF_DW (64-bit) size modifiers, load-acquires and store-releases also
support BPF_B (8-bit) and BPF_H (16-bit).  An 8- or 16-bit load-acquire
zero-extends the value before writing it to a 32-bit register, just like
ARM64 instruction LDARH and friends.

As an example, consider the following 64-bit load-acquire BPF
instruction:

  db 10 00 00 11 00 00 00  r0 = load_acquire((u64 *)(r1 + 0x0))

  opcode (0xdb): BPF_ATOMIC | BPF_DW | BPF_STX
  imm (0x00000011): BPF_LOAD_ACQ

Similarly, a 16-bit BPF store-release:

  cb 21 00 00 22 00 00 00  store_release((u16 *)(r1 + 0x0), w2)

  opcode (0xcb): BPF_ATOMIC | BPF_H | BPF_STX
  imm (0x00000022): BPF_STORE_REL

In arch/{arm64,s390,x86}/net/bpf_jit_comp.c, have
bpf_jit_supports_insn(..., /*in_arena=*/true) return false for the new
instructions, until the corresponding JIT compiler supports them.

[1] https://lore.kernel.org/all/20240729183246.4110549-1-yepeilin@google.com/

Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Peilin Ye <yepeilin@google.com>
---
 arch/arm64/net/bpf_jit_comp.c  |  4 +++
 arch/s390/net/bpf_jit_comp.c   | 14 +++++---
 arch/x86/net/bpf_jit_comp.c    |  4 +++
 include/linux/bpf.h            | 11 ++++++
 include/linux/filter.h         |  2 ++
 include/uapi/linux/bpf.h       | 13 +++++++
 kernel/bpf/core.c              | 63 ++++++++++++++++++++++++++++++----
 kernel/bpf/disasm.c            | 12 +++++++
 kernel/bpf/verifier.c          | 45 ++++++++++++++++++++++--
 tools/include/uapi/linux/bpf.h | 13 +++++++
 10 files changed, 168 insertions(+), 13 deletions(-)

diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c
index 8446848edddb..8c3b47d9e441 100644
--- a/arch/arm64/net/bpf_jit_comp.c
+++ b/arch/arm64/net/bpf_jit_comp.c
@@ -2667,8 +2667,12 @@ bool bpf_jit_supports_insn(struct bpf_insn *insn, bool in_arena)
 	if (!in_arena)
 		return true;
 	switch (insn->code) {
+	case BPF_STX | BPF_ATOMIC | BPF_B:
+	case BPF_STX | BPF_ATOMIC | BPF_H:
 	case BPF_STX | BPF_ATOMIC | BPF_W:
 	case BPF_STX | BPF_ATOMIC | BPF_DW:
+		if (bpf_atomic_is_load_store(insn))
+			return false;
 		if (!cpus_have_cap(ARM64_HAS_LSE_ATOMICS))
 			return false;
 	}
diff --git a/arch/s390/net/bpf_jit_comp.c b/arch/s390/net/bpf_jit_comp.c
index 9d440a0b729e..0776dfde2dba 100644
--- a/arch/s390/net/bpf_jit_comp.c
+++ b/arch/s390/net/bpf_jit_comp.c
@@ -2919,10 +2919,16 @@ bool bpf_jit_supports_arena(void)
 
 bool bpf_jit_supports_insn(struct bpf_insn *insn, bool in_arena)
 {
-	/*
-	 * Currently the verifier uses this function only to check which
-	 * atomic stores to arena are supported, and they all are.
-	 */
+	if (!in_arena)
+		return true;
+	switch (insn->code) {
+	case BPF_STX | BPF_ATOMIC | BPF_B:
+	case BPF_STX | BPF_ATOMIC | BPF_H:
+	case BPF_STX | BPF_ATOMIC | BPF_W:
+	case BPF_STX | BPF_ATOMIC | BPF_DW:
+		if (bpf_atomic_is_load_store(insn))
+			return false;
+	}
 	return true;
 }
 
diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index a43fc5af973d..f0c31c940fb8 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -3771,8 +3771,12 @@ bool bpf_jit_supports_insn(struct bpf_insn *insn, bool in_arena)
 	if (!in_arena)
 		return true;
 	switch (insn->code) {
+	case BPF_STX | BPF_ATOMIC | BPF_B:
+	case BPF_STX | BPF_ATOMIC | BPF_H:
 	case BPF_STX | BPF_ATOMIC | BPF_W:
 	case BPF_STX | BPF_ATOMIC | BPF_DW:
+		if (bpf_atomic_is_load_store(insn))
+			return false;
 		if (insn->imm == (BPF_AND | BPF_FETCH) ||
 		    insn->imm == (BPF_OR | BPF_FETCH) ||
 		    insn->imm == (BPF_XOR | BPF_FETCH))
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index f3f50e29d639..96c936fd125f 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -990,6 +990,17 @@ static inline bool bpf_pseudo_func(const struct bpf_insn *insn)
 	return bpf_is_ldimm64(insn) && insn->src_reg == BPF_PSEUDO_FUNC;
 }
 
+/* Given a BPF_ATOMIC instruction @atomic_insn, return true if it is an
+ * atomic load or store, and false if it is a read-modify-write instruction.
+ */
+static inline bool
+bpf_atomic_is_load_store(const struct bpf_insn *atomic_insn)
+{
+	const s32 type = BPF_ATOMIC_TYPE(atomic_insn->imm);
+
+	return type == BPF_ATOMIC_LOAD || type == BPF_ATOMIC_STORE;
+}
+
 struct bpf_prog_ops {
 	int (*test_run)(struct bpf_prog *prog, const union bpf_attr *kattr,
 			union bpf_attr __user *uattr);
diff --git a/include/linux/filter.h b/include/linux/filter.h
index a3ea46281595..e36812a5b01f 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -364,6 +364,8 @@ static inline bool insn_is_cast_user(const struct bpf_insn *insn)
  *   BPF_XOR | BPF_FETCH      src_reg = atomic_fetch_xor(dst_reg + off16, src_reg);
  *   BPF_XCHG                 src_reg = atomic_xchg(dst_reg + off16, src_reg)
  *   BPF_CMPXCHG              r0 = atomic_cmpxchg(dst_reg + off16, r0, src_reg)
+ *   BPF_LOAD_ACQ             dst_reg = smp_load_acquire(src_reg + off16)
+ *   BPF_STORE_REL            smp_store_release(dst_reg + off16, src_reg)
  */
 
 #define BPF_ATOMIC_OP(SIZE, OP, DST, SRC, OFF)			\
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index fff6cdb8d11a..e78306e6e2be 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -51,6 +51,19 @@
 #define BPF_XCHG	(0xe0 | BPF_FETCH)	/* atomic exchange */
 #define BPF_CMPXCHG	(0xf0 | BPF_FETCH)	/* atomic compare-and-write */
 
+#define BPF_ATOMIC_LOAD		0x10
+#define BPF_ATOMIC_STORE	0x20
+#define BPF_ATOMIC_TYPE(imm)	((imm) & 0xf0)
+
+#define BPF_RELAXED	0x00
+#define BPF_ACQUIRE	0x01
+#define BPF_RELEASE	0x02
+#define BPF_ACQ_REL	0x03
+#define BPF_SEQ_CST	0x04
+
+#define BPF_LOAD_ACQ	(BPF_ATOMIC_LOAD | BPF_ACQUIRE)		/* load-acquire */
+#define BPF_STORE_REL	(BPF_ATOMIC_STORE | BPF_RELEASE)	/* store-release */
+
 enum bpf_cond_pseudo_jmp {
 	BPF_MAY_GOTO = 0,
 };
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index da729cbbaeb9..3f3127479a93 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -1663,14 +1663,17 @@ EXPORT_SYMBOL_GPL(__bpf_call_base);
 	INSN_3(JMP, JSET, K),			\
 	INSN_2(JMP, JA),			\
 	INSN_2(JMP32, JA),			\
+	/* Atomic operations. */		\
+	INSN_3(STX, ATOMIC, B),			\
+	INSN_3(STX, ATOMIC, H),			\
+	INSN_3(STX, ATOMIC, W),			\
+	INSN_3(STX, ATOMIC, DW),		\
 	/* Store instructions. */		\
 	/*   Register based. */			\
 	INSN_3(STX, MEM,  B),			\
 	INSN_3(STX, MEM,  H),			\
 	INSN_3(STX, MEM,  W),			\
 	INSN_3(STX, MEM,  DW),			\
-	INSN_3(STX, ATOMIC, W),			\
-	INSN_3(STX, ATOMIC, DW),		\
 	/*   Immediate based. */		\
 	INSN_3(ST, MEM, B),			\
 	INSN_3(ST, MEM, H),			\
@@ -2152,24 +2155,33 @@ static u64 ___bpf_prog_run(u64 *regs, const struct bpf_insn *insn)
 			if (BPF_SIZE(insn->code) == BPF_W)		\
 				atomic_##KOP((u32) SRC, (atomic_t *)(unsigned long) \
 					     (DST + insn->off));	\
-			else						\
+			else if (BPF_SIZE(insn->code) == BPF_DW)	\
 				atomic64_##KOP((u64) SRC, (atomic64_t *)(unsigned long) \
 					       (DST + insn->off));	\
+			else						\
+				goto default_label;			\
 			break;						\
 		case BOP | BPF_FETCH:					\
 			if (BPF_SIZE(insn->code) == BPF_W)		\
 				SRC = (u32) atomic_fetch_##KOP(		\
 					(u32) SRC,			\
 					(atomic_t *)(unsigned long) (DST + insn->off)); \
-			else						\
+			else if (BPF_SIZE(insn->code) == BPF_DW)	\
 				SRC = (u64) atomic64_fetch_##KOP(	\
 					(u64) SRC,			\
 					(atomic64_t *)(unsigned long) (DST + insn->off)); \
+			else						\
+				goto default_label;			\
 			break;
 
 	STX_ATOMIC_DW:
 	STX_ATOMIC_W:
+	STX_ATOMIC_H:
+	STX_ATOMIC_B:
 		switch (IMM) {
+		/* Atomic read-modify-write instructions support only W and DW
+		 * size modifiers.
+		 */
 		ATOMIC_ALU_OP(BPF_ADD, add)
 		ATOMIC_ALU_OP(BPF_AND, and)
 		ATOMIC_ALU_OP(BPF_OR, or)
@@ -2181,20 +2193,59 @@ static u64 ___bpf_prog_run(u64 *regs, const struct bpf_insn *insn)
 				SRC = (u32) atomic_xchg(
 					(atomic_t *)(unsigned long) (DST + insn->off),
 					(u32) SRC);
-			else
+			else if (BPF_SIZE(insn->code) == BPF_DW)
 				SRC = (u64) atomic64_xchg(
 					(atomic64_t *)(unsigned long) (DST + insn->off),
 					(u64) SRC);
+			else
+				goto default_label;
 			break;
 		case BPF_CMPXCHG:
 			if (BPF_SIZE(insn->code) == BPF_W)
 				BPF_R0 = (u32) atomic_cmpxchg(
 					(atomic_t *)(unsigned long) (DST + insn->off),
 					(u32) BPF_R0, (u32) SRC);
-			else
+			else if (BPF_SIZE(insn->code) == BPF_DW)
 				BPF_R0 = (u64) atomic64_cmpxchg(
 					(atomic64_t *)(unsigned long) (DST + insn->off),
 					(u64) BPF_R0, (u64) SRC);
+			else
+				goto default_label;
+			break;
+		/* Atomic load and store instructions support all size
+		 * modifiers.
+		 */
+		case BPF_LOAD_ACQ:
+			switch (BPF_SIZE(insn->code)) {
+#define LOAD_ACQUIRE(SIZEOP, SIZE)				\
+			case BPF_##SIZEOP:			\
+				DST = (SIZE)smp_load_acquire(	\
+					(SIZE *)(unsigned long)(SRC + insn->off));	\
+				break;
+			LOAD_ACQUIRE(B,   u8)
+			LOAD_ACQUIRE(H,  u16)
+			LOAD_ACQUIRE(W,  u32)
+			LOAD_ACQUIRE(DW, u64)
+#undef LOAD_ACQUIRE
+			default:
+				goto default_label;
+			}
+			break;
+		case BPF_STORE_REL:
+			switch (BPF_SIZE(insn->code)) {
+#define STORE_RELEASE(SIZEOP, SIZE)			\
+			case BPF_##SIZEOP:		\
+				smp_store_release(	\
+					(SIZE *)(unsigned long)(DST + insn->off), (SIZE)SRC);	\
+				break;
+			STORE_RELEASE(B,   u8)
+			STORE_RELEASE(H,  u16)
+			STORE_RELEASE(W,  u32)
+			STORE_RELEASE(DW, u64)
+#undef STORE_RELEASE
+			default:
+				goto default_label;
+			}
 			break;
 
 		default:
diff --git a/kernel/bpf/disasm.c b/kernel/bpf/disasm.c
index 309c4aa1b026..974d172d6735 100644
--- a/kernel/bpf/disasm.c
+++ b/kernel/bpf/disasm.c
@@ -267,6 +267,18 @@ void print_bpf_insn(const struct bpf_insn_cbs *cbs,
 				BPF_SIZE(insn->code) == BPF_DW ? "64" : "",
 				bpf_ldst_string[BPF_SIZE(insn->code) >> 3],
 				insn->dst_reg, insn->off, insn->src_reg);
+		} else if (BPF_MODE(insn->code) == BPF_ATOMIC &&
+			   insn->imm == BPF_LOAD_ACQ) {
+			verbose(cbs->private_data, "(%02x) r%d = load_acquire((%s *)(r%d %+d))\n",
+				insn->code, insn->dst_reg,
+				bpf_ldst_string[BPF_SIZE(insn->code) >> 3],
+				insn->src_reg, insn->off);
+		} else if (BPF_MODE(insn->code) == BPF_ATOMIC &&
+			   insn->imm == BPF_STORE_REL) {
+			verbose(cbs->private_data, "(%02x) store_release((%s *)(r%d %+d), r%d)\n",
+				insn->code,
+				bpf_ldst_string[BPF_SIZE(insn->code) >> 3],
+				insn->dst_reg, insn->off, insn->src_reg);
 		} else {
 			verbose(cbs->private_data, "BUG_%02x\n", insn->code);
 		}
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 82a5a4acf576..7ebc224bf9cb 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -579,6 +579,13 @@ static bool is_cmpxchg_insn(const struct bpf_insn *insn)
 	       insn->imm == BPF_CMPXCHG;
 }
 
+static bool is_atomic_load_insn(const struct bpf_insn *insn)
+{
+	return BPF_CLASS(insn->code) == BPF_STX &&
+	       BPF_MODE(insn->code) == BPF_ATOMIC &&
+	       BPF_ATOMIC_TYPE(insn->imm) == BPF_ATOMIC_LOAD;
+}
+
 static int __get_spi(s32 off)
 {
 	return (-off - 1) / BPF_REG_SIZE;
@@ -3481,7 +3488,7 @@ static bool is_reg64(struct bpf_verifier_env *env, struct bpf_insn *insn,
 	}
 
 	if (class == BPF_STX) {
-		/* BPF_STX (including atomic variants) has multiple source
+		/* BPF_STX (including atomic variants) has one or more source
 		 * operands, one of which is a ptr. Check whether the caller is
 		 * asking about it.
 		 */
@@ -4095,7 +4102,7 @@ static int backtrack_insn(struct bpf_verifier_env *env, int idx, int subseq_idx,
 			   * dreg still needs precision before this insn
 			   */
 		}
-	} else if (class == BPF_LDX) {
+	} else if (class == BPF_LDX || is_atomic_load_insn(insn)) {
 		if (!bt_is_reg_set(bt, dreg))
 			return 0;
 		bt_clear_reg(bt, dreg);
@@ -7686,6 +7693,32 @@ static int check_atomic_rmw(struct bpf_verifier_env *env,
 	return 0;
 }
 
+static int check_atomic_load(struct bpf_verifier_env *env,
+			     struct bpf_insn *insn)
+{
+	if (!atomic_ptr_type_ok(env, insn->src_reg, insn)) {
+		verbose(env, "BPF_ATOMIC loads from R%d %s is not allowed\n",
+			insn->src_reg,
+			reg_type_str(env, reg_state(env, insn->src_reg)->type));
+		return -EACCES;
+	}
+
+	return check_load_mem(env, insn, true, false, false, "atomic_load");
+}
+
+static int check_atomic_store(struct bpf_verifier_env *env,
+			      struct bpf_insn *insn)
+{
+	if (!atomic_ptr_type_ok(env, insn->dst_reg, insn)) {
+		verbose(env, "BPF_ATOMIC stores into R%d %s is not allowed\n",
+			insn->dst_reg,
+			reg_type_str(env, reg_state(env, insn->dst_reg)->type));
+		return -EACCES;
+	}
+
+	return check_store_reg(env, insn, true);
+}
+
 static int check_atomic(struct bpf_verifier_env *env, struct bpf_insn *insn)
 {
 	switch (insn->imm) {
@@ -7700,6 +7733,10 @@ static int check_atomic(struct bpf_verifier_env *env, struct bpf_insn *insn)
 	case BPF_XCHG:
 	case BPF_CMPXCHG:
 		return check_atomic_rmw(env, insn);
+	case BPF_LOAD_ACQ:
+		return check_atomic_load(env, insn);
+	case BPF_STORE_REL:
+		return check_atomic_store(env, insn);
 	default:
 		verbose(env, "BPF_ATOMIC uses invalid atomic opcode %02x\n",
 			insn->imm);
@@ -20445,7 +20482,9 @@ static int convert_ctx_accesses(struct bpf_verifier_env *env)
 			   insn->code == (BPF_ST | BPF_MEM | BPF_W) ||
 			   insn->code == (BPF_ST | BPF_MEM | BPF_DW)) {
 			type = BPF_WRITE;
-		} else if ((insn->code == (BPF_STX | BPF_ATOMIC | BPF_W) ||
+		} else if ((insn->code == (BPF_STX | BPF_ATOMIC | BPF_B) ||
+			    insn->code == (BPF_STX | BPF_ATOMIC | BPF_H) ||
+			    insn->code == (BPF_STX | BPF_ATOMIC | BPF_W) ||
 			    insn->code == (BPF_STX | BPF_ATOMIC | BPF_DW)) &&
 			   env->insn_aux_data[i + delta].ptr_type == PTR_TO_ARENA) {
 			insn->code = BPF_STX | BPF_PROBE_ATOMIC | BPF_SIZE(insn->code);
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 2acf9b336371..4a20a125eb46 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -51,6 +51,19 @@
 #define BPF_XCHG	(0xe0 | BPF_FETCH)	/* atomic exchange */
 #define BPF_CMPXCHG	(0xf0 | BPF_FETCH)	/* atomic compare-and-write */
 
+#define BPF_ATOMIC_LOAD		0x10
+#define BPF_ATOMIC_STORE	0x20
+#define BPF_ATOMIC_TYPE(imm)	((imm) & 0xf0)
+
+#define BPF_RELAXED	0x00
+#define BPF_ACQUIRE	0x01
+#define BPF_RELEASE	0x02
+#define BPF_ACQ_REL	0x03
+#define BPF_SEQ_CST	0x04
+
+#define BPF_LOAD_ACQ	(BPF_ATOMIC_LOAD | BPF_ACQUIRE)		/* load-acquire */
+#define BPF_STORE_REL	(BPF_ATOMIC_STORE | BPF_RELEASE)	/* store-release */
+
 enum bpf_cond_pseudo_jmp {
 	BPF_MAY_GOTO = 0,
 };
-- 
2.48.1.502.g6dc24dfdaf-goog



^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH bpf-next v2 5/9] arm64: insn: Add BIT(23) to {load,store}_ex's mask
  2025-02-07  2:04 [PATCH bpf-next v2 0/9] Introduce load-acquire and store-release BPF instructions Peilin Ye
                   ` (3 preceding siblings ...)
  2025-02-07  2:05 ` [PATCH bpf-next v2 4/9] bpf: Introduce load-acquire and store-release instructions Peilin Ye
@ 2025-02-07  2:06 ` Peilin Ye
  2025-02-07  2:06 ` [PATCH bpf-next v2 6/9] arm64: insn: Add load-acquire and store-release instructions Peilin Ye
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 35+ messages in thread
From: Peilin Ye @ 2025-02-07  2:06 UTC (permalink / raw)
  To: bpf, linux-arm-kernel
  Cc: Peilin Ye, bpf, Xu Kuohai, Eduard Zingerman, David Vernet,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Jonathan Corbet,
	Paul E. McKenney, Puranjay Mohan, Ilya Leoshkevich,
	Heiko Carstens, Vasily Gorbik, Catalin Marinas, Will Deacon,
	Quentin Monnet, Mykola Lysenko, Shuah Khan, Ihor Solodrai,
	Yingchi Long, Josh Don, Barret Rhoden, Neel Natu, Benjamin Segall,
	linux-kernel

We are planning to add load-acquire (LDAR{,B,H}) and store-release
(STLR{,B,H}) instructions to insn.{c,h}; add BIT(23) to mask of load_ex
and store_ex to prevent aarch64_insn_is_{load,store}_ex() from returning
false-positives for load-acquire and store-release instructions.

Reference: Arm Architecture Reference Manual (ARM DDI 0487K.a,
           ID032224),

  * C6.2.228 LDXR
  * C6.2.165 LDAXR
  * C6.2.161 LDAR
  * C6.2.393 STXR
  * C6.2.360 STLXR
  * C6.2.353 STLR

Signed-off-by: Peilin Ye <yepeilin@google.com>
---
 arch/arm64/include/asm/insn.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/insn.h b/arch/arm64/include/asm/insn.h
index e390c432f546..2d8316b3abaf 100644
--- a/arch/arm64/include/asm/insn.h
+++ b/arch/arm64/include/asm/insn.h
@@ -351,8 +351,8 @@ __AARCH64_INSN_FUNCS(ldr_imm,	0x3FC00000, 0x39400000)
 __AARCH64_INSN_FUNCS(ldr_lit,	0xBF000000, 0x18000000)
 __AARCH64_INSN_FUNCS(ldrsw_lit,	0xFF000000, 0x98000000)
 __AARCH64_INSN_FUNCS(exclusive,	0x3F800000, 0x08000000)
-__AARCH64_INSN_FUNCS(load_ex,	0x3F400000, 0x08400000)
-__AARCH64_INSN_FUNCS(store_ex,	0x3F400000, 0x08000000)
+__AARCH64_INSN_FUNCS(load_ex,	0x3FC00000, 0x08400000)
+__AARCH64_INSN_FUNCS(store_ex,	0x3FC00000, 0x08000000)
 __AARCH64_INSN_FUNCS(mops,	0x3B200C00, 0x19000400)
 __AARCH64_INSN_FUNCS(stp,	0x7FC00000, 0x29000000)
 __AARCH64_INSN_FUNCS(ldp,	0x7FC00000, 0x29400000)
-- 
2.48.1.502.g6dc24dfdaf-goog



^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH bpf-next v2 6/9] arm64: insn: Add load-acquire and store-release instructions
  2025-02-07  2:04 [PATCH bpf-next v2 0/9] Introduce load-acquire and store-release BPF instructions Peilin Ye
                   ` (4 preceding siblings ...)
  2025-02-07  2:06 ` [PATCH bpf-next v2 5/9] arm64: insn: Add BIT(23) to {load,store}_ex's mask Peilin Ye
@ 2025-02-07  2:06 ` Peilin Ye
  2025-02-07  2:06 ` [PATCH bpf-next v2 7/9] bpf, arm64: Support " Peilin Ye
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 35+ messages in thread
From: Peilin Ye @ 2025-02-07  2:06 UTC (permalink / raw)
  To: bpf, linux-arm-kernel
  Cc: Peilin Ye, bpf, Xu Kuohai, Eduard Zingerman, David Vernet,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Jonathan Corbet,
	Paul E. McKenney, Puranjay Mohan, Ilya Leoshkevich,
	Heiko Carstens, Vasily Gorbik, Catalin Marinas, Will Deacon,
	Quentin Monnet, Mykola Lysenko, Shuah Khan, Ihor Solodrai,
	Yingchi Long, Josh Don, Barret Rhoden, Neel Natu, Benjamin Segall,
	linux-kernel

Add load-acquire ("load_acq", LDAR{,B,H}) and store-release
("store_rel", STLR{,B,H}) instructions.  Breakdown of encoding:

                                size        L   (Rs)  o0 (Rt2) Rn    Rt
             mask (0x3fdffc00): 00 111111 1 1 0 11111 1  11111 00000 00000
  value, load_acq (0x08dffc00): 00 001000 1 1 0 11111 1  11111 00000 00000
 value, store_rel (0x089ffc00): 00 001000 1 0 0 11111 1  11111 00000 00000

As suggested by Xu [1], include all Should-Be-One (SBO) bits ("Rs" and
"Rt2" fields) in the "mask" and "value" numbers.

It is worth noting that we are adding the "no offset" variant of STLR
instead of the "pre-index" variant, which has a different encoding.

Reference: Arm Architecture Reference Manual (ARM DDI 0487K.a,
           ID032224),

  * C6.2.161 LDAR
  * C6.2.353 STLR

[1] https://lore.kernel.org/bpf/4e6641ce-3f1e-4251-8daf-4dd4b77d08c4@huaweicloud.com/

Signed-off-by: Peilin Ye <yepeilin@google.com>
---
 arch/arm64/include/asm/insn.h |  8 ++++++++
 arch/arm64/lib/insn.c         | 29 +++++++++++++++++++++++++++++
 2 files changed, 37 insertions(+)

diff --git a/arch/arm64/include/asm/insn.h b/arch/arm64/include/asm/insn.h
index 2d8316b3abaf..39577f1d079a 100644
--- a/arch/arm64/include/asm/insn.h
+++ b/arch/arm64/include/asm/insn.h
@@ -188,8 +188,10 @@ enum aarch64_insn_ldst_type {
 	AARCH64_INSN_LDST_STORE_PAIR_PRE_INDEX,
 	AARCH64_INSN_LDST_LOAD_PAIR_POST_INDEX,
 	AARCH64_INSN_LDST_STORE_PAIR_POST_INDEX,
+	AARCH64_INSN_LDST_LOAD_ACQ,
 	AARCH64_INSN_LDST_LOAD_EX,
 	AARCH64_INSN_LDST_LOAD_ACQ_EX,
+	AARCH64_INSN_LDST_STORE_REL,
 	AARCH64_INSN_LDST_STORE_EX,
 	AARCH64_INSN_LDST_STORE_REL_EX,
 	AARCH64_INSN_LDST_SIGNED_LOAD_IMM_OFFSET,
@@ -351,6 +353,8 @@ __AARCH64_INSN_FUNCS(ldr_imm,	0x3FC00000, 0x39400000)
 __AARCH64_INSN_FUNCS(ldr_lit,	0xBF000000, 0x18000000)
 __AARCH64_INSN_FUNCS(ldrsw_lit,	0xFF000000, 0x98000000)
 __AARCH64_INSN_FUNCS(exclusive,	0x3F800000, 0x08000000)
+__AARCH64_INSN_FUNCS(load_acq,  0x3FDFFC00, 0x08DFFC00)
+__AARCH64_INSN_FUNCS(store_rel, 0x3FDFFC00, 0x089FFC00)
 __AARCH64_INSN_FUNCS(load_ex,	0x3FC00000, 0x08400000)
 __AARCH64_INSN_FUNCS(store_ex,	0x3FC00000, 0x08000000)
 __AARCH64_INSN_FUNCS(mops,	0x3B200C00, 0x19000400)
@@ -602,6 +606,10 @@ u32 aarch64_insn_gen_load_store_pair(enum aarch64_insn_register reg1,
 				     int offset,
 				     enum aarch64_insn_variant variant,
 				     enum aarch64_insn_ldst_type type);
+u32 aarch64_insn_gen_load_acq_store_rel(enum aarch64_insn_register reg,
+					enum aarch64_insn_register base,
+					enum aarch64_insn_size_type size,
+					enum aarch64_insn_ldst_type type);
 u32 aarch64_insn_gen_load_store_ex(enum aarch64_insn_register reg,
 				   enum aarch64_insn_register base,
 				   enum aarch64_insn_register state,
diff --git a/arch/arm64/lib/insn.c b/arch/arm64/lib/insn.c
index b008a9b46a7f..9bef696e2230 100644
--- a/arch/arm64/lib/insn.c
+++ b/arch/arm64/lib/insn.c
@@ -540,6 +540,35 @@ u32 aarch64_insn_gen_load_store_pair(enum aarch64_insn_register reg1,
 					     offset >> shift);
 }
 
+u32 aarch64_insn_gen_load_acq_store_rel(enum aarch64_insn_register reg,
+					enum aarch64_insn_register base,
+					enum aarch64_insn_size_type size,
+					enum aarch64_insn_ldst_type type)
+{
+	u32 insn;
+
+	switch (type) {
+	case AARCH64_INSN_LDST_LOAD_ACQ:
+		insn = aarch64_insn_get_load_acq_value();
+		break;
+	case AARCH64_INSN_LDST_STORE_REL:
+		insn = aarch64_insn_get_store_rel_value();
+		break;
+	default:
+		pr_err("%s: unknown load-acquire/store-release encoding %d\n",
+		       __func__, type);
+		return AARCH64_BREAK_FAULT;
+	}
+
+	insn = aarch64_insn_encode_ldst_size(size, insn);
+
+	insn = aarch64_insn_encode_register(AARCH64_INSN_REGTYPE_RT, insn,
+					    reg);
+
+	return aarch64_insn_encode_register(AARCH64_INSN_REGTYPE_RN, insn,
+					    base);
+}
+
 u32 aarch64_insn_gen_load_store_ex(enum aarch64_insn_register reg,
 				   enum aarch64_insn_register base,
 				   enum aarch64_insn_register state,
-- 
2.48.1.502.g6dc24dfdaf-goog



^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH bpf-next v2 7/9] bpf, arm64: Support load-acquire and store-release instructions
  2025-02-07  2:04 [PATCH bpf-next v2 0/9] Introduce load-acquire and store-release BPF instructions Peilin Ye
                   ` (5 preceding siblings ...)
  2025-02-07  2:06 ` [PATCH bpf-next v2 6/9] arm64: insn: Add load-acquire and store-release instructions Peilin Ye
@ 2025-02-07  2:06 ` Peilin Ye
  2025-02-07  2:06 ` [PATCH bpf-next v2 8/9] selftests/bpf: Add selftests for " Peilin Ye
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 35+ messages in thread
From: Peilin Ye @ 2025-02-07  2:06 UTC (permalink / raw)
  To: bpf, linux-arm-kernel
  Cc: Peilin Ye, bpf, Xu Kuohai, Eduard Zingerman, David Vernet,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Jonathan Corbet,
	Paul E. McKenney, Puranjay Mohan, Ilya Leoshkevich,
	Heiko Carstens, Vasily Gorbik, Catalin Marinas, Will Deacon,
	Quentin Monnet, Mykola Lysenko, Shuah Khan, Ihor Solodrai,
	Yingchi Long, Josh Don, Barret Rhoden, Neel Natu, Benjamin Segall,
	linux-kernel

Support BPF load-acquire (BPF_LOAD_ACQ) and store-release
(BPF_STORE_REL) instructions in the arm64 JIT compiler.  For example:

  db 10 00 00 11 00 00 00  r0 = load_acquire((u64 *)(r1 + 0x0))
  95 00 00 00 00 00 00 00  exit

  opcode (0xdb): BPF_ATOMIC | BPF_DW | BPF_STX
  imm (0x00000011): BPF_LOAD_ACQ

The JIT compiler would emit an LDAR instruction for the above, e.g.:

  ldar  x7, [x0]

Similarly, consider the following 16-bit store-release:

  cb 21 00 00 22 00 00 00  store_release((u16 *)(r1 + 0x0), w2)
  95 00 00 00 00 00 00 00  exit

  opcode (0xcb): BPF_ATOMIC | BPF_H | BPF_STX
  imm (0x00000022): BPF_ATOMIC_STORE | BPF_RELEASE

An STLRH instruction would be emitted, e.g.:

  stlrh  w1, [x0]

For a complete mapping:

  load-acquire     8-bit  LDARB
 (BPF_LOAD_ACQ)   16-bit  LDARH
                  32-bit  LDAR (32-bit)
                  64-bit  LDAR (64-bit)
  store-release    8-bit  STLRB
 (BPF_STORE_REL)  16-bit  STLRH
                  32-bit  STLR (32-bit)
                  64-bit  STLR (64-bit)

Arena accesses are supported.
bpf_jit_supports_insn(..., /*in_arena=*/true) always returns true for
BPF_LOAD_ACQ and BPF_STORE_REL instructions, as they don't depend on
ARM64_HAS_LSE_ATOMICS.

Signed-off-by: Peilin Ye <yepeilin@google.com>
---
 arch/arm64/net/bpf_jit.h      | 20 ++++++++
 arch/arm64/net/bpf_jit_comp.c | 91 ++++++++++++++++++++++++++++++++---
 2 files changed, 105 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/net/bpf_jit.h b/arch/arm64/net/bpf_jit.h
index b22ab2f97a30..a3b0e693a125 100644
--- a/arch/arm64/net/bpf_jit.h
+++ b/arch/arm64/net/bpf_jit.h
@@ -119,6 +119,26 @@
 	aarch64_insn_gen_load_store_ex(Rt, Rn, Rs, A64_SIZE(sf), \
 				       AARCH64_INSN_LDST_STORE_REL_EX)
 
+/* Load-acquire & store-release */
+#define A64_LDAR(Rt, Rn, size)  \
+	aarch64_insn_gen_load_acq_store_rel(Rt, Rn, AARCH64_INSN_SIZE_##size, \
+					    AARCH64_INSN_LDST_LOAD_ACQ)
+#define A64_STLR(Rt, Rn, size)  \
+	aarch64_insn_gen_load_acq_store_rel(Rt, Rn, AARCH64_INSN_SIZE_##size, \
+					    AARCH64_INSN_LDST_STORE_REL)
+
+/* Rt = [Rn] (load acquire) */
+#define A64_LDARB(Wt, Xn)	A64_LDAR(Wt, Xn, 8)
+#define A64_LDARH(Wt, Xn)	A64_LDAR(Wt, Xn, 16)
+#define A64_LDAR32(Wt, Xn)	A64_LDAR(Wt, Xn, 32)
+#define A64_LDAR64(Xt, Xn)	A64_LDAR(Xt, Xn, 64)
+
+/* [Rn] = Rt (store release) */
+#define A64_STLRB(Wt, Xn)	A64_STLR(Wt, Xn, 8)
+#define A64_STLRH(Wt, Xn)	A64_STLR(Wt, Xn, 16)
+#define A64_STLR32(Wt, Xn)	A64_STLR(Wt, Xn, 32)
+#define A64_STLR64(Xt, Xn)	A64_STLR(Xt, Xn, 64)
+
 /*
  * LSE atomics
  *
diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c
index 8c3b47d9e441..c439df0233d1 100644
--- a/arch/arm64/net/bpf_jit_comp.c
+++ b/arch/arm64/net/bpf_jit_comp.c
@@ -647,6 +647,82 @@ static int emit_bpf_tail_call(struct jit_ctx *ctx)
 	return 0;
 }
 
+static int emit_atomic_load_store(const struct bpf_insn *insn,
+				  struct jit_ctx *ctx)
+{
+	const s32 type = BPF_ATOMIC_TYPE(insn->imm);
+	const s16 off = insn->off;
+	const u8 code = insn->code;
+	const bool arena = BPF_MODE(code) == BPF_PROBE_ATOMIC;
+	const u8 arena_vm_base = bpf2a64[ARENA_VM_START];
+	const u8 dst = bpf2a64[insn->dst_reg];
+	const u8 src = bpf2a64[insn->src_reg];
+	const u8 tmp = bpf2a64[TMP_REG_1];
+	u8 reg;
+
+	switch (type) {
+	case BPF_ATOMIC_LOAD:
+		reg = src;
+		break;
+	case BPF_ATOMIC_STORE:
+		reg = dst;
+		break;
+	default:
+		pr_err_once("unknown atomic load/store op type %02x\n", type);
+		return -EINVAL;
+	}
+
+	if (off) {
+		emit_a64_add_i(1, tmp, reg, tmp, off, ctx);
+		reg = tmp;
+	}
+	if (arena) {
+		emit(A64_ADD(1, tmp, reg, arena_vm_base), ctx);
+		reg = tmp;
+	}
+
+	switch (insn->imm) {
+	case BPF_LOAD_ACQ:
+		switch (BPF_SIZE(code)) {
+		case BPF_B:
+			emit(A64_LDARB(dst, reg), ctx);
+			break;
+		case BPF_H:
+			emit(A64_LDARH(dst, reg), ctx);
+			break;
+		case BPF_W:
+			emit(A64_LDAR32(dst, reg), ctx);
+			break;
+		case BPF_DW:
+			emit(A64_LDAR64(dst, reg), ctx);
+			break;
+		}
+		break;
+	case BPF_STORE_REL:
+		switch (BPF_SIZE(code)) {
+		case BPF_B:
+			emit(A64_STLRB(src, reg), ctx);
+			break;
+		case BPF_H:
+			emit(A64_STLRH(src, reg), ctx);
+			break;
+		case BPF_W:
+			emit(A64_STLR32(src, reg), ctx);
+			break;
+		case BPF_DW:
+			emit(A64_STLR64(src, reg), ctx);
+			break;
+		}
+		break;
+	default:
+		pr_err_once("unknown atomic load/store op code %02x\n",
+			    insn->imm);
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
 #ifdef CONFIG_ARM64_LSE_ATOMICS
 static int emit_lse_atomic(const struct bpf_insn *insn, struct jit_ctx *ctx)
 {
@@ -1641,11 +1717,17 @@ static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx,
 			return ret;
 		break;
 
+	case BPF_STX | BPF_ATOMIC | BPF_B:
+	case BPF_STX | BPF_ATOMIC | BPF_H:
 	case BPF_STX | BPF_ATOMIC | BPF_W:
 	case BPF_STX | BPF_ATOMIC | BPF_DW:
+	case BPF_STX | BPF_PROBE_ATOMIC | BPF_B:
+	case BPF_STX | BPF_PROBE_ATOMIC | BPF_H:
 	case BPF_STX | BPF_PROBE_ATOMIC | BPF_W:
 	case BPF_STX | BPF_PROBE_ATOMIC | BPF_DW:
-		if (cpus_have_cap(ARM64_HAS_LSE_ATOMICS))
+		if (bpf_atomic_is_load_store(insn))
+			ret = emit_atomic_load_store(insn, ctx);
+		else if (cpus_have_cap(ARM64_HAS_LSE_ATOMICS))
 			ret = emit_lse_atomic(insn, ctx);
 		else
 			ret = emit_ll_sc_atomic(insn, ctx);
@@ -2667,13 +2749,10 @@ bool bpf_jit_supports_insn(struct bpf_insn *insn, bool in_arena)
 	if (!in_arena)
 		return true;
 	switch (insn->code) {
-	case BPF_STX | BPF_ATOMIC | BPF_B:
-	case BPF_STX | BPF_ATOMIC | BPF_H:
 	case BPF_STX | BPF_ATOMIC | BPF_W:
 	case BPF_STX | BPF_ATOMIC | BPF_DW:
-		if (bpf_atomic_is_load_store(insn))
-			return false;
-		if (!cpus_have_cap(ARM64_HAS_LSE_ATOMICS))
+		if (!bpf_atomic_is_load_store(insn) &&
+		    !cpus_have_cap(ARM64_HAS_LSE_ATOMICS))
 			return false;
 	}
 	return true;
-- 
2.48.1.502.g6dc24dfdaf-goog



^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH bpf-next v2 8/9] selftests/bpf: Add selftests for load-acquire and store-release instructions
  2025-02-07  2:04 [PATCH bpf-next v2 0/9] Introduce load-acquire and store-release BPF instructions Peilin Ye
                   ` (6 preceding siblings ...)
  2025-02-07  2:06 ` [PATCH bpf-next v2 7/9] bpf, arm64: Support " Peilin Ye
@ 2025-02-07  2:06 ` Peilin Ye
  2025-02-07 23:47   ` Peilin Ye
                     ` (2 more replies)
  2025-02-07  2:06 ` [PATCH bpf-next v2 9/9] bpf, docs: Update instruction-set.rst " Peilin Ye
  2025-02-07 21:29 ` [PATCH bpf-next v2 0/9] Introduce load-acquire and store-release BPF instructions Peilin Ye
  9 siblings, 3 replies; 35+ messages in thread
From: Peilin Ye @ 2025-02-07  2:06 UTC (permalink / raw)
  To: bpf, linux-arm-kernel
  Cc: Peilin Ye, bpf, Xu Kuohai, Eduard Zingerman, David Vernet,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Jonathan Corbet,
	Paul E. McKenney, Puranjay Mohan, Ilya Leoshkevich,
	Heiko Carstens, Vasily Gorbik, Catalin Marinas, Will Deacon,
	Quentin Monnet, Mykola Lysenko, Shuah Khan, Ihor Solodrai,
	Yingchi Long, Josh Don, Barret Rhoden, Neel Natu, Benjamin Segall,
	linux-kernel

Add several ./test_progs tests:

  - arena_atomics/load_acquire
  - arena_atomics/store_release
  - verifier_load_acquire/*
  - verifier_store_release/*
  - verifier_precision/bpf_load_acquire
  - verifier_precision/bpf_store_release

The last two tests are added to check if backtrack_insn() handles the
new instructions correctly.

Additionally, the last test also makes sure that the verifier
"remembers" the value (in src_reg) we store-release into e.g. a stack
slot.  For example, if we take a look at the test program:

    #0:  r1 = 8;
      /* store_release((u64 *)(r10 - 8), r1); */
    #1:  .8byte %[store_release];
    #2:  r1 = *(u64 *)(r10 - 8);
    #3:  r2 = r10;
    #4:  r2 += r1;
    #5:  r0 = 0;
    #6:  exit;

At #1, if the verifier doesn't remember that we wrote 8 to the stack,
then later at #4 we would be adding an unbounded scalar value to the
stack pointer, which would cause the program to be rejected:

  VERIFIER LOG:
  =============
...
  math between fp pointer and register with unbounded min value is not allowed

All new tests depend on #ifdef ENABLE_ATOMICS_TESTS.  Currently they
only run for arm64.

Signed-off-by: Peilin Ye <yepeilin@google.com>
---
 .../selftests/bpf/prog_tests/arena_atomics.c  |  50 ++++
 .../selftests/bpf/prog_tests/verifier.c       |   4 +
 .../selftests/bpf/progs/arena_atomics.c       |  88 ++++++
 .../bpf/progs/verifier_load_acquire.c         | 190 +++++++++++++
 .../selftests/bpf/progs/verifier_precision.c  |  47 ++++
 .../bpf/progs/verifier_store_release.c        | 262 ++++++++++++++++++
 6 files changed, 641 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/progs/verifier_load_acquire.c
 create mode 100644 tools/testing/selftests/bpf/progs/verifier_store_release.c

diff --git a/tools/testing/selftests/bpf/prog_tests/arena_atomics.c b/tools/testing/selftests/bpf/prog_tests/arena_atomics.c
index 26e7c06c6cb4..d714af9af50c 100644
--- a/tools/testing/selftests/bpf/prog_tests/arena_atomics.c
+++ b/tools/testing/selftests/bpf/prog_tests/arena_atomics.c
@@ -162,6 +162,52 @@ static void test_uaf(struct arena_atomics *skel)
 	ASSERT_EQ(skel->arena->uaf_recovery_fails, 0, "uaf_recovery_fails");
 }
 
+static void test_load_acquire(struct arena_atomics *skel)
+{
+	LIBBPF_OPTS(bpf_test_run_opts, topts);
+	int err, prog_fd;
+
+	/* No need to attach it, just run it directly */
+	prog_fd = bpf_program__fd(skel->progs.load_acquire);
+	err = bpf_prog_test_run_opts(prog_fd, &topts);
+	if (!ASSERT_OK(err, "test_run_opts err"))
+		return;
+	if (!ASSERT_OK(topts.retval, "test_run_opts retval"))
+		return;
+
+	ASSERT_EQ(skel->arena->load_acquire8_result, 0x12,
+		  "load_acquire8_result");
+	ASSERT_EQ(skel->arena->load_acquire16_result, 0x1234,
+		  "load_acquire16_result");
+	ASSERT_EQ(skel->arena->load_acquire32_result, 0x12345678,
+		  "load_acquire32_result");
+	ASSERT_EQ(skel->arena->load_acquire64_result, 0x1234567890abcdef,
+		  "load_acquire64_result");
+}
+
+static void test_store_release(struct arena_atomics *skel)
+{
+	LIBBPF_OPTS(bpf_test_run_opts, topts);
+	int err, prog_fd;
+
+	/* No need to attach it, just run it directly */
+	prog_fd = bpf_program__fd(skel->progs.store_release);
+	err = bpf_prog_test_run_opts(prog_fd, &topts);
+	if (!ASSERT_OK(err, "test_run_opts err"))
+		return;
+	if (!ASSERT_OK(topts.retval, "test_run_opts retval"))
+		return;
+
+	ASSERT_EQ(skel->arena->store_release8_result, 0x12,
+		  "store_release8_result");
+	ASSERT_EQ(skel->arena->store_release16_result, 0x1234,
+		  "store_release16_result");
+	ASSERT_EQ(skel->arena->store_release32_result, 0x12345678,
+		  "store_release32_result");
+	ASSERT_EQ(skel->arena->store_release64_result, 0x1234567890abcdef,
+		  "store_release64_result");
+}
+
 void test_arena_atomics(void)
 {
 	struct arena_atomics *skel;
@@ -198,6 +244,10 @@ void test_arena_atomics(void)
 		test_xchg(skel);
 	if (test__start_subtest("uaf"))
 		test_uaf(skel);
+	if (test__start_subtest("load_acquire"))
+		test_load_acquire(skel);
+	if (test__start_subtest("store_release"))
+		test_store_release(skel);
 
 cleanup:
 	arena_atomics__destroy(skel);
diff --git a/tools/testing/selftests/bpf/prog_tests/verifier.c b/tools/testing/selftests/bpf/prog_tests/verifier.c
index 8a0e1ff8a2dc..8bdad4167cf5 100644
--- a/tools/testing/selftests/bpf/prog_tests/verifier.c
+++ b/tools/testing/selftests/bpf/prog_tests/verifier.c
@@ -45,6 +45,7 @@
 #include "verifier_ldsx.skel.h"
 #include "verifier_leak_ptr.skel.h"
 #include "verifier_linked_scalars.skel.h"
+#include "verifier_load_acquire.skel.h"
 #include "verifier_loops1.skel.h"
 #include "verifier_lwt.skel.h"
 #include "verifier_map_in_map.skel.h"
@@ -80,6 +81,7 @@
 #include "verifier_spill_fill.skel.h"
 #include "verifier_spin_lock.skel.h"
 #include "verifier_stack_ptr.skel.h"
+#include "verifier_store_release.skel.h"
 #include "verifier_subprog_precision.skel.h"
 #include "verifier_subreg.skel.h"
 #include "verifier_tailcall_jit.skel.h"
@@ -173,6 +175,7 @@ void test_verifier_int_ptr(void)              { RUN(verifier_int_ptr); }
 void test_verifier_iterating_callbacks(void)  { RUN(verifier_iterating_callbacks); }
 void test_verifier_jeq_infer_not_null(void)   { RUN(verifier_jeq_infer_not_null); }
 void test_verifier_jit_convergence(void)      { RUN(verifier_jit_convergence); }
+void test_verifier_load_acquire(void)	      { RUN(verifier_load_acquire); }
 void test_verifier_ld_ind(void)               { RUN(verifier_ld_ind); }
 void test_verifier_ldsx(void)                  { RUN(verifier_ldsx); }
 void test_verifier_leak_ptr(void)             { RUN(verifier_leak_ptr); }
@@ -211,6 +214,7 @@ void test_verifier_sockmap_mutate(void)       { RUN(verifier_sockmap_mutate); }
 void test_verifier_spill_fill(void)           { RUN(verifier_spill_fill); }
 void test_verifier_spin_lock(void)            { RUN(verifier_spin_lock); }
 void test_verifier_stack_ptr(void)            { RUN(verifier_stack_ptr); }
+void test_verifier_store_release(void)	      { RUN(verifier_store_release); }
 void test_verifier_subprog_precision(void)    { RUN(verifier_subprog_precision); }
 void test_verifier_subreg(void)               { RUN(verifier_subreg); }
 void test_verifier_tailcall_jit(void)         { RUN(verifier_tailcall_jit); }
diff --git a/tools/testing/selftests/bpf/progs/arena_atomics.c b/tools/testing/selftests/bpf/progs/arena_atomics.c
index 40dd57fca5cc..62189b428fe3 100644
--- a/tools/testing/selftests/bpf/progs/arena_atomics.c
+++ b/tools/testing/selftests/bpf/progs/arena_atomics.c
@@ -6,6 +6,8 @@
 #include <stdbool.h>
 #include <stdatomic.h>
 #include "bpf_arena_common.h"
+#include "../../../include/linux/filter.h"
+#include "bpf_misc.h"
 
 struct {
 	__uint(type, BPF_MAP_TYPE_ARENA);
@@ -274,4 +276,90 @@ int uaf(const void *ctx)
 	return 0;
 }
 
+__u8 __arena_global load_acquire8_value = 0x12;
+__u16 __arena_global load_acquire16_value = 0x1234;
+__u32 __arena_global load_acquire32_value = 0x12345678;
+__u64 __arena_global load_acquire64_value = 0x1234567890abcdef;
+
+__u8 __arena_global load_acquire8_result = 0x12;
+__u16 __arena_global load_acquire16_result = 0x1234;
+__u32 __arena_global load_acquire32_result = 0x12345678;
+__u64 __arena_global load_acquire64_result = 0x1234567890abcdef;
+
+SEC("raw_tp/sys_enter")
+int load_acquire(const void *ctx)
+{
+#if defined(ENABLE_ATOMICS_TESTS) && defined(__TARGET_ARCH_arm64)
+	load_acquire8_result = 0;
+	load_acquire16_result = 0;
+	load_acquire32_result = 0;
+	load_acquire64_result = 0;
+
+#define LOAD_ACQUIRE_ARENA(SIZEOP, SIZE, SRC, DST)	\
+	{ asm volatile (				\
+	"r1 = %[" #SRC "] ll;"				\
+	"r1 = addr_space_cast(r1, 0x0, 0x1);"		\
+	".8byte %[load_acquire_insn];"			\
+	"r3 = %[" #DST "] ll;"				\
+	"r3 = addr_space_cast(r3, 0x0, 0x1);"		\
+	"*(" #SIZE " *)(r3 + 0) = r2;"			\
+	:						\
+	: __imm_addr(SRC),				\
+	  __imm_insn(load_acquire_insn,			\
+		     BPF_ATOMIC_OP(BPF_##SIZEOP, BPF_LOAD_ACQ,	\
+				   BPF_REG_2, BPF_REG_1, 0)),	\
+	  __imm_addr(DST)				\
+	: __clobber_all); }				\
+
+	LOAD_ACQUIRE_ARENA(B, u8, load_acquire8_value, load_acquire8_result)
+	LOAD_ACQUIRE_ARENA(H, u16, load_acquire16_value,
+			   load_acquire16_result)
+	LOAD_ACQUIRE_ARENA(W, u32, load_acquire32_value,
+			   load_acquire32_result)
+	LOAD_ACQUIRE_ARENA(DW, u64, load_acquire64_value,
+			   load_acquire64_result)
+#undef LOAD_ACQUIRE_ARENA
+#endif
+
+	return 0;
+}
+
+__u8 __arena_global store_release8_result = 0x12;
+__u16 __arena_global store_release16_result = 0x1234;
+__u32 __arena_global store_release32_result = 0x12345678;
+__u64 __arena_global store_release64_result = 0x1234567890abcdef;
+
+SEC("raw_tp/sys_enter")
+int store_release(const void *ctx)
+{
+#if defined(ENABLE_ATOMICS_TESTS) && defined(__TARGET_ARCH_arm64)
+	store_release8_result = 0;
+	store_release16_result = 0;
+	store_release32_result = 0;
+	store_release64_result = 0;
+
+#define STORE_RELEASE_ARENA(SIZEOP, DST, VAL)	\
+	{ asm volatile (			\
+	"r1 = " VAL ";"				\
+	"r2 = %[" #DST "] ll;"			\
+	"r2 = addr_space_cast(r2, 0x0, 0x1);"	\
+	".8byte %[store_release_insn];"		\
+	:					\
+	: __imm_addr(DST),			\
+	  __imm_insn(store_release_insn,	\
+		     BPF_ATOMIC_OP(BPF_##SIZEOP, BPF_STORE_REL,	\
+				   BPF_REG_2, BPF_REG_1, 0))	\
+	: __clobber_all); }			\
+
+	STORE_RELEASE_ARENA(B, store_release8_result, "0x12")
+	STORE_RELEASE_ARENA(H, store_release16_result, "0x1234")
+	STORE_RELEASE_ARENA(W, store_release32_result, "0x12345678")
+	STORE_RELEASE_ARENA(DW, store_release64_result,
+			    "0x1234567890abcdef ll")
+#undef STORE_RELEASE_ARENA
+#endif
+
+	return 0;
+}
+
 char _license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/progs/verifier_load_acquire.c b/tools/testing/selftests/bpf/progs/verifier_load_acquire.c
new file mode 100644
index 000000000000..6327638f031c
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/verifier_load_acquire.c
@@ -0,0 +1,190 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <linux/bpf.h>
+#include <bpf/bpf_helpers.h>
+#include "../../../include/linux/filter.h"
+#include "bpf_misc.h"
+
+#if defined(ENABLE_ATOMICS_TESTS) && defined(__TARGET_ARCH_arm64)
+
+SEC("socket")
+__description("load-acquire, 8-bit")
+__success __success_unpriv __retval(0x12)
+__naked void load_acquire_8(void)
+{
+	asm volatile (
+	"*(u8 *)(r10 - 1) = 0x12;"
+	".8byte %[load_acquire_insn];" // w0 = load_acquire((u8 *)(r10 - 1));
+	"exit;"
+	:
+	: __imm_insn(load_acquire_insn,
+		     BPF_ATOMIC_OP(BPF_B, BPF_LOAD_ACQ, BPF_REG_0, BPF_REG_10, -1))
+	: __clobber_all);
+}
+
+SEC("socket")
+__description("load-acquire, 16-bit")
+__success __success_unpriv __retval(0x1234)
+__naked void load_acquire_16(void)
+{
+	asm volatile (
+	"*(u16 *)(r10 - 2) = 0x1234;"
+	".8byte %[load_acquire_insn];" // w0 = load_acquire((u16 *)(r10 - 2));
+	"exit;"
+	:
+	: __imm_insn(load_acquire_insn,
+		     BPF_ATOMIC_OP(BPF_H, BPF_LOAD_ACQ, BPF_REG_0, BPF_REG_10, -2))
+	: __clobber_all);
+}
+
+SEC("socket")
+__description("load-acquire, 32-bit")
+__success __success_unpriv __retval(0x12345678)
+__naked void load_acquire_32(void)
+{
+	asm volatile (
+	"*(u32 *)(r10 - 4) = 0x12345678;"
+	".8byte %[load_acquire_insn];" // w0 = load_acquire((u32 *)(r10 - 4));
+	"exit;"
+	:
+	: __imm_insn(load_acquire_insn,
+		     BPF_ATOMIC_OP(BPF_W, BPF_LOAD_ACQ, BPF_REG_0, BPF_REG_10, -4))
+	: __clobber_all);
+}
+
+SEC("socket")
+__description("load-acquire, 64-bit")
+__success __success_unpriv __retval(0x1234567890abcdef)
+__naked void load_acquire_64(void)
+{
+	asm volatile (
+	"*(u64 *)(r10 - 8) = 0x1234567890abcdef;"
+	".8byte %[load_acquire_insn];" // r0 = load_acquire((u64 *)(r10 - 8));
+	"exit;"
+	:
+	: __imm_insn(load_acquire_insn,
+		     BPF_ATOMIC_OP(BPF_DW, BPF_LOAD_ACQ, BPF_REG_0, BPF_REG_10, -8))
+	: __clobber_all);
+}
+
+SEC("socket")
+__description("load-acquire with uninitialized src_reg")
+__failure __failure_unpriv __msg("R2 !read_ok")
+__naked void load_acquire_with_uninitialized_src_reg(void)
+{
+	asm volatile (
+	".8byte %[load_acquire_insn];" // r0 = load_acquire((u64 *)(r2 + 0));
+	"exit;"
+	:
+	: __imm_insn(load_acquire_insn,
+		     BPF_ATOMIC_OP(BPF_DW, BPF_LOAD_ACQ, BPF_REG_0, BPF_REG_2, 0))
+	: __clobber_all);
+}
+
+SEC("socket")
+__description("load-acquire with non-pointer src_reg")
+__failure __failure_unpriv __msg("R1 invalid mem access 'scalar'")
+__naked void load_acquire_with_non_pointer_src_reg(void)
+{
+	asm volatile (
+	"r1 = 0;"
+	".8byte %[load_acquire_insn];" // r0 = load_acquire((u64 *)(r1 + 0));
+	"exit;"
+	:
+	: __imm_insn(load_acquire_insn,
+		     BPF_ATOMIC_OP(BPF_DW, BPF_LOAD_ACQ, BPF_REG_0, BPF_REG_1, 0))
+	: __clobber_all);
+}
+
+SEC("socket")
+__description("misaligned load-acquire")
+__failure __failure_unpriv __msg("misaligned stack access off")
+__flag(BPF_F_ANY_ALIGNMENT)
+__naked void load_acquire_misaligned(void)
+{
+	asm volatile (
+	"*(u64 *)(r10 - 8) = 0;"
+	".8byte %[load_acquire_insn];" // w0 = load_acquire((u32 *)(r10 - 5));
+	"exit;"
+	:
+	: __imm_insn(load_acquire_insn,
+		     BPF_ATOMIC_OP(BPF_W, BPF_LOAD_ACQ, BPF_REG_0, BPF_REG_10, -5))
+	: __clobber_all);
+}
+
+SEC("socket")
+__description("load-acquire from ctx pointer")
+__failure __failure_unpriv __msg("BPF_ATOMIC loads from R1 ctx is not allowed")
+__naked void load_acquire_from_ctx_pointer(void)
+{
+	asm volatile (
+	".8byte %[load_acquire_insn];" // w0 = load_acquire((u8 *)(r1 + 0));
+	"exit;"
+	:
+	: __imm_insn(load_acquire_insn,
+		     BPF_ATOMIC_OP(BPF_B, BPF_LOAD_ACQ, BPF_REG_0, BPF_REG_1, 0))
+	: __clobber_all);
+}
+
+SEC("xdp")
+__description("load-acquire from pkt pointer")
+__failure __msg("BPF_ATOMIC loads from R2 pkt is not allowed")
+__naked void load_acquire_from_pkt_pointer(void)
+{
+	asm volatile (
+	"r2 = *(u32 *)(r1 + %[xdp_md_data]);"
+	".8byte %[load_acquire_insn];" // w0 = load_acquire((u8 *)(r2 + 0));
+	"exit;"
+	:
+	: __imm_const(xdp_md_data, offsetof(struct xdp_md, data)),
+	  __imm_insn(load_acquire_insn,
+		     BPF_ATOMIC_OP(BPF_B, BPF_LOAD_ACQ, BPF_REG_0, BPF_REG_2, 0))
+	: __clobber_all);
+}
+
+SEC("flow_dissector")
+__description("load-acquire from flow_keys pointer")
+__failure __msg("BPF_ATOMIC loads from R2 flow_keys is not allowed")
+__naked void load_acquire_from_flow_keys_pointer(void)
+{
+	asm volatile (
+	"r2 = *(u64 *)(r1 + %[__sk_buff_flow_keys]);"
+	".8byte %[load_acquire_insn];" // w0 = load_acquire((u8 *)(r2 + 0));
+	"exit;"
+	:
+	: __imm_const(__sk_buff_flow_keys,
+		      offsetof(struct __sk_buff, flow_keys)),
+	  __imm_insn(load_acquire_insn,
+		     BPF_ATOMIC_OP(BPF_B, BPF_LOAD_ACQ, BPF_REG_0, BPF_REG_2, 0))
+	: __clobber_all);
+}
+
+SEC("sk_reuseport")
+__description("load-acquire from sock pointer")
+__failure __msg("BPF_ATOMIC loads from R2 sock is not allowed")
+__naked void load_acquire_from_sock_pointer(void)
+{
+	asm volatile (
+	"r2 = *(u64 *)(r1 + %[sk_reuseport_md_sk]);"
+	".8byte %[load_acquire_insn];" // w0 = load_acquire((u8 *)(r2 + 0));
+	"exit;"
+	:
+	: __imm_const(sk_reuseport_md_sk, offsetof(struct sk_reuseport_md, sk)),
+	  __imm_insn(load_acquire_insn,
+		     BPF_ATOMIC_OP(BPF_B, BPF_LOAD_ACQ, BPF_REG_0, BPF_REG_2, 0))
+	: __clobber_all);
+}
+
+#else
+
+SEC("socket")
+__description("load-acquire is not supported by compiler or jit, use a dummy test")
+__success
+int dummy_test(void)
+{
+	return 0;
+}
+
+#endif
+
+char _license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/progs/verifier_precision.c b/tools/testing/selftests/bpf/progs/verifier_precision.c
index 6b564d4c0986..3bc4f50015e6 100644
--- a/tools/testing/selftests/bpf/progs/verifier_precision.c
+++ b/tools/testing/selftests/bpf/progs/verifier_precision.c
@@ -2,6 +2,7 @@
 /* Copyright (C) 2023 SUSE LLC */
 #include <linux/bpf.h>
 #include <bpf/bpf_helpers.h>
+#include "../../../include/linux/filter.h"
 #include "bpf_misc.h"
 
 SEC("?raw_tp")
@@ -92,6 +93,52 @@ __naked int bpf_end_bswap(void)
 
 #endif /* v4 instruction */
 
+#if defined(ENABLE_ATOMICS_TESTS) && defined(__TARGET_ARCH_arm64)
+
+SEC("?raw_tp")
+__success __log_level(2)
+__msg("mark_precise: frame0: regs=r1 stack= before 2: (bf) r2 = r10")
+__msg("mark_precise: frame0: regs=r1 stack= before 1: (db) r1 = load_acquire((u64 *)(r10 -8))")
+__msg("mark_precise: frame0: regs= stack=-8 before 0: (7a) *(u64 *)(r10 -8) = 8")
+__naked int bpf_load_acquire(void)
+{
+	asm volatile (
+	"*(u64 *)(r10 - 8) = 8;"
+	".8byte %[load_acquire_insn];" /* r1 = load_acquire((u64 *)(r10 - 8)); */
+	"r2 = r10;"
+	"r2 += r1;" /* mark_precise */
+	"r0 = 0;"
+	"exit;"
+	:
+	: __imm_insn(load_acquire_insn,
+		     BPF_ATOMIC_OP(BPF_DW, BPF_LOAD_ACQ, BPF_REG_1, BPF_REG_10, -8))
+	: __clobber_all);
+}
+
+SEC("?raw_tp")
+__success __log_level(2)
+__msg("mark_precise: frame0: regs=r1 stack= before 3: (bf) r2 = r10")
+__msg("mark_precise: frame0: regs=r1 stack= before 2: (79) r1 = *(u64 *)(r10 -8)")
+__msg("mark_precise: frame0: regs= stack=-8 before 1: (db) store_release((u64 *)(r10 -8), r1)")
+__msg("mark_precise: frame0: regs=r1 stack= before 0: (b7) r1 = 8")
+__naked int bpf_store_release(void)
+{
+	asm volatile (
+	"r1 = 8;"
+	".8byte %[store_release_insn];" /* store_release((u64 *)(r10 - 8), r1); */
+	"r1 = *(u64 *)(r10 - 8);"
+	"r2 = r10;"
+	"r2 += r1;" /* mark_precise */
+	"r0 = 0;"
+	"exit;"
+	:
+	: __imm_insn(store_release_insn,
+		     BPF_ATOMIC_OP(BPF_DW, BPF_STORE_REL, BPF_REG_10, BPF_REG_1, -8))
+	: __clobber_all);
+}
+
+#endif /* load-acquire, store-release */
+
 SEC("?raw_tp")
 __success __log_level(2)
 /*
diff --git a/tools/testing/selftests/bpf/progs/verifier_store_release.c b/tools/testing/selftests/bpf/progs/verifier_store_release.c
new file mode 100644
index 000000000000..f6e7cc6538c4
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/verifier_store_release.c
@@ -0,0 +1,262 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <linux/bpf.h>
+#include <bpf/bpf_helpers.h>
+#include "../../../include/linux/filter.h"
+#include "bpf_misc.h"
+
+#if defined(ENABLE_ATOMICS_TESTS) && defined(__TARGET_ARCH_arm64)
+
+SEC("socket")
+__description("store-release, 8-bit")
+__success __success_unpriv __retval(0x12)
+__naked void store_release_8(void)
+{
+	asm volatile (
+	"w1 = 0x12;"
+	".8byte %[store_release_insn];" // store_release((u8 *)(r10 - 1), w1);
+	"w0 = *(u8 *)(r10 - 1);"
+	"exit;"
+	:
+	: __imm_insn(store_release_insn,
+		     BPF_ATOMIC_OP(BPF_B, BPF_STORE_REL, BPF_REG_10, BPF_REG_1, -1))
+	: __clobber_all);
+}
+
+SEC("socket")
+__description("store-release, 16-bit")
+__success __success_unpriv __retval(0x1234)
+__naked void store_release_16(void)
+{
+	asm volatile (
+	"w1 = 0x1234;"
+	".8byte %[store_release_insn];" // store_release((u16 *)(r10 - 2), w1);
+	"w0 = *(u16 *)(r10 - 2);"
+	"exit;"
+	:
+	: __imm_insn(store_release_insn,
+		     BPF_ATOMIC_OP(BPF_H, BPF_STORE_REL, BPF_REG_10, BPF_REG_1, -2))
+	: __clobber_all);
+}
+
+SEC("socket")
+__description("store-release, 32-bit")
+__success __success_unpriv __retval(0x12345678)
+__naked void store_release_32(void)
+{
+	asm volatile (
+	"w1 = 0x12345678;"
+	".8byte %[store_release_insn];" // store_release((u32 *)(r10 - 4), w1);
+	"w0 = *(u32 *)(r10 - 4);"
+	"exit;"
+	:
+	: __imm_insn(store_release_insn,
+		     BPF_ATOMIC_OP(BPF_W, BPF_STORE_REL, BPF_REG_10, BPF_REG_1, -4))
+	: __clobber_all);
+}
+
+SEC("socket")
+__description("store-release, 64-bit")
+__success __success_unpriv __retval(0x1234567890abcdef)
+__naked void store_release_64(void)
+{
+	asm volatile (
+	"r1 = 0x1234567890abcdef ll;"
+	".8byte %[store_release_insn];" // store_release((u64 *)(r10 - 8), r1);
+	"r0 = *(u64 *)(r10 - 8);"
+	"exit;"
+	:
+	: __imm_insn(store_release_insn,
+		     BPF_ATOMIC_OP(BPF_DW, BPF_STORE_REL, BPF_REG_10, BPF_REG_1, -8))
+	: __clobber_all);
+}
+
+SEC("socket")
+__description("store-release with uninitialized src_reg")
+__failure __failure_unpriv __msg("R2 !read_ok")
+__naked void store_release_with_uninitialized_src_reg(void)
+{
+	asm volatile (
+	".8byte %[store_release_insn];" // store_release((u64 *)(r10 - 8), r2);
+	"exit;"
+	:
+	: __imm_insn(store_release_insn,
+		     BPF_ATOMIC_OP(BPF_DW, BPF_STORE_REL, BPF_REG_10, BPF_REG_2, -8))
+	: __clobber_all);
+}
+
+SEC("socket")
+__description("store-release with uninitialized dst_reg")
+__failure __failure_unpriv __msg("R2 !read_ok")
+__naked void store_release_with_uninitialized_dst_reg(void)
+{
+	asm volatile (
+	"r1 = 0;"
+	".8byte %[store_release_insn];" // store_release((u64 *)(r2 - 8), r1);
+	"exit;"
+	:
+	: __imm_insn(store_release_insn,
+		     BPF_ATOMIC_OP(BPF_DW, BPF_STORE_REL, BPF_REG_2, BPF_REG_1, -8))
+	: __clobber_all);
+}
+
+SEC("socket")
+__description("store-release with non-pointer dst_reg")
+__failure __failure_unpriv __msg("R1 invalid mem access 'scalar'")
+__naked void store_release_with_non_pointer_dst_reg(void)
+{
+	asm volatile (
+	"r1 = 0;"
+	".8byte %[store_release_insn];" // store_release((u64 *)(r1 + 0), r1);
+	"exit;"
+	:
+	: __imm_insn(store_release_insn,
+		     BPF_ATOMIC_OP(BPF_DW, BPF_STORE_REL, BPF_REG_1, BPF_REG_1, 0))
+	: __clobber_all);
+}
+
+SEC("socket")
+__description("misaligned store-release")
+__failure __failure_unpriv __msg("misaligned stack access off")
+__flag(BPF_F_ANY_ALIGNMENT)
+__naked void store_release_misaligned(void)
+{
+	asm volatile (
+	"w0 = 0;"
+	".8byte %[store_release_insn];" // store_release((u32 *)(r10 - 5), w0);
+	"exit;"
+	:
+	: __imm_insn(store_release_insn,
+		     BPF_ATOMIC_OP(BPF_W, BPF_STORE_REL, BPF_REG_10, BPF_REG_0, -5))
+	: __clobber_all);
+}
+
+SEC("socket")
+__description("store-release to ctx pointer")
+__failure __failure_unpriv __msg("BPF_ATOMIC stores into R1 ctx is not allowed")
+__naked void store_release_to_ctx_pointer(void)
+{
+	asm volatile (
+	"w0 = 0;"
+	".8byte %[store_release_insn];" // store_release((u8 *)(r1 + 0), w0);
+	"exit;"
+	:
+	: __imm_insn(store_release_insn,
+		     BPF_ATOMIC_OP(BPF_B, BPF_STORE_REL, BPF_REG_1, BPF_REG_0, 0))
+	: __clobber_all);
+}
+
+SEC("xdp")
+__description("store-release to pkt pointer")
+__failure __msg("BPF_ATOMIC stores into R2 pkt is not allowed")
+__naked void store_release_to_pkt_pointer(void)
+{
+	asm volatile (
+	"w0 = 0;"
+	"r2 = *(u32 *)(r1 + %[xdp_md_data]);"
+	".8byte %[store_release_insn];" // store_release((u8 *)(r2 + 0), w0);
+	"exit;"
+	:
+	: __imm_const(xdp_md_data, offsetof(struct xdp_md, data)),
+	  __imm_insn(store_release_insn,
+		     BPF_ATOMIC_OP(BPF_B, BPF_STORE_REL, BPF_REG_2, BPF_REG_0, 0))
+	: __clobber_all);
+}
+
+SEC("flow_dissector")
+__description("store-release to flow_keys pointer")
+__failure __msg("BPF_ATOMIC stores into R2 flow_keys is not allowed")
+__naked void store_release_to_flow_keys_pointer(void)
+{
+	asm volatile (
+	"w0 = 0;"
+	"r2 = *(u64 *)(r1 + %[__sk_buff_flow_keys]);"
+	".8byte %[store_release_insn];" // store_release((u8 *)(r2 + 0), w0);
+	"exit;"
+	:
+	: __imm_const(__sk_buff_flow_keys,
+		      offsetof(struct __sk_buff, flow_keys)),
+	  __imm_insn(store_release_insn,
+		     BPF_ATOMIC_OP(BPF_B, BPF_STORE_REL, BPF_REG_2, BPF_REG_0, 0))
+	: __clobber_all);
+}
+
+SEC("sk_reuseport")
+__description("store-release to sock pointer")
+__failure __msg("BPF_ATOMIC stores into R2 sock is not allowed")
+__naked void store_release_to_sock_pointer(void)
+{
+	asm volatile (
+	"w0 = 0;"
+	"r2 = *(u64 *)(r1 + %[sk_reuseport_md_sk]);"
+	".8byte %[store_release_insn];" // store_release((u8 *)(r2 + 0), w0);
+	"exit;"
+	:
+	: __imm_const(sk_reuseport_md_sk, offsetof(struct sk_reuseport_md, sk)),
+	  __imm_insn(store_release_insn,
+		     BPF_ATOMIC_OP(BPF_B, BPF_STORE_REL, BPF_REG_2, BPF_REG_0, 0))
+	: __clobber_all);
+}
+
+SEC("socket")
+__description("store-release, leak pointer to stack")
+__success __success_unpriv __retval(0)
+__naked void store_release_leak_pointer_to_stack(void)
+{
+	asm volatile (
+	".8byte %[store_release_insn];" // store_release((u64 *)(r10 - 8), r1);
+	"r0 = 0;"
+	"exit;"
+	:
+	: __imm_insn(store_release_insn,
+		     BPF_ATOMIC_OP(BPF_DW, BPF_STORE_REL, BPF_REG_10, BPF_REG_1, -8))
+	: __clobber_all);
+}
+
+struct {
+	__uint(type, BPF_MAP_TYPE_HASH);
+	__uint(max_entries, 1);
+	__type(key, long long);
+	__type(value, long long);
+} map_hash_8b SEC(".maps");
+
+SEC("socket")
+__description("store-release, leak pointer to map")
+__success __retval(0)
+__failure_unpriv __msg_unpriv("R6 leaks addr into map")
+__naked void store_release_leak_pointer_to_map(void)
+{
+	asm volatile (
+	"r6 = r1;"
+	"r1 = %[map_hash_8b] ll;"
+	"r2 = 0;"
+	"*(u64 *)(r10 - 8) = r2;"
+	"r2 = r10;"
+	"r2 += -8;"
+	"call %[bpf_map_lookup_elem];"
+	"if r0 == 0 goto l0_%=;"
+	".8byte %[store_release_insn];" // store_release((u64 *)(r0 + 0), r6);
+"l0_%=:"
+	"r0 = 0;"
+	"exit;"
+	:
+	: __imm_addr(map_hash_8b),
+	  __imm(bpf_map_lookup_elem),
+	  __imm_insn(store_release_insn,
+		     BPF_ATOMIC_OP(BPF_DW, BPF_STORE_REL, BPF_REG_0, BPF_REG_6, 0))
+	: __clobber_all);
+}
+
+#else
+
+SEC("socket")
+__description("store-release is not supported by compiler or jit, use a dummy test")
+__success
+int dummy_test(void)
+{
+	return 0;
+}
+
+#endif
+
+char _license[] SEC("license") = "GPL";
-- 
2.48.1.502.g6dc24dfdaf-goog



^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH bpf-next v2 9/9] bpf, docs: Update instruction-set.rst for load-acquire and store-release instructions
  2025-02-07  2:04 [PATCH bpf-next v2 0/9] Introduce load-acquire and store-release BPF instructions Peilin Ye
                   ` (7 preceding siblings ...)
  2025-02-07  2:06 ` [PATCH bpf-next v2 8/9] selftests/bpf: Add selftests for " Peilin Ye
@ 2025-02-07  2:06 ` Peilin Ye
  2025-02-07 21:29 ` [PATCH bpf-next v2 0/9] Introduce load-acquire and store-release BPF instructions Peilin Ye
  9 siblings, 0 replies; 35+ messages in thread
From: Peilin Ye @ 2025-02-07  2:06 UTC (permalink / raw)
  To: bpf, linux-arm-kernel
  Cc: Peilin Ye, bpf, Xu Kuohai, Eduard Zingerman, David Vernet,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Jonathan Corbet,
	Paul E. McKenney, Puranjay Mohan, Ilya Leoshkevich,
	Heiko Carstens, Vasily Gorbik, Catalin Marinas, Will Deacon,
	Quentin Monnet, Mykola Lysenko, Shuah Khan, Ihor Solodrai,
	Yingchi Long, Josh Don, Barret Rhoden, Neel Natu, Benjamin Segall,
	linux-kernel

Update documentation for the new load-acquire and store-release
instructions.  Rename existing atomic operations as "atomic
read-modify-write (RMW) operations".

Following RFC 9669, section 7.3. "Adding Instructions", create new
conformance groups "atomic32v2" and "atomic64v2", where:

  * atomic32v2: includes all instructions in "atomic32", plus the new
                8-bit, 16-bit and 32-bit atomic load-acquire and
                store-release instructions

  * atomic64v2: includes all instructions in "atomic64" and
                "atomic32v2", plus the new 64-bit atomic load-acquire
                and store-release instructions

Cc: bpf@ietf.org
Signed-off-by: Peilin Ye <yepeilin@google.com>
---
 .../bpf/standardization/instruction-set.rst   | 114 +++++++++++++++---
 1 file changed, 98 insertions(+), 16 deletions(-)

diff --git a/Documentation/bpf/standardization/instruction-set.rst b/Documentation/bpf/standardization/instruction-set.rst
index ab820d565052..86917932e9ef 100644
--- a/Documentation/bpf/standardization/instruction-set.rst
+++ b/Documentation/bpf/standardization/instruction-set.rst
@@ -139,8 +139,14 @@ This document defines the following conformance groups:
   specification unless otherwise noted.
 * base64: includes base32, plus instructions explicitly noted
   as being in the base64 conformance group.
-* atomic32: includes 32-bit atomic operation instructions (see `Atomic operations`_).
-* atomic64: includes atomic32, plus 64-bit atomic operation instructions.
+* atomic32: includes 32-bit atomic read-modify-write instructions (see
+  `Atomic operations`_).
+* atomic32v2: includes atomic32, plus 8-bit, 16-bit and 32-bit atomic
+  load-acquire and store-release instructions.
+* atomic64: includes atomic32, plus 64-bit atomic read-modify-write
+  instructions.
+* atomic64v2: unifies atomic32v2 and atomic64, plus 64-bit atomic load-acquire
+  and store-release instructions.
 * divmul32: includes 32-bit division, multiplication, and modulo instructions.
 * divmul64: includes divmul32, plus 64-bit division, multiplication,
   and modulo instructions.
@@ -653,20 +659,31 @@ Atomic operations are operations that operate on memory and can not be
 interrupted or corrupted by other access to the same memory region
 by other BPF programs or means outside of this specification.
 
-All atomic operations supported by BPF are encoded as store operations
-that use the ``ATOMIC`` mode modifier as follows:
+All atomic operations supported by BPF are encoded as ``STX`` instructions
+that use the ``ATOMIC`` mode modifier, with the 'imm' field encoding the
+actual atomic operation.  These operations are categorized based on the second
+lowest nibble (bits 4-7) of the 'imm' field:
 
-* ``{ATOMIC, W, STX}`` for 32-bit operations, which are
+* ``ATOMIC_LOAD`` and ``ATOMIC_STORE`` indicate atomic load and store
+  operations, respectively (see `Atomic load and store operations`_).
+* All other defined values indicate an atomic read-modify-write operation, as
+  described in the following section.
+
+Atomic read-modify-write operations
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The atomic read-modify-write (RMW) operations are encoded as follows:
+
+* ``{ATOMIC, W, STX}`` for 32-bit RMW operations, which are
   part of the "atomic32" conformance group.
-* ``{ATOMIC, DW, STX}`` for 64-bit operations, which are
+* ``{ATOMIC, DW, STX}`` for 64-bit RMW operations, which are
   part of the "atomic64" conformance group.
-* 8-bit and 16-bit wide atomic operations are not supported.
+* 8-bit and 16-bit wide atomic RMW operations are not supported.
 
-The 'imm' field is used to encode the actual atomic operation.
-Simple atomic operation use a subset of the values defined to encode
-arithmetic operations in the 'imm' field to encode the atomic operation:
+Simple atomic RMW operation use a subset of the values defined to encode
+arithmetic operations in the 'imm' field to encode the atomic RMW operation:
 
-.. table:: Simple atomic operations
+.. table:: Simple atomic read-modify-write operations
 
   ========  =====  ===========
   imm       value  description
@@ -686,10 +703,10 @@ arithmetic operations in the 'imm' field to encode the atomic operation:
 
   *(u64 *)(dst + offset) += src
 
-In addition to the simple atomic operations, there also is a modifier and
-two complex atomic operations:
+In addition to the simple atomic RMW operations, there also is a modifier and
+two complex atomic RMW operations:
 
-.. table:: Complex atomic operations
+.. table:: Complex atomic read-modify-write operations
 
   ===========  ================  ===========================
   imm          value             description
@@ -699,8 +716,8 @@ two complex atomic operations:
   CMPXCHG      0xf0 | FETCH      atomic compare and exchange
   ===========  ================  ===========================
 
-The ``FETCH`` modifier is optional for simple atomic operations, and
-always set for the complex atomic operations.  If the ``FETCH`` flag
+The ``FETCH`` modifier is optional for simple atomic RMW operations, and
+always set for the complex atomic RMW operations.  If the ``FETCH`` flag
 is set, then the operation also overwrites ``src`` with the value that
 was in memory before it was modified.
 
@@ -713,6 +730,71 @@ The ``CMPXCHG`` operation atomically compares the value addressed by
 value that was at ``dst + offset`` before the operation is zero-extended
 and loaded back to ``R0``.
 
+Atomic load and store operations
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+To encode an atomic load or store operation, the lowest 8 bits of the 'imm'
+field are divided as follows::
+
+  +-+-+-+-+-+-+-+-+
+  | type  | order |
+  +-+-+-+-+-+-+-+-+
+
+**type**
+  The operation type is one of:
+
+.. table:: Atomic load and store operation types
+
+  ============  =====  ============
+  type          value  description
+  ============  =====  ============
+  ATOMIC_LOAD   0x1    atomic load
+  ATOMIC_STORE  0x2    atomic store
+  ============  =====  ============
+
+**order**
+  The memory order is one of:
+
+.. table:: Memory orders
+
+  =======  =====  =======================
+  order    value  description
+  =======  =====  =======================
+  RELAXED  0x0    relaxed
+  ACQUIRE  0x1    acquire
+  RELEASE  0x2    release
+  ACQ_REL  0x3    acquire and release
+  SEQ_CST  0x4    sequentially consistent
+  =======  =====  =======================
+
+Currently the following combinations of ``type`` and ``order`` are allowed:
+
+.. table:: Atomic load and store operations
+
+  ========= =====  ====================
+  imm       value  description
+  ========= =====  ====================
+  LOAD_ACQ  0x11   atomic load-acquire
+  STORE_REL 0x22   atomic store-release
+  ========= =====  ====================
+
+``{ATOMIC, <size>, STX}`` with 'imm' = LOAD_ACQ means::
+
+  dst = load_acquire((unsigned size *)(src + offset))
+
+``{ATOMIC, <size>, STX}`` with 'imm' = STORE_REL means::
+
+  store_release((unsigned size *)(dst + offset), src)
+
+Where '<size>' is one of: ``B``, ``H``, ``W``, or ``DW``, and 'unsigned size'
+is one of: u8, u16, u32, or u64.
+
+8-bit, 16-bit and 32-bit atomic load-acquire and store-release instructions
+are part of the "atomic32v2" conformance group.
+
+64-bit atomic load-acquire and store-release instructions are part of the
+"atomic64v2" conformance group.
+
 64-bit immediate instructions
 -----------------------------
 
-- 
2.48.1.502.g6dc24dfdaf-goog



^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH bpf-next v2 4/9] bpf: Introduce load-acquire and store-release instructions
  2025-02-07  2:05 ` [PATCH bpf-next v2 4/9] bpf: Introduce load-acquire and store-release instructions Peilin Ye
@ 2025-02-07 11:28   ` Ilya Leoshkevich
  2025-02-07 21:23     ` Peilin Ye
  2025-02-08 21:30   ` Alexei Starovoitov
  1 sibling, 1 reply; 35+ messages in thread
From: Ilya Leoshkevich @ 2025-02-07 11:28 UTC (permalink / raw)
  To: Peilin Ye, bpf, linux-arm-kernel
  Cc: bpf, Xu Kuohai, Eduard Zingerman, David Vernet,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Jonathan Corbet,
	Paul E. McKenney, Puranjay Mohan, Heiko Carstens, Vasily Gorbik,
	Catalin Marinas, Will Deacon, Quentin Monnet, Mykola Lysenko,
	Shuah Khan, Ihor Solodrai, Yingchi Long, Josh Don, Barret Rhoden,
	Neel Natu, Benjamin Segall, linux-kernel

On Fri, 2025-02-07 at 02:05 +0000, Peilin Ye wrote:
> Introduce BPF instructions with load-acquire and store-release
> semantics, as discussed in [1].  The following new flags are defined:
> 
>   BPF_ATOMIC_LOAD         0x10
>   BPF_ATOMIC_STORE        0x20
>   BPF_ATOMIC_TYPE(imm)    ((imm) & 0xf0)
> 
>   BPF_RELAXED        0x0
>   BPF_ACQUIRE        0x1
>   BPF_RELEASE        0x2
>   BPF_ACQ_REL        0x3
>   BPF_SEQ_CST        0x4
> 
>   BPF_LOAD_ACQ       (BPF_ATOMIC_LOAD | BPF_ACQUIRE)
>   BPF_STORE_REL      (BPF_ATOMIC_STORE | BPF_RELEASE)
> 
> A "load-acquire" is a BPF_STX | BPF_ATOMIC instruction with the 'imm'
> field set to BPF_LOAD_ACQ (0x11).
> 
> Similarly, a "store-release" is a BPF_STX | BPF_ATOMIC instruction
> with
> the 'imm' field set to BPF_STORE_REL (0x22).
> 
> Unlike existing atomic operations that only support BPF_W (32-bit)
> and
> BPF_DW (64-bit) size modifiers, load-acquires and store-releases also
> support BPF_B (8-bit) and BPF_H (16-bit).  An 8- or 16-bit load-
> acquire
> zero-extends the value before writing it to a 32-bit register, just
> like
> ARM64 instruction LDARH and friends.
> 
> As an example, consider the following 64-bit load-acquire BPF
> instruction:
> 
>   db 10 00 00 11 00 00 00  r0 = load_acquire((u64 *)(r1 + 0x0))
> 
>   opcode (0xdb): BPF_ATOMIC | BPF_DW | BPF_STX
>   imm (0x00000011): BPF_LOAD_ACQ
> 
> Similarly, a 16-bit BPF store-release:
> 
>   cb 21 00 00 22 00 00 00  store_release((u16 *)(r1 + 0x0), w2)
> 
>   opcode (0xcb): BPF_ATOMIC | BPF_H | BPF_STX
>   imm (0x00000022): BPF_STORE_REL
> 
> In arch/{arm64,s390,x86}/net/bpf_jit_comp.c, have
> bpf_jit_supports_insn(..., /*in_arena=*/true) return false for the
> new
> instructions, until the corresponding JIT compiler supports them.
> 
> [1]
> https://lore.kernel.org/all/20240729183246.4110549-1-yepeilin@google.com/
> 
> Acked-by: Eduard Zingerman <eddyz87@gmail.com>
> Signed-off-by: Peilin Ye <yepeilin@google.com>
> ---
>  arch/arm64/net/bpf_jit_comp.c  |  4 +++
>  arch/s390/net/bpf_jit_comp.c   | 14 +++++---
>  arch/x86/net/bpf_jit_comp.c    |  4 +++
>  include/linux/bpf.h            | 11 ++++++
>  include/linux/filter.h         |  2 ++
>  include/uapi/linux/bpf.h       | 13 +++++++
>  kernel/bpf/core.c              | 63 ++++++++++++++++++++++++++++++--
> --
>  kernel/bpf/disasm.c            | 12 +++++++
>  kernel/bpf/verifier.c          | 45 ++++++++++++++++++++++--
>  tools/include/uapi/linux/bpf.h | 13 +++++++
>  10 files changed, 168 insertions(+), 13 deletions(-)

Acked-by: Ilya Leoshkevich <iii@linux.ibm.com>

s390x has a strong memory model, and the regular load and store
instructions are atomic as long as operand addresses are aligned.

IIUC the verifier already enforces this unless BPF_F_ANY_ALIGNMENT
is set, in which case whoever loaded the program is responsible for the
consequences: memory accesses that happen to be unaligned would
not trigger an exception, but they would not be atomic either.

So I can implement the new instructions as normal loads/stores after
this series is merged.


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH bpf-next v2 4/9] bpf: Introduce load-acquire and store-release instructions
  2025-02-07 11:28   ` Ilya Leoshkevich
@ 2025-02-07 21:23     ` Peilin Ye
  0 siblings, 0 replies; 35+ messages in thread
From: Peilin Ye @ 2025-02-07 21:23 UTC (permalink / raw)
  To: Ilya Leoshkevich
  Cc: bpf, linux-arm-kernel, bpf, Xu Kuohai, Eduard Zingerman,
	David Vernet, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Jonathan Corbet, Paul E. McKenney, Puranjay Mohan, Heiko Carstens,
	Vasily Gorbik, Catalin Marinas, Will Deacon, Quentin Monnet,
	Mykola Lysenko, Shuah Khan, Ihor Solodrai, Yingchi Long, Josh Don,
	Barret Rhoden, Neel Natu, Benjamin Segall, linux-kernel

Hi Ilya,

On Fri, Feb 07, 2025 at 12:28:43PM +0100, Ilya Leoshkevich wrote:
> Acked-by: Ilya Leoshkevich <iii@linux.ibm.com>

Thanks!

> s390x has a strong memory model, and the regular load and store
> instructions are atomic as long as operand addresses are aligned.

I see.

> IIUC the verifier already enforces this unless BPF_F_ANY_ALIGNMENT
> is set, in which case whoever loaded the program is responsible for the
> consequences: memory accesses that happen to be unaligned would
> not trigger an exception, but they would not be atomic either.

The verifier rejects misaligned BPF_ATOMIC instructions since commit
ca36960211eb ("bpf: allow xadd only on aligned memory"), even if
BPF_F_ANY_ALIGNMENT is set - so this patch makes the verifier reject
misaligned load-acquires and store-releases, too, to keep the behaviour
consistent:

Specifically, check_atomic_load() calls check_load_mem() (and
check_atomic_store() calls check_store_reg()) with the
@strict_alignment_once argument equals true.  See also selftests
load_acquire_misaligned() and store_release_misaligned() in PATCH 8/9.

> So I can implement the new instructions as normal loads/stores after
> this series is merged.

That would be great!

Thanks,
Peilin Ye



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH bpf-next v2 0/9] Introduce load-acquire and store-release BPF instructions
  2025-02-07  2:04 [PATCH bpf-next v2 0/9] Introduce load-acquire and store-release BPF instructions Peilin Ye
                   ` (8 preceding siblings ...)
  2025-02-07  2:06 ` [PATCH bpf-next v2 9/9] bpf, docs: Update instruction-set.rst " Peilin Ye
@ 2025-02-07 21:29 ` Peilin Ye
  9 siblings, 0 replies; 35+ messages in thread
From: Peilin Ye @ 2025-02-07 21:29 UTC (permalink / raw)
  To: bpf, linux-arm-kernel
  Cc: bpf, Xu Kuohai, Eduard Zingerman, David Vernet,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Jonathan Corbet,
	Paul E. McKenney, Puranjay Mohan, Ilya Leoshkevich,
	Heiko Carstens, Vasily Gorbik, Catalin Marinas, Will Deacon,
	Quentin Monnet, Mykola Lysenko, Shuah Khan, Ihor Solodrai,
	Yingchi Long, Josh Don, Barret Rhoden, Neel Natu, Benjamin Segall,
	linux-kernel

On Fri, Feb 07, 2025 at 02:04:54AM +0000, Peilin Ye wrote:
> Peilin Ye (9):
>   bpf/verifier: Factor out atomic_ptr_type_ok()
>   bpf/verifier: Factor out check_atomic_rmw()
>   bpf/verifier: Factor out check_load_mem() and check_store_reg()
>   bpf: Introduce load-acquire and store-release instructions
>   arm64: insn: Add BIT(23) to {load,store}_ex's mask
>   arm64: insn: Add load-acquire and store-release instructions
>   bpf, arm64: Support load-acquire and store-release instructions
>   selftests/bpf: Add selftests for load-acquire and store-release
>     instructions
>   bpf, docs: Update instruction-set.rst for load-acquire and
>     store-release instructions
> 
>  .../bpf/standardization/instruction-set.rst   | 114 ++++++--
>  arch/arm64/include/asm/insn.h                 |  12 +-
>  arch/arm64/lib/insn.c                         |  29 ++
>  arch/arm64/net/bpf_jit.h                      |  20 ++
>  arch/arm64/net/bpf_jit_comp.c                 |  87 +++++-
>  arch/s390/net/bpf_jit_comp.c                  |  14 +-
>  arch/x86/net/bpf_jit_comp.c                   |   4 +
>  include/linux/bpf.h                           |  11 +
>  include/linux/filter.h                        |   2 +
>  include/uapi/linux/bpf.h                      |  13 +
>  kernel/bpf/core.c                             |  63 ++++-
>  kernel/bpf/disasm.c                           |  12 +
>  kernel/bpf/verifier.c                         | 234 +++++++++++-----
>  tools/include/uapi/linux/bpf.h                |  13 +
>  .../selftests/bpf/prog_tests/arena_atomics.c  |  50 ++++
>  .../selftests/bpf/prog_tests/verifier.c       |   4 +
>  .../selftests/bpf/progs/arena_atomics.c       |  88 ++++++
>  .../bpf/progs/verifier_load_acquire.c         | 190 +++++++++++++
>  .../selftests/bpf/progs/verifier_precision.c  |  47 ++++
>  .../bpf/progs/verifier_store_release.c        | 262 ++++++++++++++++++
>  20 files changed, 1164 insertions(+), 105 deletions(-)
>  create mode 100644 tools/testing/selftests/bpf/progs/verifier_load_acquire.c
>  create mode 100644 tools/testing/selftests/bpf/progs/verifier_store_release.c

Looks like the llvm-18 CI job passed but the llvm-17/gcc ones failed.
I'll debug with llvm-17 and see if I need different #ifdef guards for
the new tests.

Thanks,
Peilin Ye



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH bpf-next v2 8/9] selftests/bpf: Add selftests for load-acquire and store-release instructions
  2025-02-07  2:06 ` [PATCH bpf-next v2 8/9] selftests/bpf: Add selftests for " Peilin Ye
@ 2025-02-07 23:47   ` Peilin Ye
  2025-02-08  0:20   ` Peilin Ye
  2025-02-11  0:08   ` Eduard Zingerman
  2 siblings, 0 replies; 35+ messages in thread
From: Peilin Ye @ 2025-02-07 23:47 UTC (permalink / raw)
  To: bpf, linux-arm-kernel
  Cc: bpf, Xu Kuohai, Eduard Zingerman, David Vernet,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Jonathan Corbet,
	Paul E. McKenney, Puranjay Mohan, Ilya Leoshkevich,
	Heiko Carstens, Vasily Gorbik, Catalin Marinas, Will Deacon,
	Quentin Monnet, Mykola Lysenko, Shuah Khan, Ihor Solodrai,
	Yingchi Long, Josh Don, Barret Rhoden, Neel Natu, Benjamin Segall,
	linux-kernel

On Fri, Feb 07, 2025 at 02:06:29AM +0000, Peilin Ye wrote:
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/progs/verifier_load_acquire.c
> @@ -0,0 +1,190 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/bpf.h>
> +#include <bpf/bpf_helpers.h>
> +#include "../../../include/linux/filter.h"
> +#include "bpf_misc.h"
> +
> +#if defined(ENABLE_ATOMICS_TESTS) && defined(__TARGET_ARCH_arm64)
> +
> +SEC("socket")
> +__description("load-acquire, 8-bit")
> +__success __success_unpriv __retval(0x12)
> +__naked void load_acquire_8(void)
> +{
> +	asm volatile (
> +	"*(u8 *)(r10 - 1) = 0x12;"
                            ~~~~
I realized that I am using STORE_imm<> instructions in load-acquire
tests, and llvm-17 -mcpu=v3 cannot build them.

Can be fixed by simply doing e.g. the following instead:

        "r1 = 0x12;"
	"*(u8 *)(r10 - 1) = r1;"

> +	".8byte %[load_acquire_insn];" // w0 = load_acquire((u8 *)(r10 - 1));
> +	"exit;"
> +	:
> +	: __imm_insn(load_acquire_insn,
> +		     BPF_ATOMIC_OP(BPF_B, BPF_LOAD_ACQ, BPF_REG_0, BPF_REG_10, -1))
> +	: __clobber_all);
> +}

Thanks,
Peilin Ye



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH bpf-next v2 8/9] selftests/bpf: Add selftests for load-acquire and store-release instructions
  2025-02-07  2:06 ` [PATCH bpf-next v2 8/9] selftests/bpf: Add selftests for " Peilin Ye
  2025-02-07 23:47   ` Peilin Ye
@ 2025-02-08  0:20   ` Peilin Ye
  2025-02-08  2:20     ` Peilin Ye
  2025-02-11  0:08   ` Eduard Zingerman
  2 siblings, 1 reply; 35+ messages in thread
From: Peilin Ye @ 2025-02-08  0:20 UTC (permalink / raw)
  To: bpf, linux-arm-kernel
  Cc: bpf, Xu Kuohai, Eduard Zingerman, David Vernet,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Jonathan Corbet,
	Paul E. McKenney, Puranjay Mohan, Ilya Leoshkevich,
	Heiko Carstens, Vasily Gorbik, Catalin Marinas, Will Deacon,
	Quentin Monnet, Mykola Lysenko, Shuah Khan, Ihor Solodrai,
	Yingchi Long, Josh Don, Barret Rhoden, Neel Natu, Benjamin Segall,
	linux-kernel

On Fri, Feb 07, 2025 at 02:06:29AM +0000, Peilin Ye wrote:
> --- a/tools/testing/selftests/bpf/progs/arena_atomics.c
> +++ b/tools/testing/selftests/bpf/progs/arena_atomics.c
> @@ -6,6 +6,8 @@
>  #include <stdbool.h>
>  #include <stdatomic.h>
>  #include "bpf_arena_common.h"
> +#include "../../../include/linux/filter.h"
> +#include "bpf_misc.h"
>  
>  struct {
>  	__uint(type, BPF_MAP_TYPE_ARENA);
> @@ -274,4 +276,90 @@ int uaf(const void *ctx)
>  	return 0;
>  }
>  
> +__u8 __arena_global load_acquire8_value = 0x12;
   ~~~~

CI job x86_64-llvm-17 [1] failed because clang-17 crashed when compiling
this file (arena_atomics.c):

  fatal error: error in backend: unable to write nop sequence of 1 bytes

After some digging, I believe I am hitting a known issue that Yonghong
described in [2].  Changing __u8 and __u16 variables to __u32 seems to
resolve/work around the issue:

  +__u32 __arena_global load_acquire8_value = 0x12;

Will look into it more.

[1] https://github.com/kernel-patches/bpf/actions/runs/13191887466/job/36826207612
[2] https://lore.kernel.org/bpf/d56223f9-483e-fbc1-4564-44c0858a1e3e@meta.com/

> +__u16 __arena_global load_acquire16_value = 0x1234;
> +__u32 __arena_global load_acquire32_value = 0x12345678;
> +__u64 __arena_global load_acquire64_value = 0x1234567890abcdef;
> +
> +__u8 __arena_global load_acquire8_result = 0x12;
> +__u16 __arena_global load_acquire16_result = 0x1234;
> +__u32 __arena_global load_acquire32_result = 0x12345678;
> +__u64 __arena_global load_acquire64_result = 0x1234567890abcdef;

Thanks,
Peilin Ye



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH bpf-next v2 8/9] selftests/bpf: Add selftests for load-acquire and store-release instructions
  2025-02-08  0:20   ` Peilin Ye
@ 2025-02-08  2:20     ` Peilin Ye
  2025-02-11  0:20       ` Eduard Zingerman
  0 siblings, 1 reply; 35+ messages in thread
From: Peilin Ye @ 2025-02-08  2:20 UTC (permalink / raw)
  To: bpf, linux-arm-kernel
  Cc: bpf, Xu Kuohai, Eduard Zingerman, David Vernet,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Jonathan Corbet,
	Paul E. McKenney, Puranjay Mohan, Ilya Leoshkevich,
	Heiko Carstens, Vasily Gorbik, Catalin Marinas, Will Deacon,
	Quentin Monnet, Mykola Lysenko, Shuah Khan, Ihor Solodrai,
	Yingchi Long, Josh Don, Barret Rhoden, Neel Natu, Benjamin Segall,
	linux-kernel

On Sat, Feb 08, 2025 at 12:20:58AM +0000, Peilin Ye wrote:
> > --- a/tools/testing/selftests/bpf/progs/arena_atomics.c
> > +++ b/tools/testing/selftests/bpf/progs/arena_atomics.c
> > @@ -6,6 +6,8 @@
> >  #include <stdbool.h>
> >  #include <stdatomic.h>
> >  #include "bpf_arena_common.h"
> > +#include "../../../include/linux/filter.h"
> > +#include "bpf_misc.h"
> >  
> >  struct {
> >  	__uint(type, BPF_MAP_TYPE_ARENA);
> > @@ -274,4 +276,90 @@ int uaf(const void *ctx)
> >  	return 0;
> >  }
> >  
> > +__u8 __arena_global load_acquire8_value = 0x12;
>    ~~~~
> 
> CI job x86_64-llvm-17 [1] failed because clang-17 crashed when compiling
> this file (arena_atomics.c):
> 
>   fatal error: error in backend: unable to write nop sequence of 1 bytes
> 
> After some digging, I believe I am hitting a known issue that Yonghong
> described in [2].

> Changing __u8 and __u16 variables to __u32 seems to resolve/work
> around the issue

Sorry, that wasn't very accurate - we need to make sure there are no
"holes" in the .addr_space.1 ELF section, e.g.:

  /* 8-byte-aligned */
  __u8 __arena_global load_acquire8_value = 0x12;
  /* 1-byte hole, causing clang-17 to crash */
  __u16 __arena_global load_acquire16_value = 0x1234;

LLVM commit f27c4903c43b ("MC: Add .data. and .rodata. prefixes to
MCContext section classification") fixed this issue.

- - -
For now, I think I should:

  1. change existing #if guards to
     "#if defined(__TARGET_ARCH_arm64) && __clang_major__ >= 18"

  2. additionally, guard "__arena_global" variable definitions behind
     "#if __clang_major >= 18" so that clang-17 doesn't try to compile
     that part (then crash)

Will fix in v3.

Thanks,
Peilin Ye



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH bpf-next v2 4/9] bpf: Introduce load-acquire and store-release instructions
  2025-02-07  2:05 ` [PATCH bpf-next v2 4/9] bpf: Introduce load-acquire and store-release instructions Peilin Ye
  2025-02-07 11:28   ` Ilya Leoshkevich
@ 2025-02-08 21:30   ` Alexei Starovoitov
  2025-02-09  2:21     ` Peilin Ye
  1 sibling, 1 reply; 35+ messages in thread
From: Alexei Starovoitov @ 2025-02-08 21:30 UTC (permalink / raw)
  To: Peilin Ye
  Cc: bpf, linux-arm-kernel, bpf, Xu Kuohai, Eduard Zingerman,
	David Vernet, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Jonathan Corbet, Paul E. McKenney, Puranjay Mohan,
	Ilya Leoshkevich, Heiko Carstens, Vasily Gorbik, Catalin Marinas,
	Will Deacon, Quentin Monnet, Mykola Lysenko, Shuah Khan,
	Ihor Solodrai, Yingchi Long, Josh Don, Barret Rhoden, Neel Natu,
	Benjamin Segall, LKML

On Thu, Feb 6, 2025 at 6:06 PM Peilin Ye <yepeilin@google.com> wrote:
>
> Introduce BPF instructions with load-acquire and store-release
> semantics, as discussed in [1].  The following new flags are defined:
>
>   BPF_ATOMIC_LOAD         0x10
>   BPF_ATOMIC_STORE        0x20
>   BPF_ATOMIC_TYPE(imm)    ((imm) & 0xf0)
>
>   BPF_RELAXED        0x0
>   BPF_ACQUIRE        0x1
>   BPF_RELEASE        0x2
>   BPF_ACQ_REL        0x3
>   BPF_SEQ_CST        0x4

I still don't like this.

Earlier you said:

> If yes, I think we either:
>
>  (a) add more flags to imm<4-7>: maybe LOAD_SEQ_CST (0x3) and
>      STORE_SEQ_CST (0x6); need to skip OR (0x4) and AND (0x5) used by
>      RMW atomics
>  (b) specify memorder in imm<0-3>
>
> I chose (b) for fewer "What would be a good numerical value so that RMW
> atomics won't need to use it in imm<4-7>?" questions to answer.
>
> If we're having dedicated fields for memorder, I think it's better to
> define all possible values once and for all, just so that e.g. 0x2 will
> always mean RELEASE in a memorder field.  Initially I defined all six of
> them [2], then Yonghong suggested dropping CONSUME [3].

I don't think we should be defining "all possible values",
since these are the values that llvm and C model supports,
but do we have any plans to support anything bug ld_acq/st_rel ?
I haven't heard anything.
What even the meaning of BPF_ATOMIC_LOAD | BPF_ACQ_REL ?

What does the verifier suppose to do? reject for now? and then what?
Map to what insn?

These values might imply that bpf infra is supposed to map all the values
to cpu instructions, but that's not what we're doing here.
We're only dealing with two specific instructions.
We're not defining a memory model for all future new instructions.

pw-bot: cr


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH bpf-next v2 4/9] bpf: Introduce load-acquire and store-release instructions
  2025-02-08 21:30   ` Alexei Starovoitov
@ 2025-02-09  2:21     ` Peilin Ye
  2025-02-09  3:46       ` Alexei Starovoitov
  0 siblings, 1 reply; 35+ messages in thread
From: Peilin Ye @ 2025-02-09  2:21 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: bpf, linux-arm-kernel, bpf, Xu Kuohai, Eduard Zingerman,
	David Vernet, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Jonathan Corbet, Paul E. McKenney, Puranjay Mohan,
	Ilya Leoshkevich, Heiko Carstens, Vasily Gorbik, Catalin Marinas,
	Will Deacon, Quentin Monnet, Mykola Lysenko, Shuah Khan,
	Ihor Solodrai, Yingchi Long, Josh Don, Barret Rhoden, Neel Natu,
	Benjamin Segall, LKML

Hi Alexei,

On Sat, Feb 08, 2025 at 01:30:46PM -0800, Alexei Starovoitov wrote:
> > Introduce BPF instructions with load-acquire and store-release
> > semantics, as discussed in [1].  The following new flags are defined:
> >
> >   BPF_ATOMIC_LOAD         0x10
> >   BPF_ATOMIC_STORE        0x20
> >   BPF_ATOMIC_TYPE(imm)    ((imm) & 0xf0)
> >
> >   BPF_RELAXED        0x0
> >   BPF_ACQUIRE        0x1
> >   BPF_RELEASE        0x2
> >   BPF_ACQ_REL        0x3
> >   BPF_SEQ_CST        0x4
> 
> I still don't like this.
> 
> Earlier you said:
> 
> > If yes, I think we either:
> >
> >  (a) add more flags to imm<4-7>: maybe LOAD_SEQ_CST (0x3) and
> >      STORE_SEQ_CST (0x6); need to skip OR (0x4) and AND (0x5) used by
> >      RMW atomics
> >  (b) specify memorder in imm<0-3>
> >
> > I chose (b) for fewer "What would be a good numerical value so that RMW
> > atomics won't need to use it in imm<4-7>?" questions to answer.
> >
> > If we're having dedicated fields for memorder, I think it's better to
> > define all possible values once and for all, just so that e.g. 0x2 will
> > always mean RELEASE in a memorder field.  Initially I defined all six of
> > them [2], then Yonghong suggested dropping CONSUME [3].
> 
> I don't think we should be defining "all possible values",
> since these are the values that llvm and C model supports,
> but do we have any plans to support anything bug ld_acq/st_rel ?
> I haven't heard anything.
> What even the meaning of BPF_ATOMIC_LOAD | BPF_ACQ_REL ?
> 
> What does the verifier suppose to do? reject for now? and then what?
> Map to what insn?
> 
> These values might imply that bpf infra is supposed to map all the values
> to cpu instructions, but that's not what we're doing here.
> We're only dealing with two specific instructions.
> We're not defining a memory model for all future new instructions.

Got it!  In v3, I'll change it back to:

  #define BPF_LOAD_ACQ   0x10
  #define BPF_STORE_REL  0x20

Thanks,
Peilin Ye



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH bpf-next v2 4/9] bpf: Introduce load-acquire and store-release instructions
  2025-02-09  2:21     ` Peilin Ye
@ 2025-02-09  3:46       ` Alexei Starovoitov
  2025-02-09 23:41         ` Peilin Ye
  2025-02-10 22:51         ` Peilin Ye
  0 siblings, 2 replies; 35+ messages in thread
From: Alexei Starovoitov @ 2025-02-09  3:46 UTC (permalink / raw)
  To: Peilin Ye
  Cc: bpf, linux-arm-kernel, bpf, Xu Kuohai, Eduard Zingerman,
	David Vernet, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Jonathan Corbet, Paul E. McKenney, Puranjay Mohan,
	Ilya Leoshkevich, Heiko Carstens, Vasily Gorbik, Catalin Marinas,
	Will Deacon, Quentin Monnet, Mykola Lysenko, Shuah Khan,
	Ihor Solodrai, Yingchi Long, Josh Don, Barret Rhoden, Neel Natu,
	Benjamin Segall, LKML

On Sat, Feb 8, 2025 at 6:21 PM Peilin Ye <yepeilin@google.com> wrote:
>
> Hi Alexei,
>
> On Sat, Feb 08, 2025 at 01:30:46PM -0800, Alexei Starovoitov wrote:
> > > Introduce BPF instructions with load-acquire and store-release
> > > semantics, as discussed in [1].  The following new flags are defined:
> > >
> > >   BPF_ATOMIC_LOAD         0x10
> > >   BPF_ATOMIC_STORE        0x20
> > >   BPF_ATOMIC_TYPE(imm)    ((imm) & 0xf0)
> > >
> > >   BPF_RELAXED        0x0
> > >   BPF_ACQUIRE        0x1
> > >   BPF_RELEASE        0x2
> > >   BPF_ACQ_REL        0x3
> > >   BPF_SEQ_CST        0x4
> >
> > I still don't like this.
> >
> > Earlier you said:
> >
> > > If yes, I think we either:
> > >
> > >  (a) add more flags to imm<4-7>: maybe LOAD_SEQ_CST (0x3) and
> > >      STORE_SEQ_CST (0x6); need to skip OR (0x4) and AND (0x5) used by
> > >      RMW atomics
> > >  (b) specify memorder in imm<0-3>
> > >
> > > I chose (b) for fewer "What would be a good numerical value so that RMW
> > > atomics won't need to use it in imm<4-7>?" questions to answer.
> > >
> > > If we're having dedicated fields for memorder, I think it's better to
> > > define all possible values once and for all, just so that e.g. 0x2 will
> > > always mean RELEASE in a memorder field.  Initially I defined all six of
> > > them [2], then Yonghong suggested dropping CONSUME [3].
> >
> > I don't think we should be defining "all possible values",
> > since these are the values that llvm and C model supports,
> > but do we have any plans to support anything bug ld_acq/st_rel ?
> > I haven't heard anything.
> > What even the meaning of BPF_ATOMIC_LOAD | BPF_ACQ_REL ?
> >
> > What does the verifier suppose to do? reject for now? and then what?
> > Map to what insn?
> >
> > These values might imply that bpf infra is supposed to map all the values
> > to cpu instructions, but that's not what we're doing here.
> > We're only dealing with two specific instructions.
> > We're not defining a memory model for all future new instructions.
>
> Got it!  In v3, I'll change it back to:
>
>   #define BPF_LOAD_ACQ   0x10
>   #define BPF_STORE_REL  0x20

why not 1 and 2 ?
All other bits are reserved and the verifier will make sure they're zero,
so when/if we need to extend it then it wouldn't matter whether
lower 4 bits are reserved or other bits.
Say, we decide to support cmpwait_relaxed as a new insn.
It can take the value 3 and arm64 JIT will map it to ldxr+wfe+...

Then with this new load_acq and cmpwait_relaxed we can efficiently
implement both smp_cond_load_relaxed and smp_cond_load_acquire.


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH bpf-next v2 4/9] bpf: Introduce load-acquire and store-release instructions
  2025-02-09  3:46       ` Alexei Starovoitov
@ 2025-02-09 23:41         ` Peilin Ye
  2025-02-10 22:51         ` Peilin Ye
  1 sibling, 0 replies; 35+ messages in thread
From: Peilin Ye @ 2025-02-09 23:41 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: bpf, linux-arm-kernel, bpf, Xu Kuohai, Eduard Zingerman,
	David Vernet, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Jonathan Corbet, Paul E. McKenney, Puranjay Mohan,
	Ilya Leoshkevich, Heiko Carstens, Vasily Gorbik, Catalin Marinas,
	Will Deacon, Quentin Monnet, Mykola Lysenko, Shuah Khan,
	Ihor Solodrai, Yingchi Long, Josh Don, Barret Rhoden, Neel Natu,
	Benjamin Segall, LKML

On Sat, Feb 08, 2025 at 07:46:54PM -0800, Alexei Starovoitov wrote:
> > > These values might imply that bpf infra is supposed to map all the values
> > > to cpu instructions, but that's not what we're doing here.
> > > We're only dealing with two specific instructions.
> > > We're not defining a memory model for all future new instructions.
> >
> > Got it!  In v3, I'll change it back to:
> >
> >   #define BPF_LOAD_ACQ   0x10
> >   #define BPF_STORE_REL  0x20
> 
> why not 1 and 2 ?
> All other bits are reserved and the verifier will make sure they're zero,
> so when/if we need to extend it then it wouldn't matter whether
> lower 4 bits are reserved or other bits.
> Say, we decide to support cmpwait_relaxed as a new insn.
> It can take the value 3 and arm64 JIT will map it to ldxr+wfe+...
> 
> Then with this new load_acq and cmpwait_relaxed we can efficiently
> implement both smp_cond_load_relaxed and smp_cond_load_acquire.

Ah, I see.  When you suggested "LOAD_ACQ=1 and STORE_REL=2" earlier, I
didn't realize you meant 1 and 2 in imm<0-3>.

Thanks,
Peilin Ye



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH bpf-next v2 4/9] bpf: Introduce load-acquire and store-release instructions
  2025-02-09  3:46       ` Alexei Starovoitov
  2025-02-09 23:41         ` Peilin Ye
@ 2025-02-10 22:51         ` Peilin Ye
  2025-02-12 22:14           ` Peilin Ye
  1 sibling, 1 reply; 35+ messages in thread
From: Peilin Ye @ 2025-02-10 22:51 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: bpf, linux-arm-kernel, bpf, Xu Kuohai, Eduard Zingerman,
	David Vernet, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Jonathan Corbet, Paul E. McKenney, Puranjay Mohan,
	Ilya Leoshkevich, Heiko Carstens, Vasily Gorbik, Catalin Marinas,
	Will Deacon, Quentin Monnet, Mykola Lysenko, Shuah Khan,
	Ihor Solodrai, Yingchi Long, Josh Don, Barret Rhoden, Neel Natu,
	Benjamin Segall, LKML

Hi Alexei,

On Sat, Feb 08, 2025 at 07:46:54PM -0800, Alexei Starovoitov wrote:
> > Got it!  In v3, I'll change it back to:
> >
> >   #define BPF_LOAD_ACQ   0x10
> >   #define BPF_STORE_REL  0x20
> 
> why not 1 and 2 ?

I just realized that we can't do 1 and 2 because BPF_ADD | BPF_FETCH
also equals 1.

> All other bits are reserved and the verifier will make sure they're zero

IOW, we can't tell if imm<4-7> is reserved or BPF_ADD (0x00).  What
would you suggest?  Maybe:

  #define BPF_ATOMIC_LD_ST 0x10

  #define BPF_LOAD_ACQ      0x1
  #define BPF_STORE_REL     0x2

?

> , so when/if we need to extend it then it wouldn't matter whether
> lower 4 bits are reserved or other bits.
> Say, we decide to support cmpwait_relaxed as a new insn.
> It can take the value 3 and arm64 JIT will map it to ldxr+wfe+...
> 
> Then with this new load_acq and cmpwait_relaxed we can efficiently
> implement both smp_cond_load_relaxed and smp_cond_load_acquire.

Thanks,
Peilin Ye



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH bpf-next v2 3/9] bpf/verifier: Factor out check_load_mem() and check_store_reg()
  2025-02-07  2:05 ` [PATCH bpf-next v2 3/9] bpf/verifier: Factor out check_load_mem() and check_store_reg() Peilin Ye
@ 2025-02-10 23:45   ` Eduard Zingerman
  0 siblings, 0 replies; 35+ messages in thread
From: Eduard Zingerman @ 2025-02-10 23:45 UTC (permalink / raw)
  To: Peilin Ye, bpf, linux-arm-kernel
  Cc: bpf, Xu Kuohai, David Vernet, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Jonathan Corbet, Paul E. McKenney, Puranjay Mohan,
	Ilya Leoshkevich, Heiko Carstens, Vasily Gorbik, Catalin Marinas,
	Will Deacon, Quentin Monnet, Mykola Lysenko, Shuah Khan,
	Ihor Solodrai, Yingchi Long, Josh Don, Barret Rhoden, Neel Natu,
	Benjamin Segall, linux-kernel

On Fri, 2025-02-07 at 02:05 +0000, Peilin Ye wrote:
> Extract BPF_LDX and most non-atomic BPF_STX instruction handling logic
> in do_check() into helper functions to be used later.  While we are
> here, make that comment about "reserved fields" more specific.
> 
> Suggested-by: Eduard Zingerman <eddyz87@gmail.com>
> Signed-off-by: Peilin Ye <yepeilin@google.com>
> ---

Acked-by: Eduard Zingerman <eddyz87@gmail.com>

[...]



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH bpf-next v2 8/9] selftests/bpf: Add selftests for load-acquire and store-release instructions
  2025-02-07  2:06 ` [PATCH bpf-next v2 8/9] selftests/bpf: Add selftests for " Peilin Ye
  2025-02-07 23:47   ` Peilin Ye
  2025-02-08  0:20   ` Peilin Ye
@ 2025-02-11  0:08   ` Eduard Zingerman
  2025-02-11 19:09     ` Peilin Ye
  2 siblings, 1 reply; 35+ messages in thread
From: Eduard Zingerman @ 2025-02-11  0:08 UTC (permalink / raw)
  To: Peilin Ye, bpf, linux-arm-kernel
  Cc: bpf, Xu Kuohai, David Vernet, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Jonathan Corbet, Paul E. McKenney, Puranjay Mohan,
	Ilya Leoshkevich, Heiko Carstens, Vasily Gorbik, Catalin Marinas,
	Will Deacon, Quentin Monnet, Mykola Lysenko, Shuah Khan,
	Ihor Solodrai, Yingchi Long, Josh Don, Barret Rhoden, Neel Natu,
	Benjamin Segall, linux-kernel

On Fri, 2025-02-07 at 02:06 +0000, Peilin Ye wrote:
> Add several ./test_progs tests:
> 
>   - arena_atomics/load_acquire
>   - arena_atomics/store_release
>   - verifier_load_acquire/*
>   - verifier_store_release/*
>   - verifier_precision/bpf_load_acquire
>   - verifier_precision/bpf_store_release
> 
> The last two tests are added to check if backtrack_insn() handles the
> new instructions correctly.
> 
> Additionally, the last test also makes sure that the verifier
> "remembers" the value (in src_reg) we store-release into e.g. a stack
> slot.  For example, if we take a look at the test program:
> 
>     #0:  r1 = 8;
>       /* store_release((u64 *)(r10 - 8), r1); */
>     #1:  .8byte %[store_release];
>     #2:  r1 = *(u64 *)(r10 - 8);
>     #3:  r2 = r10;
>     #4:  r2 += r1;
>     #5:  r0 = 0;
>     #6:  exit;
> 
> At #1, if the verifier doesn't remember that we wrote 8 to the stack,
> then later at #4 we would be adding an unbounded scalar value to the
> stack pointer, which would cause the program to be rejected:
> 
>   VERIFIER LOG:
>   =============
> ...
>   math between fp pointer and register with unbounded min value is not allowed
> 
> All new tests depend on #ifdef ENABLE_ATOMICS_TESTS.  Currently they
> only run for arm64.
> 
> Signed-off-by: Peilin Ye <yepeilin@google.com>
> ---

Acked-by: Eduard Zingerman <eddyz87@gmail.com>

[...]

> +++ b/tools/testing/selftests/bpf/progs/verifier_load_acquire.c
> @@ -0,0 +1,190 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/bpf.h>
> +#include <bpf/bpf_helpers.h>
> +#include "../../../include/linux/filter.h"
> +#include "bpf_misc.h"
> +
> +#if defined(ENABLE_ATOMICS_TESTS) && defined(__TARGET_ARCH_arm64)

[...]

> +#else
> +
> +SEC("socket")
> +__description("load-acquire is not supported by compiler or jit, use a dummy test")
> +__success
> +int dummy_test(void)
> +{
> +	return 0;
> +}

Nit: why is dummy_test() necessary?

> +
> +#endif
> +
> +char _license[] SEC("license") = "GPL";

[...]



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH bpf-next v2 8/9] selftests/bpf: Add selftests for load-acquire and store-release instructions
  2025-02-08  2:20     ` Peilin Ye
@ 2025-02-11  0:20       ` Eduard Zingerman
  0 siblings, 0 replies; 35+ messages in thread
From: Eduard Zingerman @ 2025-02-11  0:20 UTC (permalink / raw)
  To: Peilin Ye, bpf, linux-arm-kernel
  Cc: bpf, Xu Kuohai, David Vernet, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Jonathan Corbet, Paul E. McKenney, Puranjay Mohan,
	Ilya Leoshkevich, Heiko Carstens, Vasily Gorbik, Catalin Marinas,
	Will Deacon, Quentin Monnet, Mykola Lysenko, Shuah Khan,
	Ihor Solodrai, Yingchi Long, Josh Don, Barret Rhoden, Neel Natu,
	Benjamin Segall, linux-kernel

On Sat, 2025-02-08 at 02:20 +0000, Peilin Ye wrote:

[...]

> Sorry, that wasn't very accurate - we need to make sure there are no
> "holes" in the .addr_space.1 ELF section, e.g.:
> 
>   /* 8-byte-aligned */
>   __u8 __arena_global load_acquire8_value = 0x12;
>   /* 1-byte hole, causing clang-17 to crash */
>   __u16 __arena_global load_acquire16_value = 0x1234;
> 
> LLVM commit f27c4903c43b ("MC: Add .data. and .rodata. prefixes to
> MCContext section classification") fixed this issue.

This is a bit strange, from commit log it looks like LLVM-17 should
include this commit. But in any case, targeting LLVM >= 18 seems
reasonable to me, the main goal is to have these tests run by CI for
some compiler version.

> - - -
> For now, I think I should:
> 
>   1. change existing #if guards to
>      "#if defined(__TARGET_ARCH_arm64) && __clang_major__ >= 18"
> 
>   2. additionally, guard "__arena_global" variable definitions behind
>      "#if __clang_major >= 18" so that clang-17 doesn't try to compile
>      that part (then crash)
> 
> Will fix in v3.
> 
> Thanks,
> Peilin Ye
> 




^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH bpf-next v2 8/9] selftests/bpf: Add selftests for load-acquire and store-release instructions
  2025-02-11  0:08   ` Eduard Zingerman
@ 2025-02-11 19:09     ` Peilin Ye
  2025-02-11 20:15       ` Eduard Zingerman
  0 siblings, 1 reply; 35+ messages in thread
From: Peilin Ye @ 2025-02-11 19:09 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: bpf, linux-arm-kernel, bpf, Xu Kuohai, David Vernet,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Jonathan Corbet,
	Paul E. McKenney, Puranjay Mohan, Ilya Leoshkevich,
	Heiko Carstens, Vasily Gorbik, Catalin Marinas, Will Deacon,
	Quentin Monnet, Mykola Lysenko, Shuah Khan, Ihor Solodrai,
	Yingchi Long, Josh Don, Barret Rhoden, Neel Natu, Benjamin Segall,
	linux-kernel

On Mon, Feb 10, 2025 at 04:08:44PM -0800, Eduard Zingerman wrote:
> > +++ b/tools/testing/selftests/bpf/progs/verifier_load_acquire.c
> > @@ -0,0 +1,190 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +
> > +#include <linux/bpf.h>
> > +#include <bpf/bpf_helpers.h>
> > +#include "../../../include/linux/filter.h"
> > +#include "bpf_misc.h"
> > +
> > +#if defined(ENABLE_ATOMICS_TESTS) && defined(__TARGET_ARCH_arm64)
> 
> [...]
> 
> > +#else
> > +
> > +SEC("socket")
> > +__description("load-acquire is not supported by compiler or jit, use a dummy test")
> > +__success
> > +int dummy_test(void)
> > +{
> > +	return 0;
> > +}
> 
> Nit: why is dummy_test() necessary?

It's just to make it clear when these tests are (effectively) skipped.
Otherwise, e.g. -cpuv4 runner with LLVM-18 on x86-64 would give:

  #518     verifier_load_acquire:OK

With dummy_test(), we would see:

(FWIW, for v3 I'm planning to change __description() to the following,
since new tests no longer depend on __BPF_FEATURE_LOAD_ACQ_STORE_REL.)

  #518/1   verifier_load_acquire/Clang version < 18, or JIT does not support load-acquire; use a dummy test:OK
  #518     verifier_load_acquire:OK

Commit 147c8f4470ee ("selftests/bpf: Add unit tests for new
sign-extension load insns") did similar thing in verifier_ldsx.c.

> > +
> > +#endif
> > +
> > +char _license[] SEC("license") = "GPL";
> 
> [...]

Thanks,
Peilin Ye



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH bpf-next v2 8/9] selftests/bpf: Add selftests for load-acquire and store-release instructions
  2025-02-11 19:09     ` Peilin Ye
@ 2025-02-11 20:15       ` Eduard Zingerman
  2025-02-11 20:51         ` Peilin Ye
  0 siblings, 1 reply; 35+ messages in thread
From: Eduard Zingerman @ 2025-02-11 20:15 UTC (permalink / raw)
  To: Peilin Ye
  Cc: bpf, linux-arm-kernel, bpf, Xu Kuohai, David Vernet,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Jonathan Corbet,
	Paul E. McKenney, Puranjay Mohan, Ilya Leoshkevich,
	Heiko Carstens, Vasily Gorbik, Catalin Marinas, Will Deacon,
	Quentin Monnet, Mykola Lysenko, Shuah Khan, Ihor Solodrai,
	Yingchi Long, Josh Don, Barret Rhoden, Neel Natu, Benjamin Segall,
	linux-kernel

On Tue, 2025-02-11 at 19:09 +0000, Peilin Ye wrote:

[...]

> > Nit: why is dummy_test() necessary?
> 
> It's just to make it clear when these tests are (effectively) skipped.
> Otherwise, e.g. -cpuv4 runner with LLVM-18 on x86-64 would give:
> 
>   #518     verifier_load_acquire:OK
> 
> With dummy_test(), we would see:
> 
> (FWIW, for v3 I'm planning to change __description() to the following,
> since new tests no longer depend on __BPF_FEATURE_LOAD_ACQ_STORE_REL.)
> 
>   #518/1   verifier_load_acquire/Clang version < 18, or JIT does not support load-acquire; use a dummy test:OK
>   #518     verifier_load_acquire:OK
> 
> Commit 147c8f4470ee ("selftests/bpf: Add unit tests for new
> sign-extension load insns") did similar thing in verifier_ldsx.c.

I see, thank you for explaining.
We do have a concept of skipped tests in the test-suite,
but it is implemented by calling test__skip() from the prog_tests/<smth>.c.
This would translate as something like below in prog_tests/verifier.c:

	void test_verifier_store_release(void) {
	#if defined(ENABLE_ATOMICS_TESTS) && defined(__TARGET_ARCH_arm64)
		RUN(verifier_store_release);
	#else
		test__skip()
	#endif
	}

The number of tests skipped is printed after tests execution.
Up to you if you'd like to change it like that or not.



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH bpf-next v2 8/9] selftests/bpf: Add selftests for load-acquire and store-release instructions
  2025-02-11 20:15       ` Eduard Zingerman
@ 2025-02-11 20:51         ` Peilin Ye
  2025-02-20  1:08           ` Peilin Ye
  0 siblings, 1 reply; 35+ messages in thread
From: Peilin Ye @ 2025-02-11 20:51 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: bpf, linux-arm-kernel, bpf, Xu Kuohai, David Vernet,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Jonathan Corbet,
	Paul E. McKenney, Puranjay Mohan, Ilya Leoshkevich,
	Heiko Carstens, Vasily Gorbik, Catalin Marinas, Will Deacon,
	Quentin Monnet, Mykola Lysenko, Shuah Khan, Ihor Solodrai,
	Yingchi Long, Josh Don, Barret Rhoden, Neel Natu, Benjamin Segall,
	linux-kernel

On Tue, Feb 11, 2025 at 12:15:25PM -0800, Eduard Zingerman wrote:
> > > Nit: why is dummy_test() necessary?
> > 
> > It's just to make it clear when these tests are (effectively) skipped.
> > Otherwise, e.g. -cpuv4 runner with LLVM-18 on x86-64 would give:
> > 
> >   #518     verifier_load_acquire:OK
> > 
> > With dummy_test(), we would see:
> > 
> > (FWIW, for v3 I'm planning to change __description() to the following,
> > since new tests no longer depend on __BPF_FEATURE_LOAD_ACQ_STORE_REL.)
> > 
> >   #518/1   verifier_load_acquire/Clang version < 18, or JIT does not support load-acquire; use a dummy test:OK
> >   #518     verifier_load_acquire:OK
> > 
> > Commit 147c8f4470ee ("selftests/bpf: Add unit tests for new
> > sign-extension load insns") did similar thing in verifier_ldsx.c.
> 
> I see, thank you for explaining.
> We do have a concept of skipped tests in the test-suite,
> but it is implemented by calling test__skip() from the prog_tests/<smth>.c.
> This would translate as something like below in prog_tests/verifier.c:
> 
> 	void test_verifier_store_release(void) {
> 	#if defined(ENABLE_ATOMICS_TESTS) && defined(__TARGET_ARCH_arm64)
> 		RUN(verifier_store_release);
> 	#else
> 		test__skip()
> 	#endif
> 	}

> The number of tests skipped is printed after tests execution.

Sounds nice!

> Up to you if you'd like to change it like that or not.

I'll do that in v3, thanks for the suggestion.

Peilin Ye



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH bpf-next v2 4/9] bpf: Introduce load-acquire and store-release instructions
  2025-02-10 22:51         ` Peilin Ye
@ 2025-02-12 22:14           ` Peilin Ye
  2025-02-13  5:55             ` Alexei Starovoitov
  0 siblings, 1 reply; 35+ messages in thread
From: Peilin Ye @ 2025-02-12 22:14 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: bpf, linux-arm-kernel, bpf, Xu Kuohai, Eduard Zingerman,
	David Vernet, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Jonathan Corbet, Paul E. McKenney, Puranjay Mohan,
	Ilya Leoshkevich, Heiko Carstens, Vasily Gorbik, Catalin Marinas,
	Will Deacon, Quentin Monnet, Mykola Lysenko, Shuah Khan,
	Ihor Solodrai, Yingchi Long, Josh Don, Barret Rhoden, Neel Natu,
	Benjamin Segall, LKML

On Mon, Feb 10, 2025 at 10:51:11PM +0000, Peilin Ye wrote:
> > >   #define BPF_LOAD_ACQ   0x10
> > >   #define BPF_STORE_REL  0x20
> > 
> > why not 1 and 2 ?
> 
> I just realized that we can't do 1 and 2 because BPF_ADD | BPF_FETCH
> also equals 1.
> 
> > All other bits are reserved and the verifier will make sure they're zero
> 
> IOW, we can't tell if imm<4-7> is reserved or BPF_ADD (0x00).  What
> would you suggest?  Maybe:
> 
>   #define BPF_ATOMIC_LD_ST 0x10
> 
>   #define BPF_LOAD_ACQ      0x1
>   #define BPF_STORE_REL     0x2
> 
> ?

Or, how about reusing 0xb in imm<4-7>:

  #define BPF_ATOMIC_LD_ST 0xb0

  #define BPF_LOAD_ACQ      0x1
  #define BPF_STORE_REL     0x2

0xb is BPF_MOV in BPFArithOp<>, and we'll never need it for BPF_ATOMIC.
Instead of moving values between registers, we now "move" values from/to
the memory - if I can think of it that way.

- - -
Or, do we want to start to use the remaining bits of the imm field (i.e.
imm<8-31>) ?

Thanks,
Peilin Ye



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH bpf-next v2 4/9] bpf: Introduce load-acquire and store-release instructions
  2025-02-12 22:14           ` Peilin Ye
@ 2025-02-13  5:55             ` Alexei Starovoitov
  2025-02-13 11:41               ` Peilin Ye
  0 siblings, 1 reply; 35+ messages in thread
From: Alexei Starovoitov @ 2025-02-13  5:55 UTC (permalink / raw)
  To: Peilin Ye
  Cc: bpf, linux-arm-kernel, bpf, Xu Kuohai, Eduard Zingerman,
	David Vernet, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Jonathan Corbet, Paul E. McKenney, Puranjay Mohan,
	Ilya Leoshkevich, Heiko Carstens, Vasily Gorbik, Catalin Marinas,
	Will Deacon, Quentin Monnet, Mykola Lysenko, Shuah Khan,
	Ihor Solodrai, Yingchi Long, Josh Don, Barret Rhoden, Neel Natu,
	Benjamin Segall, LKML

On Wed, Feb 12, 2025 at 2:14 PM Peilin Ye <yepeilin@google.com> wrote:
>
> On Mon, Feb 10, 2025 at 10:51:11PM +0000, Peilin Ye wrote:
> > > >   #define BPF_LOAD_ACQ   0x10
> > > >   #define BPF_STORE_REL  0x20

so that was broken then,
since BPF_SUB 0x10 ?

And original thing was also completely broken for
BPF_ATOMIC_LOAD | BPF_RELAXED == 0x10 == BPF_SUB ?

so much for "lets define relaxed, acquire,
release, acq_rel for completeness".
:(

> > >
> > > why not 1 and 2 ?
> >
> > I just realized that we can't do 1 and 2 because BPF_ADD | BPF_FETCH
> > also equals 1.
> >
> > > All other bits are reserved and the verifier will make sure they're zero
> >
> > IOW, we can't tell if imm<4-7> is reserved or BPF_ADD (0x00).  What
> > would you suggest?  Maybe:
> >
> >   #define BPF_ATOMIC_LD_ST 0x10
> >
> >   #define BPF_LOAD_ACQ      0x1
> >   #define BPF_STORE_REL     0x2

This is also broken, since
BPF_ATOMIC_LD_ST | BPF_LOAD_ACQ == 0x11 == BPF_SUB | BPF_FETCH.

BPF_SUB | BPF_FETCH is invalid at the moment,
but such aliasing is bad.

> >
> > ?
>
> Or, how about reusing 0xb in imm<4-7>:
>
>   #define BPF_ATOMIC_LD_ST 0xb0
>
>   #define BPF_LOAD_ACQ      0x1
>   #define BPF_STORE_REL     0x2
>
> 0xb is BPF_MOV in BPFArithOp<>, and we'll never need it for BPF_ATOMIC.
> Instead of moving values between registers, we now "move" values from/to
> the memory - if I can think of it that way.

and BPF_ATOMIC_LD_ST | BPF_LOAD_ACQ would == BPF_MOV | BPF_FETCH ?

Not pretty and confusing.

BPF_FETCH modifier means that "do whatever opcode says to do,
like add in memory, but also return the value into insn->src_reg"

Which doesn't fit this BPF_ATOMIC_LD_ST | BPF_LOAD_ACQ semantics
which loads into _dst_reg_.

How about:
#define BPF_LOAD_ACQ 2
#define BPF_STORE_REL 3

and only use them with BPF_MOV like

imm = BPF_MOV | BPF_LOAD_ACQ - is actual load acquire
imm = BPF_MOV | BPF_STORE_REL - release

Thought 2 stands on its own,
it's also equal to BPF_ADD | BPF_LOAD_ACQ
which is kinda ugly, so I don't like to use 2 alone.

> Or, do we want to start to use the remaining bits of the imm field (i.e.
> imm<8-31>) ?

Maybe.
Sort-of.
Since #define BPF_CMPXCHG     (0xf0 | BPF_FETCH)
another option would be:

#define BPF_LOAD_ACQ 0x100
#define BPF_STORE_REL 0x110

essentially extending op type to:
BPF_ATOMIC_TYPE(imm)    ((imm) & 0x1f0)

All options are not great.
I feel we need to step back.
Is there an architecture that has sign extending load acquire ?

Looks like arm doesn't, and I couldn't find any arch that does.
Then maybe we should reconsider BPF_LDX/STX and use BPF_MODE
to distinguish from normal ldx/stx

#define BPF_ACQ_REL 0xe0

BPF_LDX | BPF_ACQ_REL | BPF_W
BPF_STX | BPF_ACQ_REL | BPF_W

?


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH bpf-next v2 4/9] bpf: Introduce load-acquire and store-release instructions
  2025-02-13  5:55             ` Alexei Starovoitov
@ 2025-02-13 11:41               ` Peilin Ye
  2025-02-15  2:34                 ` Peilin Ye
  0 siblings, 1 reply; 35+ messages in thread
From: Peilin Ye @ 2025-02-13 11:41 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: bpf, linux-arm-kernel, bpf, Xu Kuohai, Eduard Zingerman,
	David Vernet, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Jonathan Corbet, Paul E. McKenney, Puranjay Mohan,
	Ilya Leoshkevich, Heiko Carstens, Vasily Gorbik, Catalin Marinas,
	Will Deacon, Quentin Monnet, Mykola Lysenko, Shuah Khan,
	Ihor Solodrai, Yingchi Long, Josh Don, Barret Rhoden, Neel Natu,
	Benjamin Segall, LKML

Hi Alexei,

On Wed, Feb 12, 2025 at 09:55:43PM -0800, Alexei Starovoitov wrote:
> > > > >   #define BPF_LOAD_ACQ   0x10
> > > > >   #define BPF_STORE_REL  0x20
> 
> so that was broken then,
> since BPF_SUB 0x10 ?
> 
> And original thing was also completely broken for
> BPF_ATOMIC_LOAD | BPF_RELAXED == 0x10 == BPF_SUB ?
> 
> so much for "lets define relaxed, acquire,
> release, acq_rel for completeness".
> :(
> 
> > > > why not 1 and 2 ?
> > >
> > > I just realized

To clarify, by "just realized" I meant I forgot BPF_ADD equals 0x00
until (I had coffee on) Monday :-)

I wouldn't call it completely broken though.  For full context,
initially I picked [1] 0x1 and 0xb in imm<4-7> because:

  * 0x1 is BPF_SUB in BPFArithOp<>, and atomic SUB is implemented using
    NEG + ADD, quoting a comment in LLVM source:

    // atomic_load_sub can be represented as a neg followed
    // by an atomic_load_add.

    Though admittedly atomic SUB _could_ have its own insn.

  * 0xb is BPF_MOV, which is not applicable for atomic (memory)
    operations, as already discussed

After discussing [2] this with Yonghong, I changed it to 0x1 and 0x2,
because 0x2 is BPF_MUL and we are unlikely to support atomic
multiplication.  Then, following your suggestion to discuss the encoding
on-list, I left this as an open topic in RFC v1 cover letter (then
documented it in PATCH v1 8/8 and v2 9/9).

TL;DR: I wasn't aware that you were against having "aliases" (I do still
believe it's safe to pick 0xb).

> > > that we can't do 1 and 2 because BPF_ADD | BPF_FETCH also equals
> > > 1.
> > >
> > > > All other bits are reserved and the verifier will make sure they're zero
> > >
> > > IOW, we can't tell if imm<4-7> is reserved or BPF_ADD (0x00).  What
> > > would you suggest?  Maybe:
> > >
> > >   #define BPF_ATOMIC_LD_ST 0x10
> > >
> > >   #define BPF_LOAD_ACQ      0x1
> > >   #define BPF_STORE_REL     0x2
> 
> This is also broken, since
> BPF_ATOMIC_LD_ST | BPF_LOAD_ACQ == 0x11 == BPF_SUB | BPF_FETCH.
> 
> BPF_SUB | BPF_FETCH is invalid at the moment,
> but such aliasing is bad.
> 
> > > ?
> >
> > Or, how about reusing 0xb in imm<4-7>:
> >
> >   #define BPF_ATOMIC_LD_ST 0xb0
> >
> >   #define BPF_LOAD_ACQ      0x1
> >   #define BPF_STORE_REL     0x2
> >
> > 0xb is BPF_MOV in BPFArithOp<>, and we'll never need it for BPF_ATOMIC.
> > Instead of moving values between registers, we now "move" values from/to
> > the memory - if I can think of it that way.
> 
> and BPF_ATOMIC_LD_ST | BPF_LOAD_ACQ would == BPF_MOV | BPF_FETCH ?
> 
> Not pretty and confusing.
> 
> BPF_FETCH modifier means that "do whatever opcode says to do,
> like add in memory, but also return the value into insn->src_reg"
> 
> Which doesn't fit this BPF_ATOMIC_LD_ST | BPF_LOAD_ACQ semantics
> which loads into _dst_reg_.

I think we can have different imm<0-3> "namespace"s depending on
different imm<4-7> values?  So that 0x1 in imm<0-3> means BPF_FETCH for
existing RMW operations, and BPF_LOAD_ACQ for loads/stores.

Just like (browsing instruction-set.rst) for "64-bit immediate
instructions", the imm field means different things depending on the
value in src_reg?

If I change PATCH v2 9/9 to say the following in instruction-set.rst:

  ```
  These operations are categorized based on the second lowest nibble
  (bits 4-7) of the 'imm' field:

  * ``ATOMIC_LD_ST`` indicates an atomic load or store operation (see
    `Atomic load and store operations`_).

  * All other defined values indicate an atomic read-modify-write
    operation, as described in the following section.
  ```

The section for loads/stores will have its own table explaining what
imm<0-3> means.

> How about:
> #define BPF_LOAD_ACQ 2
> #define BPF_STORE_REL 3
> 
> and only use them with BPF_MOV like
> 
> imm = BPF_MOV | BPF_LOAD_ACQ - is actual load acquire
> imm = BPF_MOV | BPF_STORE_REL - release
> 
> Thought 2 stands on its own,
> it's also equal to BPF_ADD | BPF_LOAD_ACQ
> which is kinda ugly,

> so I don't like to use 2 alone.

Totally agree - if we use 2 and 3 alone, zero in imm<4-7> would mean
"reserved" for load_acq/store_rel, and "BPF_ADD" for add/fetch_add.

> > Or, do we want to start to use the remaining bits of the imm field (i.e.
> > imm<8-31>) ?
> 
> Maybe.
> Sort-of.
> Since #define BPF_CMPXCHG     (0xf0 | BPF_FETCH)
> another option would be:
> 
> #define BPF_LOAD_ACQ 0x100
> #define BPF_STORE_REL 0x110
> 
> essentially extending op type to:
> BPF_ATOMIC_TYPE(imm)    ((imm) & 0x1f0)

Why, it sounds like a great idea!  If we extend the op_type field from
imm<4-7> to imm<4-11>, 256 numbers is more than we'll ever need?

After all we'd still need to worry about e.g. cmpwait_relaxed you
mentioned earlier.  I am guessing that we'll want to put it under
BPF_ATOMIC as well, since XCHG and CMPXCHG are here.  If we take your
approach, cmpwait_relaxed can be easily defined as e.g.:

  #define BPF_CMPWAIT_RELAXED   0x120

(FWIW, I was imagining a subtype/subclass flag in maybe imm<8-11> or
 imm<28-31> (or make it 8 bits for 256 subtypes/subclasses), so that 0x0
 means read-modify-write subclass, then 0x1 means maybe load/store
 subclass" etc.)

> All options are not great.
> I feel we need to step back.
> Is there an architecture that has sign extending load acquire ?

IIUC, if I grep the LLVM source like:

  $ git grep -B 100 -A 100 getExtendForAtomicOps -- llvm/lib/Target/ \
	| grep ISD::SIGN_EXTEND
  llvm/lib/Target/LoongArch/LoongArchISelLowering.h-    return ISD::SIGN_EXTEND;
  llvm/lib/Target/Mips/MipsISelLowering.h-      return ISD::SIGN_EXTEND;
  llvm/lib/Target/RISCV/RISCVISelLowering.h-    return ISD::SIGN_EXTEND;

So LoongArch, Mips and RISCV it seems?

Semi-related, but it would be non-trivial (if not infeasible) to support
both zext and sext load-acquire for LLVM BPF backend, because LLVM core
expects each arch to pick from SIGN_EXTEND, ZERO_EXTEND and ANY_EXTEND
for its atomic ops.  See my earlier investigation:

  https://github.com/llvm/llvm-project/pull/108636#issuecomment-2433844760

> Looks like arm doesn't, and I couldn't find any arch that does.
> Then maybe we should reconsider BPF_LDX/STX and use BPF_MODE
> to distinguish from normal ldx/stx
> 
> #define BPF_ACQ_REL 0xe0
> 
> BPF_LDX | BPF_ACQ_REL | BPF_W
> BPF_STX | BPF_ACQ_REL | BPF_W
> 
> ?

[1] https://github.com/llvm/llvm-project/pull/108636#issuecomment-2398916882
[2] https://github.com/llvm/llvm-project/pull/108636#discussion_r1815927568

Thanks,
Peilin Ye



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH bpf-next v2 4/9] bpf: Introduce load-acquire and store-release instructions
  2025-02-13 11:41               ` Peilin Ye
@ 2025-02-15  2:34                 ` Peilin Ye
  2025-02-15  3:04                   ` Alexei Starovoitov
  0 siblings, 1 reply; 35+ messages in thread
From: Peilin Ye @ 2025-02-15  2:34 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: bpf, linux-arm-kernel, bpf, Xu Kuohai, Eduard Zingerman,
	David Vernet, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Jonathan Corbet, Paul E. McKenney, Puranjay Mohan,
	Ilya Leoshkevich, Heiko Carstens, Vasily Gorbik, Catalin Marinas,
	Will Deacon, Quentin Monnet, Mykola Lysenko, Shuah Khan,
	Ihor Solodrai, Yingchi Long, Josh Don, Barret Rhoden, Neel Natu,
	Benjamin Segall, LKML

> On Wed, Feb 12, 2025 at 09:55:43PM -0800, Alexei Starovoitov wrote:
> > How about:
> > #define BPF_LOAD_ACQ 2
> > #define BPF_STORE_REL 3
> > 
> > and only use them with BPF_MOV like
> > 
> > imm = BPF_MOV | BPF_LOAD_ACQ - is actual load acquire
> > imm = BPF_MOV | BPF_STORE_REL - release

Based on everything discussed, should we proceed with the above
suggestion?  Specifically:

  #define BPF_LD_ST     BPF_MOV /* 0xb0 */

  #define BPF_LOAD_ACQ  0x2
  #define BPF_STORE_REL 0x3

(And document that BPF_LD_ST cannot be used together with BPF_FETCH.)
So that:

  1. We avoid "aliasing" with BPF_SUB or BPF_MUL at all.

  2. If we need to add cmpwait_relaxed, we can then expand imm<4-7> to
     e.g. imm<4-11> and do something similar to:

     XCHG             0x0e0 | FETCH
     CMPXCHG          0x0f0 | FETCH
    +CMPWAIT_RELAXED  0x100

     So that <asm/cmpxchg.h> operations can "stay together".

  3. In the hypothetical scenario where we need seq_cst loads/stores, we
     add new flags to imm<0-3>.

Though considering that:

  * BPF_FETCH doesn't apply to loads/stores, and
  * BPF_LOAD_ACQ and BPF_STORE_REL don't apply to RMW operatons
  * We only have 15 numbers for imm<0-3> flags

I do think it makes sense to define BPF_LOAD_ACQ and BPF_STORE_REL as 1
and 2 (instead of 2 and 3).  With proper documentation I believe it'll
be clear that load/store and RMW are separate categories, with different
ways of using imm<0-3> (or, different imm<0-3> "namespace"s).  That
said, I'm happy to do either 2 and 3, or 1 and 2.

I'll start making changes for v3 and the LLVM PR, according to the
description above (with BPF_LOAD_ACQ=2, BPF_STORE_REL=3).  Please advise
me of any further concerns.

Thanks,
Peilin Ye



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH bpf-next v2 4/9] bpf: Introduce load-acquire and store-release instructions
  2025-02-15  2:34                 ` Peilin Ye
@ 2025-02-15  3:04                   ` Alexei Starovoitov
  2025-02-15  6:17                     ` Peilin Ye
  0 siblings, 1 reply; 35+ messages in thread
From: Alexei Starovoitov @ 2025-02-15  3:04 UTC (permalink / raw)
  To: Peilin Ye
  Cc: bpf, linux-arm-kernel, bpf, Xu Kuohai, Eduard Zingerman,
	David Vernet, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Jonathan Corbet, Paul E. McKenney, Puranjay Mohan,
	Ilya Leoshkevich, Heiko Carstens, Vasily Gorbik, Catalin Marinas,
	Will Deacon, Quentin Monnet, Mykola Lysenko, Shuah Khan,
	Ihor Solodrai, Yingchi Long, Josh Don, Barret Rhoden, Neel Natu,
	Benjamin Segall, LKML

On Fri, Feb 14, 2025 at 6:34 PM Peilin Ye <yepeilin@google.com> wrote:
>
> > On Wed, Feb 12, 2025 at 09:55:43PM -0800, Alexei Starovoitov wrote:
> > > How about:
> > > #define BPF_LOAD_ACQ 2
> > > #define BPF_STORE_REL 3
> > >
> > > and only use them with BPF_MOV like
> > >
> > > imm = BPF_MOV | BPF_LOAD_ACQ - is actual load acquire
> > > imm = BPF_MOV | BPF_STORE_REL - release
>
> Based on everything discussed, should we proceed with the above
> suggestion?  Specifically:
>
>   #define BPF_LD_ST     BPF_MOV /* 0xb0 */

The aliasing still bothers me.
I hated doing it when going from cBPF to eBPF,
but we only had 8-bit to work with.
Here we have 32-bit.
Aliases make disassemblers trickier, since value no longer
translates to string as-is. It depends on the context.
There is probably no use for BPF_MOV operation in atomic
context, but by reusing BPF_ADD, BPF_XOR, etc in atomic
we signed up ourselves for all of alu ops.
That's why BPF_XCHG and BPF_CMPXCHG are outside
of alu op range.

So my preference is to do:
#define BPF_LOAD_ACQ 0x100
#define BPF_STORE_REL 0x110
#define BPF_CMPWAIT_RELAXED   0x120

and keep growing it.
We burn the first nibble, but so be it.


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH bpf-next v2 4/9] bpf: Introduce load-acquire and store-release instructions
  2025-02-15  3:04                   ` Alexei Starovoitov
@ 2025-02-15  6:17                     ` Peilin Ye
  0 siblings, 0 replies; 35+ messages in thread
From: Peilin Ye @ 2025-02-15  6:17 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: bpf, linux-arm-kernel, bpf, Xu Kuohai, Eduard Zingerman,
	David Vernet, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Jonathan Corbet, Paul E. McKenney, Puranjay Mohan,
	Ilya Leoshkevich, Heiko Carstens, Vasily Gorbik, Catalin Marinas,
	Will Deacon, Quentin Monnet, Mykola Lysenko, Shuah Khan,
	Ihor Solodrai, Yingchi Long, Josh Don, Barret Rhoden, Neel Natu,
	Benjamin Segall, LKML

On Fri, Feb 14, 2025 at 07:04:13PM -0800, Alexei Starovoitov wrote:
> > > > How about:
> > > > #define BPF_LOAD_ACQ 2
> > > > #define BPF_STORE_REL 3
> > > >
> > > > and only use them with BPF_MOV like
> > > >
> > > > imm = BPF_MOV | BPF_LOAD_ACQ - is actual load acquire
> > > > imm = BPF_MOV | BPF_STORE_REL - release
> >
> > Based on everything discussed, should we proceed with the above
> > suggestion?  Specifically:
> >
> >   #define BPF_LD_ST     BPF_MOV /* 0xb0 */
> 
> The aliasing still bothers me.
> I hated doing it when going from cBPF to eBPF,
> but we only had 8-bit to work with.
> Here we have 32-bit.
> Aliases make disassemblers trickier, since value no longer
> translates to string as-is. It depends on the context.
> There is probably no use for BPF_MOV operation in atomic
> context, but by reusing BPF_ADD, BPF_XOR, etc in atomic
> we signed up ourselves for all of alu ops.

I see.

> That's why BPF_XCHG and BPF_CMPXCHG are outside
> of alu op range.
> 
> So my preference is to do:
> #define BPF_LOAD_ACQ 0x100
> #define BPF_STORE_REL 0x110
> #define BPF_CMPWAIT_RELAXED   0x120
>
> and keep growing it.
> We burn the first nibble, but so be it.

Got it!  In instruction-set.rst I'll make it clear that imm<0-3> must be
zero for load_acq/store_rel.

Cheers,
Peilin Ye



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH bpf-next v2 8/9] selftests/bpf: Add selftests for load-acquire and store-release instructions
  2025-02-11 20:51         ` Peilin Ye
@ 2025-02-20  1:08           ` Peilin Ye
  2025-02-20  1:21             ` Eduard Zingerman
  0 siblings, 1 reply; 35+ messages in thread
From: Peilin Ye @ 2025-02-20  1:08 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: bpf, linux-arm-kernel, bpf, Xu Kuohai, David Vernet,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Jonathan Corbet,
	Paul E. McKenney, Puranjay Mohan, Ilya Leoshkevich,
	Heiko Carstens, Vasily Gorbik, Catalin Marinas, Will Deacon,
	Quentin Monnet, Mykola Lysenko, Shuah Khan, Ihor Solodrai,
	Yingchi Long, Josh Don, Barret Rhoden, Neel Natu, Benjamin Segall,
	linux-kernel

On Tue, Feb 11, 2025 at 08:51:07PM +0000, Peilin Ye wrote:
> > > > Nit: why is dummy_test() necessary?
> > > 
> > > It's just to make it clear when these tests are (effectively) skipped.

<...>

> > > Commit 147c8f4470ee ("selftests/bpf: Add unit tests for new
> > > sign-extension load insns") did similar thing in verifier_ldsx.c.
> > 
> > I see, thank you for explaining.
> > We do have a concept of skipped tests in the test-suite,
> > but it is implemented by calling test__skip() from the prog_tests/<smth>.c.
> > This would translate as something like below in prog_tests/verifier.c:
> > 
> > 	void test_verifier_store_release(void) {
> > 	#if defined(ENABLE_ATOMICS_TESTS) && defined(__TARGET_ARCH_arm64)
> > 		RUN(verifier_store_release);
> > 	#else
> > 		test__skip()
> > 	#endif
> > 	}
> 
> > The number of tests skipped is printed after tests execution.

I tried:

  void test_verifier_load_acquire(void)
  {
  #if __clang_major__ >= 18 && defined(ENABLE_ATOMICS_TESTS) && defined(__aarch64__)
          RUN(verifier_load_acquire);
  #else
          printf("%s:SKIP: Clang version < 18, ENABLE_ATOMICS_TESTS not defined, and/or JIT doesn't support load-acquire\n",
                 __func__);
          test__skip();
  #endif
  }

Then realized that I can't check __clang_major__ in .../prog_tests/*
files (e.g. I was building prog_tests/verifier.c using GCC).  I think
ideally we should do something similar to prog{,_test}s/arena_atomics.c,
i.e. use a global bool in BPF source to indicate if we should skip this
test, but that seems to require non-trivial changes to
prog_tests/verifier.c?

For the purpose of this patchset, let me keep dummy_test(), like what we
have now in verifier_ldsx.c.

Thanks,
Peilin Ye



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH bpf-next v2 8/9] selftests/bpf: Add selftests for load-acquire and store-release instructions
  2025-02-20  1:08           ` Peilin Ye
@ 2025-02-20  1:21             ` Eduard Zingerman
  0 siblings, 0 replies; 35+ messages in thread
From: Eduard Zingerman @ 2025-02-20  1:21 UTC (permalink / raw)
  To: Peilin Ye
  Cc: bpf, linux-arm-kernel, bpf, Xu Kuohai, David Vernet,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Jonathan Corbet,
	Paul E. McKenney, Puranjay Mohan, Ilya Leoshkevich,
	Heiko Carstens, Vasily Gorbik, Catalin Marinas, Will Deacon,
	Quentin Monnet, Mykola Lysenko, Shuah Khan, Ihor Solodrai,
	Yingchi Long, Josh Don, Barret Rhoden, Neel Natu, Benjamin Segall,
	linux-kernel

On Thu, 2025-02-20 at 01:08 +0000, Peilin Ye wrote:

[...]

> Then realized that I can't check __clang_major__ in .../prog_tests/*
> files (e.g. I was building prog_tests/verifier.c using GCC).  I think
> ideally we should do something similar to prog{,_test}s/arena_atomics.c,
> i.e. use a global bool in BPF source to indicate if we should skip this
> test, but that seems to require non-trivial changes to
> prog_tests/verifier.c?
> 
> For the purpose of this patchset, let me keep dummy_test(), like what we
> have now in verifier_ldsx.c.

Well, that's unfortunate.
Thank you for trying.



^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2025-02-20  1:31 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-07  2:04 [PATCH bpf-next v2 0/9] Introduce load-acquire and store-release BPF instructions Peilin Ye
2025-02-07  2:05 ` [PATCH bpf-next v2 1/9] bpf/verifier: Factor out atomic_ptr_type_ok() Peilin Ye
2025-02-07  2:05 ` [PATCH bpf-next v2 2/9] bpf/verifier: Factor out check_atomic_rmw() Peilin Ye
2025-02-07  2:05 ` [PATCH bpf-next v2 3/9] bpf/verifier: Factor out check_load_mem() and check_store_reg() Peilin Ye
2025-02-10 23:45   ` Eduard Zingerman
2025-02-07  2:05 ` [PATCH bpf-next v2 4/9] bpf: Introduce load-acquire and store-release instructions Peilin Ye
2025-02-07 11:28   ` Ilya Leoshkevich
2025-02-07 21:23     ` Peilin Ye
2025-02-08 21:30   ` Alexei Starovoitov
2025-02-09  2:21     ` Peilin Ye
2025-02-09  3:46       ` Alexei Starovoitov
2025-02-09 23:41         ` Peilin Ye
2025-02-10 22:51         ` Peilin Ye
2025-02-12 22:14           ` Peilin Ye
2025-02-13  5:55             ` Alexei Starovoitov
2025-02-13 11:41               ` Peilin Ye
2025-02-15  2:34                 ` Peilin Ye
2025-02-15  3:04                   ` Alexei Starovoitov
2025-02-15  6:17                     ` Peilin Ye
2025-02-07  2:06 ` [PATCH bpf-next v2 5/9] arm64: insn: Add BIT(23) to {load,store}_ex's mask Peilin Ye
2025-02-07  2:06 ` [PATCH bpf-next v2 6/9] arm64: insn: Add load-acquire and store-release instructions Peilin Ye
2025-02-07  2:06 ` [PATCH bpf-next v2 7/9] bpf, arm64: Support " Peilin Ye
2025-02-07  2:06 ` [PATCH bpf-next v2 8/9] selftests/bpf: Add selftests for " Peilin Ye
2025-02-07 23:47   ` Peilin Ye
2025-02-08  0:20   ` Peilin Ye
2025-02-08  2:20     ` Peilin Ye
2025-02-11  0:20       ` Eduard Zingerman
2025-02-11  0:08   ` Eduard Zingerman
2025-02-11 19:09     ` Peilin Ye
2025-02-11 20:15       ` Eduard Zingerman
2025-02-11 20:51         ` Peilin Ye
2025-02-20  1:08           ` Peilin Ye
2025-02-20  1:21             ` Eduard Zingerman
2025-02-07  2:06 ` [PATCH bpf-next v2 9/9] bpf, docs: Update instruction-set.rst " Peilin Ye
2025-02-07 21:29 ` [PATCH bpf-next v2 0/9] Introduce load-acquire and store-release BPF instructions Peilin Ye

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).