LinuxPPC-Dev Archive on lore.kernel.org

LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v3 7/8] bpf ppc32: Add BPF_PROBE_MEM support for JIT
From: Hari Bathini @ 2021-09-21 13:29 UTC (permalink / raw)
  To: naveen.n.rao, christophe.leroy, mpe, ast, daniel
  Cc: songliubraving, netdev, john.fastabend, andrii, kpsingh, paulus,
	yhs, bpf, linuxppc-dev, kafai, Hari Bathini
In-Reply-To: <20210921132943.489732-1-hbathini@linux.ibm.com>

BPF load instruction with BPF_PROBE_MEM mode can cause a fault
inside kernel. Append exception table for such instructions
within BPF program.

Unlike other archs which uses extable 'fixup' field to pass dest_reg
and nip, BPF exception table on PowerPC follows the generic PowerPC
exception table design, where it populates both fixup and extable
sections within BPF program. fixup section contains 3 instructions,
first 2 instructions clear dest_reg (lower & higher 32-bit registers)
and last instruction jumps to next instruction in the BPF code.
extable 'insn' field contains relative offset of the instruction and
'fixup' field contains relative offset of the fixup entry. Example
layout of BPF program with extable present:

             +------------------+
             |                  |
             |                  |
   0x4020 -->| lwz   r28,4(r4)  |
             |                  |
             |                  |
   0x40ac -->| lwz  r3,0(r24)   |
             | lwz  r4,4(r24)   |
             |                  |
             |                  |
             |------------------|
   0x4278 -->| li  r28,0        |  \
             | li  r27,0        |  | fixup entry
             | b   0x4024       |  /
   0x4284 -->| li  r4,0         |
             | li  r3,0         |
             | b   0x40b4       |
             |------------------|
   0x4290 -->| insn=0xfffffd90  |  \ extable entry
             | fixup=0xffffffe4 |  /
   0x4298 -->| insn=0xfffffe14  |
             | fixup=0xffffffe8 |
             +------------------+

   (Addresses shown here are chosen random, not real)

Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
---

Changes in v3:
* Changed how BPF_FIXUP_LEN is defined based on Chris' suggestion.


 arch/powerpc/net/bpf_jit.h        |  4 ++++
 arch/powerpc/net/bpf_jit_comp.c   |  2 ++
 arch/powerpc/net/bpf_jit_comp32.c | 34 +++++++++++++++++++++++++++++++
 3 files changed, 40 insertions(+)

diff --git a/arch/powerpc/net/bpf_jit.h b/arch/powerpc/net/bpf_jit.h
index 561689a2abdf..800734056200 100644
--- a/arch/powerpc/net/bpf_jit.h
+++ b/arch/powerpc/net/bpf_jit.h
@@ -144,7 +144,11 @@ struct codegen_context {
 	unsigned int exentry_idx;
 };
 
+#ifdef CONFIG_PPC32
+#define BPF_FIXUP_LEN	3 /* Three instructions => 12 bytes */
+#else
 #define BPF_FIXUP_LEN	2 /* Two instructions => 8 bytes */
+#endif
 
 static inline void bpf_flush_icache(void *start, void *end)
 {
diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c
index f02457c6b54f..1a0041997050 100644
--- a/arch/powerpc/net/bpf_jit_comp.c
+++ b/arch/powerpc/net/bpf_jit_comp.c
@@ -297,6 +297,8 @@ int bpf_add_extable_entry(struct bpf_prog *fp, u32 *image, int pass, struct code
 		(ctx->exentry_idx * BPF_FIXUP_LEN * 4);
 
 	fixup[0] = PPC_RAW_LI(dst_reg, 0);
+	if (IS_ENABLED(CONFIG_PPC32))
+		fixup[1] = PPC_RAW_LI(dst_reg - 1, 0); /* clear higher 32-bit register too */
 
 	fixup[BPF_FIXUP_LEN - 1] =
 		PPC_RAW_BRANCH((long)(pc + jmp_off) - (long)&fixup[BPF_FIXUP_LEN - 1]);
diff --git a/arch/powerpc/net/bpf_jit_comp32.c b/arch/powerpc/net/bpf_jit_comp32.c
index 820c7848434e..1239643f532c 100644
--- a/arch/powerpc/net/bpf_jit_comp32.c
+++ b/arch/powerpc/net/bpf_jit_comp32.c
@@ -812,11 +812,19 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, struct codegen_context *
 		 */
 		case BPF_LDX | BPF_MEM | BPF_B: /* dst = *(u8 *)(ul) (src + off) */
 			fallthrough;
+		case BPF_LDX | BPF_PROBE_MEM | BPF_B:
+			fallthrough;
 		case BPF_LDX | BPF_MEM | BPF_H: /* dst = *(u16 *)(ul) (src + off) */
 			fallthrough;
+		case BPF_LDX | BPF_PROBE_MEM | BPF_H:
+			fallthrough;
 		case BPF_LDX | BPF_MEM | BPF_W: /* dst = *(u32 *)(ul) (src + off) */
 			fallthrough;
+		case BPF_LDX | BPF_PROBE_MEM | BPF_W:
+			fallthrough;
 		case BPF_LDX | BPF_MEM | BPF_DW: /* dst = *(u64 *)(ul) (src + off) */
+			fallthrough;
+		case BPF_LDX | BPF_PROBE_MEM | BPF_DW:
 			switch (size) {
 			case BPF_B:
 				EMIT(PPC_RAW_LBZ(dst_reg, src_reg, off));
@@ -841,6 +849,32 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, struct codegen_context *
 
 			if (size != BPF_DW && !fp->aux->verifier_zext)
 				EMIT(PPC_RAW_LI(dst_reg_h, 0));
+
+			if (BPF_MODE(code) == BPF_PROBE_MEM) {
+				int insn_idx = ctx->idx - 1;
+				int jmp_off = 4;
+
+				/*
+				 * In case of BPF_DW, two lwz instructions are emitted, one
+				 * for higher 32-bit and another for lower 32-bit. So, set
+				 * ex->insn to the first of the two and jump over both
+				 * instructions in fixup.
+				 *
+				 * Similarly, with !verifier_zext, two instructions are
+				 * emitted for BPF_B/H/W case. So, set ex->insn to the
+				 * instruction that could fault and skip over both
+				 * instructions.
+				 */
+				if (size == BPF_DW || !fp->aux->verifier_zext) {
+					insn_idx -= 1;
+					jmp_off += 4;
+				}
+
+				ret = bpf_add_extable_entry(fp, image, pass, ctx, insn_idx,
+							    jmp_off, dst_reg);
+				if (ret)
+					return ret;
+			}
 			break;
 
 		/*
-- 
2.31.1


^ permalink raw reply related

* [PATCH v3 6/8] bpf ppc64: Access only if addr is kernel address
From: Hari Bathini @ 2021-09-21 13:29 UTC (permalink / raw)
  To: naveen.n.rao, christophe.leroy, mpe, ast, daniel
  Cc: Ravi Bangoria, songliubraving, netdev, john.fastabend, andrii,
	kpsingh, paulus, yhs, bpf, linuxppc-dev, kafai, Hari Bathini
In-Reply-To: <20210921132943.489732-1-hbathini@linux.ibm.com>

From: Ravi Bangoria <ravi.bangoria@linux.ibm.com>

On PPC64 with KUAP enabled, any kernel code which wants to
access userspace needs to be surrounded by disable-enable KUAP.
But that is not happening for BPF_PROBE_MEM load instruction.
So, when BPF program tries to access invalid userspace address,
page-fault handler considers it as bad KUAP fault:

  Kernel attempted to read user page (d0000000) - exploit attempt? (uid: 0)

Considering the fact that PTR_TO_BTF_ID (which uses BPF_PROBE_MEM
mode) could either be a valid kernel pointer or NULL but should
never be a pointer to userspace address, execute BPF_PROBE_MEM load
only if addr is kernel address, otherwise set dst_reg=0 and move on.

This will catch NULL, valid or invalid userspace pointers. Only bad
kernel pointer will be handled by BPF exception table.

[Alexei suggested for x86]
Suggested-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
---

Changes in v3:
* Used is_kernel_addr() logic instead of using TASK_SIZE_MAX check
  all the time.
* Addressed other comments from Christophe.


 arch/powerpc/net/bpf_jit_comp64.c | 29 +++++++++++++++++++++++++++++
 1 file changed, 29 insertions(+)

diff --git a/arch/powerpc/net/bpf_jit_comp64.c b/arch/powerpc/net/bpf_jit_comp64.c
index 506934c13ef7..06e1206a4266 100644
--- a/arch/powerpc/net/bpf_jit_comp64.c
+++ b/arch/powerpc/net/bpf_jit_comp64.c
@@ -734,6 +734,35 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, struct codegen_context *
 		case BPF_LDX | BPF_MEM | BPF_DW:
 			fallthrough;
 		case BPF_LDX | BPF_PROBE_MEM | BPF_DW:
+			/*
+			 * As PTR_TO_BTF_ID that uses BPF_PROBE_MEM mode could either be a valid
+			 * kernel pointer or NULL but not a userspace address, execute BPF_PROBE_MEM
+			 * load only if addr is kernel address (see is_kernel_addr()), otherwise
+			 * set dst_reg=0 and move on.
+			 */
+			if (BPF_MODE(code) == BPF_PROBE_MEM) {
+				EMIT(PPC_RAW_ADDI(b2p[TMP_REG_1], src_reg, off));
+#ifdef CONFIG_PPC_BOOK3E_64
+				PPC_LI64(b2p[TMP_REG_2], 0x8000000000000000ul);
+#elif defined(CONFIG_PPC_BOOK3S_64)
+				PPC_LI64(b2p[TMP_REG_2], PAGE_OFFSET);
+#else
+				PPC_LI64(b2p[TMP_REG_2], TASK_SIZE);
+#endif
+				EMIT(PPC_RAW_CMPLD(b2p[TMP_REG_1], b2p[TMP_REG_2]));
+				PPC_BCC(COND_GT, (ctx->idx + 4) * 4);
+				EMIT(PPC_RAW_LI(dst_reg, 0));
+				/*
+				 * Check if 'off' is word aligned because PPC_BPF_LL()
+				 * (BPF_DW case) generates two instructions if 'off' is not
+				 * word-aligned and one instruction otherwise.
+				 */
+				if (BPF_SIZE(code) == BPF_DW && (off & 3))
+					PPC_JMP((ctx->idx + 3) * 4);
+				else
+					PPC_JMP((ctx->idx + 2) * 4);
+			}
+
 			switch (size) {
 			case BPF_B:
 				EMIT(PPC_RAW_LBZ(dst_reg, src_reg, off));
-- 
2.31.1


^ permalink raw reply related

* [PATCH v3 5/8] bpf ppc64: Add BPF_PROBE_MEM support for JIT
From: Hari Bathini @ 2021-09-21 13:29 UTC (permalink / raw)
  To: naveen.n.rao, christophe.leroy, mpe, ast, daniel
  Cc: Ravi Bangoria, songliubraving, netdev, john.fastabend, andrii,
	kpsingh, paulus, yhs, bpf, linuxppc-dev, kafai, Hari Bathini
In-Reply-To: <20210921132943.489732-1-hbathini@linux.ibm.com>

From: Ravi Bangoria <ravi.bangoria@linux.ibm.com>

BPF load instruction with BPF_PROBE_MEM mode can cause a fault
inside kernel. Append exception table for such instructions
within BPF program.

Unlike other archs which uses extable 'fixup' field to pass dest_reg
and nip, BPF exception table on PowerPC follows the generic PowerPC
exception table design, where it populates both fixup and extable
sections within BPF program. fixup section contains two instructions,
first instruction clears dest_reg and 2nd jumps to next instruction
in the BPF code. extable 'insn' field contains relative offset of
the instruction and 'fixup' field contains relative offset of the
fixup entry. Example layout of BPF program with extable present:

             +------------------+
             |                  |
             |                  |
   0x4020 -->| ld   r27,4(r3)   |
             |                  |
             |                  |
   0x40ac -->| lwz  r3,0(r4)    |
             |                  |
             |                  |
             |------------------|
   0x4280 -->| li  r27,0        |  \ fixup entry
             | b   0x4024       |  /
   0x4288 -->| li  r3,0         |
             | b   0x40b0       |
             |------------------|
   0x4290 -->| insn=0xfffffd90  |  \ extable entry
             | fixup=0xffffffec |  /
   0x4298 -->| insn=0xfffffe14  |
             | fixup=0xffffffec |
             +------------------+

   (Addresses shown here are chosen random, not real)

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
---

Changes in v3:
* Made all changes for bpf_add_extable_entry() in this patch instead
  of updating it again in patch #7.
* Fixed a coding style issue.


 arch/powerpc/net/bpf_jit.h        |  8 +++-
 arch/powerpc/net/bpf_jit_comp.c   | 70 ++++++++++++++++++++++++++++---
 arch/powerpc/net/bpf_jit_comp32.c |  2 +-
 arch/powerpc/net/bpf_jit_comp64.c | 17 +++++++-
 4 files changed, 88 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/net/bpf_jit.h b/arch/powerpc/net/bpf_jit.h
index 0c8f885b8f48..561689a2abdf 100644
--- a/arch/powerpc/net/bpf_jit.h
+++ b/arch/powerpc/net/bpf_jit.h
@@ -141,8 +141,11 @@ struct codegen_context {
 	unsigned int idx;
 	unsigned int stack_size;
 	int b2p[ARRAY_SIZE(b2p)];
+	unsigned int exentry_idx;
 };
 
+#define BPF_FIXUP_LEN	2 /* Two instructions => 8 bytes */
+
 static inline void bpf_flush_icache(void *start, void *end)
 {
 	smp_wmb();	/* smp write barrier */
@@ -166,11 +169,14 @@ static inline void bpf_clear_seen_register(struct codegen_context *ctx, int i)
 
 void bpf_jit_emit_func_call_rel(u32 *image, struct codegen_context *ctx, u64 func);
 int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, struct codegen_context *ctx,
-		       u32 *addrs);
+		       u32 *addrs, int pass);
 void bpf_jit_build_prologue(u32 *image, struct codegen_context *ctx);
 void bpf_jit_build_epilogue(u32 *image, struct codegen_context *ctx);
 void bpf_jit_realloc_regs(struct codegen_context *ctx);
 
+int bpf_add_extable_entry(struct bpf_prog *fp, u32 *image, int pass, struct codegen_context *ctx,
+			  int insn_idx, int jmp_off, int dst_reg);
+
 #endif
 
 #endif
diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c
index c5c9e8ad1de7..f02457c6b54f 100644
--- a/arch/powerpc/net/bpf_jit_comp.c
+++ b/arch/powerpc/net/bpf_jit_comp.c
@@ -101,6 +101,8 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
 	struct bpf_prog *tmp_fp;
 	bool bpf_blinded = false;
 	bool extra_pass = false;
+	u32 extable_len;
+	u32 fixup_len;
 
 	if (!fp->jit_requested)
 		return org_fp;
@@ -131,7 +133,6 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
 		image = jit_data->image;
 		bpf_hdr = jit_data->header;
 		proglen = jit_data->proglen;
-		alloclen = proglen + FUNCTION_DESCR_SIZE;
 		extra_pass = true;
 		goto skip_init_ctx;
 	}
@@ -149,7 +150,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
 	cgctx.stack_size = round_up(fp->aux->stack_depth, 16);
 
 	/* Scouting faux-generate pass 0 */
-	if (bpf_jit_build_body(fp, 0, &cgctx, addrs)) {
+	if (bpf_jit_build_body(fp, 0, &cgctx, addrs, 0)) {
 		/* We hit something illegal or unsupported. */
 		fp = org_fp;
 		goto out_addrs;
@@ -162,7 +163,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
 	 */
 	if (cgctx.seen & SEEN_TAILCALL) {
 		cgctx.idx = 0;
-		if (bpf_jit_build_body(fp, 0, &cgctx, addrs)) {
+		if (bpf_jit_build_body(fp, 0, &cgctx, addrs, 0)) {
 			fp = org_fp;
 			goto out_addrs;
 		}
@@ -177,8 +178,11 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
 	bpf_jit_build_prologue(0, &cgctx);
 	bpf_jit_build_epilogue(0, &cgctx);
 
+	fixup_len = fp->aux->num_exentries * BPF_FIXUP_LEN * 4;
+	extable_len = fp->aux->num_exentries * sizeof(struct exception_table_entry);
+
 	proglen = cgctx.idx * 4;
-	alloclen = proglen + FUNCTION_DESCR_SIZE;
+	alloclen = proglen + FUNCTION_DESCR_SIZE + fixup_len + extable_len;
 
 	bpf_hdr = bpf_jit_binary_alloc(alloclen, &image, 4, bpf_jit_fill_ill_insns);
 	if (!bpf_hdr) {
@@ -186,6 +190,9 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
 		goto out_addrs;
 	}
 
+	if (extable_len)
+		fp->aux->extable = (void *)image + FUNCTION_DESCR_SIZE + proglen + fixup_len;
+
 skip_init_ctx:
 	code_base = (u32 *)(image + FUNCTION_DESCR_SIZE);
 
@@ -210,7 +217,11 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
 		/* Now build the prologue, body code & epilogue for real. */
 		cgctx.idx = 0;
 		bpf_jit_build_prologue(code_base, &cgctx);
-		bpf_jit_build_body(fp, code_base, &cgctx, addrs);
+		if (bpf_jit_build_body(fp, code_base, &cgctx, addrs, pass)) {
+			bpf_jit_binary_free(bpf_hdr);
+			fp = org_fp;
+			goto out_addrs;
+		}
 		bpf_jit_build_epilogue(code_base, &cgctx);
 
 		if (bpf_jit_enable > 1)
@@ -234,7 +245,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
 
 	fp->bpf_func = (void *)image;
 	fp->jited = 1;
-	fp->jited_len = alloclen;
+	fp->jited_len = proglen + FUNCTION_DESCR_SIZE;
 
 	bpf_flush_icache(bpf_hdr, (u8 *)bpf_hdr + (bpf_hdr->pages * PAGE_SIZE));
 	bpf_jit_binary_lock_ro(bpf_hdr);
@@ -258,3 +269,50 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
 
 	return fp;
 }
+
+/*
+ * The caller should check for (BPF_MODE(code) == BPF_PROBE_MEM) before calling
+ * this function, as this only applies to BPF_PROBE_MEM, for now.
+ */
+int bpf_add_extable_entry(struct bpf_prog *fp, u32 *image, int pass, struct codegen_context *ctx,
+			  int insn_idx, int jmp_off, int dst_reg)
+{
+	off_t offset;
+	unsigned long pc;
+	struct exception_table_entry *ex;
+	u32 *fixup;
+
+	/* Populate extable entries only in the last pass */
+	if (pass != 2)
+		return 0;
+
+	if (!fp->aux->extable ||
+	    WARN_ON_ONCE(ctx->exentry_idx >= fp->aux->num_exentries))
+		return -EINVAL;
+
+	pc = (unsigned long)&image[insn_idx];
+
+	fixup = (void *)fp->aux->extable -
+		(fp->aux->num_exentries * BPF_FIXUP_LEN * 4) +
+		(ctx->exentry_idx * BPF_FIXUP_LEN * 4);
+
+	fixup[0] = PPC_RAW_LI(dst_reg, 0);
+
+	fixup[BPF_FIXUP_LEN - 1] =
+		PPC_RAW_BRANCH((long)(pc + jmp_off) - (long)&fixup[BPF_FIXUP_LEN - 1]);
+
+	ex = &fp->aux->extable[ctx->exentry_idx];
+
+	offset = pc - (long)&ex->insn;
+	if (WARN_ON_ONCE(offset >= 0 || offset < INT_MIN))
+		return -ERANGE;
+	ex->insn = offset;
+
+	offset = (long)fixup - (long)&ex->fixup;
+	if (WARN_ON_ONCE(offset >= 0 || offset < INT_MIN))
+		return -ERANGE;
+	ex->fixup = offset;
+
+	ctx->exentry_idx++;
+	return 0;
+}
diff --git a/arch/powerpc/net/bpf_jit_comp32.c b/arch/powerpc/net/bpf_jit_comp32.c
index 6e4956cd52e7..820c7848434e 100644
--- a/arch/powerpc/net/bpf_jit_comp32.c
+++ b/arch/powerpc/net/bpf_jit_comp32.c
@@ -266,7 +266,7 @@ static void bpf_jit_emit_tail_call(u32 *image, struct codegen_context *ctx, u32
 
 /* Assemble the body code between the prologue & epilogue */
 int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, struct codegen_context *ctx,
-		       u32 *addrs)
+		       u32 *addrs, int pass)
 {
 	const struct bpf_insn *insn = fp->insnsi;
 	int flen = fp->len;
diff --git a/arch/powerpc/net/bpf_jit_comp64.c b/arch/powerpc/net/bpf_jit_comp64.c
index 991eb43d4cd2..506934c13ef7 100644
--- a/arch/powerpc/net/bpf_jit_comp64.c
+++ b/arch/powerpc/net/bpf_jit_comp64.c
@@ -272,7 +272,7 @@ static void bpf_jit_emit_tail_call(u32 *image, struct codegen_context *ctx, u32
 
 /* Assemble the body code between the prologue & epilogue */
 int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, struct codegen_context *ctx,
-		       u32 *addrs)
+		       u32 *addrs, int pass)
 {
 	const struct bpf_insn *insn = fp->insnsi;
 	int flen = fp->len;
@@ -718,14 +718,22 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, struct codegen_context *
 		/* dst = *(u8 *)(ul) (src + off) */
 		case BPF_LDX | BPF_MEM | BPF_B:
 			fallthrough;
+		case BPF_LDX | BPF_PROBE_MEM | BPF_B:
+			fallthrough;
 		/* dst = *(u16 *)(ul) (src + off) */
 		case BPF_LDX | BPF_MEM | BPF_H:
 			fallthrough;
+		case BPF_LDX | BPF_PROBE_MEM | BPF_H:
+			fallthrough;
 		/* dst = *(u32 *)(ul) (src + off) */
 		case BPF_LDX | BPF_MEM | BPF_W:
 			fallthrough;
+		case BPF_LDX | BPF_PROBE_MEM | BPF_W:
+			fallthrough;
 		/* dst = *(u64 *)(ul) (src + off) */
 		case BPF_LDX | BPF_MEM | BPF_DW:
+			fallthrough;
+		case BPF_LDX | BPF_PROBE_MEM | BPF_DW:
 			switch (size) {
 			case BPF_B:
 				EMIT(PPC_RAW_LBZ(dst_reg, src_reg, off));
@@ -749,6 +757,13 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, struct codegen_context *
 
 			if (size != BPF_DW && insn_is_zext(&insn[i + 1]))
 				addrs[++i] = ctx->idx * 4;
+
+			if (BPF_MODE(code) == BPF_PROBE_MEM) {
+				ret = bpf_add_extable_entry(fp, image, pass, ctx, ctx->idx - 1,
+							    4, dst_reg);
+				if (ret)
+					return ret;
+			}
 			break;
 
 		/*
-- 
2.31.1


^ permalink raw reply related

* [PATCH v3 4/8] powerpc/ppc-opcode: introduce PPC_RAW_BRANCH() macro
From: Hari Bathini @ 2021-09-21 13:29 UTC (permalink / raw)
  To: naveen.n.rao, christophe.leroy, mpe, ast, daniel
  Cc: songliubraving, netdev, john.fastabend, andrii, kpsingh, paulus,
	yhs, bpf, linuxppc-dev, kafai, Hari Bathini
In-Reply-To: <20210921132943.489732-1-hbathini@linux.ibm.com>

Define and use PPC_RAW_BRANCH() macro instead of open coding it. This
macro is used while adding BPF_PROBE_MEM support.

Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---

Changes in v3:
* Added Reviewed-by tag from Chris.


 arch/powerpc/include/asm/ppc-opcode.h | 2 ++
 arch/powerpc/net/bpf_jit.h            | 4 ++--
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/ppc-opcode.h b/arch/powerpc/include/asm/ppc-opcode.h
index baea657bc868..f50213e2a3e0 100644
--- a/arch/powerpc/include/asm/ppc-opcode.h
+++ b/arch/powerpc/include/asm/ppc-opcode.h
@@ -566,6 +566,8 @@
 #define PPC_RAW_MTSPR(spr, d)		(0x7c0003a6 | ___PPC_RS(d) | __PPC_SPR(spr))
 #define PPC_RAW_EIEIO()			(0x7c0006ac)
 
+#define PPC_RAW_BRANCH(addr)		(PPC_INST_BRANCH | ((addr) & 0x03fffffc))
+
 /* Deal with instructions that older assemblers aren't aware of */
 #define	PPC_BCCTR_FLUSH		stringify_in_c(.long PPC_INST_BCCTR_FLUSH)
 #define	PPC_CP_ABORT		stringify_in_c(.long PPC_RAW_CP_ABORT)
diff --git a/arch/powerpc/net/bpf_jit.h b/arch/powerpc/net/bpf_jit.h
index 411c63d945c7..0c8f885b8f48 100644
--- a/arch/powerpc/net/bpf_jit.h
+++ b/arch/powerpc/net/bpf_jit.h
@@ -24,8 +24,8 @@
 #define EMIT(instr)		PLANT_INSTR(image, ctx->idx, instr)
 
 /* Long jump; (unconditional 'branch') */
-#define PPC_JMP(dest)		EMIT(PPC_INST_BRANCH |			      \
-				     (((dest) - (ctx->idx * 4)) & 0x03fffffc))
+#define PPC_JMP(dest)		EMIT(PPC_RAW_BRANCH((dest) - (ctx->idx * 4)))
+
 /* blr; (unconditional 'branch' with link) to absolute address */
 #define PPC_BL_ABS(dest)	EMIT(PPC_INST_BL |			      \
 				     (((dest) - (unsigned long)(image + ctx->idx)) & 0x03fffffc))
-- 
2.31.1


^ permalink raw reply related

* [PATCH v3 3/8] bpf powerpc: refactor JIT compiler code
From: Hari Bathini @ 2021-09-21 13:29 UTC (permalink / raw)
  To: naveen.n.rao, christophe.leroy, mpe, ast, daniel
  Cc: songliubraving, netdev, john.fastabend, andrii, kpsingh, paulus,
	yhs, bpf, linuxppc-dev, kafai, Hari Bathini
In-Reply-To: <20210921132943.489732-1-hbathini@linux.ibm.com>

Refactor powerpc LDX JITing code to simplify adding BPF_PROBE_MEM
support.

Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
---

Changes in v3:
* Dropped ST(X) JITing coderefactoring and ensured the changes are
  minimal and relevant to BPF_PROBE_MEM support.
* Added a default for size switch case to ensure compiler would
  not complain.


 arch/powerpc/net/bpf_jit_comp32.c | 42 ++++++++++++++++++++-----------
 arch/powerpc/net/bpf_jit_comp64.c | 40 +++++++++++++++++++----------
 2 files changed, 55 insertions(+), 27 deletions(-)

diff --git a/arch/powerpc/net/bpf_jit_comp32.c b/arch/powerpc/net/bpf_jit_comp32.c
index b60b59426a24..6e4956cd52e7 100644
--- a/arch/powerpc/net/bpf_jit_comp32.c
+++ b/arch/powerpc/net/bpf_jit_comp32.c
@@ -282,6 +282,7 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, struct codegen_context *
 		u32 src_reg = bpf_to_ppc(ctx, insn[i].src_reg);
 		u32 src_reg_h = src_reg - 1;
 		u32 tmp_reg = bpf_to_ppc(ctx, TMP_REG);
+		u32 size = BPF_SIZE(code);
 		s16 off = insn[i].off;
 		s32 imm = insn[i].imm;
 		bool func_addr_fixed;
@@ -810,23 +811,36 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, struct codegen_context *
 		 * BPF_LDX
 		 */
 		case BPF_LDX | BPF_MEM | BPF_B: /* dst = *(u8 *)(ul) (src + off) */
-			EMIT(PPC_RAW_LBZ(dst_reg, src_reg, off));
-			if (!fp->aux->verifier_zext)
-				EMIT(PPC_RAW_LI(dst_reg_h, 0));
-			break;
+			fallthrough;
 		case BPF_LDX | BPF_MEM | BPF_H: /* dst = *(u16 *)(ul) (src + off) */
-			EMIT(PPC_RAW_LHZ(dst_reg, src_reg, off));
-			if (!fp->aux->verifier_zext)
-				EMIT(PPC_RAW_LI(dst_reg_h, 0));
-			break;
+			fallthrough;
 		case BPF_LDX | BPF_MEM | BPF_W: /* dst = *(u32 *)(ul) (src + off) */
-			EMIT(PPC_RAW_LWZ(dst_reg, src_reg, off));
-			if (!fp->aux->verifier_zext)
-				EMIT(PPC_RAW_LI(dst_reg_h, 0));
-			break;
+			fallthrough;
 		case BPF_LDX | BPF_MEM | BPF_DW: /* dst = *(u64 *)(ul) (src + off) */
-			EMIT(PPC_RAW_LWZ(dst_reg_h, src_reg, off));
-			EMIT(PPC_RAW_LWZ(dst_reg, src_reg, off + 4));
+			switch (size) {
+			case BPF_B:
+				EMIT(PPC_RAW_LBZ(dst_reg, src_reg, off));
+				break;
+			case BPF_H:
+				EMIT(PPC_RAW_LHZ(dst_reg, src_reg, off));
+				break;
+			case BPF_W:
+				EMIT(PPC_RAW_LWZ(dst_reg, src_reg, off));
+				break;
+			case BPF_DW:
+				EMIT(PPC_RAW_LWZ(dst_reg_h, src_reg, off));
+				EMIT(PPC_RAW_LWZ(dst_reg, src_reg, off + 4));
+				break;
+			/*
+			 * With size not being an enum and BPF_B/H/W/DW being macros, ensure no
+			 * compiler warning/error by adding a default case that never reaches.
+			 */
+			default:
+				break;
+			}
+
+			if (size != BPF_DW && !fp->aux->verifier_zext)
+				EMIT(PPC_RAW_LI(dst_reg_h, 0));
 			break;
 
 		/*
diff --git a/arch/powerpc/net/bpf_jit_comp64.c b/arch/powerpc/net/bpf_jit_comp64.c
index 2a87da50d9a4..991eb43d4cd2 100644
--- a/arch/powerpc/net/bpf_jit_comp64.c
+++ b/arch/powerpc/net/bpf_jit_comp64.c
@@ -285,6 +285,7 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, struct codegen_context *
 		u32 code = insn[i].code;
 		u32 dst_reg = b2p[insn[i].dst_reg];
 		u32 src_reg = b2p[insn[i].src_reg];
+		u32 size = BPF_SIZE(code);
 		s16 off = insn[i].off;
 		s32 imm = insn[i].imm;
 		bool func_addr_fixed;
@@ -716,25 +717,38 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, struct codegen_context *
 		 */
 		/* dst = *(u8 *)(ul) (src + off) */
 		case BPF_LDX | BPF_MEM | BPF_B:
-			EMIT(PPC_RAW_LBZ(dst_reg, src_reg, off));
-			if (insn_is_zext(&insn[i + 1]))
-				addrs[++i] = ctx->idx * 4;
-			break;
+			fallthrough;
 		/* dst = *(u16 *)(ul) (src + off) */
 		case BPF_LDX | BPF_MEM | BPF_H:
-			EMIT(PPC_RAW_LHZ(dst_reg, src_reg, off));
-			if (insn_is_zext(&insn[i + 1]))
-				addrs[++i] = ctx->idx * 4;
-			break;
+			fallthrough;
 		/* dst = *(u32 *)(ul) (src + off) */
 		case BPF_LDX | BPF_MEM | BPF_W:
-			EMIT(PPC_RAW_LWZ(dst_reg, src_reg, off));
-			if (insn_is_zext(&insn[i + 1]))
-				addrs[++i] = ctx->idx * 4;
-			break;
+			fallthrough;
 		/* dst = *(u64 *)(ul) (src + off) */
 		case BPF_LDX | BPF_MEM | BPF_DW:
-			PPC_BPF_LL(dst_reg, src_reg, off);
+			switch (size) {
+			case BPF_B:
+				EMIT(PPC_RAW_LBZ(dst_reg, src_reg, off));
+				break;
+			case BPF_H:
+				EMIT(PPC_RAW_LHZ(dst_reg, src_reg, off));
+				break;
+			case BPF_W:
+				EMIT(PPC_RAW_LWZ(dst_reg, src_reg, off));
+				break;
+			case BPF_DW:
+				PPC_BPF_LL(dst_reg, src_reg, off);
+				break;
+			/*
+			 * With size not being an enum and BPF_B/H/W/DW being macros, ensure no
+			 * compiler warning/error by adding a default case that never reaches.
+			 */
+			default:
+				break;
+			}
+
+			if (size != BPF_DW && insn_is_zext(&insn[i + 1]))
+				addrs[++i] = ctx->idx * 4;
 			break;
 
 		/*
-- 
2.31.1


^ permalink raw reply related

* [PATCH v3 2/8] bpf powerpc: Remove extra_pass from bpf_jit_build_body()
From: Hari Bathini @ 2021-09-21 13:29 UTC (permalink / raw)
  To: naveen.n.rao, christophe.leroy, mpe, ast, daniel
  Cc: Ravi Bangoria, songliubraving, netdev, john.fastabend, andrii,
	kpsingh, paulus, yhs, bpf, linuxppc-dev, kafai
In-Reply-To: <20210921132943.489732-1-hbathini@linux.ibm.com>

From: Ravi Bangoria <ravi.bangoria@linux.ibm.com>

In case of extra_pass, usual JIT passes are always skipped. So,
extra_pass is always false while calling bpf_jit_build_body() and
can be removed.

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
---

Changes in v3:
* Updated the changelog wording a bit.


 arch/powerpc/net/bpf_jit.h        | 2 +-
 arch/powerpc/net/bpf_jit_comp.c   | 6 +++---
 arch/powerpc/net/bpf_jit_comp32.c | 4 ++--
 arch/powerpc/net/bpf_jit_comp64.c | 4 ++--
 4 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/net/bpf_jit.h b/arch/powerpc/net/bpf_jit.h
index d6267e93027a..411c63d945c7 100644
--- a/arch/powerpc/net/bpf_jit.h
+++ b/arch/powerpc/net/bpf_jit.h
@@ -166,7 +166,7 @@ static inline void bpf_clear_seen_register(struct codegen_context *ctx, int i)
 
 void bpf_jit_emit_func_call_rel(u32 *image, struct codegen_context *ctx, u64 func);
 int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, struct codegen_context *ctx,
-		       u32 *addrs, bool extra_pass);
+		       u32 *addrs);
 void bpf_jit_build_prologue(u32 *image, struct codegen_context *ctx);
 void bpf_jit_build_epilogue(u32 *image, struct codegen_context *ctx);
 void bpf_jit_realloc_regs(struct codegen_context *ctx);
diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c
index 53aefee3fe70..c5c9e8ad1de7 100644
--- a/arch/powerpc/net/bpf_jit_comp.c
+++ b/arch/powerpc/net/bpf_jit_comp.c
@@ -149,7 +149,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
 	cgctx.stack_size = round_up(fp->aux->stack_depth, 16);
 
 	/* Scouting faux-generate pass 0 */
-	if (bpf_jit_build_body(fp, 0, &cgctx, addrs, false)) {
+	if (bpf_jit_build_body(fp, 0, &cgctx, addrs)) {
 		/* We hit something illegal or unsupported. */
 		fp = org_fp;
 		goto out_addrs;
@@ -162,7 +162,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
 	 */
 	if (cgctx.seen & SEEN_TAILCALL) {
 		cgctx.idx = 0;
-		if (bpf_jit_build_body(fp, 0, &cgctx, addrs, false)) {
+		if (bpf_jit_build_body(fp, 0, &cgctx, addrs)) {
 			fp = org_fp;
 			goto out_addrs;
 		}
@@ -210,7 +210,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
 		/* Now build the prologue, body code & epilogue for real. */
 		cgctx.idx = 0;
 		bpf_jit_build_prologue(code_base, &cgctx);
-		bpf_jit_build_body(fp, code_base, &cgctx, addrs, extra_pass);
+		bpf_jit_build_body(fp, code_base, &cgctx, addrs);
 		bpf_jit_build_epilogue(code_base, &cgctx);
 
 		if (bpf_jit_enable > 1)
diff --git a/arch/powerpc/net/bpf_jit_comp32.c b/arch/powerpc/net/bpf_jit_comp32.c
index beb12cbc8c29..b60b59426a24 100644
--- a/arch/powerpc/net/bpf_jit_comp32.c
+++ b/arch/powerpc/net/bpf_jit_comp32.c
@@ -266,7 +266,7 @@ static void bpf_jit_emit_tail_call(u32 *image, struct codegen_context *ctx, u32
 
 /* Assemble the body code between the prologue & epilogue */
 int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, struct codegen_context *ctx,
-		       u32 *addrs, bool extra_pass)
+		       u32 *addrs)
 {
 	const struct bpf_insn *insn = fp->insnsi;
 	int flen = fp->len;
@@ -860,7 +860,7 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, struct codegen_context *
 		case BPF_JMP | BPF_CALL:
 			ctx->seen |= SEEN_FUNC;
 
-			ret = bpf_jit_get_func_addr(fp, &insn[i], extra_pass,
+			ret = bpf_jit_get_func_addr(fp, &insn[i], false,
 						    &func_addr, &func_addr_fixed);
 			if (ret < 0)
 				return ret;
diff --git a/arch/powerpc/net/bpf_jit_comp64.c b/arch/powerpc/net/bpf_jit_comp64.c
index b87a63dba9c8..2a87da50d9a4 100644
--- a/arch/powerpc/net/bpf_jit_comp64.c
+++ b/arch/powerpc/net/bpf_jit_comp64.c
@@ -272,7 +272,7 @@ static void bpf_jit_emit_tail_call(u32 *image, struct codegen_context *ctx, u32
 
 /* Assemble the body code between the prologue & epilogue */
 int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, struct codegen_context *ctx,
-		       u32 *addrs, bool extra_pass)
+		       u32 *addrs)
 {
 	const struct bpf_insn *insn = fp->insnsi;
 	int flen = fp->len;
@@ -769,7 +769,7 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, struct codegen_context *
 		case BPF_JMP | BPF_CALL:
 			ctx->seen |= SEEN_FUNC;
 
-			ret = bpf_jit_get_func_addr(fp, &insn[i], extra_pass,
+			ret = bpf_jit_get_func_addr(fp, &insn[i], false,
 						    &func_addr, &func_addr_fixed);
 			if (ret < 0)
 				return ret;
-- 
2.31.1


^ permalink raw reply related

* [PATCH v3 1/8] bpf powerpc: Remove unused SEEN_STACK
From: Hari Bathini @ 2021-09-21 13:29 UTC (permalink / raw)
  To: naveen.n.rao, christophe.leroy, mpe, ast, daniel
  Cc: Ravi Bangoria, songliubraving, netdev, john.fastabend, andrii,
	kpsingh, paulus, yhs, bpf, linuxppc-dev, kafai
In-Reply-To: <20210921132943.489732-1-hbathini@linux.ibm.com>

From: Ravi Bangoria <ravi.bangoria@linux.ibm.com>

SEEN_STACK is unused on PowerPC. Remove it. Also, have
SEEN_TAILCALL use 0x40000000.

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---

Changes in v3:
* Added Reviewed-by tag from Chris.


 arch/powerpc/net/bpf_jit.h | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/powerpc/net/bpf_jit.h b/arch/powerpc/net/bpf_jit.h
index 99fad093f43e..d6267e93027a 100644
--- a/arch/powerpc/net/bpf_jit.h
+++ b/arch/powerpc/net/bpf_jit.h
@@ -116,8 +116,7 @@ static inline bool is_nearbranch(int offset)
 #define COND_LE		(CR0_GT | COND_CMP_FALSE)
 
 #define SEEN_FUNC	0x20000000 /* might call external helpers */
-#define SEEN_STACK	0x40000000 /* uses BPF stack */
-#define SEEN_TAILCALL	0x80000000 /* uses tail calls */
+#define SEEN_TAILCALL	0x40000000 /* uses tail calls */
 
 #define SEEN_VREG_MASK	0x1ff80000 /* Volatile registers r3-r12 */
 #define SEEN_NVREG_MASK	0x0003ffff /* Non volatile registers r14-r31 */
-- 
2.31.1


^ permalink raw reply related

* [PATCH v3 0/8] bpf powerpc: Add BPF_PROBE_MEM support in powerpc JIT compiler
From: Hari Bathini @ 2021-09-21 13:29 UTC (permalink / raw)
  To: naveen.n.rao, christophe.leroy, mpe, ast, daniel
  Cc: songliubraving, netdev, john.fastabend, andrii, kpsingh, paulus,
	yhs, bpf, linuxppc-dev, kafai, Hari Bathini

Patch #1 & #2 are simple cleanup patches. Patch #3 refactors JIT
compiler code with the aim to simplify adding BPF_PROBE_MEM support.
Patch #4 introduces PPC_RAW_BRANCH() macro instead of open coding
branch instruction. Patch #5 & #7 add BPF_PROBE_MEM support for PPC64
& PPC32 JIT compilers respectively. Patch #6 & #8 handle bad userspace
pointers for PPC64 & PPC32 cases respectively.

Changes in v3:
* Addressed all the review comments from Christophe.

Hari Bathini (4):
  bpf powerpc: refactor JIT compiler code
  powerpc/ppc-opcode: introduce PPC_RAW_BRANCH() macro
  bpf ppc32: Add BPF_PROBE_MEM support for JIT
  bpf ppc32: Access only if addr is kernel address

Ravi Bangoria (4):
  bpf powerpc: Remove unused SEEN_STACK
  bpf powerpc: Remove extra_pass from bpf_jit_build_body()
  bpf ppc64: Add BPF_PROBE_MEM support for JIT
  bpf ppc64: Access only if addr is kernel address

 arch/powerpc/include/asm/ppc-opcode.h |   2 +
 arch/powerpc/net/bpf_jit.h            |  19 +++--
 arch/powerpc/net/bpf_jit_comp.c       |  72 ++++++++++++++--
 arch/powerpc/net/bpf_jit_comp32.c     | 115 ++++++++++++++++++++++----
 arch/powerpc/net/bpf_jit_comp64.c     |  88 ++++++++++++++++----
 5 files changed, 254 insertions(+), 42 deletions(-)

-- 
2.31.1


^ permalink raw reply

* Re: [RFC PATCH 5/8] sched: move CPU field back into thread_info if THREAD_INFO_IN_TASK=y
From: Catalin Marinas @ 2021-09-21 13:12 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Peter Zijlstra, Paul Mackerras, linux-riscv, Will Deacon,
	linux-s390, Arnd Bergmann, Russell King, Christian Borntraeger,
	Ingo Molnar, Albert Ou, Kees Cook, Vasily Gorbik, Heiko Carstens,
	Keith Packard, Borislav Petkov, Andy Lutomirski, Paul Walmsley,
	Thomas Gleixner, linux-arm-kernel, linuxppc-dev, linux-kernel,
	Palmer Dabbelt, Linus Torvalds
In-Reply-To: <20210914121036.3975026-6-ardb@kernel.org>

On Tue, Sep 14, 2021 at 02:10:33PM +0200, Ard Biesheuvel wrote:
> THREAD_INFO_IN_TASK moved the CPU field out of thread_info, but this
> causes some issues on architectures that define raw_smp_processor_id()
> in terms of this field, due to the fact that #include'ing linux/sched.h
> to get at struct task_struct is problematic in terms of circular
> dependencies.
> 
> Given that thread_info and task_struct are the same data structure
> anyway when THREAD_INFO_IN_TASK=y, let's move it back so that having
> access to the type definition of struct thread_info is sufficient to
> reference the CPU number of the current task.
> 
> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>

Acked-by: Catalin Marinas <catalin.marinas@arm.com>

^ permalink raw reply

* Re: [RFC PATCH 2/8] x86: add CPU field to struct thread_info
From: Borislav Petkov @ 2021-09-21 12:54 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Peter Zijlstra, Paul Mackerras, linux-riscv, Will Deacon,
	linux-s390, Arnd Bergmann, Russell King, Christian Borntraeger,
	Ingo Molnar, Catalin Marinas, Albert Ou, Kees Cook, Vasily Gorbik,
	Heiko Carstens, Keith Packard, Andy Lutomirski, Paul Walmsley,
	Thomas Gleixner, linux-arm-kernel, linuxppc-dev, linux-kernel,
	Palmer Dabbelt, Linus Torvalds
In-Reply-To: <20210914121036.3975026-3-ardb@kernel.org>

On Tue, Sep 14, 2021 at 02:10:30PM +0200, Ard Biesheuvel wrote:
> The CPU field will be moved back into thread_info even when
> THREAD_INFO_IN_TASK is enabled, so add it back to x86's definition of
> struct thread_info.
> 
> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> ---
>  arch/x86/include/asm/thread_info.h | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h
> index cf132663c219..ebec69c35e95 100644
> --- a/arch/x86/include/asm/thread_info.h
> +++ b/arch/x86/include/asm/thread_info.h
> @@ -57,6 +57,9 @@ struct thread_info {
>  	unsigned long		flags;		/* low level flags */
>  	unsigned long		syscall_work;	/* SYSCALL_WORK_ flags */
>  	u32			status;		/* thread synchronous flags */
> +#ifdef CONFIG_SMP
> +	u32			cpu;		/* current CPU */
> +#endif
>  };
>  
>  #define INIT_THREAD_INFO(tsk)			\
> -- 

Acked-by: Borislav Petkov <bp@suse.de>

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply

* Re: [RFC PATCH 0/8] Move task_struct::cpu back into thread_info
From: Ard Biesheuvel @ 2021-09-21 12:06 UTC (permalink / raw)
  To: Mark Rutland, Michael Ellerman, Benjamin Herrenschmidt,
	Palmer Dabbelt, Paul Walmsley, Albert Ou, Heiko Carstens,
	Paul Mackerras, Borislav Petkov, Peter Zijlstra, Thomas Gleixner,
	Christian Borntraeger, Vasily Gorbik, Ingo Molnar
  Cc: open list:S390, Kees Cook, Arnd Bergmann, Catalin Marinas,
	open list:LINUX FOR POWERPC (32-BIT AND 64-BIT),
	Linux Kernel Mailing List, Russell King, Linus Torvalds,
	Keith Packard, Andy Lutomirski, linux-riscv, Will Deacon,
	Linux ARM
In-Reply-To: <20210914135527.GC30247@C02TD0UTHF1T.local>

On Tue, 14 Sept 2021 at 15:55, Mark Rutland <mark.rutland@arm.com> wrote:
>
> On Tue, Sep 14, 2021 at 02:10:28PM +0200, Ard Biesheuvel wrote:
> > Commit c65eacbe290b ("sched/core: Allow putting thread_info into
> > task_struct") mentions that, along with moving thread_info into
> > task_struct, the cpu field is moved out of the former into the latter,
> > but does not explain why.
>
> From what I recall of talking to Andy around that time, when converting
> arm64 over, the theory was that over time we'd move more and more out of
> thread_info and into task_struct or thread_struct, until task_struct
> supplanted thread_info entirely, and that all became generic.
>
> I think the key gain there was making things more *generic*, and there
> are other ways we could do that in future without moving more into
> task_struct (e.g. with a geenric thread_info and arch_thread_info inside
> that).
>
> With that in mind, and given the diffstat, I think this is worthwhile.
>
> FWIW, for the series:
>
> Acked-by: Mark Rutland <mark.rutland@arm.com>
>

Thanks.

Any comments on this from the various arch maintainers? Especially
power, as Christophe seems happy with this but there are 3 different
patches affecting power that need a maintainer ack.

^ permalink raw reply

* Re: [RFC PATCH 1/8] arm64: add CPU field to struct thread_info
From: Ard Biesheuvel @ 2021-09-21 12:04 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Peter Zijlstra, Paul Mackerras, linux-riscv, Will Deacon,
	open list:S390, Arnd Bergmann, Russell King,
	Christian Borntraeger, Ingo Molnar, Albert Ou, Kees Cook,
	Vasily Gorbik, Heiko Carstens, Keith Packard, Borislav Petkov,
	Andy Lutomirski, Paul Walmsley, Thomas Gleixner, Linux ARM,
	open list:LINUX FOR POWERPC (32-BIT AND 64-BIT),
	Linux Kernel Mailing List, Palmer Dabbelt, Linus Torvalds
In-Reply-To: <YUNXfWKZ7XYvw2EK@arm.com>

On Thu, 16 Sept 2021 at 16:41, Catalin Marinas <catalin.marinas@arm.com> wrote:
>
> On Tue, Sep 14, 2021 at 02:10:29PM +0200, Ard Biesheuvel wrote:
> > The CPU field will be moved back into thread_info even when
> > THREAD_INFO_IN_TASK is enabled, so add it back to arm64's definition of
> > struct thread_info.
> >
> > Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
>
> Acked-by: Catalin Marinas <catalin.marinas@arm.com>

Thanks. I take it this applies to patch #5 as well?

^ permalink raw reply

* Re: [PATCH v3] lib/zlib_inflate/inffast: Check config in C to avoid unused function warning
From: Paul Menzel @ 2021-09-21 10:11 UTC (permalink / raw)
  To: Nathan Chancellor
  Cc: llvm, Zhen Lei, Nick Desaulniers, linux-kernel, Paul Mackerras,
	Andrew Morton, linuxppc-dev
In-Reply-To: <YUishGbHeaDMJDj+@archlinux-ax161>

Dear Linux folks,


Am 20.09.21 um 17:45 schrieb Nathan Chancellor:
> On Mon, Sep 20, 2021 at 10:43:33AM +0200, Paul Menzel wrote:
>> Building Linux for ppc64le with Ubuntu clang version 12.0.0-3ubuntu1~21.04.1
>> shows the warning below.
>>
>>      arch/powerpc/boot/inffast.c:20:1: warning: unused function 'get_unaligned16' [-Wunused-function]
>>      get_unaligned16(const unsigned short *p)
>>      ^
>>      1 warning generated.
>>
>> Fix it, by moving the check from the preprocessor to C, so the compiler
>> sees the use.
>>
>> Signed-off-by: Paul Menzel <pmenzel@molgen.mpg.de>
> 
> Reviewed-by: Nathan Chancellor <nathan@kernel.org>
> Tested-by: Nathan Chancellor <nathan@kernel.org>
> 
>> ---
>> v2: Use IS_ENABLED
>> v3: Use if statement over ternary operator as requested by Christophe
>>
>>   lib/zlib_inflate/inffast.c | 13 ++++++-------
>>   1 file changed, 6 insertions(+), 7 deletions(-)
>>
>> diff --git a/lib/zlib_inflate/inffast.c b/lib/zlib_inflate/inffast.c
>> index f19c4fbe1be7..2843f9bb42ac 100644
>> --- a/lib/zlib_inflate/inffast.c
>> +++ b/lib/zlib_inflate/inffast.c
>> @@ -253,13 +253,12 @@ void inflate_fast(z_streamp strm, unsigned start)
>>   
>>   			sfrom = (unsigned short *)(from);
>>   			loops = len >> 1;
>> -			do
>> -#ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
>> -			    *sout++ = *sfrom++;
>> -#else
>> -			    *sout++ = get_unaligned16(sfrom++);
>> -#endif
>> -			while (--loops);
>> +			do {
>> +			    if (IS_ENABLED(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS))
>> +				*sout++ = *sfrom++;
>> +			    else
>> +				*sout++ = get_unaligned16(sfrom++);
>> +			} while (--loops);
>>   			out = (unsigned char *)sout;
>>   			from = (unsigned char *)sfrom;
>>   		    } else { /* dist == 1 or dist == 2 */
>> -- 
>> 2.33.0

Just for the record,

I compared both object files by running `objdump -d`, and the result is 
the same.

The binary differed (`vbindiff`), but I guess this is due to the 
increased revision (`make bindeb-pkg`).

without a change (Linus’ current master):

0000 0B50: 00 00 00 00 00 00 00 00  1F 01 00 00 36 00 00 00  ........ 
....6...
                                      ^
0000 0B60: 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  ........ 
........
0000 0B70: 00 00 00 00 00 00 00 00  29 01 00 00 32 00 00 00  ........ 
)...2...
                                      ^

v2 (ternary operator):

0000 0B50: 00 00 00 00 00 00 00 00  1C 01 00 00 36 00 00 00  ........ 
....6...
0000 0B60: 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  ........ 
........
0000 0B70: 00 00 00 00 00 00 00 00  26 01 00 00 32 00 00 00  ........ 
&...2...

v3 (if-else statement):

0000 0B50: 00 00 00 00 00 00 00 00  1E 01 00 00 36 00 00 00  ........ 
....6...
0000 0B60: 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  ........ 
........
0000 0B70: 00 00 00 00 00 00 00 00  28 01 00 00 32 00 00 00  ........ 
(...2...


Kind regards,

Paul

^ permalink raw reply

* [PATCH] mm: Remove HARDENED_USERCOPY_FALLBACK
From: Stephen Kitt @ 2021-09-21  6:11 UTC (permalink / raw)
  To: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, Vlastimil Babka, James Morris, Serge E . Hallyn
  Cc: Stephen Kitt, Kees Cook, linux-kernel, linux-mm,
	linux-security-module, linux-hardening, linuxppc-dev

This has served its purpose and is no longer used. All usercopy
violations appear to have been handled by now, any remaining
instances (or new bugs) will cause copies to be rejected.

This isn't a direct revert of commit 2d891fbc3bb6 ("usercopy: Allow
strict enforcement of whitelists"); since usercopy_fallback is
effectively 0, the fallback handling is removed too.

This also removes the usercopy_fallback module parameter on
slab_common.

Link: https://github.com/KSPP/linux/issues/153
Signed-off-by: Stephen Kitt <steve@sk2.org>
Suggested-by: Kees Cook <keescook@chromium.org>
---
 arch/powerpc/configs/skiroot_defconfig |  1 -
 include/linux/slab.h                   |  2 --
 mm/slab.c                              | 13 -------------
 mm/slab_common.c                       |  8 --------
 mm/slub.c                              | 14 --------------
 security/Kconfig                       | 14 --------------
 6 files changed, 52 deletions(-)

diff --git a/arch/powerpc/configs/skiroot_defconfig b/arch/powerpc/configs/skiroot_defconfig
index b806a5d3a695..c3ba614c973d 100644
--- a/arch/powerpc/configs/skiroot_defconfig
+++ b/arch/powerpc/configs/skiroot_defconfig
@@ -275,7 +275,6 @@ CONFIG_NLS_UTF8=y
 CONFIG_ENCRYPTED_KEYS=y
 CONFIG_SECURITY=y
 CONFIG_HARDENED_USERCOPY=y
-# CONFIG_HARDENED_USERCOPY_FALLBACK is not set
 CONFIG_HARDENED_USERCOPY_PAGESPAN=y
 CONFIG_FORTIFY_SOURCE=y
 CONFIG_SECURITY_LOCKDOWN_LSM=y
diff --git a/include/linux/slab.h b/include/linux/slab.h
index 0c97d788762c..5b21515afae0 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -142,8 +142,6 @@ struct mem_cgroup;
 void __init kmem_cache_init(void);
 bool slab_is_available(void);
 
-extern bool usercopy_fallback;
-
 struct kmem_cache *kmem_cache_create(const char *name, unsigned int size,
 			unsigned int align, slab_flags_t flags,
 			void (*ctor)(void *));
diff --git a/mm/slab.c b/mm/slab.c
index d0f725637663..4d826394ffcb 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -4207,19 +4207,6 @@ void __check_heap_object(const void *ptr, unsigned long n, struct page *page,
 	    n <= cachep->useroffset - offset + cachep->usersize)
 		return;
 
-	/*
-	 * If the copy is still within the allocated object, produce
-	 * a warning instead of rejecting the copy. This is intended
-	 * to be a temporary method to find any missing usercopy
-	 * whitelists.
-	 */
-	if (usercopy_fallback &&
-	    offset <= cachep->object_size &&
-	    n <= cachep->object_size - offset) {
-		usercopy_warn("SLAB object", cachep->name, to_user, offset, n);
-		return;
-	}
-
 	usercopy_abort("SLAB object", cachep->name, to_user, offset, n);
 }
 #endif /* CONFIG_HARDENED_USERCOPY */
diff --git a/mm/slab_common.c b/mm/slab_common.c
index a4a571428c51..925b00c1d4e8 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -37,14 +37,6 @@ LIST_HEAD(slab_caches);
 DEFINE_MUTEX(slab_mutex);
 struct kmem_cache *kmem_cache;
 
-#ifdef CONFIG_HARDENED_USERCOPY
-bool usercopy_fallback __ro_after_init =
-		IS_ENABLED(CONFIG_HARDENED_USERCOPY_FALLBACK);
-module_param(usercopy_fallback, bool, 0400);
-MODULE_PARM_DESC(usercopy_fallback,
-		"WARN instead of reject usercopy whitelist violations");
-#endif
-
 static LIST_HEAD(slab_caches_to_rcu_destroy);
 static void slab_caches_to_rcu_destroy_workfn(struct work_struct *work);
 static DECLARE_WORK(slab_caches_to_rcu_destroy_work,
diff --git a/mm/slub.c b/mm/slub.c
index 3f96e099817a..77f53e76a3c3 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -4125,7 +4125,6 @@ void __check_heap_object(const void *ptr, unsigned long n, struct page *page,
 {
 	struct kmem_cache *s;
 	unsigned int offset;
-	size_t object_size;
 	bool is_kfence = is_kfence_address(ptr);
 
 	ptr = kasan_reset_tag(ptr);
@@ -4158,19 +4157,6 @@ void __check_heap_object(const void *ptr, unsigned long n, struct page *page,
 	    n <= s->useroffset - offset + s->usersize)
 		return;
 
-	/*
-	 * If the copy is still within the allocated object, produce
-	 * a warning instead of rejecting the copy. This is intended
-	 * to be a temporary method to find any missing usercopy
-	 * whitelists.
-	 */
-	object_size = slab_ksize(s);
-	if (usercopy_fallback &&
-	    offset <= object_size && n <= object_size - offset) {
-		usercopy_warn("SLUB object", s->name, to_user, offset, n);
-		return;
-	}
-
 	usercopy_abort("SLUB object", s->name, to_user, offset, n);
 }
 #endif /* CONFIG_HARDENED_USERCOPY */
diff --git a/security/Kconfig b/security/Kconfig
index 0ced7fd33e4d..d9698900c9b7 100644
--- a/security/Kconfig
+++ b/security/Kconfig
@@ -163,20 +163,6 @@ config HARDENED_USERCOPY
 	  or are part of the kernel text. This kills entire classes
 	  of heap overflow exploits and similar kernel memory exposures.
 
-config HARDENED_USERCOPY_FALLBACK
-	bool "Allow usercopy whitelist violations to fallback to object size"
-	depends on HARDENED_USERCOPY
-	default y
-	help
-	  This is a temporary option that allows missing usercopy whitelists
-	  to be discovered via a WARN() to the kernel log, instead of
-	  rejecting the copy, falling back to non-whitelisted hardened
-	  usercopy that checks the slab allocation size instead of the
-	  whitelist size. This option will be removed once it seems like
-	  all missing usercopy whitelists have been identified and fixed.
-	  Booting with "slab_common.usercopy_fallback=Y/N" can change
-	  this setting.
-
 config HARDENED_USERCOPY_PAGESPAN
 	bool "Refuse to copy allocations that span multiple pages"
 	depends on HARDENED_USERCOPY

base-commit: 368094df48e680fa51cedb68537408cfa64b788e
-- 
2.30.2


^ permalink raw reply related

* [PATCH] powerpc/paravirt: correct preempt debug splat in vcpu_is_preempted()
From: Nathan Lynch @ 2021-09-21  3:12 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: srikar, npiggin

vcpu_is_preempted() can be used outside of preempt-disabled critical
sections, yielding warnings such as:

BUG: using smp_processor_id() in preemptible [00000000] code: systemd-udevd/185
caller is rwsem_spin_on_owner+0x1cc/0x2d0
CPU: 1 PID: 185 Comm: systemd-udevd Not tainted 5.15.0-rc2+ #33
Call Trace:
[c000000012907ac0] [c000000000aa30a8] dump_stack_lvl+0xac/0x108 (unreliable)
[c000000012907b00] [c000000001371f70] check_preemption_disabled+0x150/0x160
[c000000012907b90] [c0000000001e0e8c] rwsem_spin_on_owner+0x1cc/0x2d0
[c000000012907be0] [c0000000001e1408] rwsem_down_write_slowpath+0x478/0x9a0
[c000000012907ca0] [c000000000576cf4] filename_create+0x94/0x1e0
[c000000012907d10] [c00000000057ac08] do_symlinkat+0x68/0x1a0
[c000000012907d70] [c00000000057ae18] sys_symlink+0x58/0x70
[c000000012907da0] [c00000000002e448] system_call_exception+0x198/0x3c0
[c000000012907e10] [c00000000000c54c] system_call_common+0xec/0x250

The result of vcpu_is_preempted() is always subject to invalidation by
events inside and outside of Linux; it's just a best guess at a point in
time. Use raw_smp_processor_id() to avoid such warnings.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Fixes: ca3f969dcb11 ("powerpc/paravirt: Use is_kvm_guest() in vcpu_is_preempted()")
---
 arch/powerpc/include/asm/paravirt.h | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/paravirt.h b/arch/powerpc/include/asm/paravirt.h
index bcb7b5f917be..e429aca566de 100644
--- a/arch/powerpc/include/asm/paravirt.h
+++ b/arch/powerpc/include/asm/paravirt.h
@@ -97,7 +97,14 @@ static inline bool vcpu_is_preempted(int cpu)
 
 #ifdef CONFIG_PPC_SPLPAR
 	if (!is_kvm_guest()) {
-		int first_cpu = cpu_first_thread_sibling(smp_processor_id());
+		int first_cpu;
+
+		/*
+		 * This is only a guess at best, and this function may be
+		 * called with preemption enabled. Using raw_smp_processor_id()
+		 * does not damage accuracy.
+		 */
+		first_cpu = cpu_first_thread_sibling(raw_smp_processor_id());
 
 		/*
 		 * Preemption can only happen at core granularity. This CPU
-- 
2.31.1


^ permalink raw reply related

* Re: [PATCH 1/2] powerpc/perf: Expose instruction and data address registers as part of extended regs
From: Athira Rajeev @ 2021-09-21  3:01 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: Madhavan Srinivasan, rnsastry, kajoljain,
	Arnaldo Carvalho de Melo, Jiri Olsa, linuxppc-dev
In-Reply-To: <87ilyviusy.fsf@mpe.ellerman.id.au>



> On 20-Sep-2021, at 12:43 PM, Michael Ellerman <mpe@ellerman.id.au> wrote:
> 
> Athira Rajeev <atrajeev@linux.vnet.ibm.com> writes:
>>> On 08-Sep-2021, at 10:47 AM, Michael Ellerman <mpe@ellerman.id.au> wrote:
>>> 
>>> Athira Rajeev <atrajeev@linux.vnet.ibm.com> writes:
>>>> Patch adds support to include Sampled Instruction Address Register
>>>> (SIAR) and Sampled Data Address Register (SDAR) SPRs as part of extended
>>>> registers. Update the definition of PERF_REG_PMU_MASK_300/31 and
>>>> PERF_REG_EXTENDED_MAX to include these SPR's.
>>>> 
>>>> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
>>>> ---
>>>> arch/powerpc/include/uapi/asm/perf_regs.h | 12 +++++++-----
>>>> arch/powerpc/perf/perf_regs.c             |  4 ++++
>>>> 2 files changed, 11 insertions(+), 5 deletions(-)
>>>> 
>>> ...
>>>> diff --git a/arch/powerpc/perf/perf_regs.c b/arch/powerpc/perf/perf_regs.c
>>>> index b931eed..51d31b6 100644
>>>> --- a/arch/powerpc/perf/perf_regs.c
>>>> +++ b/arch/powerpc/perf/perf_regs.c
>>>> @@ -90,7 +90,11 @@ static u64 get_ext_regs_value(int idx)
>>>> 		return mfspr(SPRN_SIER2);
>>>> 	case PERF_REG_POWERPC_SIER3:
>>>> 		return mfspr(SPRN_SIER3);
>>>> +	case PERF_REG_POWERPC_SDAR:
>>>> +		return mfspr(SPRN_SDAR);
>>>> #endif
>>>> +	case PERF_REG_POWERPC_SIAR:
>>>> +		return mfspr(SPRN_SIAR);
>>>> 	default: return 0;
>>>> 	}
>>> 
>>> This file is built for all powerpc configs that have PERF_EVENTS. Which
>>> includes CPUs that don't have SDAR or SIAR.
>>> 
>>> Don't we need checks in perf_reg_value() like we do for SIER?
>> 
>> Hi Michael,
>> 
>> Thanks for the review.
>> 
>> SIER is part of PERF_REG_PMU_MASK and hence check is needed to see if platform supports SIER.
>> Incase of extended regs, they are part of PERF_REG_EXTENDED_MASK and this mask is
>> filled with supported registers while registering the PMU ( ie during init_power9/10_pmu ). So these registers will be added
>> only for supported platforms. The validity of extended mask is also done in PMU common code 
>> ( In kernel/events/core.c with PERF_REG_EXTENDED_MASK check ). So an unsupported platform requesting for extended
>> registers won’t get it.
> 
> Right, I'd forgotten how that works.
> 
> But I think part of the reason I didn't remember is that
> PERF_REG_PMU_MASK_31 doesn't mention those regs by name, it's just a hex
> constant, ie:
> 
> -#define PERF_REG_PMU_MASK_31   (0xfffULL << PERF_REG_POWERPC_MMCR0)
> +#define PERF_REG_PMU_MASK_31   (0x3fffULL << PERF_REG_POWERPC_MMCR0)
> 
> Presumably you tested that the added 0x3 there sets the right bits for
> SDAR and SIAR, but it's 1) not obvious and 2) fragile.
> 
> So I'd like it better if we constructed the PERF_REG_PMU_MASK_31, and
> other similar masks, by or'ing together the actual register value
> constants.
> 
> eg. something like:
> 
> #define PERF_REG_PMU_MASK_31	\
> 	((1ul << PERF_REG_POWERPC_MMCR0) | (1ul << PERF_REG_POWERPC_MMCR1) | \
> 	(1ul << PERF_REG_POWERPC_MMCR2) | (1ul << PERF_REG_POWERPC_MMCR3) | \
> 	(1ul << PERF_REG_POWERPC_SIER2) | (1ul << PERF_REG_POWERPC_SIER3) | \
> 	(1ul << PERF_REG_POWERPC_PMC1) | (1ul << PERF_REG_POWERPC_PMC2) | \
> 	(1ul << PERF_REG_POWERPC_PMC3) | (1ul << PERF_REG_POWERPC_PMC4) | \
> 	(1ul << PERF_REG_POWERPC_PMC5) | (1ul << PERF_REG_POWERPC_PMC6))
> 
> 
> Also PERF_REG_EXTENDED_MAX should be part of the enum, just like
> PERF_REG_POWERPC_MAX.

Sure Michael,

I will address these changes in next version

Thanks
Athira
> 
> cheers


^ permalink raw reply

* Re: [PATCH 3/3] powerpc/pseries/cpuhp: delete add/remove_by_count code
From: Nathan Lynch @ 2021-09-21  0:43 UTC (permalink / raw)
  To: Daniel Henrique Barboza, linuxppc-dev; +Cc: tyreld, ldufour, aneesh.kumar
In-Reply-To: <19c1f412-ef08-f1ea-8c6b-45921bd1f627@gmail.com>

Daniel Henrique Barboza <danielhb413@gmail.com> writes:

> On 9/20/21 10:55, Nathan Lynch wrote:
>> The core DLPAR code supports two actions (add and remove) and three
>> subtypes of action:
>> 
>> * By DRC index: the action is attempted on a single specified resource.
>>    This is the usual case for processors.
>> * By indexed count: the action is attempted on a range of resources
>>    beginning at the specified index. This is implemented only by the memory
>>    DLPAR code.
>> * By count: the lower layer (CPU or memory) is responsible for locating the
>>    specified number of resources to which the action can be applied.
>> 
>> I cannot find any evidence of the "by count" subtype being used by drmgr or
>> qemu for processors. And when I try to exercise this code, the add case
>> does not work:
>
>
> Just to clarify: did you check both CPU and memory cases and found out that the
> 'by count' subtype isn't used with CPUs, but drmgr has some cases in which
> 'by count' is used with LMBs?

Yes, drmgr uses both the 'by count' and the 'by index' methods for
memory currently on PowerVM.

> I'm asking because I worked with a part of the LMB removal code a few months ago,
> and got stuck in a situation in which the 'by count' and 'by indexed count' are
> similar enough to feel repetitive, but distinct enough to not be easily reduced
> into a single function. If drmgr wasn't using the 'by count' subtypes for LMBs
> that would be a good chance for more code redux.

The 'by count' method is definitely used for memory on PowerVM. I was
under the impression that the 'by indexed count' method was used by qemu
for memory sometimes; I'm pretty sure it's not used on PowerVM.

>> Summary:
>> 
>> * This code has not worked reliably since its introduction.
>> * There is no evidence that it is used.
>> * It contains questionable rollback behaviors in error paths which are
>>    difficult to test.
>> 
>> So let's remove it.
>> 
>> Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
>> Fixes: ac71380071d1 ("powerpc/pseries: Add CPU dlpar remove functionality")
>> Fixes: 90edf184b9b7 ("powerpc/pseries: Add CPU dlpar add functionality")
>> Fixes: b015f6bc9547 ("powerpc/pseries: Add cpu DLPAR support for drc-info property")
>> ---
>
> Tested with a QEMU pseries guest, no issues found.
>
>
> Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
> Tested-by: Daniel Henrique Barboza <danielhb413@gmail.com>

Thanks!

^ permalink raw reply

* Re: [PATCH] powerpc/476: Fix sparse report
From: Alistair Popple @ 2021-09-21  0:38 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	Christophe Leroy
  Cc: linuxppc-dev, linux-kernel, kernel test robot
In-Reply-To: <aa6055769b92a5d8685b8d0adab99c48a0b0ef4b.1631956926.git.christophe.leroy@csgroup.eu>

Thanks, looks reasonable.

Reviewed-by: Alistair Popple <alistair@popple.id.au>

On Saturday, 18 September 2021 7:22:32 PM AEST Christophe Leroy wrote:
> 	arch/powerpc/platforms/44x/ppc476.c:236:17: warning: cast removes address space '__iomem' of expression
> 	arch/powerpc/platforms/44x/ppc476.c:241:34: warning: incorrect type in argument 1 (different address spaces)
> 	arch/powerpc/platforms/44x/ppc476.c:241:34:    expected void const volatile [noderef] __iomem *addr
> 	arch/powerpc/platforms/44x/ppc476.c:241:34:    got unsigned char [usertype] *
> 	arch/powerpc/platforms/44x/ppc476.c:243:17: warning: incorrect type in argument 1 (different address spaces)
> 	arch/powerpc/platforms/44x/ppc476.c:243:17:    expected void volatile [noderef] __iomem *addr
> 	arch/powerpc/platforms/44x/ppc476.c:243:17:    got unsigned char [usertype] *[assigned] fpga
> 
> Mark 'fpga' pointer as __iomem.
> 
> Reported-by: kernel test robot <lkp@intel.com>
> Fixes: ab9a4183fddf ("powerpc: Update currituck pci/usb fixup for new board revision")
> Cc: Alistair Popple <alistair@popple.id.au>
> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
> ---
>  arch/powerpc/platforms/44x/ppc476.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/44x/ppc476.c b/arch/powerpc/platforms/44x/ppc476.c
> index 07f7e3ce67b5..fb7db5cedd4e 100644
> --- a/arch/powerpc/platforms/44x/ppc476.c
> +++ b/arch/powerpc/platforms/44x/ppc476.c
> @@ -219,7 +219,7 @@ static int board_rev = -1;
>  static int __init ppc47x_get_board_rev(void)
>  {
>  	int reg;
> -	u8 *fpga;
> +	u8 __iomem *fpga;
>  	struct device_node *np = NULL;
>  
>  	if (of_machine_is_compatible("ibm,currituck")) {
> @@ -233,7 +233,7 @@ static int __init ppc47x_get_board_rev(void)
>  	if (!np)
>  		goto fail;
>  
> -	fpga = (u8 *) of_iomap(np, 0);
> +	fpga = of_iomap(np, 0);
>  	of_node_put(np);
>  	if (!fpga)
>  		goto fail;
> 





^ permalink raw reply

* Re: [PATCH] KVM: PPC: Replace zero-length array with flexible array member
From: Gustavo A. R. Silva @ 2021-09-21  0:05 UTC (permalink / raw)
  To: Len Baker, Paul Mackerras, Michael Ellerman,
	Benjamin Herrenschmidt
  Cc: Kees Cook, linux-kernel, kvm-ppc, Gustavo A. R. Silva,
	linux-hardening, linuxppc-dev
In-Reply-To: <20210918142138.17709-1-len.baker@gmx.com>



On 9/18/21 09:21, Len Baker wrote:
> There is a regular need in the kernel to provide a way to declare having
> a dynamically sized set of trailing elements in a structure. Kernel code
> should always use "flexible array members" [1] for these cases. The
> older style of one-element or zero-length arrays should no longer be
> used[2].
> 
> Also, make use of the struct_size() helper in kzalloc().
> 
> [1] https://en.wikipedia.org/wiki/Flexible_array_member
> [2] https://www.kernel.org/doc/html/latest/process/deprecated.html#zero-length-and-one-element-arrays
> 
> Signed-off-by: Len Baker <len.baker@gmx.com>

Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>

Thanks
--
Gustavo

> ---
>  arch/powerpc/include/asm/kvm_host.h | 2 +-
>  arch/powerpc/kvm/book3s_64_vio.c    | 3 +--
>  2 files changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
> index 080a7feb7731..3aed653373a5 100644
> --- a/arch/powerpc/include/asm/kvm_host.h
> +++ b/arch/powerpc/include/asm/kvm_host.h
> @@ -190,7 +190,7 @@ struct kvmppc_spapr_tce_table {
>  	u64 size;		/* window size in pages */
>  	struct list_head iommu_tables;
>  	struct mutex alloc_lock;
> -	struct page *pages[0];
> +	struct page *pages[];
>  };
> 
>  /* XICS components, defined in book3s_xics.c */
> diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
> index 6365087f3160..d42b4b6d4a79 100644
> --- a/arch/powerpc/kvm/book3s_64_vio.c
> +++ b/arch/powerpc/kvm/book3s_64_vio.c
> @@ -295,8 +295,7 @@ long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
>  		return ret;
> 
>  	ret = -ENOMEM;
> -	stt = kzalloc(sizeof(*stt) + npages * sizeof(struct page *),
> -		      GFP_KERNEL);
> +	stt = kzalloc(struct_size(stt, pages, npages), GFP_KERNEL);
>  	if (!stt)
>  		goto fail_acct;
> 
> --
> 2.25.1
> 

^ permalink raw reply

* Re: [PATCH 3/3] powerpc/pseries/cpuhp: delete add/remove_by_count code
From: Daniel Henrique Barboza @ 2021-09-21  0:18 UTC (permalink / raw)
  To: Nathan Lynch, linuxppc-dev; +Cc: tyreld, ldufour, aneesh.kumar
In-Reply-To: <20210920135504.1792219-4-nathanl@linux.ibm.com>



On 9/20/21 10:55, Nathan Lynch wrote:
> The core DLPAR code supports two actions (add and remove) and three
> subtypes of action:
> 
> * By DRC index: the action is attempted on a single specified resource.
>    This is the usual case for processors.
> * By indexed count: the action is attempted on a range of resources
>    beginning at the specified index. This is implemented only by the memory
>    DLPAR code.
> * By count: the lower layer (CPU or memory) is responsible for locating the
>    specified number of resources to which the action can be applied.
> 
> I cannot find any evidence of the "by count" subtype being used by drmgr or
> qemu for processors. And when I try to exercise this code, the add case
> does not work:


Just to clarify: did you check both CPU and memory cases and found out that the
'by count' subtype isn't used with CPUs, but drmgr has some cases in which
'by count' is used with LMBs?

I'm asking because I worked with a part of the LMB removal code a few months ago,
and got stuck in a situation in which the 'by count' and 'by indexed count' are
similar enough to feel repetitive, but distinct enough to not be easily reduced
into a single function. If drmgr wasn't using the 'by count' subtypes for LMBs
that would be a good chance for more code redux.


> 
>    $ ppc64_cpu --smt ; nproc
>    SMT=8
>    24
>    $ printf "cpu remove count 2" > /sys/kernel/dlpar
>    $ nproc
>    8
>    $ printf "cpu add count 2" > /sys/kernel/dlpar
>    -bash: printf: write error: Invalid argument
>    $ dmesg | tail -2
>    pseries-hotplug-cpu: Failed to find enough CPUs (1 of 2) to add
>    dlpar: Could not handle DLPAR request "cpu add count 2"
>    $ nproc
>    8
>    $ drmgr -c cpu -a -q 2         # this uses the by-index method
>    Validating CPU DLPAR capability...yes.
>    CPU 1
>    CPU 17
>    $ nproc
>    24
> 
> This is because find_drc_info_cpus_to_add() does not increment drc_index
> appropriately during its search.
> 
> This is not hard to fix. But the _by_count() functions also have the
> property that they attempt to roll back all prior operations if the entire
> request cannot be satisfied, even though the rollback itself can encounter
> errors. It's not possible to provide transaction-like behavior at this
> level, and it's undesirable to have code that can only pretend to do that.
> Any users of these functions cannot know what the state of the system is in
> the error case. And the error paths are, to my knowledge, impossible to
> test without adding custom error injection code.
> 
> Summary:
> 
> * This code has not worked reliably since its introduction.
> * There is no evidence that it is used.
> * It contains questionable rollback behaviors in error paths which are
>    difficult to test.
> 
> So let's remove it.
> 
> Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
> Fixes: ac71380071d1 ("powerpc/pseries: Add CPU dlpar remove functionality")
> Fixes: 90edf184b9b7 ("powerpc/pseries: Add CPU dlpar add functionality")
> Fixes: b015f6bc9547 ("powerpc/pseries: Add cpu DLPAR support for drc-info property")
> ---

Tested with a QEMU pseries guest, no issues found.


Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Tested-by: Daniel Henrique Barboza <danielhb413@gmail.com>



>   arch/powerpc/platforms/pseries/hotplug-cpu.c | 218 +------------------
>   1 file changed, 2 insertions(+), 216 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c b/arch/powerpc/platforms/pseries/hotplug-cpu.c
> index 87a0fbe9cf12..768997261ce8 100644
> --- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
> +++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
> @@ -741,216 +741,6 @@ static int dlpar_cpu_remove_by_index(u32 drc_index)
>   	return rc;
>   }
>   
> -static int find_dlpar_cpus_to_remove(u32 *cpu_drcs, int cpus_to_remove)
> -{
> -	struct device_node *dn;
> -	int cpus_found = 0;
> -	int rc;
> -
> -	/* We want to find cpus_to_remove + 1 CPUs to ensure we do not
> -	 * remove the last CPU.
> -	 */
> -	for_each_node_by_type(dn, "cpu") {
> -		cpus_found++;
> -
> -		if (cpus_found > cpus_to_remove) {
> -			of_node_put(dn);
> -			break;
> -		}
> -
> -		/* Note that cpus_found is always 1 ahead of the index
> -		 * into the cpu_drcs array, so we use cpus_found - 1
> -		 */
> -		rc = of_property_read_u32(dn, "ibm,my-drc-index",
> -					  &cpu_drcs[cpus_found - 1]);
> -		if (rc) {
> -			pr_warn("Error occurred getting drc-index for %pOFn\n",
> -				dn);
> -			of_node_put(dn);
> -			return -1;
> -		}
> -	}
> -
> -	if (cpus_found < cpus_to_remove) {
> -		pr_warn("Failed to find enough CPUs (%d of %d) to remove\n",
> -			cpus_found, cpus_to_remove);
> -	} else if (cpus_found == cpus_to_remove) {
> -		pr_warn("Cannot remove all CPUs\n");
> -	}
> -
> -	return cpus_found;
> -}
> -
> -static int dlpar_cpu_remove_by_count(u32 cpus_to_remove)
> -{
> -	u32 *cpu_drcs;
> -	int cpus_found;
> -	int cpus_removed = 0;
> -	int i, rc;
> -
> -	pr_debug("Attempting to hot-remove %d CPUs\n", cpus_to_remove);
> -
> -	cpu_drcs = kcalloc(cpus_to_remove, sizeof(*cpu_drcs), GFP_KERNEL);
> -	if (!cpu_drcs)
> -		return -EINVAL;
> -
> -	cpus_found = find_dlpar_cpus_to_remove(cpu_drcs, cpus_to_remove);
> -	if (cpus_found <= cpus_to_remove) {
> -		kfree(cpu_drcs);
> -		return -EINVAL;
> -	}
> -
> -	for (i = 0; i < cpus_to_remove; i++) {
> -		rc = dlpar_cpu_remove_by_index(cpu_drcs[i]);
> -		if (rc)
> -			break;
> -
> -		cpus_removed++;
> -	}
> -
> -	if (cpus_removed != cpus_to_remove) {
> -		pr_warn("CPU hot-remove failed, adding back removed CPUs\n");
> -
> -		for (i = 0; i < cpus_removed; i++)
> -			dlpar_cpu_add(cpu_drcs[i]);
> -
> -		rc = -EINVAL;
> -	} else {
> -		rc = 0;
> -	}
> -
> -	kfree(cpu_drcs);
> -	return rc;
> -}
> -
> -static int find_drc_info_cpus_to_add(struct device_node *cpus,
> -				     struct property *info,
> -				     u32 *cpu_drcs, u32 cpus_to_add)
> -{
> -	struct of_drc_info drc;
> -	const __be32 *value;
> -	u32 count, drc_index;
> -	int cpus_found = 0;
> -	int i, j;
> -
> -	if (!info)
> -		return -1;
> -
> -	value = of_prop_next_u32(info, NULL, &count);
> -	if (value)
> -		value++;
> -
> -	for (i = 0; i < count; i++) {
> -		of_read_drc_info_cell(&info, &value, &drc);
> -		if (strncmp(drc.drc_type, "CPU", 3))
> -			break;
> -
> -		drc_index = drc.drc_index_start;
> -		for (j = 0; j < drc.num_sequential_elems; j++) {
> -			if (dlpar_cpu_exists(cpus, drc_index))
> -				continue;
> -
> -			cpu_drcs[cpus_found++] = drc_index;
> -
> -			if (cpus_found == cpus_to_add)
> -				return cpus_found;
> -
> -			drc_index += drc.sequential_inc;
> -		}
> -	}
> -
> -	return cpus_found;
> -}
> -
> -static int find_drc_index_cpus_to_add(struct device_node *cpus,
> -				      u32 *cpu_drcs, u32 cpus_to_add)
> -{
> -	int cpus_found = 0;
> -	int index, rc;
> -	u32 drc_index;
> -
> -	/* Search the ibm,drc-indexes array for possible CPU drcs to
> -	 * add. Note that the format of the ibm,drc-indexes array is
> -	 * the number of entries in the array followed by the array
> -	 * of drc values so we start looking at index = 1.
> -	 */
> -	index = 1;
> -	while (cpus_found < cpus_to_add) {
> -		rc = of_property_read_u32_index(cpus, "ibm,drc-indexes",
> -						index++, &drc_index);
> -
> -		if (rc)
> -			break;
> -
> -		if (dlpar_cpu_exists(cpus, drc_index))
> -			continue;
> -
> -		cpu_drcs[cpus_found++] = drc_index;
> -	}
> -
> -	return cpus_found;
> -}
> -
> -static int dlpar_cpu_add_by_count(u32 cpus_to_add)
> -{
> -	struct device_node *parent;
> -	struct property *info;
> -	u32 *cpu_drcs;
> -	int cpus_added = 0;
> -	int cpus_found;
> -	int i, rc;
> -
> -	pr_debug("Attempting to hot-add %d CPUs\n", cpus_to_add);
> -
> -	cpu_drcs = kcalloc(cpus_to_add, sizeof(*cpu_drcs), GFP_KERNEL);
> -	if (!cpu_drcs)
> -		return -EINVAL;
> -
> -	parent = of_find_node_by_path("/cpus");
> -	if (!parent) {
> -		pr_warn("Could not find CPU root node in device tree\n");
> -		kfree(cpu_drcs);
> -		return -1;
> -	}
> -
> -	info = of_find_property(parent, "ibm,drc-info", NULL);
> -	if (info)
> -		cpus_found = find_drc_info_cpus_to_add(parent, info, cpu_drcs, cpus_to_add);
> -	else
> -		cpus_found = find_drc_index_cpus_to_add(parent, cpu_drcs, cpus_to_add);
> -
> -	of_node_put(parent);
> -
> -	if (cpus_found < cpus_to_add) {
> -		pr_warn("Failed to find enough CPUs (%d of %d) to add\n",
> -			cpus_found, cpus_to_add);
> -		kfree(cpu_drcs);
> -		return -EINVAL;
> -	}
> -
> -	for (i = 0; i < cpus_to_add; i++) {
> -		rc = dlpar_cpu_add(cpu_drcs[i]);
> -		if (rc)
> -			break;
> -
> -		cpus_added++;
> -	}
> -
> -	if (cpus_added < cpus_to_add) {
> -		pr_warn("CPU hot-add failed, removing any added CPUs\n");
> -
> -		for (i = 0; i < cpus_added; i++)
> -			dlpar_cpu_remove_by_index(cpu_drcs[i]);
> -
> -		rc = -EINVAL;
> -	} else {
> -		rc = 0;
> -	}
> -
> -	kfree(cpu_drcs);
> -	return rc;
> -}
> -
>   int dlpar_cpu(struct pseries_hp_errorlog *hp_elog)
>   {
>   	u32 count, drc_index;
> @@ -963,9 +753,7 @@ int dlpar_cpu(struct pseries_hp_errorlog *hp_elog)
>   
>   	switch (hp_elog->action) {
>   	case PSERIES_HP_ELOG_ACTION_REMOVE:
> -		if (hp_elog->id_type == PSERIES_HP_ELOG_ID_DRC_COUNT)
> -			rc = dlpar_cpu_remove_by_count(count);
> -		else if (hp_elog->id_type == PSERIES_HP_ELOG_ID_DRC_INDEX) {
> +		if (hp_elog->id_type == PSERIES_HP_ELOG_ID_DRC_INDEX) {
>   			rc = dlpar_cpu_remove_by_index(drc_index);
>   			/*
>   			 * Setting the isolation state of an UNISOLATED/CONFIGURED
> @@ -979,9 +767,7 @@ int dlpar_cpu(struct pseries_hp_errorlog *hp_elog)
>   			rc = -EINVAL;
>   		break;
>   	case PSERIES_HP_ELOG_ACTION_ADD:
> -		if (hp_elog->id_type == PSERIES_HP_ELOG_ID_DRC_COUNT)
> -			rc = dlpar_cpu_add_by_count(count);
> -		else if (hp_elog->id_type == PSERIES_HP_ELOG_ID_DRC_INDEX)
> +		if (hp_elog->id_type == PSERIES_HP_ELOG_ID_DRC_INDEX)
>   			rc = dlpar_cpu_add(drc_index);
>   		else
>   			rc = -EINVAL;
> 

^ permalink raw reply

* Re: [PATCH 2/3] powerpc/cpuhp: BUG -> WARN conversion in offline path
From: Daniel Henrique Barboza @ 2021-09-21  0:05 UTC (permalink / raw)
  To: Nathan Lynch, linuxppc-dev; +Cc: tyreld, ldufour, aneesh.kumar
In-Reply-To: <20210920135504.1792219-3-nathanl@linux.ibm.com>



On 9/20/21 10:55, Nathan Lynch wrote:
> If, due to bugs elsewhere, we get into unregister_cpu_online() with a CPU
> that isn't marked hotpluggable, we can emit a warning and return an
> appropriate error instead of crashing.
> 
> Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
> ---

As mentioned by Christophe this will not solve the crash for kernels with
panic_on_warn, but at least it will panic with a clearer message on those
and will not panic for everyone else.


Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>



>   arch/powerpc/kernel/sysfs.c | 3 ++-
>   1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/kernel/sysfs.c b/arch/powerpc/kernel/sysfs.c
> index defecb3b1b15..08d8072d6e7a 100644
> --- a/arch/powerpc/kernel/sysfs.c
> +++ b/arch/powerpc/kernel/sysfs.c
> @@ -928,7 +928,8 @@ static int unregister_cpu_online(unsigned int cpu)
>   	struct device_attribute *attrs, *pmc_attrs;
>   	int i, nattrs;
>   
> -	BUG_ON(!c->hotpluggable);
> +	if (WARN_RATELIMIT(!c->hotpluggable, "cpu %d can't be offlined\n", cpu))
> +		return -EBUSY;
>   
>   #ifdef CONFIG_PPC64
>   	if (cpu_has_feature(CPU_FTR_SMT))
> 

^ permalink raw reply

* Re: [PATCH 1/3] powerpc/pseries/cpuhp: cache node corrections
From: Daniel Henrique Barboza @ 2021-09-20 23:59 UTC (permalink / raw)
  To: Nathan Lynch, linuxppc-dev; +Cc: tyreld, ldufour, aneesh.kumar
In-Reply-To: <20210920135504.1792219-2-nathanl@linux.ibm.com>



On 9/20/21 10:55, Nathan Lynch wrote:
> On pseries, cache nodes in the device tree can be added and removed by the
> CPU DLPAR code as well as the partition migration (mobility) code. PowerVM
> partitions in dedicated processor mode typically have L2 and L3 cache
> nodes.
> 
> The CPU DLPAR code has the following shortcomings:
> 
> * Cache nodes returned as siblings of a new CPU node by
>    ibm,configure-connector are silently discarded; only the CPU node is
>    added to the device tree.
> 
> * Cache nodes which become unreferenced in the processor removal path are
>    not removed from the device tree. This can lead to duplicate nodes when
>    the post-migration device tree update code replaces cache nodes.
> 
> This is long-standing behavior. Presumably it has gone mostly unnoticed
> because the two bugs have the property of obscuring each other in common
> simple scenarios (e.g. remove a CPU and add it back). Likely you'd notice
> only if you cared to inspect the device tree or the sysfs cacheinfo
> information.
> 
> Booted with two processors:
> 
>    $ pwd
>    /sys/firmware/devicetree/base/cpus
>    $ ls -1d */
>    l2-cache@2010/
>    l2-cache@2011/
>    l3-cache@3110/
>    l3-cache@3111/
>    PowerPC,POWER9@0/
>    PowerPC,POWER9@8/
>    $ lsprop */l2-cache
>    l2-cache@2010/l2-cache
>                   00003110 (12560)
>    l2-cache@2011/l2-cache
>                   00003111 (12561)
>    PowerPC,POWER9@0/l2-cache
>                   00002010 (8208)
>    PowerPC,POWER9@8/l2-cache
>                   00002011 (8209)
>    $ ls /sys/devices/system/cpu/cpu0/cache/
>    index0  index1  index2  index3
> 
> After DLPAR-adding PowerPC,POWER9@10, we see that its associated cache
> nodes are absent, its threads' L2+L3 cacheinfo is unpopulated, and it is
> missing a cache level in its sched domain hierarchy:
> 
>    $ ls -1d */
>    l2-cache@2010/
>    l2-cache@2011/
>    l3-cache@3110/
>    l3-cache@3111/
>    PowerPC,POWER9@0/
>    PowerPC,POWER9@10/
>    PowerPC,POWER9@8/
>    $ lsprop PowerPC\,POWER9@10/l2-cache
>    PowerPC,POWER9@10/l2-cache
>                   00002012 (8210)
>    $ ls /sys/devices/system/cpu/cpu16/cache/
>    index0  index1
>    $ grep . /sys/kernel/debug/sched/domains/cpu{0,8,16}/domain*/name
>    /sys/kernel/debug/sched/domains/cpu0/domain0/name:SMT
>    /sys/kernel/debug/sched/domains/cpu0/domain1/name:CACHE
>    /sys/kernel/debug/sched/domains/cpu0/domain2/name:DIE
>    /sys/kernel/debug/sched/domains/cpu8/domain0/name:SMT
>    /sys/kernel/debug/sched/domains/cpu8/domain1/name:CACHE
>    /sys/kernel/debug/sched/domains/cpu8/domain2/name:DIE
>    /sys/kernel/debug/sched/domains/cpu16/domain0/name:SMT
>    /sys/kernel/debug/sched/domains/cpu16/domain1/name:DIE
> 
> When removing PowerPC,POWER9@8, we see that its cache nodes are left
> behind:
> 
>    $ ls -1d */
>    l2-cache@2010/
>    l2-cache@2011/
>    l3-cache@3110/
>    l3-cache@3111/
>    PowerPC,POWER9@0/
> 
> When DLPAR is combined with VM migration, we can get duplicate nodes. E.g.
> removing one processor, then migrating, adding a processor, and then
> migrating again can result in warnings from the OF core during
> post-migration device tree updates:
> 
>    Duplicate name in cpus, renamed to "l2-cache@2011#1"
>    Duplicate name in cpus, renamed to "l3-cache@3111#1"
> 
> and nodes with duplicated phandles in the tree, making lookup behavior
> unpredictable:
> 
>    $ lsprop l[23]-cache@*/ibm,phandle
>    l2-cache@2010/ibm,phandle
>                     00002010 (8208)
>    l2-cache@2011#1/ibm,phandle
>                     00002011 (8209)
>    l2-cache@2011/ibm,phandle
>                     00002011 (8209)
>    l3-cache@3110/ibm,phandle
>                     00003110 (12560)
>    l3-cache@3111#1/ibm,phandle
>                     00003111 (12561)
>    l3-cache@3111/ibm,phandle
>                     00003111 (12561)
> 
> Address these issues by:
> 
> * Correctly processing siblings of the node returned from
>    dlpar_configure_connector().
> * Removing cache nodes in the CPU remove path when it can be determined
>    that they are not associated with other CPUs or caches.
> 
> Use the of_changeset API in both cases, which allows us to keep the error
> handling in this code from becoming more complex while ensuring that the
> device tree cannot become inconsistent.
> 
> Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
> Fixes: ac71380 ("powerpc/pseries: Add CPU dlpar remove functionality")
> Fixes: 90edf18 ("powerpc/pseries: Add CPU dlpar add functionality")
> ---
>   arch/powerpc/platforms/pseries/hotplug-cpu.c | 72 +++++++++++++++++++-
>   1 file changed, 70 insertions(+), 2 deletions(-)

Tested with a QEMU pseries guest, multiple CPU add/removals of the same CPU,
and no issues found with these new pseries_cpuhp* functions.

Code LGTM as well.


Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Tested-by: Daniel Henrique Barboza <danielhb413@gmail.com>




> 
> diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c b/arch/powerpc/platforms/pseries/hotplug-cpu.c
> index d646c22e94ab..87a0fbe9cf12 100644
> --- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
> +++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
> @@ -521,6 +521,27 @@ static bool valid_cpu_drc_index(struct device_node *parent, u32 drc_index)
>   	return found;
>   }
>   
> +static int pseries_cpuhp_attach_nodes(struct device_node *dn)
> +{
> +	struct of_changeset cs;
> +	int ret;
> +
> +	/*
> +	 * This device node is unattached but may have siblings; open-code the
> +	 * traversal.
> +	 */
> +	for (of_changeset_init(&cs); dn != NULL; dn = dn->sibling) {
> +		ret = of_changeset_attach_node(&cs, dn);
> +		if (ret)
> +			goto out;
> +	}
> +
> +	ret = of_changeset_apply(&cs);
> +out:
> +	of_changeset_destroy(&cs);
> +	return ret;
> +}
> +
>   static ssize_t dlpar_cpu_add(u32 drc_index)
>   {
>   	struct device_node *dn, *parent;
> @@ -563,7 +584,7 @@ static ssize_t dlpar_cpu_add(u32 drc_index)
>   		return -EINVAL;
>   	}
>   
> -	rc = dlpar_attach_node(dn, parent);
> +	rc = pseries_cpuhp_attach_nodes(dn);
>   
>   	/* Regardless we are done with parent now */
>   	of_node_put(parent);
> @@ -600,6 +621,53 @@ static ssize_t dlpar_cpu_add(u32 drc_index)
>   	return rc;
>   }
>   
> +static unsigned int pseries_cpuhp_cache_use_count(const struct device_node *cachedn)
> +{
> +	unsigned int use_count = 0;
> +	struct device_node *dn;
> +
> +	WARN_ON(!of_node_is_type(cachedn, "cache"));
> +
> +	for_each_of_cpu_node(dn) {
> +		if (of_find_next_cache_node(dn) == cachedn)
> +			use_count++;
> +	}
> +
> +	for_each_node_by_type(dn, "cache") {
> +		if (of_find_next_cache_node(dn) == cachedn)
> +			use_count++;
> +	}
> +
> +	return use_count;
> +}
> +
> +static int pseries_cpuhp_detach_nodes(struct device_node *cpudn)
> +{
> +	struct device_node *dn;
> +	struct of_changeset cs;
> +	int ret = 0;
> +
> +	of_changeset_init(&cs);
> +	ret = of_changeset_detach_node(&cs, cpudn);
> +	if (ret)
> +		goto out;
> +
> +	dn = cpudn;
> +	while ((dn = of_find_next_cache_node(dn))) {
> +		if (pseries_cpuhp_cache_use_count(dn) > 1)
> +			break;
> +
> +		ret = of_changeset_detach_node(&cs, dn);
> +		if (ret)
> +			goto out;
> +	}
> +
> +	ret = of_changeset_apply(&cs);
> +out:
> +	of_changeset_destroy(&cs);
> +	return ret;
> +}
> +
>   static ssize_t dlpar_cpu_remove(struct device_node *dn, u32 drc_index)
>   {
>   	int rc;
> @@ -621,7 +689,7 @@ static ssize_t dlpar_cpu_remove(struct device_node *dn, u32 drc_index)
>   		return rc;
>   	}
>   
> -	rc = dlpar_detach_node(dn);
> +	rc = pseries_cpuhp_detach_nodes(dn);
>   	if (rc) {
>   		int saved_rc = rc;
>   
> 

^ permalink raw reply

* Re: [PATCH v4 1/3] powerpc/bitops: Use immediate operand when possible
From: Segher Boessenkool @ 2021-09-20 21:23 UTC (permalink / raw)
  To: Christophe Leroy; +Cc: Paul Mackerras, linuxppc-dev, linux-kernel
In-Reply-To: <db9d01d5c543c5add4b2beadb03d39e99c7ada2c.1632126669.git.christophe.leroy@csgroup.eu>

Hi!

On Mon, Sep 20, 2021 at 10:31:17AM +0200, Christophe Leroy wrote:
> Today we get the following code generation for bitops like
> set or clear bit:
> 
> 	c0009fe0:	39 40 08 00 	li      r10,2048
> 	c0009fe4:	7c e0 40 28 	lwarx   r7,0,r8
> 	c0009fe8:	7c e7 53 78 	or      r7,r7,r10
> 	c0009fec:	7c e0 41 2d 	stwcx.  r7,0,r8
> 
> 	c000d568:	39 00 18 00 	li      r8,6144
> 	c000d56c:	7c c0 38 28 	lwarx   r6,0,r7
> 	c000d570:	7c c6 40 78 	andc    r6,r6,r8
> 	c000d574:	7c c0 39 2d 	stwcx.  r6,0,r7
> 
> Most set bits are constant on lower 16 bits, so it can easily
> be replaced by the "immediate" version of the operation. Allow
> GCC to choose between the normal or immediate form.

You can also handle the second sixteen bits (the "shifted" half), by
using oris etc.  The "%eN" output modifier prints an "s" for this:
  /* If the low 16 bits are 0, but some other bit is set, write 's'.  */
But this doesn't handle non-constant arguments, so you're likely better
off using what you have noe.

> For clear bits, on 32 bits 'rlwinm' can be used instead of 'andc' for
> when all bits to be cleared are consecutive.

Or when all you want to keep are consecutive (you do handle that now :-) )

> On 64 bits we don't have any equivalent single operation for clearing,
> single bits or a few bits, we'd need two 'rldicl' so it is not
> worth it, the li/andc sequence is doing the same.

You can use rlwinm whenever you want to clear all top 32 bits.

A sometimes nice idiom is  ori x,x,N ; xori x,x,N  to clear the bits N
(or oris/xoris).  But it's two insns no matter what (but no spare
register is needed).

> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>

> +static inline unsigned long test_and_clear_bits(unsigned long mask, volatile unsigned long *_p)
> +{
> +	unsigned long old, t;
> +	unsigned long *p = (unsigned long *)_p;
> +
> +	if (IS_ENABLED(CONFIG_PPC32) &&
> +	    __builtin_constant_p(mask) && is_rlwinm_mask_valid(mask)) {

is_rlwinm_mask_valid(~mask)?  So that test_and_clear_bits(0, ...) will
work with rlwinm, and test_and_clear_bits(0xffffffff, ...) will not make
gas scream bloody murder ("illegal bitmask").  Tha mask you pass to the
instruction is ~mask after all.

Looks great except that one nit.  Thanks :-)

Reviewed-by: Segher Boessenkool <segher@kernel.crashing.org>

Segher

^ permalink raw reply

* [PATCH] ASoC: fsl: Constify static snd_soc_ops
From: Rikard Falkeborn @ 2021-09-20 19:39 UTC (permalink / raw)
  To: Nicolin Chen, Xiubo Li, Jaroslav Kysela, Takashi Iwai, Shawn Guo,
	Sascha Hauer
  Cc: alsa-devel, Fabio Estevam, linuxppc-dev, Liam Girdwood,
	Rikard Falkeborn, linux-kernel, Mark Brown, NXP Linux Team,
	Pengutronix Kernel Team, Shengjiu Wang, linux-arm-kernel

These are only assigned to the ops field in the snd_soc_dai_link struct
which is a pointer to const struct snd_soc_ops. Make them const to allow
the compiler to put them in read-only memory.

Signed-off-by: Rikard Falkeborn <rikard.falkeborn@gmail.com>
---
 sound/soc/fsl/imx-audmix.c | 4 ++--
 sound/soc/fsl/imx-card.c   | 4 ++--
 sound/soc/fsl/imx-hdmi.c   | 2 +-
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/sound/soc/fsl/imx-audmix.c b/sound/soc/fsl/imx-audmix.c
index a364e2415de0..0d637929bfef 100644
--- a/sound/soc/fsl/imx-audmix.c
+++ b/sound/soc/fsl/imx-audmix.c
@@ -132,12 +132,12 @@ static int imx_audmix_be_hw_params(struct snd_pcm_substream *substream,
 	return ret;
 }
 
-static struct snd_soc_ops imx_audmix_fe_ops = {
+static const struct snd_soc_ops imx_audmix_fe_ops = {
 	.startup = imx_audmix_fe_startup,
 	.hw_params = imx_audmix_fe_hw_params,
 };
 
-static struct snd_soc_ops imx_audmix_be_ops = {
+static const struct snd_soc_ops imx_audmix_be_ops = {
 	.hw_params = imx_audmix_be_hw_params,
 };
 
diff --git a/sound/soc/fsl/imx-card.c b/sound/soc/fsl/imx-card.c
index 58fd0639a069..05dff2dc1d19 100644
--- a/sound/soc/fsl/imx-card.c
+++ b/sound/soc/fsl/imx-card.c
@@ -442,12 +442,12 @@ static int imx_aif_startup(struct snd_pcm_substream *substream)
 	return ret;
 }
 
-static struct snd_soc_ops imx_aif_ops = {
+static const struct snd_soc_ops imx_aif_ops = {
 	.hw_params = imx_aif_hw_params,
 	.startup = imx_aif_startup,
 };
 
-static struct snd_soc_ops imx_aif_ops_be = {
+static const struct snd_soc_ops imx_aif_ops_be = {
 	.hw_params = imx_aif_hw_params,
 };
 
diff --git a/sound/soc/fsl/imx-hdmi.c b/sound/soc/fsl/imx-hdmi.c
index 34a0dceae621..2b663c39edb2 100644
--- a/sound/soc/fsl/imx-hdmi.c
+++ b/sound/soc/fsl/imx-hdmi.c
@@ -59,7 +59,7 @@ static int imx_hdmi_hw_params(struct snd_pcm_substream *substream,
 	return 0;
 }
 
-static struct snd_soc_ops imx_hdmi_ops = {
+static const struct snd_soc_ops imx_hdmi_ops = {
 	.hw_params = imx_hdmi_hw_params,
 };
 
-- 
2.33.0


^ permalink raw reply related

* Re: [PATCH v3 5/8] x86/sme: Replace occurrences of sme_active() with cc_platform_has()
From: Kirill A. Shutemov @ 2021-09-20 19:23 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: Sathyanarayanan Kuppuswamy, linux-efi, Brijesh Singh, kvm,
	Peter Zijlstra, Dave Hansen, dri-devel, platform-driver-x86,
	Will Deacon, linux-s390, Andi Kleen, Joerg Roedel, x86, amd-gfx,
	Christoph Hellwig, Ingo Molnar, linux-graphics-maintainer,
	Tianyu Lan, Borislav Petkov, Andy Lutomirski, Thomas Gleixner,
	kexec, linux-kernel, iommu, linux-fsdevel, linuxppc-dev
In-Reply-To: <367624d43d35d61d5c97a8b289d9ddae223636e9.1631141919.git.thomas.lendacky@amd.com>

On Wed, Sep 08, 2021 at 05:58:36PM -0500, Tom Lendacky wrote:
> diff --git a/arch/x86/mm/mem_encrypt_identity.c b/arch/x86/mm/mem_encrypt_identity.c
> index 470b20208430..eff4d19f9cb4 100644
> --- a/arch/x86/mm/mem_encrypt_identity.c
> +++ b/arch/x86/mm/mem_encrypt_identity.c
> @@ -30,6 +30,7 @@
>  #include <linux/kernel.h>
>  #include <linux/mm.h>
>  #include <linux/mem_encrypt.h>
> +#include <linux/cc_platform.h>
>  
>  #include <asm/setup.h>
>  #include <asm/sections.h>
> @@ -287,7 +288,7 @@ void __init sme_encrypt_kernel(struct boot_params *bp)
>  	unsigned long pgtable_area_len;
>  	unsigned long decrypted_base;
>  
> -	if (!sme_active())
> +	if (!cc_platform_has(CC_ATTR_HOST_MEM_ENCRYPT))
>  		return;
>  
>  	/*

This change break boot for me (in KVM on Intel host). It only reproduces
with allyesconfig. More reasonable config works fine, but I didn't try to
find exact cause in config.

Convertion to cc_platform_has() in __startup_64() in 8/8 has the same
effect.

I believe it caused by sme_me_mask access from __startup_64() without
fixup_pointer() magic. I think __startup_64() requires special treatement
and we should avoid cc_platform_has() there (or have a special version of
the helper). Note that only AMD requires these cc_platform_has() to return
true.

-- 
 Kirill A. Shutemov

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox