LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH AUTOSEL 5.4 14/33] powerpc/64: Fix stack trace not displaying final frame
From: Sasha Levin @ 2021-03-02 11:57 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Sasha Levin, linuxppc-dev
In-Reply-To: <20210302115749.62653-1-sashal@kernel.org>

From: Michael Ellerman <mpe@ellerman.id.au>

[ Upstream commit e3de1e291fa58a1ab0f471a4b458eff2514e4b5f ]

In commit bf13718bc57a ("powerpc: show registers when unwinding
interrupt frames") we changed our stack dumping logic to show the full
registers whenever we find an interrupt frame on the stack.

However we didn't notice that on 64-bit this doesn't show the final
frame, ie. the interrupt that brought us in from userspace, whereas on
32-bit it does.

That is due to confusion about the size of that last frame. The code
in show_stack() calls validate_sp(), passing it STACK_INT_FRAME_SIZE
to check the sp is at least that far below the top of the stack.

However on 64-bit that size is too large for the final frame, because
it includes the red zone, but we don't allocate a red zone for the
first frame.

So add a new define that encodes the correct size for 32-bit and
64-bit, and use it in show_stack().

This results in the full trace being shown on 64-bit, eg:

  sysrq: Trigger a crash
  Kernel panic - not syncing: sysrq triggered crash
  CPU: 0 PID: 83 Comm: sh Not tainted 5.11.0-rc2-gcc-8.2.0-00188-g571abcb96b10-dirty #649
  Call Trace:
  [c00000000a1c3ac0] [c000000000897b70] dump_stack+0xc4/0x114 (unreliable)
  [c00000000a1c3b00] [c00000000014334c] panic+0x178/0x41c
  [c00000000a1c3ba0] [c00000000094e600] sysrq_handle_crash+0x40/0x50
  [c00000000a1c3c00] [c00000000094ef98] __handle_sysrq+0xd8/0x210
  [c00000000a1c3ca0] [c00000000094f820] write_sysrq_trigger+0x100/0x188
  [c00000000a1c3ce0] [c0000000005559dc] proc_reg_write+0x10c/0x1b0
  [c00000000a1c3d10] [c000000000479950] vfs_write+0xf0/0x360
  [c00000000a1c3d60] [c000000000479d9c] ksys_write+0x7c/0x140
  [c00000000a1c3db0] [c00000000002bf5c] system_call_exception+0x19c/0x2c0
  [c00000000a1c3e10] [c00000000000d35c] system_call_common+0xec/0x278
  --- interrupt: c00 at 0x7fff9fbab428
  NIP:  00007fff9fbab428 LR: 000000001000b724 CTR: 0000000000000000
  REGS: c00000000a1c3e80 TRAP: 0c00   Not tainted  (5.11.0-rc2-gcc-8.2.0-00188-g571abcb96b10-dirty)
  MSR:  900000000280f033 <SF,HV,VEC,VSX,EE,PR,FP,ME,IR,DR,RI,LE>  CR: 22002884  XER: 00000000
  IRQMASK: 0
  GPR00: 0000000000000004 00007fffc3cb8960 00007fff9fc59900 0000000000000001
  GPR04: 000000002a4b32d0 0000000000000002 0000000000000063 0000000000000063
  GPR08: 000000002a4b32d0 0000000000000000 0000000000000000 0000000000000000
  GPR12: 0000000000000000 00007fff9fcca9a0 0000000000000000 0000000000000000
  GPR16: 0000000000000000 0000000000000000 0000000000000000 00000000100b8fd0
  GPR20: 000000002a4b3485 00000000100b8f90 0000000000000000 0000000000000000
  GPR24: 000000002a4b0440 00000000100e77b8 0000000000000020 000000002a4b32d0
  GPR28: 0000000000000001 0000000000000002 000000002a4b32d0 0000000000000001
  NIP [00007fff9fbab428] 0x7fff9fbab428
  LR [000000001000b724] 0x1000b724
  --- interrupt: c00

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210209141627.2898485-1-mpe@ellerman.id.au
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 arch/powerpc/include/asm/ptrace.h | 3 +++
 arch/powerpc/kernel/asm-offsets.c | 2 +-
 arch/powerpc/kernel/process.c     | 2 +-
 3 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/ptrace.h b/arch/powerpc/include/asm/ptrace.h
index c41220f4aad9..5a424f867c82 100644
--- a/arch/powerpc/include/asm/ptrace.h
+++ b/arch/powerpc/include/asm/ptrace.h
@@ -62,6 +62,9 @@ struct pt_regs
 };
 #endif
 
+
+#define STACK_FRAME_WITH_PT_REGS (STACK_FRAME_OVERHEAD + sizeof(struct pt_regs))
+
 #ifdef __powerpc64__
 
 /*
diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c
index 5c0a1e17219b..af399675248e 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -285,7 +285,7 @@ int main(void)
 
 	/* Interrupt register frame */
 	DEFINE(INT_FRAME_SIZE, STACK_INT_FRAME_SIZE);
-	DEFINE(SWITCH_FRAME_SIZE, STACK_FRAME_OVERHEAD + sizeof(struct pt_regs));
+	DEFINE(SWITCH_FRAME_SIZE, STACK_FRAME_WITH_PT_REGS);
 	STACK_PT_REGS_OFFSET(GPR0, gpr[0]);
 	STACK_PT_REGS_OFFSET(GPR1, gpr[1]);
 	STACK_PT_REGS_OFFSET(GPR2, gpr[2]);
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index bd0c258a1d5d..c94bba9142e7 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -2081,7 +2081,7 @@ void show_stack(struct task_struct *tsk, unsigned long *stack)
 		 * See if this is an exception frame.
 		 * We look for the "regshere" marker in the current frame.
 		 */
-		if (validate_sp(sp, tsk, STACK_INT_FRAME_SIZE)
+		if (validate_sp(sp, tsk, STACK_FRAME_WITH_PT_REGS)
 		    && stack[STACK_FRAME_MARKER] == STACK_FRAME_REGS_MARKER) {
 			struct pt_regs *regs = (struct pt_regs *)
 				(sp + STACK_FRAME_OVERHEAD);
-- 
2.30.1


^ permalink raw reply related

* [PATCH AUTOSEL 4.19 06/21] powerpc/pci: Add ppc_md.discover_phbs()
From: Sasha Levin @ 2021-03-02 11:58 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Sasha Levin, linuxppc-dev, Oliver O'Halloran,
	kernel test robot
In-Reply-To: <20210302115835.63269-1-sashal@kernel.org>

From: Oliver O'Halloran <oohall@gmail.com>

[ Upstream commit 5537fcb319d016ce387f818dd774179bc03217f5 ]

On many powerpc platforms the discovery and initalisation of
pci_controllers (PHBs) happens inside of setup_arch(). This is very early
in boot (pre-initcalls) and means that we're initialising the PHB long
before many basic kernel services (slab allocator, debugfs, a real ioremap)
are available.

On PowerNV this causes an additional problem since we map the PHB registers
with ioremap(). As of commit d538aadc2718 ("powerpc/ioremap: warn on early
use of ioremap()") a warning is printed because we're using the "incorrect"
API to setup and MMIO mapping in searly boot. The kernel does provide
early_ioremap(), but that is not intended to create long-lived MMIO
mappings and a seperate warning is printed by generic code if
early_ioremap() mappings are "leaked."

This is all fixable with dumb hacks like using early_ioremap() to setup
the initial mapping then replacing it with a real ioremap later on in
boot, but it does raise the question: Why the hell are we setting up the
PHB's this early in boot?

The old and wise claim it's due to "hysterical rasins." Aside from amused
grapes there doesn't appear to be any real reason to maintain the current
behaviour. Already most of the newer embedded platforms perform PHB
discovery in an arch_initcall and between the end of setup_arch() and the
start of initcalls none of the generic kernel code does anything PCI
related. On powerpc scanning PHBs occurs in a subsys_initcall so it should
be possible to move the PHB discovery to a core, postcore or arch initcall.

This patch adds the ppc_md.discover_phbs hook and a core_initcall stub that
calls it. The core_initcalls are the earliest to be called so this will
any possibly issues with dependency between initcalls. This isn't just an
academic issue either since on pseries and PowerNV EEH init occurs in an
arch_initcall and depends on the pci_controllers being available, similarly
the creation of pci_dns occurs at core_initcall_sync (i.e. between core and
postcore initcalls). These problems need to be addressed seperately.

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
[mpe: Make discover_phbs() static]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201103043523.916109-1-oohall@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 arch/powerpc/include/asm/machdep.h |  3 +++
 arch/powerpc/kernel/pci-common.c   | 10 ++++++++++
 2 files changed, 13 insertions(+)

diff --git a/arch/powerpc/include/asm/machdep.h b/arch/powerpc/include/asm/machdep.h
index a47de82fb8e2..bda87cbf106d 100644
--- a/arch/powerpc/include/asm/machdep.h
+++ b/arch/powerpc/include/asm/machdep.h
@@ -71,6 +71,9 @@ struct machdep_calls {
 	int		(*pcibios_root_bridge_prepare)(struct pci_host_bridge
 				*bridge);
 
+	/* finds all the pci_controllers present at boot */
+	void 		(*discover_phbs)(void);
+
 	/* To setup PHBs when using automatic OF platform driver for PCI */
 	int		(*pci_setup_phb)(struct pci_controller *host);
 
diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
index 88e4f69a09e5..74628aca2bf1 100644
--- a/arch/powerpc/kernel/pci-common.c
+++ b/arch/powerpc/kernel/pci-common.c
@@ -1671,3 +1671,13 @@ static void fixup_hide_host_resource_fsl(struct pci_dev *dev)
 }
 DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_MOTOROLA, PCI_ANY_ID, fixup_hide_host_resource_fsl);
 DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_FREESCALE, PCI_ANY_ID, fixup_hide_host_resource_fsl);
+
+
+static int __init discover_phbs(void)
+{
+	if (ppc_md.discover_phbs)
+		ppc_md.discover_phbs();
+
+	return 0;
+}
+core_initcall(discover_phbs);
-- 
2.30.1


^ permalink raw reply related

* [PATCH AUTOSEL 4.19 07/21] powerpc: improve handling of unrecoverable system reset
From: Sasha Levin @ 2021-03-02 11:58 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Sasha Levin, linuxppc-dev, Nicholas Piggin
In-Reply-To: <20210302115835.63269-1-sashal@kernel.org>

From: Nicholas Piggin <npiggin@gmail.com>

[ Upstream commit 11cb0a25f71818ca7ab4856548ecfd83c169aa4d ]

If an unrecoverable system reset hits in process context, the system
does not have to panic. Similar to machine check, call nmi_exit()
before die().

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210130130852.2952424-26-npiggin@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 arch/powerpc/kernel/traps.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index 1b2d84cb373b..2379c4bf3979 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -433,8 +433,11 @@ out:
 		die("Unrecoverable nested System Reset", regs, SIGABRT);
 #endif
 	/* Must die if the interrupt is not recoverable */
-	if (!(regs->msr & MSR_RI))
+	if (!(regs->msr & MSR_RI)) {
+		/* For the reason explained in die_mce, nmi_exit before die */
+		nmi_exit();
 		die("Unrecoverable System Reset", regs, SIGABRT);
+	}
 
 	if (!nested)
 		nmi_exit();
-- 
2.30.1


^ permalink raw reply related

* [PATCH AUTOSEL 4.19 08/21] powerpc/perf: Record counter overflow always if SAMPLE_IP is unset
From: Sasha Levin @ 2021-03-02 11:58 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Sasha Levin, Athira Rajeev, linuxppc-dev
In-Reply-To: <20210302115835.63269-1-sashal@kernel.org>

From: Athira Rajeev <atrajeev@linux.vnet.ibm.com>

[ Upstream commit d137845c973147a22622cc76c7b0bc16f6206323 ]

While sampling for marked events, currently we record the sample only
if the SIAR valid bit of Sampled Instruction Event Register (SIER) is
set. SIAR_VALID bit is used for fetching the instruction address from
Sampled Instruction Address Register(SIAR). But there are some
usecases, where the user is interested only in the PMU stats at each
counter overflow and the exact IP of the overflow event is not
required. Dropping SIAR invalid samples will fail to record some of
the counter overflows in such cases.

Example of such usecase is dumping the PMU stats (event counts) after
some regular amount of instructions/events from the userspace (ex: via
ptrace). Here counter overflow is indicated to userspace via signal
handler, and captured by monitoring and enabling I/O signaling on the
event file descriptor. In these cases, we expect to get
sample/overflow indication after each specified sample_period.

Perf event attribute will not have PERF_SAMPLE_IP set in the
sample_type if exact IP of the overflow event is not requested. So
while profiling if SAMPLE_IP is not set, just record the counter
overflow irrespective of SIAR_VALID check.

Suggested-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
[mpe: Reflow comment and if formatting]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1612516492-1428-1-git-send-email-atrajeev@linux.vnet.ibm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 arch/powerpc/perf/core-book3s.c | 19 +++++++++++++++----
 1 file changed, 15 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
index 70de13822828..091bdeaf02a3 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -2046,7 +2046,17 @@ static void record_and_restart(struct perf_event *event, unsigned long val,
 			left += period;
 			if (left <= 0)
 				left = period;
-			record = siar_valid(regs);
+
+			/*
+			 * If address is not requested in the sample via
+			 * PERF_SAMPLE_IP, just record that sample irrespective
+			 * of SIAR valid check.
+			 */
+			if (event->attr.sample_type & PERF_SAMPLE_IP)
+				record = siar_valid(regs);
+			else
+				record = 1;
+
 			event->hw.last_period = event->hw.sample_period;
 		}
 		if (left < 0x80000000LL)
@@ -2064,9 +2074,10 @@ static void record_and_restart(struct perf_event *event, unsigned long val,
 	 * MMCR2. Check attr.exclude_kernel and address to drop the sample in
 	 * these cases.
 	 */
-	if (event->attr.exclude_kernel && record)
-		if (is_kernel_addr(mfspr(SPRN_SIAR)))
-			record = 0;
+	if (event->attr.exclude_kernel &&
+	    (event->attr.sample_type & PERF_SAMPLE_IP) &&
+	    is_kernel_addr(mfspr(SPRN_SIAR)))
+		record = 0;
 
 	/*
 	 * Finally record data if requested.
-- 
2.30.1


^ permalink raw reply related

* [PATCH AUTOSEL 4.14 04/13] powerpc: improve handling of unrecoverable system reset
From: Sasha Levin @ 2021-03-02 11:58 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Sasha Levin, linuxppc-dev, Nicholas Piggin
In-Reply-To: <20210302115903.63458-1-sashal@kernel.org>

From: Nicholas Piggin <npiggin@gmail.com>

[ Upstream commit 11cb0a25f71818ca7ab4856548ecfd83c169aa4d ]

If an unrecoverable system reset hits in process context, the system
does not have to panic. Similar to machine check, call nmi_exit()
before die().

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210130130852.2952424-26-npiggin@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 arch/powerpc/kernel/traps.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index 0f1a888c04a8..05c1aabad01c 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -360,8 +360,11 @@ out:
 		die("Unrecoverable nested System Reset", regs, SIGABRT);
 #endif
 	/* Must die if the interrupt is not recoverable */
-	if (!(regs->msr & MSR_RI))
+	if (!(regs->msr & MSR_RI)) {
+		/* For the reason explained in die_mce, nmi_exit before die */
+		nmi_exit();
 		die("Unrecoverable System Reset", regs, SIGABRT);
+	}
 
 	if (!nested)
 		nmi_exit();
-- 
2.30.1


^ permalink raw reply related

* [PATCH AUTOSEL 4.14 05/13] powerpc/perf: Record counter overflow always if SAMPLE_IP is unset
From: Sasha Levin @ 2021-03-02 11:58 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Sasha Levin, Athira Rajeev, linuxppc-dev
In-Reply-To: <20210302115903.63458-1-sashal@kernel.org>

From: Athira Rajeev <atrajeev@linux.vnet.ibm.com>

[ Upstream commit d137845c973147a22622cc76c7b0bc16f6206323 ]

While sampling for marked events, currently we record the sample only
if the SIAR valid bit of Sampled Instruction Event Register (SIER) is
set. SIAR_VALID bit is used for fetching the instruction address from
Sampled Instruction Address Register(SIAR). But there are some
usecases, where the user is interested only in the PMU stats at each
counter overflow and the exact IP of the overflow event is not
required. Dropping SIAR invalid samples will fail to record some of
the counter overflows in such cases.

Example of such usecase is dumping the PMU stats (event counts) after
some regular amount of instructions/events from the userspace (ex: via
ptrace). Here counter overflow is indicated to userspace via signal
handler, and captured by monitoring and enabling I/O signaling on the
event file descriptor. In these cases, we expect to get
sample/overflow indication after each specified sample_period.

Perf event attribute will not have PERF_SAMPLE_IP set in the
sample_type if exact IP of the overflow event is not requested. So
while profiling if SAMPLE_IP is not set, just record the counter
overflow irrespective of SIAR_VALID check.

Suggested-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
[mpe: Reflow comment and if formatting]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1612516492-1428-1-git-send-email-atrajeev@linux.vnet.ibm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 arch/powerpc/perf/core-book3s.c | 19 +++++++++++++++----
 1 file changed, 15 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
index 56f16c803590..2669847434b8 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -2055,7 +2055,17 @@ static void record_and_restart(struct perf_event *event, unsigned long val,
 			left += period;
 			if (left <= 0)
 				left = period;
-			record = siar_valid(regs);
+
+			/*
+			 * If address is not requested in the sample via
+			 * PERF_SAMPLE_IP, just record that sample irrespective
+			 * of SIAR valid check.
+			 */
+			if (event->attr.sample_type & PERF_SAMPLE_IP)
+				record = siar_valid(regs);
+			else
+				record = 1;
+
 			event->hw.last_period = event->hw.sample_period;
 		}
 		if (left < 0x80000000LL)
@@ -2073,9 +2083,10 @@ static void record_and_restart(struct perf_event *event, unsigned long val,
 	 * MMCR2. Check attr.exclude_kernel and address to drop the sample in
 	 * these cases.
 	 */
-	if (event->attr.exclude_kernel && record)
-		if (is_kernel_addr(mfspr(SPRN_SIAR)))
-			record = 0;
+	if (event->attr.exclude_kernel &&
+	    (event->attr.sample_type & PERF_SAMPLE_IP) &&
+	    is_kernel_addr(mfspr(SPRN_SIAR)))
+		record = 0;
 
 	/*
 	 * Finally record data if requested.
-- 
2.30.1


^ permalink raw reply related

* [PATCH AUTOSEL 4.9 04/10] powerpc/perf: Record counter overflow always if SAMPLE_IP is unset
From: Sasha Levin @ 2021-03-02 11:59 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Sasha Levin, Athira Rajeev, linuxppc-dev
In-Reply-To: <20210302115921.63636-1-sashal@kernel.org>

From: Athira Rajeev <atrajeev@linux.vnet.ibm.com>

[ Upstream commit d137845c973147a22622cc76c7b0bc16f6206323 ]

While sampling for marked events, currently we record the sample only
if the SIAR valid bit of Sampled Instruction Event Register (SIER) is
set. SIAR_VALID bit is used for fetching the instruction address from
Sampled Instruction Address Register(SIAR). But there are some
usecases, where the user is interested only in the PMU stats at each
counter overflow and the exact IP of the overflow event is not
required. Dropping SIAR invalid samples will fail to record some of
the counter overflows in such cases.

Example of such usecase is dumping the PMU stats (event counts) after
some regular amount of instructions/events from the userspace (ex: via
ptrace). Here counter overflow is indicated to userspace via signal
handler, and captured by monitoring and enabling I/O signaling on the
event file descriptor. In these cases, we expect to get
sample/overflow indication after each specified sample_period.

Perf event attribute will not have PERF_SAMPLE_IP set in the
sample_type if exact IP of the overflow event is not requested. So
while profiling if SAMPLE_IP is not set, just record the counter
overflow irrespective of SIAR_VALID check.

Suggested-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
[mpe: Reflow comment and if formatting]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1612516492-1428-1-git-send-email-atrajeev@linux.vnet.ibm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 arch/powerpc/perf/core-book3s.c | 19 +++++++++++++++----
 1 file changed, 15 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
index 1f1ac446ace9..f2d8f35c181f 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -2010,7 +2010,17 @@ static void record_and_restart(struct perf_event *event, unsigned long val,
 			left += period;
 			if (left <= 0)
 				left = period;
-			record = siar_valid(regs);
+
+			/*
+			 * If address is not requested in the sample via
+			 * PERF_SAMPLE_IP, just record that sample irrespective
+			 * of SIAR valid check.
+			 */
+			if (event->attr.sample_type & PERF_SAMPLE_IP)
+				record = siar_valid(regs);
+			else
+				record = 1;
+
 			event->hw.last_period = event->hw.sample_period;
 		}
 		if (left < 0x80000000LL)
@@ -2028,9 +2038,10 @@ static void record_and_restart(struct perf_event *event, unsigned long val,
 	 * MMCR2. Check attr.exclude_kernel and address to drop the sample in
 	 * these cases.
 	 */
-	if (event->attr.exclude_kernel && record)
-		if (is_kernel_addr(mfspr(SPRN_SIAR)))
-			record = 0;
+	if (event->attr.exclude_kernel &&
+	    (event->attr.sample_type & PERF_SAMPLE_IP) &&
+	    is_kernel_addr(mfspr(SPRN_SIAR)))
+		record = 0;
 
 	/*
 	 * Finally record data if requested.
-- 
2.30.1


^ permalink raw reply related

* [PATCH AUTOSEL 4.4 3/8] powerpc/perf: Record counter overflow always if SAMPLE_IP is unset
From: Sasha Levin @ 2021-03-02 11:59 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Sasha Levin, Athira Rajeev, linuxppc-dev
In-Reply-To: <20210302115935.63777-1-sashal@kernel.org>

From: Athira Rajeev <atrajeev@linux.vnet.ibm.com>

[ Upstream commit d137845c973147a22622cc76c7b0bc16f6206323 ]

While sampling for marked events, currently we record the sample only
if the SIAR valid bit of Sampled Instruction Event Register (SIER) is
set. SIAR_VALID bit is used for fetching the instruction address from
Sampled Instruction Address Register(SIAR). But there are some
usecases, where the user is interested only in the PMU stats at each
counter overflow and the exact IP of the overflow event is not
required. Dropping SIAR invalid samples will fail to record some of
the counter overflows in such cases.

Example of such usecase is dumping the PMU stats (event counts) after
some regular amount of instructions/events from the userspace (ex: via
ptrace). Here counter overflow is indicated to userspace via signal
handler, and captured by monitoring and enabling I/O signaling on the
event file descriptor. In these cases, we expect to get
sample/overflow indication after each specified sample_period.

Perf event attribute will not have PERF_SAMPLE_IP set in the
sample_type if exact IP of the overflow event is not requested. So
while profiling if SAMPLE_IP is not set, just record the counter
overflow irrespective of SIAR_VALID check.

Suggested-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
[mpe: Reflow comment and if formatting]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1612516492-1428-1-git-send-email-atrajeev@linux.vnet.ibm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 arch/powerpc/perf/core-book3s.c | 19 +++++++++++++++----
 1 file changed, 15 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
index e593e7f856ed..7a80e1cff6e2 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -2008,7 +2008,17 @@ static void record_and_restart(struct perf_event *event, unsigned long val,
 			left += period;
 			if (left <= 0)
 				left = period;
-			record = siar_valid(regs);
+
+			/*
+			 * If address is not requested in the sample via
+			 * PERF_SAMPLE_IP, just record that sample irrespective
+			 * of SIAR valid check.
+			 */
+			if (event->attr.sample_type & PERF_SAMPLE_IP)
+				record = siar_valid(regs);
+			else
+				record = 1;
+
 			event->hw.last_period = event->hw.sample_period;
 		}
 		if (left < 0x80000000LL)
@@ -2026,9 +2036,10 @@ static void record_and_restart(struct perf_event *event, unsigned long val,
 	 * MMCR2. Check attr.exclude_kernel and address to drop the sample in
 	 * these cases.
 	 */
-	if (event->attr.exclude_kernel && record)
-		if (is_kernel_addr(mfspr(SPRN_SIAR)))
-			record = 0;
+	if (event->attr.exclude_kernel &&
+	    (event->attr.sample_type & PERF_SAMPLE_IP) &&
+	    is_kernel_addr(mfspr(SPRN_SIAR)))
+		record = 0;
 
 	/*
 	 * Finally record data if requested.
-- 
2.30.1


^ permalink raw reply related

* Re: [PATCH] perf bench numa: Fix the condition checks for max number of numa nodes
From: Arnaldo Carvalho de Melo @ 2021-03-02 12:33 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: ravi.bangoria, Athira Rajeev, peterz, linux-kernel,
	linux-perf-users, maddy, jolsa, kjain, linuxppc-dev, kan.liang
In-Reply-To: <20210226085827.GF2028034@linux.vnet.ibm.com>

Em Fri, Feb 26, 2021 at 02:28:27PM +0530, Srikar Dronamraju escreveu:
> * Athira Rajeev <atrajeev@linux.vnet.ibm.com> [2021-02-25 11:50:02]:
> 
> > In systems having higher node numbers available like node
> > 255, perf numa bench will fail with SIGABORT.
> > 
> > <<>>
> > perf: bench/numa.c:1416: init: Assertion `!(g->p.nr_nodes > 64 || g->p.nr_nodes < 0)' failed.
> > Aborted (core dumped)
> > <<>>
> > 
> 
> Looks good to me.
> 
> Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>

Thanks, applied.

- Arnaldo


^ permalink raw reply

* [PATCH] ASoC: fsl_xcvr: Use devm_platform_ioremap_resource_byname() to simplify code
From: Tang Bin @ 2021-03-02 12:50 UTC (permalink / raw)
  To: broonie, timur, nicoleotsuka, Xiubo.Lee, perex, tiwai
  Cc: alsa-devel, linuxppc-dev, linux-kernel, Tang Bin

In this function, devm_platform_ioremap_resource_byname() should be 
suitable to simplify code.

Signed-off-by: Tang Bin <tangbin@cmss.chinamobile.com>
---
 sound/soc/fsl/fsl_xcvr.c | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/sound/soc/fsl/fsl_xcvr.c b/sound/soc/fsl/fsl_xcvr.c
index 6dd0a5fcd455..5e8284db857b 100644
--- a/sound/soc/fsl/fsl_xcvr.c
+++ b/sound/soc/fsl/fsl_xcvr.c
@@ -1131,7 +1131,7 @@ static int fsl_xcvr_probe(struct platform_device *pdev)
 {
 	struct device *dev = &pdev->dev;
 	struct fsl_xcvr *xcvr;
-	struct resource *ram_res, *regs_res, *rx_res, *tx_res;
+	struct resource *rx_res, *tx_res;
 	void __iomem *regs;
 	int ret, irq;
 
@@ -1166,13 +1166,11 @@ static int fsl_xcvr_probe(struct platform_device *pdev)
 		return PTR_ERR(xcvr->pll_ipg_clk);
 	}
 
-	ram_res = platform_get_resource_byname(pdev, IORESOURCE_MEM, "ram");
-	xcvr->ram_addr = devm_ioremap_resource(dev, ram_res);
+	xcvr->ram_addr = devm_platform_ioremap_resource_byname(pdev, "ram");
 	if (IS_ERR(xcvr->ram_addr))
 		return PTR_ERR(xcvr->ram_addr);
 
-	regs_res = platform_get_resource_byname(pdev, IORESOURCE_MEM, "regs");
-	regs = devm_ioremap_resource(dev, regs_res);
+	regs = devm_platform_ioremap_resource_byname(pdev, "regs");
 	if (IS_ERR(regs))
 		return PTR_ERR(regs);
 
-- 
2.18.2




^ permalink raw reply related

* Re: [PATCH next v3 12/15] printk: introduce a kmsg_dump iterator
From: John Ogness @ 2021-03-02 13:20 UTC (permalink / raw)
  To: Petr Mladek
  Cc: linux-hyperv, Sergey Senozhatsky, Douglas Anderson,
	Paul Mackerras, Miquel Raynal, K. Y. Srinivasan, Thomas Meyer,
	Vignesh Raghavendra, Wei Liu, Madhavan Srinivasan,
	Stephen Hemminger, Anton Vorontsov, Joel Stanley, Jason Wessel,
	Anton Ivanov, Wei Li, Haiyang Zhang, Ravi Bangoria, Kees Cook,
	Alistair Popple, Jeff Dike, Colin Cross, linux-um,
	Daniel Thompson, Steven Rostedt, Davidlohr Bueso, Nicholas Piggin,
	Oleg Nesterov, Thomas Gleixner, Andy Shevchenko, Jordan Niethe,
	Michael Kelley, Christophe Leroy, Tony Luck, Pavel Tatashin,
	linux-kernel, Sergey Senozhatsky, Richard Weinberger,
	kgdb-bugreport, linux-mtd, linuxppc-dev, Mike Rapoport
In-Reply-To: <YD0tbVV+hZOFvWyB@alley>

On 2021-03-01, Petr Mladek <pmladek@suse.com> wrote:
>> diff --git a/arch/powerpc/kernel/nvram_64.c b/arch/powerpc/kernel/nvram_64.c
>> index 532f22637783..5a64b24a91c2 100644
>> --- a/arch/powerpc/kernel/nvram_64.c
>> +++ b/arch/powerpc/kernel/nvram_64.c
>> @@ -72,8 +72,7 @@ static const char *nvram_os_partitions[] = {
>>  	NULL
>>  };
>>  
>> -static void oops_to_nvram(struct kmsg_dumper *dumper,
>> -			  enum kmsg_dump_reason reason);
>> +static void oops_to_nvram(enum kmsg_dump_reason reason);
>>  
>>  static struct kmsg_dumper nvram_kmsg_dumper = {
>>  	.dump = oops_to_nvram
>> @@ -642,11 +641,11 @@ void __init nvram_init_oops_partition(int rtas_partition_exists)
>>   * that we think will compress sufficiently to fit in the lnx,oops-log
>>   * partition.  If that's too much, go back and capture uncompressed text.
>>   */
>> -static void oops_to_nvram(struct kmsg_dumper *dumper,
>> -			  enum kmsg_dump_reason reason)
>> +static void oops_to_nvram(enum kmsg_dump_reason reason)
>>  {
>>  	struct oops_log_info *oops_hdr = (struct oops_log_info *)oops_buf;
>>  	static unsigned int oops_count = 0;
>> +	static struct kmsg_dump_iter iter;
>>  	static bool panicking = false;
>>  	static DEFINE_SPINLOCK(lock);
>>  	unsigned long flags;
>> @@ -681,13 +680,14 @@ static void oops_to_nvram(struct kmsg_dumper *dumper,
>>  		return;
>>  
>>  	if (big_oops_buf) {
>> -		kmsg_dump_get_buffer(dumper, false,
>> +		kmsg_dump_rewind(&iter);
>
> It would be nice to get rid of the kmsg_dump_rewind(&iter) calls
> in all callers.
>
> A solution might be to create the following in include/linux/kmsg_dump.h
>
> #define KMSG_DUMP_ITER_INIT(iter) {	\
> 	.cur_seq = 0,			\
> 	.next_seq = U64_MAX,		\
> 	}
>
> #define DEFINE_KMSG_DUMP_ITER(iter)	\
> 	struct kmsg_dump_iter iter = KMSG_DUMP_ITER_INIT(iter)

For this caller (arch/powerpc/kernel/nvram_64.c) and for
(kernel/debug/kdb/kdb_main.c), kmsg_dump_rewind() is called twice within
the dumper. So rewind will still be used there.

> Then we could do the following at the beginning of both
> kmsg_dump_get_buffer() and kmsg_dump_get_line():
>
> 	u64 clear_seq = latched_seq_read_nolock(&clear_seq);
>
> 	if (iter->cur_seq < clear_seq)
> 		cur_seq = clear_seq;

I suppose we need to add this part anyway, if we want to enforce that
records before @clear_seq are not to be available for dumpers.

> I am not completely sure about next_seq:
>
>    + kmsg_dump_get_buffer() will set it for the next call anyway.
>      It reads the blocks of messages from the newest.
>
>    + kmsg_dump_get_line() wants to read the entire buffer anyway.
>      But there is a small risk of an infinite loop when new messages
>      are printed when dumping each line.
>
> It might be better to avoid the infinite loop. We could do the following:
>
> static void check_and_set_iter(struct kmsg_dump_iter)
> {
> 	if (iter->cur_seq == 0 && iter->next_seq == U64_MAX) {
> 		kmsg_dump_rewind(iter);
> }
>
> and call this at the beginning of both kmsg_dump_get_buffer()
> and kmsg_dump_get_line()
>
> What do you think?

On a technical level, it does not make any difference. It is pure
cosmetic.

Personally, I prefer the rewind directly before the kmsg_dump_get calls
because it puts the initializer directly next to the user.

As an example to illustrate my view, I prefer:

    for (i = 0; i < n; i++)
        ...;

instead of:

    int i = 0;

    ...

    for (; i < n; i++)
        ...;

Also, I do not really like the special use of 0/U64_MAX to identify
special actions of the kmsg_dump_get functions.

> Note that I do not resist on it. But it might make the API easier to
> use from my POV.

Since you do not resist, I will keep the API the same for v4. But I will
add the @clear_seq check to the kmsg_dump_get functions.

John Ogness

^ permalink raw reply

* Re: [PATCH v2 24/37] KVM: PPC: Book3S HV P9: inline kvmhv_load_hv_regs_and_go into __kvmhv_vcpu_entry_p9
From: Fabiano Rosas @ 2021-03-02 13:48 UTC (permalink / raw)
  To: Nicholas Piggin, kvm-ppc; +Cc: linuxppc-dev, Nicholas Piggin
In-Reply-To: <20210225134652.2127648-25-npiggin@gmail.com>

Nicholas Piggin <npiggin@gmail.com> writes:

> Now the initial C implementation is done, inline more HV code to make
> rearranging things easier.
>
> And rename __kvmhv_vcpu_entry_p9 to drop the leading underscores as it's
> now C, and is now a more complete vcpu entry.
>
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>

Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>

> ---
>  arch/powerpc/include/asm/kvm_book3s_64.h |   2 +-
>  arch/powerpc/kvm/book3s_hv.c             | 181 +----------------------
>  arch/powerpc/kvm/book3s_hv_interrupt.c   | 168 ++++++++++++++++++++-
>  3 files changed, 169 insertions(+), 182 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h
> index c214bcffb441..eaf3a562bf1e 100644
> --- a/arch/powerpc/include/asm/kvm_book3s_64.h
> +++ b/arch/powerpc/include/asm/kvm_book3s_64.h
> @@ -153,7 +153,7 @@ static inline bool kvmhv_vcpu_is_radix(struct kvm_vcpu *vcpu)
>  	return radix;
>  }
>
> -int __kvmhv_vcpu_entry_p9(struct kvm_vcpu *vcpu);
> +int kvmhv_vcpu_entry_p9(struct kvm_vcpu *vcpu, u64 time_limit, unsigned long lpcr);
>
>  #define KVM_DEFAULT_HPT_ORDER	24	/* 16MB HPT by default */
>  #endif
> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> index 28a2761515e3..f99503acdda5 100644
> --- a/arch/powerpc/kvm/book3s_hv.c
> +++ b/arch/powerpc/kvm/book3s_hv.c
> @@ -3442,183 +3442,6 @@ static noinline void kvmppc_run_core(struct kvmppc_vcore *vc)
>  	trace_kvmppc_run_core(vc, 1);
>  }
>
> -static void switch_mmu_to_guest_radix(struct kvm *kvm, struct kvm_vcpu *vcpu, u64 lpcr)
> -{
> -	struct kvmppc_vcore *vc = vcpu->arch.vcore;
> -	struct kvm_nested_guest *nested = vcpu->arch.nested;
> -	u32 lpid;
> -
> -	lpid = nested ? nested->shadow_lpid : kvm->arch.lpid;
> -
> -	mtspr(SPRN_LPID, lpid);
> -	mtspr(SPRN_LPCR, lpcr);
> -	mtspr(SPRN_PID, vcpu->arch.pid);
> -	isync();
> -
> -	/* TLBIEL must have LPIDR set, so set guest LPID before flushing. */
> -	kvmppc_check_need_tlb_flush(kvm, vc->pcpu, nested);
> -}
> -
> -static void switch_mmu_to_host_radix(struct kvm *kvm, u32 pid)
> -{
> -	mtspr(SPRN_PID, pid);
> -	mtspr(SPRN_LPID, kvm->arch.host_lpid);
> -	mtspr(SPRN_LPCR, kvm->arch.host_lpcr);
> -	isync();
> -}
> -
> -/*
> - * Load up hypervisor-mode registers on P9.
> - */
> -static int kvmhv_load_hv_regs_and_go(struct kvm_vcpu *vcpu, u64 time_limit,
> -				     unsigned long lpcr)
> -{
> -	struct kvm *kvm = vcpu->kvm;
> -	struct kvmppc_vcore *vc = vcpu->arch.vcore;
> -	s64 hdec;
> -	u64 tb, purr, spurr;
> -	int trap;
> -	unsigned long host_hfscr = mfspr(SPRN_HFSCR);
> -	unsigned long host_ciabr = mfspr(SPRN_CIABR);
> -	unsigned long host_dawr0 = mfspr(SPRN_DAWR0);
> -	unsigned long host_dawrx0 = mfspr(SPRN_DAWRX0);
> -	unsigned long host_psscr = mfspr(SPRN_PSSCR);
> -	unsigned long host_pidr = mfspr(SPRN_PID);
> -	unsigned long host_dawr1 = 0;
> -	unsigned long host_dawrx1 = 0;
> -
> -	if (cpu_has_feature(CPU_FTR_DAWR1)) {
> -		host_dawr1 = mfspr(SPRN_DAWR1);
> -		host_dawrx1 = mfspr(SPRN_DAWRX1);
> -	}
> -
> -	tb = mftb();
> -	hdec = time_limit - tb;
> -	if (hdec < 0)
> -		return BOOK3S_INTERRUPT_HV_DECREMENTER;
> -
> -	if (vc->tb_offset) {
> -		u64 new_tb = tb + vc->tb_offset;
> -		mtspr(SPRN_TBU40, new_tb);
> -		tb = mftb();
> -		if ((tb & 0xffffff) < (new_tb & 0xffffff))
> -			mtspr(SPRN_TBU40, new_tb + 0x1000000);
> -		vc->tb_offset_applied = vc->tb_offset;
> -	}
> -
> -	if (vc->pcr)
> -		mtspr(SPRN_PCR, vc->pcr | PCR_MASK);
> -	mtspr(SPRN_DPDES, vc->dpdes);
> -	mtspr(SPRN_VTB, vc->vtb);
> -
> -	local_paca->kvm_hstate.host_purr = mfspr(SPRN_PURR);
> -	local_paca->kvm_hstate.host_spurr = mfspr(SPRN_SPURR);
> -	mtspr(SPRN_PURR, vcpu->arch.purr);
> -	mtspr(SPRN_SPURR, vcpu->arch.spurr);
> -
> -	if (dawr_enabled()) {
> -		mtspr(SPRN_DAWR0, vcpu->arch.dawr0);
> -		mtspr(SPRN_DAWRX0, vcpu->arch.dawrx0);
> -		if (cpu_has_feature(CPU_FTR_DAWR1)) {
> -			mtspr(SPRN_DAWR1, vcpu->arch.dawr1);
> -			mtspr(SPRN_DAWRX1, vcpu->arch.dawrx1);
> -		}
> -	}
> -	mtspr(SPRN_CIABR, vcpu->arch.ciabr);
> -	mtspr(SPRN_IC, vcpu->arch.ic);
> -
> -	mtspr(SPRN_PSSCR, vcpu->arch.psscr | PSSCR_EC |
> -	      (local_paca->kvm_hstate.fake_suspend << PSSCR_FAKE_SUSPEND_LG));
> -
> -	mtspr(SPRN_HFSCR, vcpu->arch.hfscr);
> -
> -	mtspr(SPRN_SPRG0, vcpu->arch.shregs.sprg0);
> -	mtspr(SPRN_SPRG1, vcpu->arch.shregs.sprg1);
> -	mtspr(SPRN_SPRG2, vcpu->arch.shregs.sprg2);
> -	mtspr(SPRN_SPRG3, vcpu->arch.shregs.sprg3);
> -
> -	mtspr(SPRN_AMOR, ~0UL);
> -
> -	switch_mmu_to_guest_radix(kvm, vcpu, lpcr);
> -
> -	/*
> -	 * P9 suppresses the HDEC exception when LPCR[HDICE] = 0,
> -	 * so set guest LPCR (with HDICE) before writing HDEC.
> -	 */
> -	mtspr(SPRN_HDEC, hdec);
> -
> -	mtspr(SPRN_SRR0, vcpu->arch.shregs.srr0);
> -	mtspr(SPRN_SRR1, vcpu->arch.shregs.srr1);
> -
> -	trap = __kvmhv_vcpu_entry_p9(vcpu);
> -
> -	/* Advance host PURR/SPURR by the amount used by guest */
> -	purr = mfspr(SPRN_PURR);
> -	spurr = mfspr(SPRN_SPURR);
> -	mtspr(SPRN_PURR, local_paca->kvm_hstate.host_purr +
> -	      purr - vcpu->arch.purr);
> -	mtspr(SPRN_SPURR, local_paca->kvm_hstate.host_spurr +
> -	      spurr - vcpu->arch.spurr);
> -	vcpu->arch.purr = purr;
> -	vcpu->arch.spurr = spurr;
> -
> -	vcpu->arch.ic = mfspr(SPRN_IC);
> -	vcpu->arch.pid = mfspr(SPRN_PID);
> -	vcpu->arch.psscr = mfspr(SPRN_PSSCR) & PSSCR_GUEST_VIS;
> -
> -	vcpu->arch.shregs.sprg0 = mfspr(SPRN_SPRG0);
> -	vcpu->arch.shregs.sprg1 = mfspr(SPRN_SPRG1);
> -	vcpu->arch.shregs.sprg2 = mfspr(SPRN_SPRG2);
> -	vcpu->arch.shregs.sprg3 = mfspr(SPRN_SPRG3);
> -
> -	/* Preserve PSSCR[FAKE_SUSPEND] until we've called kvmppc_save_tm_hv */
> -	mtspr(SPRN_PSSCR, host_psscr |
> -	      (local_paca->kvm_hstate.fake_suspend << PSSCR_FAKE_SUSPEND_LG));
> -	mtspr(SPRN_HFSCR, host_hfscr);
> -	mtspr(SPRN_CIABR, host_ciabr);
> -	mtspr(SPRN_DAWR0, host_dawr0);
> -	mtspr(SPRN_DAWRX0, host_dawrx0);
> -	if (cpu_has_feature(CPU_FTR_DAWR1)) {
> -		mtspr(SPRN_DAWR1, host_dawr1);
> -		mtspr(SPRN_DAWRX1, host_dawrx1);
> -	}
> -
> -	/*
> -	 * Since this is radix, do a eieio; tlbsync; ptesync sequence in
> -	 * case we interrupted the guest between a tlbie and a ptesync.
> -	 */
> -	asm volatile("eieio; tlbsync; ptesync");
> -
> -	/*
> -	 * cp_abort is required if the processor supports local copy-paste
> -	 * to clear the copy buffer that was under control of the guest.
> -	 */
> -	if (cpu_has_feature(CPU_FTR_ARCH_31))
> -		asm volatile(PPC_CP_ABORT);
> -
> -	vc->dpdes = mfspr(SPRN_DPDES);
> -	vc->vtb = mfspr(SPRN_VTB);
> -	mtspr(SPRN_DPDES, 0);
> -	if (vc->pcr)
> -		mtspr(SPRN_PCR, PCR_MASK);
> -
> -	if (vc->tb_offset_applied) {
> -		u64 new_tb = mftb() - vc->tb_offset_applied;
> -		mtspr(SPRN_TBU40, new_tb);
> -		tb = mftb();
> -		if ((tb & 0xffffff) < (new_tb & 0xffffff))
> -			mtspr(SPRN_TBU40, new_tb + 0x1000000);
> -		vc->tb_offset_applied = 0;
> -	}
> -
> -	/* HDEC must be at least as large as DEC, so decrementer_max fits */
> -	mtspr(SPRN_HDEC, decrementer_max);
> -
> -	switch_mmu_to_host_radix(kvm, host_pidr);
> -
> -	return trap;
> -}
> -
>  /*
>   * Virtual-mode guest entry for POWER9 and later when the host and
>   * guest are both using the radix MMU.  The LPIDR has already been set.
> @@ -3710,7 +3533,7 @@ static int kvmhv_p9_guest_entry(struct kvm_vcpu *vcpu, u64 time_limit,
>  		 * We need to save and restore the guest visible part of the
>  		 * psscr (i.e. using SPRN_PSSCR_PR) since the hypervisor
>  		 * doesn't do this for us. Note only required if pseries since
> -		 * this is done in kvmhv_load_hv_regs_and_go() below otherwise.
> +		 * this is done in kvmhv_vcpu_entry_p9() below otherwise.
>  		 */
>  		unsigned long host_psscr;
>  		/* call our hypervisor to load up HV regs and go */
> @@ -3748,7 +3571,7 @@ static int kvmhv_p9_guest_entry(struct kvm_vcpu *vcpu, u64 time_limit,
>
>  	} else {
>  		kvmppc_xive_push_vcpu(vcpu);
> -		trap = kvmhv_load_hv_regs_and_go(vcpu, time_limit, lpcr);
> +		trap = kvmhv_vcpu_entry_p9(vcpu, time_limit, lpcr);
>  		/* H_CEDE has to be handled now, not later */
>  		/* XICS hcalls must be handled before xive is pulled */
>  		if (trap == BOOK3S_INTERRUPT_SYSCALL &&
> diff --git a/arch/powerpc/kvm/book3s_hv_interrupt.c b/arch/powerpc/kvm/book3s_hv_interrupt.c
> index 5a7b036c447f..dea3eca3648a 100644
> --- a/arch/powerpc/kvm/book3s_hv_interrupt.c
> +++ b/arch/powerpc/kvm/book3s_hv_interrupt.c
> @@ -55,6 +55,31 @@ static void __accumulate_time(struct kvm_vcpu *vcpu, struct kvmhv_tb_accumulator
>  #define accumulate_time(vcpu, next) do {} while (0)
>  #endif
>
> +static void switch_mmu_to_guest_radix(struct kvm *kvm, struct kvm_vcpu *vcpu, u64 lpcr)
> +{
> +	struct kvmppc_vcore *vc = vcpu->arch.vcore;
> +	struct kvm_nested_guest *nested = vcpu->arch.nested;
> +	u32 lpid;
> +
> +	lpid = nested ? nested->shadow_lpid : kvm->arch.lpid;
> +
> +	mtspr(SPRN_LPID, lpid);
> +	mtspr(SPRN_LPCR, lpcr);
> +	mtspr(SPRN_PID, vcpu->arch.pid);
> +	isync();
> +
> +	/* TLBIEL must have LPIDR set, so set guest LPID before flushing. */
> +	kvmppc_check_need_tlb_flush(kvm, vc->pcpu, nested);
> +}
> +
> +static void switch_mmu_to_host_radix(struct kvm *kvm, u32 pid)
> +{
> +	mtspr(SPRN_PID, pid);
> +	mtspr(SPRN_LPID, kvm->arch.host_lpid);
> +	mtspr(SPRN_LPCR, kvm->arch.host_lpcr);
> +	isync();
> +}
> +
>  static inline void mfslb(unsigned int idx, u64 *slbee, u64 *slbev)
>  {
>  	asm volatile("slbmfev  %0,%1" : "=r" (*slbev) : "r" (idx));
> @@ -94,11 +119,86 @@ static void radix_clear_slb(void)
>  	}
>  }
>
> -int __kvmhv_vcpu_entry_p9(struct kvm_vcpu *vcpu)
> +int kvmhv_vcpu_entry_p9(struct kvm_vcpu *vcpu, u64 time_limit, unsigned long lpcr)
>  {
> +	struct kvm *kvm = vcpu->kvm;
> +	struct kvmppc_vcore *vc = vcpu->arch.vcore;
> +	s64 hdec;
> +	u64 tb, purr, spurr;
>  	u64 *exsave;
>  	unsigned long msr = mfmsr();
>  	int trap;
> +	unsigned long host_hfscr = mfspr(SPRN_HFSCR);
> +	unsigned long host_ciabr = mfspr(SPRN_CIABR);
> +	unsigned long host_dawr0 = mfspr(SPRN_DAWR0);
> +	unsigned long host_dawrx0 = mfspr(SPRN_DAWRX0);
> +	unsigned long host_psscr = mfspr(SPRN_PSSCR);
> +	unsigned long host_pidr = mfspr(SPRN_PID);
> +	unsigned long host_dawr1 = 0;
> +	unsigned long host_dawrx1 = 0;
> +
> +	if (cpu_has_feature(CPU_FTR_DAWR1)) {
> +		host_dawr1 = mfspr(SPRN_DAWR1);
> +		host_dawrx1 = mfspr(SPRN_DAWRX1);
> +	}
> +
> +	tb = mftb();
> +	hdec = time_limit - tb;
> +	if (hdec < 0)
> +		return BOOK3S_INTERRUPT_HV_DECREMENTER;
> +
> +	if (vc->tb_offset) {
> +		u64 new_tb = tb + vc->tb_offset;
> +		mtspr(SPRN_TBU40, new_tb);
> +		tb = mftb();
> +		if ((tb & 0xffffff) < (new_tb & 0xffffff))
> +			mtspr(SPRN_TBU40, new_tb + 0x1000000);
> +		vc->tb_offset_applied = vc->tb_offset;
> +	}
> +
> +	if (vc->pcr)
> +		mtspr(SPRN_PCR, vc->pcr | PCR_MASK);
> +	mtspr(SPRN_DPDES, vc->dpdes);
> +	mtspr(SPRN_VTB, vc->vtb);
> +
> +	local_paca->kvm_hstate.host_purr = mfspr(SPRN_PURR);
> +	local_paca->kvm_hstate.host_spurr = mfspr(SPRN_SPURR);
> +	mtspr(SPRN_PURR, vcpu->arch.purr);
> +	mtspr(SPRN_SPURR, vcpu->arch.spurr);
> +
> +	if (dawr_enabled()) {
> +		mtspr(SPRN_DAWR0, vcpu->arch.dawr0);
> +		mtspr(SPRN_DAWRX0, vcpu->arch.dawrx0);
> +		if (cpu_has_feature(CPU_FTR_DAWR1)) {
> +			mtspr(SPRN_DAWR1, vcpu->arch.dawr1);
> +			mtspr(SPRN_DAWRX1, vcpu->arch.dawrx1);
> +		}
> +	}
> +	mtspr(SPRN_CIABR, vcpu->arch.ciabr);
> +	mtspr(SPRN_IC, vcpu->arch.ic);
> +
> +	mtspr(SPRN_PSSCR, vcpu->arch.psscr | PSSCR_EC |
> +	      (local_paca->kvm_hstate.fake_suspend << PSSCR_FAKE_SUSPEND_LG));
> +
> +	mtspr(SPRN_HFSCR, vcpu->arch.hfscr);
> +
> +	mtspr(SPRN_SPRG0, vcpu->arch.shregs.sprg0);
> +	mtspr(SPRN_SPRG1, vcpu->arch.shregs.sprg1);
> +	mtspr(SPRN_SPRG2, vcpu->arch.shregs.sprg2);
> +	mtspr(SPRN_SPRG3, vcpu->arch.shregs.sprg3);
> +
> +	mtspr(SPRN_AMOR, ~0UL);
> +
> +	switch_mmu_to_guest_radix(kvm, vcpu, lpcr);
> +
> +	/*
> +	 * P9 suppresses the HDEC exception when LPCR[HDICE] = 0,
> +	 * so set guest LPCR (with HDICE) before writing HDEC.
> +	 */
> +	mtspr(SPRN_HDEC, hdec);
> +
> +	mtspr(SPRN_SRR0, vcpu->arch.shregs.srr0);
> +	mtspr(SPRN_SRR1, vcpu->arch.shregs.srr1);
>
>  	start_timing(vcpu, &vcpu->arch.rm_entry);
>
> @@ -216,6 +316,70 @@ int __kvmhv_vcpu_entry_p9(struct kvm_vcpu *vcpu)
>
>  	end_timing(vcpu);
>
> +	/* Advance host PURR/SPURR by the amount used by guest */
> +	purr = mfspr(SPRN_PURR);
> +	spurr = mfspr(SPRN_SPURR);
> +	mtspr(SPRN_PURR, local_paca->kvm_hstate.host_purr +
> +	      purr - vcpu->arch.purr);
> +	mtspr(SPRN_SPURR, local_paca->kvm_hstate.host_spurr +
> +	      spurr - vcpu->arch.spurr);
> +	vcpu->arch.purr = purr;
> +	vcpu->arch.spurr = spurr;
> +
> +	vcpu->arch.ic = mfspr(SPRN_IC);
> +	vcpu->arch.pid = mfspr(SPRN_PID);
> +	vcpu->arch.psscr = mfspr(SPRN_PSSCR) & PSSCR_GUEST_VIS;
> +
> +	vcpu->arch.shregs.sprg0 = mfspr(SPRN_SPRG0);
> +	vcpu->arch.shregs.sprg1 = mfspr(SPRN_SPRG1);
> +	vcpu->arch.shregs.sprg2 = mfspr(SPRN_SPRG2);
> +	vcpu->arch.shregs.sprg3 = mfspr(SPRN_SPRG3);
> +
> +	/* Preserve PSSCR[FAKE_SUSPEND] until we've called kvmppc_save_tm_hv */
> +	mtspr(SPRN_PSSCR, host_psscr |
> +	      (local_paca->kvm_hstate.fake_suspend << PSSCR_FAKE_SUSPEND_LG));
> +	mtspr(SPRN_HFSCR, host_hfscr);
> +	mtspr(SPRN_CIABR, host_ciabr);
> +	mtspr(SPRN_DAWR0, host_dawr0);
> +	mtspr(SPRN_DAWRX0, host_dawrx0);
> +	if (cpu_has_feature(CPU_FTR_DAWR1)) {
> +		mtspr(SPRN_DAWR1, host_dawr1);
> +		mtspr(SPRN_DAWRX1, host_dawrx1);
> +	}
> +
> +	/*
> +	 * Since this is radix, do a eieio; tlbsync; ptesync sequence in
> +	 * case we interrupted the guest between a tlbie and a ptesync.
> +	 */
> +	asm volatile("eieio; tlbsync; ptesync");
> +
> +	/*
> +	 * cp_abort is required if the processor supports local copy-paste
> +	 * to clear the copy buffer that was under control of the guest.
> +	 */
> +	if (cpu_has_feature(CPU_FTR_ARCH_31))
> +		asm volatile(PPC_CP_ABORT);
> +
> +	vc->dpdes = mfspr(SPRN_DPDES);
> +	vc->vtb = mfspr(SPRN_VTB);
> +	mtspr(SPRN_DPDES, 0);
> +	if (vc->pcr)
> +		mtspr(SPRN_PCR, PCR_MASK);
> +
> +	if (vc->tb_offset_applied) {
> +		u64 new_tb = mftb() - vc->tb_offset_applied;
> +		mtspr(SPRN_TBU40, new_tb);
> +		tb = mftb();
> +		if ((tb & 0xffffff) < (new_tb & 0xffffff))
> +			mtspr(SPRN_TBU40, new_tb + 0x1000000);
> +		vc->tb_offset_applied = 0;
> +	}
> +
> +	/* HDEC must be at least as large as DEC, so decrementer_max fits */
> +	mtspr(SPRN_HDEC, decrementer_max);
> +
> +	switch_mmu_to_host_radix(kvm, host_pidr);
> +
>  	return trap;
>  }
> -EXPORT_SYMBOL_GPL(__kvmhv_vcpu_entry_p9);
> +EXPORT_SYMBOL_GPL(kvmhv_vcpu_entry_p9);

^ permalink raw reply

* Re: [PATCH next v3 12/15] printk: introduce a kmsg_dump iterator
From: Petr Mladek @ 2021-03-02 13:55 UTC (permalink / raw)
  To: John Ogness
  Cc: linux-hyperv, Sergey Senozhatsky, Douglas Anderson,
	Paul Mackerras, Miquel Raynal, K. Y. Srinivasan, Thomas Meyer,
	Vignesh Raghavendra, Wei Liu, Madhavan Srinivasan,
	Stephen Hemminger, Anton Vorontsov, Joel Stanley, Jason Wessel,
	Anton Ivanov, Wei Li, Haiyang Zhang, Ravi Bangoria, Kees Cook,
	Alistair Popple, Jeff Dike, Colin Cross, linux-um,
	Daniel Thompson, Steven Rostedt, Davidlohr Bueso, Nicholas Piggin,
	Oleg Nesterov, Thomas Gleixner, Andy Shevchenko, Jordan Niethe,
	Michael Kelley, Christophe Leroy, Tony Luck, Pavel Tatashin,
	linux-kernel, Sergey Senozhatsky, Richard Weinberger,
	kgdb-bugreport, linux-mtd, linuxppc-dev, Mike Rapoport
In-Reply-To: <87lfb5pu8c.fsf@jogness.linutronix.de>

On Tue 2021-03-02 14:20:51, John Ogness wrote:
> On 2021-03-01, Petr Mladek <pmladek@suse.com> wrote:
> >> diff --git a/arch/powerpc/kernel/nvram_64.c b/arch/powerpc/kernel/nvram_64.c
> >> index 532f22637783..5a64b24a91c2 100644
> >> --- a/arch/powerpc/kernel/nvram_64.c
> >> +++ b/arch/powerpc/kernel/nvram_64.c
> >> @@ -681,13 +680,14 @@ static void oops_to_nvram(struct kmsg_dumper *dumper,
> >>  		return;
> >>  
> >>  	if (big_oops_buf) {
> >> -		kmsg_dump_get_buffer(dumper, false,
> >> +		kmsg_dump_rewind(&iter);
> >
> > It would be nice to get rid of the kmsg_dump_rewind(&iter) calls
> > in all callers.
> >
> > A solution might be to create the following in include/linux/kmsg_dump.h
> >
> > Then we could do the following at the beginning of both
> > kmsg_dump_get_buffer() and kmsg_dump_get_line():
> >
> > 	u64 clear_seq = latched_seq_read_nolock(&clear_seq);
> >
> > 	if (iter->cur_seq < clear_seq)
> > 		cur_seq = clear_seq;
> 
> I suppose we need to add this part anyway, if we want to enforce that
> records before @clear_seq are not to be available for dumpers.

Yup.

> > It might be better to avoid the infinite loop. We could do the following:
> >
> > static void check_and_set_iter(struct kmsg_dump_iter)
> > {
> > 	if (iter->cur_seq == 0 && iter->next_seq == U64_MAX) {
> > 		kmsg_dump_rewind(iter);
> > }
> >
> > and call this at the beginning of both kmsg_dump_get_buffer()
> > and kmsg_dump_get_line()
> >
> > What do you think?
> 
> On a technical level, it does not make any difference. It is pure
> cosmetic.

Yup.

> Personally, I prefer the rewind directly before the kmsg_dump_get calls
> because it puts the initializer directly next to the user.
> 
> As an example to illustrate my view, I prefer:
> 
>     for (i = 0; i < n; i++)
>         ...;
> 
> instead of:
> 
>     int i = 0;
> 
>     ...
> 
>     for (; i < n; i++)
>         ...;
> 
> Also, I do not really like the special use of 0/U64_MAX to identify
> special actions of the kmsg_dump_get functions.

Fair enough.

> > Note that I do not resist on it. But it might make the API easier to
> > use from my POV.
> 
> Since you do not resist, I will keep the API the same for v4. But I will
> add the @clear_seq check to the kmsg_dump_get functions.

Go for it.

Best Regards,
Petr

^ permalink raw reply

* Re: [PATCH 0/2] Fix CMDLINE_EXTEND handling for FDT "bootargs"
From: Rob Herring @ 2021-03-02 14:56 UTC (permalink / raw)
  To: Christophe Leroy
  Cc: Tyler Hicks, devicetree, Marc Zyngier, Arnd Bergmann,
	Greg Kroah-Hartman, Will Deacon, Frank Rowand, linuxppc-dev,
	Doug Anderson, Chris Packham, Palmer Dabbelt, linux-arm-kernel,
	Catalin Marinas, Max Uvarov, Android Kernel Team, Ard Biesheuvel,
	linux-kernel@vger.kernel.org, Daniel Walker
In-Reply-To: <bbbf5def-a168-9a4c-1106-b80883dfd389@csgroup.eu>

On Mon, Mar 1, 2021 at 11:45 AM Christophe Leroy
<christophe.leroy@csgroup.eu> wrote:
>
>
>
> Le 01/03/2021 à 18:26, Rob Herring a écrit :
> > +PPC folks and Daniel W
> >
> > On Mon, Mar 1, 2021 at 8:42 AM Will Deacon <will@kernel.org> wrote:
> >>
> >> On Mon, Mar 01, 2021 at 08:19:32AM -0600, Rob Herring wrote:
> >>> On Thu, Feb 25, 2021 at 6:59 AM Will Deacon <will@kernel.org> wrote:
> >>>> We recently [1] enabled support for CMDLINE_EXTEND on arm64, however
> >>>> when I started looking at replacing Android's out-of-tree implementation [2]
> >>>
> >>> Did anyone go read the common, reworked version of all this I
> >>> referenced that supports prepend and append. Here it is again[1].
> >>> Maybe I should have been more assertive there and said 'extend' is
> >>> ambiguous.
> >>
> >> I tried reading that, but (a) most of the series is not in the mailing list
> >> archives and (b) the patch that _is_ doesn't touch CMDLINE_EXTEND at all.
> >> Right now the code in mainline does the opposite of what it's documented to
> >> do.
> >
> > Actually, there is a newer version I found:
> >
> > https://lore.kernel.org/linuxppc-dev/1551469472-53043-1-git-send-email-danielwa@cisco.com/
> > https://lore.kernel.org/linuxppc-dev/1551469472-53043-2-git-send-email-danielwa@cisco.com/
> > https://lore.kernel.org/linuxppc-dev/1551469472-53043-3-git-send-email-danielwa@cisco.com/
>
> This was seen as too much intrusive into powerpc.

It looked like the main issue was string functions for KASAN?

As far as being too complex, I think that will be needed if you look
at all architectures and non-DT cases.

> I proposed an alternative at
> https://patchwork.ozlabs.org/project/linuxppc-dev/cover/cover.1554195798.git.christophe.leroy@c-s.fr/ but
> never got any feedback.

Didn't go to a list I subscribe to. In particular, if it had gone to
DT list and into PW you would have gotten a reply from me.

Rob

^ permalink raw reply

* Re: [PATCH v2 28/37] KVM: PPC: Book3S HV P9: Add helpers for OS SPR handling
From: Fabiano Rosas @ 2021-03-02 15:04 UTC (permalink / raw)
  To: Nicholas Piggin, kvm-ppc; +Cc: linuxppc-dev, Nicholas Piggin
In-Reply-To: <20210225134652.2127648-29-npiggin@gmail.com>

Nicholas Piggin <npiggin@gmail.com> writes:

> This is a first step to wrapping supervisor and user SPR saving and
> loading up into helpers, which will then be called independently in
> bare metal and nested HV cases in order to optimise SPR access.
>
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---

<snip>

> +/* vcpu guest regs must already be saved */
> +static void restore_p9_host_os_sprs(struct kvm_vcpu *vcpu,
> +				    struct p9_host_os_sprs *host_os_sprs)
> +{
> +	mtspr(SPRN_PSPB, 0);
> +	mtspr(SPRN_WORT, 0);
> +	mtspr(SPRN_UAMOR, 0);
> +	mtspr(SPRN_PSPB, 0);

Not your fault, but PSPB is set twice here.

> +
> +	mtspr(SPRN_DSCR, host_os_sprs->dscr);
> +	mtspr(SPRN_TIDR, host_os_sprs->tidr);
> +	mtspr(SPRN_IAMR, host_os_sprs->iamr);
> +
> +	if (host_os_sprs->amr != vcpu->arch.amr)
> +		mtspr(SPRN_AMR, host_os_sprs->amr);
> +
> +	if (host_os_sprs->fscr != vcpu->arch.fscr)
> +		mtspr(SPRN_FSCR, host_os_sprs->fscr);
> +}
> +

<snip>

> @@ -3605,34 +3666,10 @@ static int kvmhv_p9_guest_entry(struct kvm_vcpu *vcpu, u64 time_limit,
>  	vcpu->arch.dec_expires = dec + tb;
>  	vcpu->cpu = -1;
>  	vcpu->arch.thread_cpu = -1;
> -	vcpu->arch.ctrl = mfspr(SPRN_CTRLF);
> -
> -	vcpu->arch.iamr = mfspr(SPRN_IAMR);
> -	vcpu->arch.pspb = mfspr(SPRN_PSPB);
> -	vcpu->arch.fscr = mfspr(SPRN_FSCR);
> -	vcpu->arch.tar = mfspr(SPRN_TAR);
> -	vcpu->arch.ebbhr = mfspr(SPRN_EBBHR);
> -	vcpu->arch.ebbrr = mfspr(SPRN_EBBRR);
> -	vcpu->arch.bescr = mfspr(SPRN_BESCR);
> -	vcpu->arch.wort = mfspr(SPRN_WORT);
> -	vcpu->arch.tid = mfspr(SPRN_TIDR);
> -	vcpu->arch.amr = mfspr(SPRN_AMR);
> -	vcpu->arch.uamor = mfspr(SPRN_UAMOR);
> -	vcpu->arch.dscr = mfspr(SPRN_DSCR);
> -
> -	mtspr(SPRN_PSPB, 0);
> -	mtspr(SPRN_WORT, 0);
> -	mtspr(SPRN_UAMOR, 0);
> -	mtspr(SPRN_DSCR, host_dscr);
> -	mtspr(SPRN_TIDR, host_tidr);
> -	mtspr(SPRN_IAMR, host_iamr);
> -	mtspr(SPRN_PSPB, 0);
>
> -	if (host_amr != vcpu->arch.amr)
> -		mtspr(SPRN_AMR, host_amr);
> +	restore_p9_host_os_sprs(vcpu, &host_os_sprs);
>
> -	if (host_fscr != vcpu->arch.fscr)
> -		mtspr(SPRN_FSCR, host_fscr);
> +	store_spr_state(vcpu);

store_spr_state should come first, right? We want to save the guest
state before restoring the host state.

>
>  	msr_check_and_set(MSR_FP | MSR_VEC | MSR_VSX);
>  	store_fp_state(&vcpu->arch.fp);

^ permalink raw reply

* Re: [PATCH v19 00/13] Carry forward IMA measurement log on kexec on ARM64
From: Rob Herring @ 2021-03-02 15:06 UTC (permalink / raw)
  To: Lakshmi Ramasubramanian
  Cc: Mark Rutland, tao.li, Mimi Zohar, Paul Mackerras,
	vincenzo.frascino, Frank Rowand, Sasha Levin, Stephen Rothwell,
	Masahiro Yamada, James Morris, AKASHI, Takahiro, linux-arm-kernel,
	Catalin Marinas, Serge E. Hallyn, devicetree, Pavel Tatashin,
	Will Deacon, Prakhar Srivastava, Hsin-Yi Wang, Allison Randal,
	Christophe Leroy, Matthias Brugger, balajib, dmitry.kasatkin,
	linux-kernel@vger.kernel.org, James Morse, Greg Kroah-Hartman,
	Joe Perches, linux-integrity, linuxppc-dev, Thiago Jung Bauermann
In-Reply-To: <20210221174930.27324-1-nramas@linux.microsoft.com>

On Sun, Feb 21, 2021 at 11:49 AM Lakshmi Ramasubramanian
<nramas@linux.microsoft.com> wrote:
>
> On kexec file load Integrity Measurement Architecture (IMA) subsystem
> may verify the IMA signature of the kernel and initramfs, and measure
> it.  The command line parameters passed to the kernel in the kexec call
> may also be measured by IMA.  A remote attestation service can verify
> a TPM quote based on the TPM event log, the IMA measurement list, and
> the TPM PCR data.  This can be achieved only if the IMA measurement log
> is carried over from the current kernel to the next kernel across
> the kexec call.
>
> powerpc already supports carrying forward the IMA measurement log on
> kexec.  This patch set adds support for carrying forward the IMA
> measurement log on kexec on ARM64.
>
> This patch set moves the platform independent code defined for powerpc
> such that it can be reused for other platforms as well.  A chosen node
> "linux,ima-kexec-buffer" is added to the DTB for ARM64 to hold
> the address and the size of the memory reserved to carry
> the IMA measurement log.
>
> This patch set has been tested for ARM64 platform using QEMU.
> I would like help from the community for testing this change on powerpc.
> Thanks.
>
> This patch set is based on
> commit f31e3386a4e9 ("ima: Free IMA measurement buffer after kexec syscall")
> in https://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity.git
> "ima-kexec-fixes" branch.
>
> Changelog:
>
> v19
>   - Moved ELF related fields from "struct kimage_arch" for x86, arm64,
>     and powerpc architectures to "struct kimage".
>
> v18
>   - Added a parameter to of_kexec_alloc_and_setup_fdt() for the caller
>     to specify additional space needed for the FDT buffer
>   - Renamed arm64 and powerpc ELF buffer address field in
>     "struct kimage_arch" to elf_load_addr to match x86_64 name.
>   - Removed of_ima_add_kexec_buffer() and instead directly set
>     ima_buffer_addr and ima_buffer_size in ima_add_kexec_buffer()
>   - Moved FDT_EXTRA_SPACE definition from include/linux/of.h to
>     drivers/of/kexec.c
>
> v17
>   - Renamed of_kexec_setup_new_fdt() to of_kexec_alloc_and_setup_fdt(),
>     and moved memory allocation for the new FDT to this function.
>
> v16
>   - Defined functions to allocate and free buffer for FDT for powerpc
>     and arm64.
>   - Moved ima_buffer_addr and ima_buffer_size fields from
>     "struct kimage_arch" in powerpc to "struct kimage"
> v15
>   - Included Rob's patches in the patch set, and rebased
>     the changes to "next-integrity" branch.
>   - Allocate memory for DTB, on arm64, using kmalloc() instead of
>     vmalloc() to keep it consistent with powerpc implementation.
>   - Call of_kexec_setup_new_fdt() from setup_new_fdt_ppc64() and
>     remove setup_new_fdt() in the same patch to keep it bisect safe.
>
> v14
>   - Select CONFIG_HAVE_IMA_KEXEC for CONFIG_KEXEC_FILE, for powerpc
>     and arm64, if CONFIG_IMA is enabled.
>   - Use IS_ENABLED() macro instead of "#ifdef" in remove_ima_buffer(),
>     ima_get_kexec_buffer(), and ima_free_kexec_buffer().
>   - Call of_kexec_setup_new_fdt() from setup_new_fdt_ppc64() and
>     remove setup_new_fdt() in "arch/powerpc/kexec/file_load.c".
>
> v13
>   - Moved the arch independent functions to drivers/of/kexec.c
>     and then refactored the code.
>   - Moved arch_ima_add_kexec_buffer() to
>     security/integrity/ima/ima_kexec.c
>
> v12
>   - Use fdt_appendprop_addrrange() in setup_ima_buffer()
>     to setup the IMA measurement list property in
>     the device tree.
>   - Moved architecture independent functions from
>     "arch/powerpc/kexec/ima.c" to "drivers/of/kexec."
>   - Deleted "arch/powerpc/kexec/ima.c" and
>     "arch/powerpc/include/asm/ima.h".
>
> v11
>   - Rebased the changes on the kexec code refactoring done by
>     Rob Herring in his "dt/kexec" branch
>   - Removed "extern" keyword in function declarations
>   - Removed unnecessary header files included in C files
>   - Updated patch descriptions per Thiago's comments
>
> v10
>   - Moved delete_fdt_mem_rsv(), remove_ima_buffer(),
>     get_ima_kexec_buffer, and get_root_addr_size_cells()
>     to drivers/of/kexec.c
>   - Moved arch_ima_add_kexec_buffer() to
>     security/integrity/ima/ima_kexec.c
>   - Conditionally define IMA buffer fields in struct kimage_arch
>
> v9
>   - Moved delete_fdt_mem_rsv() to drivers/of/kexec_fdt.c
>   - Defined a new function get_ima_kexec_buffer() in
>     drivers/of/ima_kexec.c to replace do_get_kexec_buffer()
>   - Changed remove_ima_kexec_buffer() to the original function name
>     remove_ima_buffer()
>   - Moved remove_ima_buffer() to drivers/of/ima_kexec.c
>   - Moved ima_get_kexec_buffer() and ima_free_kexec_buffer()
>     to security/integrity/ima/ima_kexec.c
>
> v8:
>   - Moved remove_ima_kexec_buffer(), do_get_kexec_buffer(), and
>     delete_fdt_mem_rsv() to drivers/of/fdt.c
>   - Moved ima_dump_measurement_list() and ima_add_kexec_buffer()
>     back to security/integrity/ima/ima_kexec.c
>
> v7:
>   - Renamed remove_ima_buffer() to remove_ima_kexec_buffer() and moved
>     this function definition to kernel.
>   - Moved delete_fdt_mem_rsv() definition to kernel
>   - Moved ima_dump_measurement_list() and ima_add_kexec_buffer() to
>     a new file namely ima_kexec_fdt.c in IMA
>
> v6:
>   - Remove any existing FDT_PROP_IMA_KEXEC_BUFFER property in the device
>     tree and also its corresponding memory reservation in the currently
>     running kernel.
>   - Moved the function remove_ima_buffer() defined for powerpc to IMA
>     and renamed the function to ima_remove_kexec_buffer(). Also, moved
>     delete_fdt_mem_rsv() from powerpc to IMA.
>
> v5:
>   - Merged get_addr_size_cells() and do_get_kexec_buffer() into a single
>     function when moving the arch independent code from powerpc to IMA
>   - Reverted the change to use FDT functions in powerpc code and added
>     back the original code in get_addr_size_cells() and
>     do_get_kexec_buffer() for powerpc.
>   - Added fdt_add_mem_rsv() for ARM64 to reserve the memory for
>     the IMA log buffer during kexec.
>   - Fixed the warning reported by kernel test bot for ARM64
>     arch_ima_add_kexec_buffer() - moved this function to a new file
>     namely arch/arm64/kernel/ima_kexec.c
>
> v4:
>   - Submitting the patch series on behalf of the original author
>     Prakhar Srivastava <prsriva@linux.microsoft.com>
>   - Moved FDT_PROP_IMA_KEXEC_BUFFER ("linux,ima-kexec-buffer") to
>     libfdt.h so that it can be shared by multiple platforms.
>
> v3:
> Breakup patches further into separate patches.
>   - Refactoring non architecture specific code out of powerpc
>   - Update powerpc related code to use fdt functions
>   - Update IMA buffer read related code to use of functions
>   - Add support to store the memory information of the IMA
>     measurement logs to be carried forward.
>   - Update the property strings to align with documented nodes
>     https://github.com/devicetree-org/dt-schema/pull/46
>
> v2:
>   Break patches into separate patches.
>   - Powerpc related Refactoring
>   - Updating the docuemntation for chosen node
>   - Updating arm64 to support IMA buffer pass
>
> v1:
>   Refactoring carrying over IMA measuremnet logs over Kexec. This patch
>     moves the non-architecture specific code out of powerpc and adds to
>     security/ima.(Suggested by Thiago)
>   Add Documentation regarding the ima-kexec-buffer node in the chosen
>     node documentation
>
> v0:
>   Add a layer of abstraction to use the memory reserved by device tree
>     for ima buffer pass.
>   Add support for ima buffer pass using reserved memory for arm64 kexec.
>     Update the arch sepcific code path in kexec file load to store the
>     ima buffer in the reserved memory. The same reserved memory is read
>     on kexec or cold boot.
>
> Lakshmi Ramasubramanian (10):
>   kexec: Move ELF fields to struct kimage
>   arm64: Use ELF fields defined in 'struct kimage'
>   powerpc: Use ELF fields defined in 'struct kimage'
>   x86: Use ELF fields defined in 'struct kimage'
>   powerpc: Move ima buffer fields to struct kimage
>   powerpc: Enable passing IMA log to next kernel on kexec
>   powerpc: Move arch independent ima kexec functions to
>     drivers/of/kexec.c
>   kexec: Use fdt_appendprop_addrrange() to add ima buffer to FDT
>   powerpc: Delete unused function delete_fdt_mem_rsv()
>   arm64: Enable passing IMA log to next kernel on kexec
>
> Rob Herring (3):
>   of: Add a common kexec FDT setup function
>   arm64: Use common of_kexec_alloc_and_setup_fdt()
>   powerpc: Use common of_kexec_alloc_and_setup_fdt()
>
>  arch/arm64/Kconfig                     |   1 +
>  arch/arm64/include/asm/kexec.h         |   4 -
>  arch/arm64/kernel/machine_kexec_file.c | 194 +----------
>  arch/powerpc/Kconfig                   |   2 +-
>  arch/powerpc/include/asm/ima.h         |  30 --
>  arch/powerpc/include/asm/kexec.h       |  14 +-
>  arch/powerpc/kexec/Makefile            |   7 -
>  arch/powerpc/kexec/elf_64.c            |  30 +-
>  arch/powerpc/kexec/file_load.c         | 183 +---------
>  arch/powerpc/kexec/file_load_64.c      |  21 +-
>  arch/powerpc/kexec/ima.c               | 219 ------------
>  arch/x86/include/asm/kexec.h           |   5 -
>  arch/x86/kernel/crash.c                |  14 +-
>  arch/x86/kernel/kexec-bzimage64.c      |   2 +-
>  arch/x86/kernel/machine_kexec_64.c     |   4 +-
>  drivers/of/Makefile                    |   6 +
>  drivers/of/kexec.c                     | 458 +++++++++++++++++++++++++
>  include/linux/kexec.h                  |   8 +
>  include/linux/of.h                     |   7 +
>  security/integrity/ima/ima.h           |   4 -
>  security/integrity/ima/ima_kexec.c     |   9 +-
>  21 files changed, 539 insertions(+), 683 deletions(-)
>  delete mode 100644 arch/powerpc/include/asm/ima.h
>  delete mode 100644 arch/powerpc/kexec/ima.c
>  create mode 100644 drivers/of/kexec.c

I fixed up the Fixes tags and applied for 5.13.

Rob

^ permalink raw reply

* Re: [PATCH 0/2] Fix CMDLINE_EXTEND handling for FDT "bootargs"
From: Christophe Leroy @ 2021-03-02 15:16 UTC (permalink / raw)
  To: Rob Herring
  Cc: Tyler Hicks, devicetree, Marc Zyngier, Arnd Bergmann,
	Greg Kroah-Hartman, Will Deacon, Frank Rowand, linuxppc-dev,
	Doug Anderson, Chris Packham, Palmer Dabbelt, linux-arm-kernel,
	Catalin Marinas, Max Uvarov, Android Kernel Team, Ard Biesheuvel,
	linux-kernel@vger.kernel.org, Daniel Walker
In-Reply-To: <CAL_Jsq+Te5+kQzbAMCzuRCkmoZWBDKGhynUC8BfvOm=R5jT4Jg@mail.gmail.com>



Le 02/03/2021 à 15:56, Rob Herring a écrit :
> On Mon, Mar 1, 2021 at 11:45 AM Christophe Leroy
> <christophe.leroy@csgroup.eu> wrote:
>>
>>
>>
>> Le 01/03/2021 à 18:26, Rob Herring a écrit :
>>> +PPC folks and Daniel W
>>>
>>> On Mon, Mar 1, 2021 at 8:42 AM Will Deacon <will@kernel.org> wrote:
>>>>
>>>> On Mon, Mar 01, 2021 at 08:19:32AM -0600, Rob Herring wrote:
>>>>> On Thu, Feb 25, 2021 at 6:59 AM Will Deacon <will@kernel.org> wrote:
>>>>>> We recently [1] enabled support for CMDLINE_EXTEND on arm64, however
>>>>>> when I started looking at replacing Android's out-of-tree implementation [2]
>>>>>
>>>>> Did anyone go read the common, reworked version of all this I
>>>>> referenced that supports prepend and append. Here it is again[1].
>>>>> Maybe I should have been more assertive there and said 'extend' is
>>>>> ambiguous.
>>>>
>>>> I tried reading that, but (a) most of the series is not in the mailing list
>>>> archives and (b) the patch that _is_ doesn't touch CMDLINE_EXTEND at all.
>>>> Right now the code in mainline does the opposite of what it's documented to
>>>> do.
>>>
>>> Actually, there is a newer version I found:
>>>
>>> https://lore.kernel.org/linuxppc-dev/1551469472-53043-1-git-send-email-danielwa@cisco.com/
>>> https://lore.kernel.org/linuxppc-dev/1551469472-53043-2-git-send-email-danielwa@cisco.com/
>>> https://lore.kernel.org/linuxppc-dev/1551469472-53043-3-git-send-email-danielwa@cisco.com/
>>
>> This was seen as too much intrusive into powerpc.
> 
> It looked like the main issue was string functions for KASAN?

This is one issue yes,

> 
> As far as being too complex, I think that will be needed if you look
> at all architectures and non-DT cases.

As far as I remember, I could't understand why we absolutely need to define the command line string 
in the common part of the code, leading to being obliged to use macros in order to allow the 
architecture to specify in which section it wants the string.

Why not leave the definition of the string to the architecture and just declare it in the common 
code, allowing the architecture to put it where it suits it and reducing opacity and allowing use of 
standard static inline functions instead of uggly macros.


> 
>> I proposed an alternative at
>> https://patchwork.ozlabs.org/project/linuxppc-dev/cover/cover.1554195798.git.christophe.leroy@c-s.fr/ but
>> never got any feedback.
> 
> Didn't go to a list I subscribe to. In particular, if it had gone to
> DT list and into PW you would have gotten a reply from me.
> 

Sorry for that. Original series from Daniel 
(https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20190319232448.45964-2-danielwa@cisco.com/) 
was sent only to linuxppc-dev list, and Michael suggested to also send it to linux-arch list, and I 
also always copy linux-kernel.

If there is new interest for that functionnality, I can try and rebase my series.

Christophe

^ permalink raw reply

* Re: [PATCH v19 00/13] Carry forward IMA measurement log on kexec on ARM64
From: Lakshmi Ramasubramanian @ 2021-03-02 15:25 UTC (permalink / raw)
  To: Rob Herring
  Cc: Mark Rutland, tao.li, Mimi Zohar, Paul Mackerras,
	vincenzo.frascino, Frank Rowand, Sasha Levin, Stephen Rothwell,
	Masahiro Yamada, James Morris, AKASHI, Takahiro, linux-arm-kernel,
	Catalin Marinas, Serge E. Hallyn, devicetree, Pavel Tatashin,
	Will Deacon, Prakhar Srivastava, Hsin-Yi Wang, Allison Randal,
	Christophe Leroy, Matthias Brugger, balajib, dmitry.kasatkin,
	linux-kernel@vger.kernel.org, James Morse, Greg Kroah-Hartman,
	Joe Perches, linux-integrity, linuxppc-dev, Thiago Jung Bauermann
In-Reply-To: <CAL_JsqKiOVo2rDhstAA-jUkMJiajHM=uwfj3JQd64h_eEfSjTw@mail.gmail.com>

On 3/2/21 7:06 AM, Rob Herring wrote:
> On Sun, Feb 21, 2021 at 11:49 AM Lakshmi Ramasubramanian
> <nramas@linux.microsoft.com> wrote:
>>
>> On kexec file load Integrity Measurement Architecture (IMA) subsystem
>> may verify the IMA signature of the kernel and initramfs, and measure
>> it.  The command line parameters passed to the kernel in the kexec call
>> may also be measured by IMA.  A remote attestation service can verify
>> a TPM quote based on the TPM event log, the IMA measurement list, and
>> the TPM PCR data.  This can be achieved only if the IMA measurement log
>> is carried over from the current kernel to the next kernel across
>> the kexec call.
>>
>> powerpc already supports carrying forward the IMA measurement log on
>> kexec.  This patch set adds support for carrying forward the IMA
>> measurement log on kexec on ARM64.
>>
>> This patch set moves the platform independent code defined for powerpc
>> such that it can be reused for other platforms as well.  A chosen node
>> "linux,ima-kexec-buffer" is added to the DTB for ARM64 to hold
>> the address and the size of the memory reserved to carry
>> the IMA measurement log.
>>
>> This patch set has been tested for ARM64 platform using QEMU.
>> I would like help from the community for testing this change on powerpc.
>> Thanks.
>>
>> This patch set is based on
>> commit f31e3386a4e9 ("ima: Free IMA measurement buffer after kexec syscall")
>> in https://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity.git
>> "ima-kexec-fixes" branch.

[...]

>>
>> Lakshmi Ramasubramanian (10):
>>    kexec: Move ELF fields to struct kimage
>>    arm64: Use ELF fields defined in 'struct kimage'
>>    powerpc: Use ELF fields defined in 'struct kimage'
>>    x86: Use ELF fields defined in 'struct kimage'
>>    powerpc: Move ima buffer fields to struct kimage
>>    powerpc: Enable passing IMA log to next kernel on kexec
>>    powerpc: Move arch independent ima kexec functions to
>>      drivers/of/kexec.c
>>    kexec: Use fdt_appendprop_addrrange() to add ima buffer to FDT
>>    powerpc: Delete unused function delete_fdt_mem_rsv()
>>    arm64: Enable passing IMA log to next kernel on kexec
>>
>> Rob Herring (3):
>>    of: Add a common kexec FDT setup function
>>    arm64: Use common of_kexec_alloc_and_setup_fdt()
>>    powerpc: Use common of_kexec_alloc_and_setup_fdt()
>>
>>   arch/arm64/Kconfig                     |   1 +
>>   arch/arm64/include/asm/kexec.h         |   4 -
>>   arch/arm64/kernel/machine_kexec_file.c | 194 +----------
>>   arch/powerpc/Kconfig                   |   2 +-
>>   arch/powerpc/include/asm/ima.h         |  30 --
>>   arch/powerpc/include/asm/kexec.h       |  14 +-
>>   arch/powerpc/kexec/Makefile            |   7 -
>>   arch/powerpc/kexec/elf_64.c            |  30 +-
>>   arch/powerpc/kexec/file_load.c         | 183 +---------
>>   arch/powerpc/kexec/file_load_64.c      |  21 +-
>>   arch/powerpc/kexec/ima.c               | 219 ------------
>>   arch/x86/include/asm/kexec.h           |   5 -
>>   arch/x86/kernel/crash.c                |  14 +-
>>   arch/x86/kernel/kexec-bzimage64.c      |   2 +-
>>   arch/x86/kernel/machine_kexec_64.c     |   4 +-
>>   drivers/of/Makefile                    |   6 +
>>   drivers/of/kexec.c                     | 458 +++++++++++++++++++++++++
>>   include/linux/kexec.h                  |   8 +
>>   include/linux/of.h                     |   7 +
>>   security/integrity/ima/ima.h           |   4 -
>>   security/integrity/ima/ima_kexec.c     |   9 +-
>>   21 files changed, 539 insertions(+), 683 deletions(-)
>>   delete mode 100644 arch/powerpc/include/asm/ima.h
>>   delete mode 100644 arch/powerpc/kexec/ima.c
>>   create mode 100644 drivers/of/kexec.c
> 
> I fixed up the Fixes tags and applied for 5.13.
> 

Thanks a lot Rob.

  -lakshmi



^ permalink raw reply

* Re: [PATCH 0/2] Fix CMDLINE_EXTEND handling for FDT "bootargs"
From: Daniel Walker @ 2021-03-02 17:12 UTC (permalink / raw)
  To: Rob Herring
  Cc: Tyler Hicks, devicetree, Android Kernel Team, Catalin Marinas,
	Arnd Bergmann, Frank Rowand, Marc Zyngier, linuxppc-dev,
	linux-kernel@vger.kernel.org, Doug Anderson, Chris Packham,
	Palmer Dabbelt, Greg Kroah-Hartman, Max Uvarov, Will Deacon,
	Ard Biesheuvel, linux-arm-kernel
In-Reply-To: <CAL_JsqJ11D-7a3pwLTVd+rHjqDGBb=b8OU_a6h3Co-at+2qMtQ@mail.gmail.com>

On Mon, Mar 01, 2021 at 11:26:14AM -0600, Rob Herring wrote:
> +PPC folks and Daniel W
> 
> On Mon, Mar 1, 2021 at 8:42 AM Will Deacon <will@kernel.org> wrote:
> >
> > On Mon, Mar 01, 2021 at 08:19:32AM -0600, Rob Herring wrote:
> > > On Thu, Feb 25, 2021 at 6:59 AM Will Deacon <will@kernel.org> wrote:
> > > > We recently [1] enabled support for CMDLINE_EXTEND on arm64, however
> > > > when I started looking at replacing Android's out-of-tree implementation [2]
> > >
> > > Did anyone go read the common, reworked version of all this I
> > > referenced that supports prepend and append. Here it is again[1].
> > > Maybe I should have been more assertive there and said 'extend' is
> > > ambiguous.
> >
> > I tried reading that, but (a) most of the series is not in the mailing list
> > archives and (b) the patch that _is_ doesn't touch CMDLINE_EXTEND at all.
> > Right now the code in mainline does the opposite of what it's documented to
> > do.
> 
> Actually, there is a newer version I found:
> 
> https://lore.kernel.org/linuxppc-dev/1551469472-53043-1-git-send-email-danielwa@cisco.com/
> https://lore.kernel.org/linuxppc-dev/1551469472-53043-2-git-send-email-danielwa@cisco.com/
> https://lore.kernel.org/linuxppc-dev/1551469472-53043-3-git-send-email-danielwa@cisco.com/
> 
> (Once again, there's some weird threading going on)
> 

I'm happy to work with anyone to resubmit the changes. We currently use the
changes in Cisco, and we have used them for many years.

I was planning to update and resubmit since someone has recently inquired about
why it wasn't upstream.

Daniel

^ permalink raw reply

* [PATCH v2 2/7] drivers: of: use cmdline building function
From: Christophe Leroy @ 2021-03-02 17:25 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	danielwa, robh, daniel
  Cc: linux-arch, devicetree, linuxppc-dev, linux-kernel
In-Reply-To: <cover.1614705851.git.christophe.leroy@csgroup.eu>

This patch uses the new cmdline building function to
concatenate the of provided cmdline with built-in parts
based on compile-time options.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
 drivers/of/fdt.c | 23 ++++-------------------
 1 file changed, 4 insertions(+), 19 deletions(-)

diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
index dcc1dd96911a..cf2b95b8f298 100644
--- a/drivers/of/fdt.c
+++ b/drivers/of/fdt.c
@@ -25,6 +25,7 @@
 #include <linux/serial_core.h>
 #include <linux/sysfs.h>
 #include <linux/random.h>
+#include <linux/cmdline.h>
 
 #include <asm/setup.h>  /* for COMMAND_LINE_SIZE */
 #include <asm/page.h>
@@ -1050,26 +1051,10 @@ int __init early_init_dt_scan_chosen(unsigned long node, const char *uname,
 
 	/* Retrieve command line */
 	p = of_get_flat_dt_prop(node, "bootargs", &l);
-	if (p != NULL && l > 0)
-		strlcpy(data, p, min(l, COMMAND_LINE_SIZE));
+	if (l <= 0)
+		p = NULL;
 
-	/*
-	 * CONFIG_CMDLINE is meant to be a default in case nothing else
-	 * managed to set the command line, unless CONFIG_CMDLINE_FORCE
-	 * is set in which case we override whatever was found earlier.
-	 */
-#ifdef CONFIG_CMDLINE
-#if defined(CONFIG_CMDLINE_EXTEND)
-	strlcat(data, " ", COMMAND_LINE_SIZE);
-	strlcat(data, CONFIG_CMDLINE, COMMAND_LINE_SIZE);
-#elif defined(CONFIG_CMDLINE_FORCE)
-	strlcpy(data, CONFIG_CMDLINE, COMMAND_LINE_SIZE);
-#else
-	/* No arguments from boot loader, use kernel's  cmdl*/
-	if (!((char *)data)[0])
-		strlcpy(data, CONFIG_CMDLINE, COMMAND_LINE_SIZE);
-#endif
-#endif /* CONFIG_CMDLINE */
+	cmdline_build(data, p, COMMAND_LINE_SIZE);
 
 	pr_debug("Command line is: %s\n", (char *)data);
 
-- 
2.25.0


^ permalink raw reply related

* [PATCH v2 0/7] Improve boot command line handling
From: Christophe Leroy @ 2021-03-02 17:25 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	danielwa, robh, daniel
  Cc: linux-arch, devicetree, linuxppc-dev, linux-kernel

The purpose of this series is to improve and enhance the
handling of kernel boot arguments.

It is first focussed on powerpc but also extends the capability
for other arches.

This is based on suggestion from Daniel Walker <danielwa@cisco.com>

Christophe Leroy (7):
  cmdline: Add generic function to build command line.
  drivers: of: use cmdline building function
  powerpc: convert to generic builtin command line
  cmdline: Add capability to prepend the command line
  powerpc: add capability to prepend default command line
  cmdline: Gives architectures opportunity to use generically defined
    boot cmdline manipulation
  powerpc: use generic CMDLINE manipulations

 arch/powerpc/Kconfig            | 37 ++-----------------
 arch/powerpc/kernel/prom_init.c | 35 +++---------------
 drivers/of/fdt.c                | 23 ++----------
 include/linux/cmdline.h         | 65 +++++++++++++++++++++++++++++++++
 init/Kconfig                    | 56 ++++++++++++++++++++++++++++
 5 files changed, 133 insertions(+), 83 deletions(-)
 create mode 100644 include/linux/cmdline.h

-- 
2.25.0


^ permalink raw reply

* [PATCH v2 1/7] cmdline: Add generic function to build command line.
From: Christophe Leroy @ 2021-03-02 17:25 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	danielwa, robh, daniel
  Cc: linux-arch, devicetree, linuxppc-dev, linux-kernel
In-Reply-To: <cover.1614705851.git.christophe.leroy@csgroup.eu>

This code provides architectures with a way to build command line
based on what is built in the kernel and what is handed over by the
bootloader, based on selected compile-time options.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
 include/linux/cmdline.h | 62 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 62 insertions(+)
 create mode 100644 include/linux/cmdline.h

diff --git a/include/linux/cmdline.h b/include/linux/cmdline.h
new file mode 100644
index 000000000000..ae3610bb0ee2
--- /dev/null
+++ b/include/linux/cmdline.h
@@ -0,0 +1,62 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _LINUX_CMDLINE_H
+#define _LINUX_CMDLINE_H
+
+static __always_inline size_t cmdline_strlen(const char *s)
+{
+	const char *sc;
+
+	for (sc = s; *sc != '\0'; ++sc)
+		; /* nothing */
+	return sc - s;
+}
+
+static __always_inline size_t cmdline_strlcat(char *dest, const char *src, size_t count)
+{
+	size_t dsize = cmdline_strlen(dest);
+	size_t len = cmdline_strlen(src);
+	size_t res = dsize + len;
+
+	/* This would be a bug */
+	if (dsize >= count)
+		return count;
+
+	dest += dsize;
+	count -= dsize;
+	if (len >= count)
+		len = count - 1;
+	memcpy(dest, src, len);
+	dest[len] = 0;
+	return res;
+}
+
+/*
+ * This function will append a builtin command line to the command
+ * line provided by the bootloader. Kconfig options can be used to alter
+ * the behavior of this builtin command line.
+ * @dest: The destination of the final appended/prepended string.
+ * @src: The starting string or NULL if there isn't one. Must not equal dest.
+ * @length: the length of dest buffer.
+ */
+static __always_inline void cmdline_build(char *dest, const char *src, size_t length)
+{
+	if (length <= 0)
+		return;
+
+	dest[0] = 0;
+
+#ifdef CONFIG_CMDLINE
+	if (IS_ENABLED(CONFIG_CMDLINE_FORCE) || !src || !src[0]) {
+		cmdline_strlcat(dest, CONFIG_CMDLINE, length);
+		return;
+	}
+#endif
+	if (dest != src)
+		cmdline_strlcat(dest, src, length);
+#ifdef CONFIG_CMDLINE
+	if (IS_ENABLED(CONFIG_CMDLINE_EXTEND) && sizeof(CONFIG_CMDLINE) > 1)
+		cmdline_strlcat(dest, " " CONFIG_CMDLINE, length);
+#endif
+}
+
+#endif /* _LINUX_CMDLINE_H */
-- 
2.25.0


^ permalink raw reply related

* [PATCH v2 4/7] cmdline: Add capability to prepend the command line
From: Christophe Leroy @ 2021-03-02 17:25 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	danielwa, robh, daniel
  Cc: linux-arch, devicetree, linuxppc-dev, linux-kernel
In-Reply-To: <cover.1614705851.git.christophe.leroy@csgroup.eu>

This patchs adds an option of prepend a text to the command
line instead of appending it.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
 include/linux/cmdline.h | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/include/linux/cmdline.h b/include/linux/cmdline.h
index ae3610bb0ee2..144346051e01 100644
--- a/include/linux/cmdline.h
+++ b/include/linux/cmdline.h
@@ -31,7 +31,7 @@ static __always_inline size_t cmdline_strlcat(char *dest, const char *src, size_
 }
 
 /*
- * This function will append a builtin command line to the command
+ * This function will append or prepend a builtin command line to the command
  * line provided by the bootloader. Kconfig options can be used to alter
  * the behavior of this builtin command line.
  * @dest: The destination of the final appended/prepended string.
@@ -50,6 +50,9 @@ static __always_inline void cmdline_build(char *dest, const char *src, size_t le
 		cmdline_strlcat(dest, CONFIG_CMDLINE, length);
 		return;
 	}
+
+	if (IS_ENABLED(CONFIG_CMDLINE_PREPEND) && sizeof(CONFIG_CMDLINE) > 1)
+		cmdline_strlcat(dest, CONFIG_CMDLINE " ", length);
 #endif
 	if (dest != src)
 		cmdline_strlcat(dest, src, length);
-- 
2.25.0


^ permalink raw reply related

* [PATCH v2 3/7] powerpc: convert to generic builtin command line
From: Christophe Leroy @ 2021-03-02 17:25 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	danielwa, robh, daniel
  Cc: linux-arch, devicetree, linuxppc-dev, linux-kernel
In-Reply-To: <cover.1614705851.git.christophe.leroy@csgroup.eu>

This updates the powerpc code to use the new cmdline building function.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
 arch/powerpc/kernel/prom_init.c | 35 +++++----------------------------
 1 file changed, 5 insertions(+), 30 deletions(-)

diff --git a/arch/powerpc/kernel/prom_init.c b/arch/powerpc/kernel/prom_init.c
index ccf77b985c8f..24157e526f80 100644
--- a/arch/powerpc/kernel/prom_init.c
+++ b/arch/powerpc/kernel/prom_init.c
@@ -27,6 +27,7 @@
 #include <linux/initrd.h>
 #include <linux/bitops.h>
 #include <linux/pgtable.h>
+#include <linux/cmdline.h>
 #include <asm/prom.h>
 #include <asm/rtas.h>
 #include <asm/page.h>
@@ -152,7 +153,7 @@ static struct prom_t __prombss prom;
 static unsigned long __prombss prom_entry;
 
 static char __prombss of_stdout_device[256];
-static char __prombss prom_scratch[256];
+static char __prombss prom_scratch[COMMAND_LINE_SIZE];
 
 static unsigned long __prombss dt_header_start;
 static unsigned long __prombss dt_struct_start, dt_struct_end;
@@ -304,26 +305,6 @@ static char __init *prom_strstr(const char *s1, const char *s2)
 	return NULL;
 }
 
-static size_t __init prom_strlcat(char *dest, const char *src, size_t count)
-{
-	size_t dsize = prom_strlen(dest);
-	size_t len = prom_strlen(src);
-	size_t res = dsize + len;
-
-	/* This would be a bug */
-	if (dsize >= count)
-		return count;
-
-	dest += dsize;
-	count -= dsize;
-	if (len >= count)
-		len = count-1;
-	memcpy(dest, src, len);
-	dest[len] = 0;
-	return res;
-
-}
-
 #ifdef CONFIG_PPC_PSERIES
 static int __init prom_strtobool(const char *s, bool *res)
 {
@@ -768,19 +749,13 @@ static unsigned long prom_memparse(const char *ptr, const char **retptr)
 static void __init early_cmdline_parse(void)
 {
 	const char *opt;
-
-	char *p;
 	int l = 0;
 
-	prom_cmd_line[0] = 0;
-	p = prom_cmd_line;
-
 	if (!IS_ENABLED(CONFIG_CMDLINE_FORCE) && (long)prom.chosen > 0)
-		l = prom_getprop(prom.chosen, "bootargs", p, COMMAND_LINE_SIZE-1);
+		l = prom_getprop(prom.chosen, "bootargs", prom_scratch,
+				 COMMAND_LINE_SIZE - 1);
 
-	if (IS_ENABLED(CONFIG_CMDLINE_EXTEND) || l <= 0 || p[0] == '\0')
-		prom_strlcat(prom_cmd_line, " " CONFIG_CMDLINE,
-			     sizeof(prom_cmd_line));
+	cmdline_build(prom_cmd_line, l > 0 ? prom_scratch : NULL, sizeof(prom_scratch));
 
 	prom_printf("command line: %s\n", prom_cmd_line);
 
-- 
2.25.0


^ permalink raw reply related

* [PATCH v2 5/7] powerpc: add capability to prepend default command line
From: Christophe Leroy @ 2021-03-02 17:25 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	danielwa, robh, daniel
  Cc: linux-arch, devicetree, linuxppc-dev, linux-kernel
In-Reply-To: <cover.1614705851.git.christophe.leroy@csgroup.eu>

This patch activates the capability to prepend default
arguments to the command line.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
 arch/powerpc/Kconfig | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 386ae12d8523..0ab406f14513 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -912,6 +912,12 @@ config CMDLINE_EXTEND
 	  The command-line arguments provided by the boot loader will be
 	  appended to the default kernel command string.
 
+config CMDLINE_PREPEND
+	bool "Prepend bootloader kernel arguments"
+	help
+	  The default kernel command string will be prepend to the
+	  command-line arguments provided during boot.
+
 config CMDLINE_FORCE
 	bool "Always use the default kernel command string"
 	help
-- 
2.25.0


^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox