* Re: [PATCH] powerpc/features: Remove CPU_FTR_NODSISRALIGN
From: Michael Ellerman @ 2020-10-14 11:00 UTC (permalink / raw)
To: Aneesh Kumar K.V, Christophe Leroy, Benjamin Herrenschmidt,
Paul Mackerras
Cc: linuxppc-dev, linux-kernel
In-Reply-To: <7b1f7fdd-0256-3370-13ab-d808b9fe9b02@linux.ibm.com>
"Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> writes:
> On 10/13/20 3:45 PM, Michael Ellerman wrote:
>> Christophe Leroy <christophe.leroy@csgroup.eu> writes:
>>> Le 13/10/2020 à 09:23, Aneesh Kumar K.V a écrit :
>>>> Christophe Leroy <christophe.leroy@csgroup.eu> writes:
>>>>
>>>>> CPU_FTR_NODSISRALIGN has not been used since
>>>>> commit 31bfdb036f12 ("powerpc: Use instruction emulation
>>>>> infrastructure to handle alignment faults")
>>>>>
>>>>> Remove it.
>>>>>
>>>>> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
>>>>> ---
>>>>> arch/powerpc/include/asm/cputable.h | 22 ++++++++++------------
>>>>> arch/powerpc/kernel/dt_cpu_ftrs.c | 8 --------
>>>>> arch/powerpc/kernel/prom.c | 2 +-
>>>>> 3 files changed, 11 insertions(+), 21 deletions(-)
>>>>>
>>>>> diff --git a/arch/powerpc/kernel/dt_cpu_ftrs.c b/arch/powerpc/kernel/dt_cpu_ftrs.c
>>>>> index 1098863e17ee..c598961d9f15 100644
>>>>> --- a/arch/powerpc/kernel/dt_cpu_ftrs.c
>>>>> +++ b/arch/powerpc/kernel/dt_cpu_ftrs.c
>>>>> @@ -273,13 +273,6 @@ static int __init feat_enable_idle_nap(struct dt_cpu_feature *f)
>>>>> return 1;
>>>>> }
>>>>>
>>>>> -static int __init feat_enable_align_dsisr(struct dt_cpu_feature *f)
>>>>> -{
>>>>> - cur_cpu_spec->cpu_features &= ~CPU_FTR_NODSISRALIGN;
>>>>> -
>>>>> - return 1;
>>>>> -}
>>>>> -
>>>>> static int __init feat_enable_idle_stop(struct dt_cpu_feature *f)
>>>>> {
>>>>> u64 lpcr;
>>>>> @@ -641,7 +634,6 @@ static struct dt_cpu_feature_match __initdata
>>>>> {"tm-suspend-hypervisor-assist", feat_enable, CPU_FTR_P9_TM_HV_ASSIST},
>>>>> {"tm-suspend-xer-so-bug", feat_enable, CPU_FTR_P9_TM_XER_SO_BUG},
>>>>> {"idle-nap", feat_enable_idle_nap, 0},
>>>>> - {"alignment-interrupt-dsisr", feat_enable_align_dsisr, 0},
>>
>> Rather than removing it entirely, I'd rather we left a comment, so that
>> it's obvious that we are ignoring that feature on purpose, not because
>> we forget about it.
>>
>> eg:
>>
>> diff --git a/arch/powerpc/kernel/dt_cpu_ftrs.c b/arch/powerpc/kernel/dt_cpu_ftrs.c
>> index f204ad79b6b5..45cb7e59bd13 100644
>> --- a/arch/powerpc/kernel/dt_cpu_ftrs.c
>> +++ b/arch/powerpc/kernel/dt_cpu_ftrs.c
>> @@ -640,7 +640,7 @@ static struct dt_cpu_feature_match __initdata
>> {"tm-suspend-hypervisor-assist", feat_enable, CPU_FTR_P9_TM_HV_ASSIST},
>> {"tm-suspend-xer-so-bug", feat_enable, CPU_FTR_P9_TM_XER_SO_BUG},
>> {"idle-nap", feat_enable_idle_nap, 0},
>> - {"alignment-interrupt-dsisr", feat_enable_align_dsisr, 0},
>> + // "alignment-interrupt-dsisr" ignored
>> {"idle-stop", feat_enable_idle_stop, 0},
>> {"machine-check-power8", feat_enable_mce_power8, 0},
>> {"performance-monitor-power8", feat_enable_pmu_power8, 0},
>>
>
>
> why not do it as
> static int __init feat_enable_align_dsisr(struct dt_cpu_feature *f)
> {
> /* This feature should not be enabled */
> #ifdef DEBUG
> WARN(1);
> #endif
>
> return 1;
> }
No one will ever turn that #define on.
No one will ever turn the feature on either.
Even if they did, it's not a bug because the kernel doesn't use the
DSISR for alignment interrupts any more.
All I want is to be able to compare the list of features defined in
skiboot vs the ones in Linux and see that none are missing in Linux.
cheers
^ permalink raw reply
* Re: [PATCH 20/20] arch: dts: Fix DWC USB3 DT nodes name
From: Krzysztof Kozlowski @ 2020-10-14 10:33 UTC (permalink / raw)
To: Serge Semin
Cc: Andrew Lunn, linux-usb, Neil Armstrong, Tony Lindgren,
Bjorn Andersson, Pavel Parkhomenko,
linux-samsung-soc@vger.kernel.org, Kevin Hilman, Gregory Clement,
Wei Xu, Chen-Yu Tsai, Kukjin Kim, Andy Gross, linux-arm-msm,
linux-snps-arc, Sebastian Hesselbarth, devicetree, Jason Cooper,
Mathias Nyman, linux-kernel@vger.kernel.org, Lad Prabhakar,
Maxime Ripard, Alexey Malahov, Rob Herring, Santosh Shilimkar,
linux-omap, linux-arm-kernel, Roger Quadros, Felipe Balbi,
linux-mips, Greg Kroah-Hartman, Yoshihiro Shimoda, linuxppc-dev,
Patrice Chotard, Serge Semin, Li Yang, Manu Gautam,
Benoît Cousson, Shawn Guo
In-Reply-To: <20201014101402.18271-21-Sergey.Semin@baikalelectronics.ru>
On Wed, 14 Oct 2020 at 12:23, Serge Semin
<Sergey.Semin@baikalelectronics.ru> wrote:
>
> In accordance with the DWC USB3 bindings the corresponding node name is
> suppose to comply with Generic USB HCD DT schema, which requires the USB
> nodes to have the name acceptable by the regexp: "^usb(@.*)?" . But a lot
> of the DWC USB3-compatible nodes defined in the ARM/ARM64 DTS files have
> name as "^dwc3@.*" or "^usb[1-3]@.*" or even "^dwusb@.*", which will cause
> the dtbs_check procedure failure. Let's fix the nodes naming to be
> compatible with the DWC USB3 DT schema to make dtbs_check happy.
>
> Note we don't change the DWC USB3-compatible nodes names of
> arch/arm64/boot/dts/apm/{apm-storm.dtsi,apm-shadowcat.dtsi} since the
> in-source comment says that the nodes name need to be preserved as
> "^dwusb@.*" for some backward compatibility.
>
> Signed-off-by: Serge Semin <Sergey.Semin@baikalelectronics.ru>
>
> ---
>
> Please, test the patch out to make sure it doesn't brake the dependent DTS
> files. I did only a manual grepping of the possible nodes dependencies.
1. It is you who should compare the decompiled DTS, not us. For example:
$ for i in dts-old/*/*dtb dts-old/*/*/*dtb; do echo $i; crosc64
scripts/dtc/dtx_diff ${i} dts-new/${i#dts-old/} ; done
$ for i in dts-old/*/*dtb dts-old/*/*/*dtb; do echo $i; crosc64
fdtdump ${i} > ${i}.fdt ; crosc64 fdtdump dts-new/${i#dts-old/} >
dts-new/${i#dts-old/}.fdt ; diff -ubB ${i}.fdt
dts-new/${i#dts-old/}.fdt ; done
2. Split it per arm architectures (and proper subject prefix - not
"arch") and subarchitectures so maintainers can pick it up.
3. The subject title could be more accurate - there is no fix here
because there was no errors in the first place. Requirement of DWC
node names comes recently, so it is more alignment with dtschema.
Otherwise automatic-pickup-stable-bot might want to pick up... and it
should not go to stable.
Best regards,
Krzysztof
> arch/arm/boot/dts/armada-375.dtsi | 2 +-
> arch/arm/boot/dts/exynos5250.dtsi | 2 +-
> arch/arm/boot/dts/exynos54xx.dtsi | 4 ++--
> arch/arm/boot/dts/keystone-k2e.dtsi | 4 ++--
> arch/arm/boot/dts/keystone.dtsi | 2 +-
> arch/arm/boot/dts/ls1021a.dtsi | 2 +-
> arch/arm/boot/dts/omap5-l4.dtsi | 2 +-
> arch/arm/boot/dts/stih407-family.dtsi | 2 +-
> arch/arm64/boot/dts/allwinner/sun50i-h6.dtsi | 2 +-
> arch/arm64/boot/dts/exynos/exynos5433.dtsi | 4 ++--
> arch/arm64/boot/dts/exynos/exynos7.dtsi | 2 +-
> arch/arm64/boot/dts/freescale/fsl-ls1012a.dtsi | 4 ++--
> arch/arm64/boot/dts/freescale/fsl-ls1043a.dtsi | 6 +++---
> arch/arm64/boot/dts/freescale/fsl-ls1088a.dtsi | 4 ++--
> arch/arm64/boot/dts/freescale/fsl-ls208xa.dtsi | 4 ++--
> arch/arm64/boot/dts/hisilicon/hi3660.dtsi | 2 +-
> arch/arm64/boot/dts/qcom/apq8096-db820c.dtsi | 4 ++--
> arch/arm64/boot/dts/qcom/ipq8074.dtsi | 4 ++--
> arch/arm64/boot/dts/qcom/msm8996.dtsi | 4 ++--
> arch/arm64/boot/dts/qcom/msm8998.dtsi | 2 +-
> arch/arm64/boot/dts/qcom/qcs404-evb.dtsi | 2 +-
> arch/arm64/boot/dts/qcom/qcs404.dtsi | 4 ++--
> arch/arm64/boot/dts/qcom/sc7180.dtsi | 2 +-
> arch/arm64/boot/dts/qcom/sdm845.dtsi | 4 ++--
> arch/arm64/boot/dts/qcom/sm8150.dtsi | 2 +-
> 25 files changed, 38 insertions(+), 38 deletions(-)
>
^ permalink raw reply
* [PATCH v2] powerpc/perf: Fix Threshold Event Counter Multiplier width for P10
From: Madhavan Srinivasan @ 2020-10-14 10:16 UTC (permalink / raw)
To: mpe; +Cc: atrajeev, linuxppc-dev, Madhavan Srinivasan
Threshold Event Counter Multiplier (TECM) is part of Monitor Mode
Control Register A (MMCRA). This field along with Threshold Event
Counter Exponent (TECE) is used to get threshould counter value.
ISA v3.1 has 7bit mantissa field for TECM, but in Power10, the
width of TECM field is increase to 8bits. Patch fixes the current
code to modify the MMCRA[TECM] extraction macro to handle this changes.
Fixes: 170a315f41c64 ('powerpc/perf: Support to export MMCRA[TEC*] field to userspace')
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
---
Changelog v1:
- Fixed the commit message
- Fixed the condition check
arch/powerpc/include/asm/reg.h | 1 +
arch/powerpc/perf/isa207-common.c | 3 +++
arch/powerpc/perf/isa207-common.h | 4 ++++
3 files changed, 8 insertions(+)
diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
index 88fb88491fe9..a1bb0ebb3a46 100644
--- a/arch/powerpc/include/asm/reg.h
+++ b/arch/powerpc/include/asm/reg.h
@@ -1355,6 +1355,7 @@
#define PVR_POWER9 0x004E
#define PVR_BE 0x0070
#define PVR_PA6T 0x0090
+#define PVR_POWER10 0x0080
/* "Logical" PVR values defined in PAPR, representing architecture levels */
#define PVR_ARCH_204 0x0f000001
diff --git a/arch/powerpc/perf/isa207-common.c b/arch/powerpc/perf/isa207-common.c
index 964437adec18..480bbe525904 100644
--- a/arch/powerpc/perf/isa207-common.c
+++ b/arch/powerpc/perf/isa207-common.c
@@ -247,6 +247,9 @@ void isa207_get_mem_weight(u64 *weight)
u64 sier = mfspr(SPRN_SIER);
u64 val = (sier & ISA207_SIER_TYPE_MASK) >> ISA207_SIER_TYPE_SHIFT;
+ if (pvr_version_is(PVR_POWER10))
+ mantissa = P10_MMCRA_THR_CTR_MANT(mmcra);
+
if (val == 0 || val == 7)
*weight = 0;
else
diff --git a/arch/powerpc/perf/isa207-common.h b/arch/powerpc/perf/isa207-common.h
index 044de65e96b9..71380e854f48 100644
--- a/arch/powerpc/perf/isa207-common.h
+++ b/arch/powerpc/perf/isa207-common.h
@@ -219,6 +219,10 @@
#define MMCRA_THR_CTR_EXP(v) (((v) >> MMCRA_THR_CTR_EXP_SHIFT) &\
MMCRA_THR_CTR_EXP_MASK)
+#define P10_MMCRA_THR_CTR_MANT_MASK 0xFFul
+#define P10_MMCRA_THR_CTR_MANT(v) (((v) >> MMCRA_THR_CTR_MANT_SHIFT) &\
+ P10_MMCRA_THR_CTR_MANT_MASK)
+
/* MMCRA Threshold Compare bit constant for power9 */
#define p9_MMCRA_THR_CMP_SHIFT 45
--
2.26.2
^ permalink raw reply related
* Re: [PATCH] soc: fsl: dpio: Change 'cpumask_t mask' to global variable
From: Laurentiu Tudor @ 2020-10-14 9:29 UTC (permalink / raw)
To: Yi Wang, Roy.Pledge, Youri Querry
Cc: jiang.xuexin, Hao Si, linux-kernel, Lin Chen, xue.zhihong,
Ioana Ciornei, leoyang.li, linuxppc-dev, linux-arm-kernel
In-Reply-To: <20201014072733.15175-1-wang.yi59@zte.com.cn>
Hi,
Thanks for finding this. Comment inline.
On 10/14/2020 10:27 AM, Yi Wang wrote:
> From: Hao Si <si.hao@zte.com.cn>
>
> The local variable 'cpumask_t mask' is in the stack memory, and its address
> is assigned to 'desc->affinity' in 'irq_set_affinity_hint()'.
> But the memory area where this variable is located is at risk of being
> modified.
>
> During LTP testing, the following error was generated:
>
> Unable to handle kernel paging request at virtual address ffff000012e9b790
> Mem abort info:
> ESR = 0x96000007
> Exception class = DABT (current EL), IL = 32 bits
> SET = 0, FnV = 0
> EA = 0, S1PTW = 0
> Data abort info:
> ISV = 0, ISS = 0x00000007
> CM = 0, WnR = 0
> swapper pgtable: 4k pages, 48-bit VAs, pgdp = 0000000075ac5e07
> [ffff000012e9b790] pgd=00000027dbffe003, pud=00000027dbffd003,
> pmd=00000027b6d61003, pte=0000000000000000
> Internal error: Oops: 96000007 [#1] PREEMPT SMP
> Modules linked in: xt_conntrack
> Process read_all (pid: 20171, stack limit = 0x0000000044ea4095)
> CPU: 14 PID: 20171 Comm: read_all Tainted: G B W
> Hardware name: NXP Layerscape LX2160ARDB (DT)
> pstate: 80000085 (Nzcv daIf -PAN -UAO)
> pc : irq_affinity_hint_proc_show+0x54/0xb0
> lr : irq_affinity_hint_proc_show+0x4c/0xb0
> sp : ffff00001138bc10
> x29: ffff00001138bc10 x28: 0000ffffd131d1e0
> x27: 00000000007000c0 x26: ffff8025b9480dc0
> x25: ffff8025b9480da8 x24: 00000000000003ff
> x23: ffff8027334f8300 x22: ffff80272e97d000
> x21: ffff80272e97d0b0 x20: ffff8025b9480d80
> x19: ffff000009a49000 x18: 0000000000000000
> x17: 0000000000000000 x16: 0000000000000000
> x15: 0000000000000000 x14: 0000000000000000
> x13: 0000000000000000 x12: 0000000000000040
> x11: 0000000000000000 x10: ffff802735b79b88
> x9 : 0000000000000000 x8 : 0000000000000000
> x7 : ffff000009a49848 x6 : 0000000000000003
> x5 : 0000000000000000 x4 : ffff000008157d6c
> x3 : ffff00001138bc10 x2 : ffff000012e9b790
> x1 : 0000000000000000 x0 : 0000000000000000
> Call trace:
> irq_affinity_hint_proc_show+0x54/0xb0
> seq_read+0x1b0/0x440
> proc_reg_read+0x80/0xd8
> __vfs_read+0x60/0x178
> vfs_read+0x94/0x150
> ksys_read+0x74/0xf0
> __arm64_sys_read+0x24/0x30
> el0_svc_common.constprop.0+0xd8/0x1a0
> el0_svc_handler+0x34/0x88
> el0_svc+0x10/0x14
> Code: f9001bbf 943e0732 f94066c2 b4000062 (f9400041)
> ---[ end trace b495bdcb0b3b732b ]---
> Kernel panic - not syncing: Fatal exception
> SMP: stopping secondary CPUs
> SMP: failed to stop secondary CPUs 0,2-4,6,8,11,13-15
> Kernel Offset: disabled
> CPU features: 0x0,21006008
> Memory Limit: none
> ---[ end Kernel panic - not syncing: Fatal exception ]---
>
> Fix it by changing 'cpumask_t mask' to global variable.
>
> Signed-off-by: Hao Si <si.hao@zte.com.cn>
> Signed-off-by: Lin Chen <chen.lin5@zte.com.cn>
> Signed-off-by: Yi Wang <wang.yi59@zte.com.cn>
> ---
> drivers/soc/fsl/dpio/dpio-driver.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/soc/fsl/dpio/dpio-driver.c b/drivers/soc/fsl/dpio/dpio-driver.c
> index 7b642c3..b31ec53 100644
> --- a/drivers/soc/fsl/dpio/dpio-driver.c
> +++ b/drivers/soc/fsl/dpio/dpio-driver.c
> @@ -31,6 +31,7 @@ struct dpio_priv {
> struct dpaa2_io *io;
> };
>
> +static cpumask_t mask;
There can be multiple dpio devices with their associated driver
instances so it's not ok to make the variable global. Please place it in
the driver's private data and while at it, please rename it to cpu_mask.
---
Thanks & Best Regards, Laurentiu
^ permalink raw reply
* [PATCH] powerpc/perf: Add support for programming of Thresholding in P10
From: Kajol Jain @ 2020-10-14 8:44 UTC (permalink / raw)
To: mpe; +Cc: kjain, atrajeev, maddy, linuxppc-dev
Thresholding, a performance monitoring unit feature, can be
used to identify marked instructions which take more than
expected cycles between start event and end event.
Threshold compare (thresh_cmp) bits are programmed in MMCRA
register. In Power9, thresh_cmp bits were part of the
event code. But in case of P10, thresh_cmp are not part of
event code due to inclusion of MMCR3 bits.
Patch here adds an option to use attr.config1 variable
to be used to pass thresh_cmp value to be programmed in
MMCRA register. A new ppmu flag called PPMU_HAS_ATTR_CONFIG1
has been added and this flag is used to notify the use of
attr.config1 variable. Secondly, thresh_cmp bits are not
part of the group constraints.
Patch has extended the parameter list of 'compute_mmcr',
to include power_pmu's 'flags' element.
command#: cat /sys/devices/cpu/format/thresh_cmp
config1:0-17
ex. usage:
command#: perf record -I --weight -d -e
cpu/event=0x67340101EC,thresh_cmp=500/ ./ebizzy -S 2 -t 1 -s 4096
1826636 records/s
real 2.00 s
user 2.00 s
sys 0.00 s
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.038 MB perf.data (61 samples) ]
Signed-off-by: Kajol Jain<kjain@linux.ibm.com>
---
arch/powerpc/include/asm/perf_event_server.h | 4 +-
arch/powerpc/perf/core-book3s.c | 2 +-
arch/powerpc/perf/isa207-common.c | 54 ++++++++++++++++++--
arch/powerpc/perf/isa207-common.h | 6 ++-
arch/powerpc/perf/mpc7450-pmu.c | 3 +-
arch/powerpc/perf/power10-pmu.c | 4 +-
arch/powerpc/perf/power5+-pmu.c | 3 +-
arch/powerpc/perf/power5-pmu.c | 3 +-
arch/powerpc/perf/power6-pmu.c | 3 +-
arch/powerpc/perf/power7-pmu.c | 3 +-
arch/powerpc/perf/ppc970-pmu.c | 3 +-
11 files changed, 73 insertions(+), 15 deletions(-)
diff --git a/arch/powerpc/include/asm/perf_event_server.h b/arch/powerpc/include/asm/perf_event_server.h
index f6acabb6c9be..566e41d4ff6a 100644
--- a/arch/powerpc/include/asm/perf_event_server.h
+++ b/arch/powerpc/include/asm/perf_event_server.h
@@ -34,9 +34,10 @@ struct power_pmu {
int max_alternatives;
unsigned long add_fields;
unsigned long test_adder;
+ /* TODO: parameter list is growing and should consider folding all this to a struct */
int (*compute_mmcr)(u64 events[], int n_ev,
unsigned int hwc[], struct mmcr_regs *mmcr,
- struct perf_event *pevents[]);
+ struct perf_event *pevents[], u32 flags);
int (*get_constraint)(u64 event_id, unsigned long *mskp,
unsigned long *valp);
int (*get_alternatives)(u64 event_id, unsigned int flags,
@@ -82,6 +83,7 @@ struct power_pmu {
#define PPMU_ARCH_207S 0x00000080 /* PMC is architecture v2.07S */
#define PPMU_NO_SIAR 0x00000100 /* Do not use SIAR */
#define PPMU_ARCH_31 0x00000200 /* Has MMCR3, SIER2 and SIER3 */
+#define PPMU_HAS_ATTR_CONFIG1 0x00000400 /* Using config1 attribute */
/*
* Values for flags to get_alternatives()
diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
index 08643cba1494..e4c817f64566 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -1356,7 +1356,7 @@ static void power_pmu_enable(struct pmu *pmu)
memset(&cpuhw->mmcr, 0, sizeof(cpuhw->mmcr));
if (ppmu->compute_mmcr(cpuhw->events, cpuhw->n_events, hwc_index,
- &cpuhw->mmcr, cpuhw->event)) {
+ &cpuhw->mmcr, cpuhw->event, ppmu->flags)) {
/* shouldn't ever get here */
printk(KERN_ERR "oops compute_mmcr failed\n");
goto out;
diff --git a/arch/powerpc/perf/isa207-common.c b/arch/powerpc/perf/isa207-common.c
index 964437adec18..cfad88b7ca85 100644
--- a/arch/powerpc/perf/isa207-common.c
+++ b/arch/powerpc/perf/isa207-common.c
@@ -110,10 +110,48 @@ static void mmcra_sdar_mode(u64 event, unsigned long *mmcra)
static u64 thresh_cmp_val(u64 value)
{
- if (cpu_has_feature(CPU_FTR_ARCH_300))
- return value << p9_MMCRA_THR_CMP_SHIFT;
+ int exp = 0;
+ u64 result = value;
+
+ if (!value)
+ return value;
+
+ /*
+ * Incase of P10, thresh_cmp value is not part of raw event code
+ * and provided via attr.config1 parameter. To program threshold in MMCRA,
+ * take a 18 bit number N and shift right 2 places and increment
+ * the exponent E by 1 until the upper 10 bits of N are zero.
+ * Write E to the threshold exponent and write the lower 8 bits of N
+ * to the threshold mantissa.
+ * The max threshold that can be written is 261120.
+ */
+ if (cpu_has_feature(CPU_FTR_ARCH_31)) {
+ if (value > 261120)
+ value = 261120;
+ while ((64 - __builtin_clzl(value)) > 8) {
+ exp++;
+ value >>= 2;
+ }
+
+ /*
+ * Note that it is invalid to write a mantissa with the
+ * upper 2 bits of mantissa being zero, unless the
+ * exponent is also zero.
+ */
+ if (!(value & 0xC0) && exp)
+ result = 0;
+ else
+ result = (exp << 8) | value;
+ }
- return value << MMCRA_THR_CMP_SHIFT;
+ /*
+ * Since location of threshold compare bits in MMCRA
+ * is different for p8, using different shift value.
+ */
+ if (cpu_has_feature(CPU_FTR_ARCH_300))
+ return result << p9_MMCRA_THR_CMP_SHIFT;
+ else
+ return result << MMCRA_THR_CMP_SHIFT;
}
static unsigned long combine_from_event(u64 event)
@@ -144,7 +182,9 @@ static bool is_thresh_cmp_valid(u64 event)
/*
* Check the mantissa upper two bits are not zero, unless the
* exponent is also zero. See the THRESH_CMP_MANTISSA doc.
- * Power10: thresh_cmp is replaced by l2_l3 event select.
+ * Power10: thresh_cmp bits are not part of raw event code.
+ * attr.config1 is used to get thresh_cmp value from user.
+ * And hence thresh_cmp is not part of group constraint.
*/
if (cpu_has_feature(CPU_FTR_ARCH_31))
return false;
@@ -386,7 +426,7 @@ int isa207_get_constraint(u64 event, unsigned long *maskp, unsigned long *valp)
int isa207_compute_mmcr(u64 event[], int n_ev,
unsigned int hwc[], struct mmcr_regs *mmcr,
- struct perf_event *pevents[])
+ struct perf_event *pevents[], u32 flags)
{
unsigned long mmcra, mmcr1, mmcr2, unit, combine, psel, cache, val;
unsigned long mmcr3;
@@ -472,6 +512,10 @@ int isa207_compute_mmcr(u64 event[], int n_ev,
val = (event[i] >> EVENT_THR_CMP_SHIFT) &
EVENT_THR_CMP_MASK;
mmcra |= thresh_cmp_val(val);
+ } else if (flags & PPMU_HAS_ATTR_CONFIG1) {
+ val = (pevents[i]->attr.config1 >> p10_EVENT_THR_CMP_SHIFT) &
+ p10_EVENT_THR_CMP_MASK;
+ mmcra |= thresh_cmp_val(val);
}
}
diff --git a/arch/powerpc/perf/isa207-common.h b/arch/powerpc/perf/isa207-common.h
index 044de65e96b9..8d365dccce50 100644
--- a/arch/powerpc/perf/isa207-common.h
+++ b/arch/powerpc/perf/isa207-common.h
@@ -100,6 +100,10 @@
#define p10_EVENT_MMCR3_MASK 0x7fffull
#define p10_EVENT_MMCR3_SHIFT 45
+/* Event Threshold Compare bit constant for power10 in config1 attribute */
+#define p10_EVENT_THR_CMP_SHIFT 0
+#define p10_EVENT_THR_CMP_MASK 0x3FFFFull
+
#define p10_EVENT_VALID_MASK \
((p10_SDAR_MODE_MASK << p10_SDAR_MODE_SHIFT | \
(p10_EVENT_THRESH_MASK << EVENT_THRESH_SHIFT) | \
@@ -249,7 +253,7 @@
int isa207_get_constraint(u64 event, unsigned long *maskp, unsigned long *valp);
int isa207_compute_mmcr(u64 event[], int n_ev,
unsigned int hwc[], struct mmcr_regs *mmcr,
- struct perf_event *pevents[]);
+ struct perf_event *pevents[], u32 flags);
void isa207_disable_pmc(unsigned int pmc, struct mmcr_regs *mmcr);
int isa207_get_alternatives(u64 event, u64 alt[], int size, unsigned int flags,
const unsigned int ev_alt[][MAX_ALT]);
diff --git a/arch/powerpc/perf/mpc7450-pmu.c b/arch/powerpc/perf/mpc7450-pmu.c
index 1919e9df9165..50612b1fafb9 100644
--- a/arch/powerpc/perf/mpc7450-pmu.c
+++ b/arch/powerpc/perf/mpc7450-pmu.c
@@ -258,7 +258,8 @@ static const u32 pmcsel_mask[N_COUNTER] = {
*/
static int mpc7450_compute_mmcr(u64 event[], int n_ev, unsigned int hwc[],
struct mmcr_regs *mmcr,
- struct perf_event *pevents[])
+ struct perf_event *pevents[],
+ u32 flags __maybe_unused)
{
u8 event_index[N_CLASSES][N_COUNTER];
int n_classevent[N_CLASSES];
diff --git a/arch/powerpc/perf/power10-pmu.c b/arch/powerpc/perf/power10-pmu.c
index 83148656b524..a838da409570 100644
--- a/arch/powerpc/perf/power10-pmu.c
+++ b/arch/powerpc/perf/power10-pmu.c
@@ -176,6 +176,7 @@ PMU_FORMAT_ATTR(src_sel, "config:45-46");
PMU_FORMAT_ATTR(invert_bit, "config:47");
PMU_FORMAT_ATTR(src_mask, "config:48-53");
PMU_FORMAT_ATTR(src_match, "config:54-59");
+PMU_FORMAT_ATTR(thresh_cmp, "config1:0-17");
static struct attribute *power10_pmu_format_attr[] = {
&format_attr_event.attr,
@@ -195,6 +196,7 @@ static struct attribute *power10_pmu_format_attr[] = {
&format_attr_invert_bit.attr,
&format_attr_src_mask.attr,
&format_attr_src_match.attr,
+ &format_attr_thresh_cmp.attr,
NULL,
};
@@ -393,7 +395,7 @@ static struct power_pmu power10_pmu = {
.get_mem_weight = isa207_get_mem_weight,
.disable_pmc = isa207_disable_pmc,
.flags = PPMU_HAS_SIER | PPMU_ARCH_207S |
- PPMU_ARCH_31,
+ PPMU_ARCH_31 | PPMU_HAS_ATTR_CONFIG1,
.n_generic = ARRAY_SIZE(power10_generic_events),
.generic_events = power10_generic_events,
.cache_events = &power10_cache_events,
diff --git a/arch/powerpc/perf/power5+-pmu.c b/arch/powerpc/perf/power5+-pmu.c
index a62b2cd7914f..6eeeb4df523d 100644
--- a/arch/powerpc/perf/power5+-pmu.c
+++ b/arch/powerpc/perf/power5+-pmu.c
@@ -449,7 +449,8 @@ static int power5p_marked_instr_event(u64 event)
static int power5p_compute_mmcr(u64 event[], int n_ev,
unsigned int hwc[], struct mmcr_regs *mmcr,
- struct perf_event *pevents[])
+ struct perf_event *pevents[],
+ u32 flags __maybe_unused)
{
unsigned long mmcr1 = 0;
unsigned long mmcra = 0;
diff --git a/arch/powerpc/perf/power5-pmu.c b/arch/powerpc/perf/power5-pmu.c
index 8732b587cf71..190973b2d394 100644
--- a/arch/powerpc/perf/power5-pmu.c
+++ b/arch/powerpc/perf/power5-pmu.c
@@ -380,7 +380,8 @@ static int power5_marked_instr_event(u64 event)
static int power5_compute_mmcr(u64 event[], int n_ev,
unsigned int hwc[], struct mmcr_regs *mmcr,
- struct perf_event *pevents[])
+ struct perf_event *pevents[],
+ u32 flags __maybe_unused)
{
unsigned long mmcr1 = 0;
unsigned long mmcra = MMCRA_SDAR_DCACHE_MISS | MMCRA_SDAR_ERAT_MISS;
diff --git a/arch/powerpc/perf/power6-pmu.c b/arch/powerpc/perf/power6-pmu.c
index 0e318cf87129..579606315b4d 100644
--- a/arch/powerpc/perf/power6-pmu.c
+++ b/arch/powerpc/perf/power6-pmu.c
@@ -171,7 +171,8 @@ static int power6_marked_instr_event(u64 event)
* Assign PMC numbers and compute MMCR1 value for a set of events
*/
static int p6_compute_mmcr(u64 event[], int n_ev,
- unsigned int hwc[], struct mmcr_regs *mmcr, struct perf_event *pevents[])
+ unsigned int hwc[], struct mmcr_regs *mmcr, struct perf_event *pevents[],
+ u32 flags __maybe_unused)
{
unsigned long mmcr1 = 0;
unsigned long mmcra = MMCRA_SDAR_DCACHE_MISS | MMCRA_SDAR_ERAT_MISS;
diff --git a/arch/powerpc/perf/power7-pmu.c b/arch/powerpc/perf/power7-pmu.c
index 5e0bf09cf077..a929917f7259 100644
--- a/arch/powerpc/perf/power7-pmu.c
+++ b/arch/powerpc/perf/power7-pmu.c
@@ -243,7 +243,8 @@ static int power7_marked_instr_event(u64 event)
static int power7_compute_mmcr(u64 event[], int n_ev,
unsigned int hwc[], struct mmcr_regs *mmcr,
- struct perf_event *pevents[])
+ struct perf_event *pevents[],
+ u32 flags __maybe_unused)
{
unsigned long mmcr1 = 0;
unsigned long mmcra = MMCRA_SDAR_DCACHE_MISS | MMCRA_SDAR_ERAT_MISS;
diff --git a/arch/powerpc/perf/ppc970-pmu.c b/arch/powerpc/perf/ppc970-pmu.c
index d35223fb112c..4a0b1eeab643 100644
--- a/arch/powerpc/perf/ppc970-pmu.c
+++ b/arch/powerpc/perf/ppc970-pmu.c
@@ -254,7 +254,8 @@ static int p970_get_alternatives(u64 event, unsigned int flags, u64 alt[])
static int p970_compute_mmcr(u64 event[], int n_ev,
unsigned int hwc[], struct mmcr_regs *mmcr,
- struct perf_event *pevents[])
+ struct perf_event *pevents[],
+ u32 flags __maybe_unused)
{
unsigned long mmcr0 = 0, mmcr1 = 0, mmcra = 0;
unsigned int pmc, unit, byte, psel;
--
2.26.2
^ permalink raw reply related
* [PATCH] soc: fsl: dpio: Change 'cpumask_t mask' to global variable
From: Yi Wang @ 2020-10-14 7:27 UTC (permalink / raw)
To: Roy.Pledge
Cc: wang.yi59, jiang.xuexin, Hao Si, linux-kernel, leoyang.li,
xue.zhihong, Lin Chen, linuxppc-dev, linux-arm-kernel
[-- Attachment #1.1: Type: text/plain, Size: 3434 bytes --]
From: Hao Si <si.hao@zte.com.cn>
The local variable 'cpumask_t mask' is in the stack memory, and its address
is assigned to 'desc->affinity' in 'irq_set_affinity_hint()'.
But the memory area where this variable is located is at risk of being
modified.
During LTP testing, the following error was generated:
Unable to handle kernel paging request at virtual address ffff000012e9b790
Mem abort info:
ESR = 0x96000007
Exception class = DABT (current EL), IL = 32 bits
SET = 0, FnV = 0
EA = 0, S1PTW = 0
Data abort info:
ISV = 0, ISS = 0x00000007
CM = 0, WnR = 0
swapper pgtable: 4k pages, 48-bit VAs, pgdp = 0000000075ac5e07
[ffff000012e9b790] pgd=00000027dbffe003, pud=00000027dbffd003,
pmd=00000027b6d61003, pte=0000000000000000
Internal error: Oops: 96000007 [#1] PREEMPT SMP
Modules linked in: xt_conntrack
Process read_all (pid: 20171, stack limit = 0x0000000044ea4095)
CPU: 14 PID: 20171 Comm: read_all Tainted: G B W
Hardware name: NXP Layerscape LX2160ARDB (DT)
pstate: 80000085 (Nzcv daIf -PAN -UAO)
pc : irq_affinity_hint_proc_show+0x54/0xb0
lr : irq_affinity_hint_proc_show+0x4c/0xb0
sp : ffff00001138bc10
x29: ffff00001138bc10 x28: 0000ffffd131d1e0
x27: 00000000007000c0 x26: ffff8025b9480dc0
x25: ffff8025b9480da8 x24: 00000000000003ff
x23: ffff8027334f8300 x22: ffff80272e97d000
x21: ffff80272e97d0b0 x20: ffff8025b9480d80
x19: ffff000009a49000 x18: 0000000000000000
x17: 0000000000000000 x16: 0000000000000000
x15: 0000000000000000 x14: 0000000000000000
x13: 0000000000000000 x12: 0000000000000040
x11: 0000000000000000 x10: ffff802735b79b88
x9 : 0000000000000000 x8 : 0000000000000000
x7 : ffff000009a49848 x6 : 0000000000000003
x5 : 0000000000000000 x4 : ffff000008157d6c
x3 : ffff00001138bc10 x2 : ffff000012e9b790
x1 : 0000000000000000 x0 : 0000000000000000
Call trace:
irq_affinity_hint_proc_show+0x54/0xb0
seq_read+0x1b0/0x440
proc_reg_read+0x80/0xd8
__vfs_read+0x60/0x178
vfs_read+0x94/0x150
ksys_read+0x74/0xf0
__arm64_sys_read+0x24/0x30
el0_svc_common.constprop.0+0xd8/0x1a0
el0_svc_handler+0x34/0x88
el0_svc+0x10/0x14
Code: f9001bbf 943e0732 f94066c2 b4000062 (f9400041)
---[ end trace b495bdcb0b3b732b ]---
Kernel panic - not syncing: Fatal exception
SMP: stopping secondary CPUs
SMP: failed to stop secondary CPUs 0,2-4,6,8,11,13-15
Kernel Offset: disabled
CPU features: 0x0,21006008
Memory Limit: none
---[ end Kernel panic - not syncing: Fatal exception ]---
Fix it by changing 'cpumask_t mask' to global variable.
Signed-off-by: Hao Si <si.hao@zte.com.cn>
Signed-off-by: Lin Chen <chen.lin5@zte.com.cn>
Signed-off-by: Yi Wang <wang.yi59@zte.com.cn>
---
drivers/soc/fsl/dpio/dpio-driver.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/soc/fsl/dpio/dpio-driver.c b/drivers/soc/fsl/dpio/dpio-driver.c
index 7b642c3..b31ec53 100644
--- a/drivers/soc/fsl/dpio/dpio-driver.c
+++ b/drivers/soc/fsl/dpio/dpio-driver.c
@@ -31,6 +31,7 @@ struct dpio_priv {
struct dpaa2_io *io;
};
+static cpumask_t mask;
static cpumask_var_t cpus_unused_mask;
static const struct soc_device_attribute ls1088a_soc[] = {
@@ -95,7 +96,6 @@ static int register_dpio_irq_handlers(struct fsl_mc_device *dpio_dev, int cpu)
{
int error;
struct fsl_mc_device_irq *irq;
- cpumask_t mask;
irq = dpio_dev->irqs[0];
error = devm_request_irq(&dpio_dev->dev,
--
2.15.2
^ permalink raw reply related
* Re: [PATCH v5 1/5] powerpc/sstep: Emulate prefixed instructions only when CPU_FTR_ARCH_31 is set
From: Ravi Bangoria @ 2020-10-14 7:34 UTC (permalink / raw)
To: Daniel Axtens
Cc: Ravi Bangoria, bala24, paulus, sandipan, jniethe5, naveen.n.rao,
linuxppc-dev
In-Reply-To: <877drvwocx.fsf@dja-thinkpad.axtens.net>
Hi Daniel,
On 10/12/20 7:14 PM, Daniel Axtens wrote:
> Hi,
>
> To review this, I looked through the supported instructions to see if
> there were any that I thought might have been missed.
>
> I didn't find any other v3.1 ones, although I don't have a v3.1 ISA to
> hand so I was basically looking for instructions I didn't recognise.
>
> I did, however, find a number of instructions that are new in ISA 3.0
> that aren't guarded:
>
> - addpcis
> - lxvl/stxvl
> - lxvll/stxvll
> - lxvwsx
> - stxvx
> - lxsipzx
> - lxvh8x
> - lxsihzx
> - lxvb16x/stxvb16x
> - stxsibx
> - stxsihx
> - lxvb16x
> - lxsd/stxsd
> - lxssp/stxssp
> - lxv/stxv
>
> Also, I don't know how bothered we are about P7, but if I'm reading the
> ISA correctly, lqarx/stqcx. are not supported before ISA 2.07. Likewise
> a number of the vector instructions like lxsiwzx and lxsiwax (and the
> companion stores).
>
> I realise it's not really the point of this particular patch, so I don't
> think this should block acceptance. What I would like to know - and
> maybe this is something where we need mpe to weigh in - is whether we
> need consistent guards for 2.07 and 3.0. We have some 3.0 guards already
> but clearly not everywhere.
Yes, those needs to be handled properly. Otherwise they can be harmful
when emulated on a non-supporting platform. Will work on it when I get
some time. Thanks for reporting it.
>
> With all that said - the patch does what it says it does, and looks good
> to me:
>
> Reviewed-by: Daniel Axtens <dja@axtens.net>
Thanks!
Ravi
^ permalink raw reply
* [PATCH v4 2/2] powerpc/64s: Convert some cpu_setup() and cpu_restore() functions to C
From: Jordan Niethe @ 2020-10-14 7:28 UTC (permalink / raw)
To: linuxppc-dev; +Cc: oohall, npiggin, Jordan Niethe
In-Reply-To: <20201014072837.24539-1-jniethe5@gmail.com>
The only thing keeping the cpu_setup() and cpu_restore() functions used
in the cputable entries for Power7, Power8, Power9 and Power10 in
assembly was cpu_restore() being called before there was a stack in
generic_secondary_smp_init(). Commit ("powerpc/64: Set up a kernel stack
for secondaries before cpu_restore()") means that it is now possible to
use C.
Rewrite the functions in C so they are a little bit easier to read. This
is not changing their functionality.
Signed-off-by: Jordan Niethe <jniethe5@gmail.com>
---
arch/powerpc/include/asm/cpu_setup_power.h | 12 +
arch/powerpc/kernel/cpu_setup_power.S | 252 -------------------
arch/powerpc/kernel/cpu_setup_power.c | 269 +++++++++++++++++++++
arch/powerpc/kernel/cputable.c | 12 +-
4 files changed, 285 insertions(+), 260 deletions(-)
create mode 100644 arch/powerpc/include/asm/cpu_setup_power.h
delete mode 100644 arch/powerpc/kernel/cpu_setup_power.S
create mode 100644 arch/powerpc/kernel/cpu_setup_power.c
diff --git a/arch/powerpc/include/asm/cpu_setup_power.h b/arch/powerpc/include/asm/cpu_setup_power.h
new file mode 100644
index 000000000000..24be9131f803
--- /dev/null
+++ b/arch/powerpc/include/asm/cpu_setup_power.h
@@ -0,0 +1,12 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Copyright (C) 2020 IBM Corporation
+ */
+void __setup_cpu_power7(unsigned long offset, struct cpu_spec *spec);
+void __restore_cpu_power7(void);
+void __setup_cpu_power8(unsigned long offset, struct cpu_spec *spec);
+void __restore_cpu_power8(void);
+void __setup_cpu_power9(unsigned long offset, struct cpu_spec *spec);
+void __restore_cpu_power9(void);
+void __setup_cpu_power10(unsigned long offset, struct cpu_spec *spec);
+void __restore_cpu_power10(void);
diff --git a/arch/powerpc/kernel/cpu_setup_power.S b/arch/powerpc/kernel/cpu_setup_power.S
deleted file mode 100644
index 704e8b9501ee..000000000000
--- a/arch/powerpc/kernel/cpu_setup_power.S
+++ /dev/null
@@ -1,252 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0-or-later */
-/*
- * This file contains low level CPU setup functions.
- * Copyright (C) 2003 Benjamin Herrenschmidt (benh@kernel.crashing.org)
- */
-
-#include <asm/processor.h>
-#include <asm/page.h>
-#include <asm/cputable.h>
-#include <asm/ppc_asm.h>
-#include <asm/asm-offsets.h>
-#include <asm/cache.h>
-#include <asm/book3s/64/mmu-hash.h>
-
-/* Entry: r3 = crap, r4 = ptr to cputable entry
- *
- * Note that we can be called twice for pseudo-PVRs
- */
-_GLOBAL(__setup_cpu_power7)
- mflr r11
- bl __init_hvmode_206
- mtlr r11
- beqlr
- li r0,0
- mtspr SPRN_LPID,r0
- LOAD_REG_IMMEDIATE(r0, PCR_MASK)
- mtspr SPRN_PCR,r0
- mfspr r3,SPRN_LPCR
- li r4,(LPCR_LPES1 >> LPCR_LPES_SH)
- bl __init_LPCR_ISA206
- mtlr r11
- blr
-
-_GLOBAL(__restore_cpu_power7)
- mflr r11
- mfmsr r3
- rldicl. r0,r3,4,63
- beqlr
- li r0,0
- mtspr SPRN_LPID,r0
- LOAD_REG_IMMEDIATE(r0, PCR_MASK)
- mtspr SPRN_PCR,r0
- mfspr r3,SPRN_LPCR
- li r4,(LPCR_LPES1 >> LPCR_LPES_SH)
- bl __init_LPCR_ISA206
- mtlr r11
- blr
-
-_GLOBAL(__setup_cpu_power8)
- mflr r11
- bl __init_FSCR
- bl __init_PMU
- bl __init_PMU_ISA207
- bl __init_hvmode_206
- mtlr r11
- beqlr
- li r0,0
- mtspr SPRN_LPID,r0
- LOAD_REG_IMMEDIATE(r0, PCR_MASK)
- mtspr SPRN_PCR,r0
- mfspr r3,SPRN_LPCR
- ori r3, r3, LPCR_PECEDH
- li r4,0 /* LPES = 0 */
- bl __init_LPCR_ISA206
- bl __init_HFSCR
- bl __init_PMU_HV
- bl __init_PMU_HV_ISA207
- mtlr r11
- blr
-
-_GLOBAL(__restore_cpu_power8)
- mflr r11
- bl __init_FSCR
- bl __init_PMU
- bl __init_PMU_ISA207
- mfmsr r3
- rldicl. r0,r3,4,63
- mtlr r11
- beqlr
- li r0,0
- mtspr SPRN_LPID,r0
- LOAD_REG_IMMEDIATE(r0, PCR_MASK)
- mtspr SPRN_PCR,r0
- mfspr r3,SPRN_LPCR
- ori r3, r3, LPCR_PECEDH
- li r4,0 /* LPES = 0 */
- bl __init_LPCR_ISA206
- bl __init_HFSCR
- bl __init_PMU_HV
- bl __init_PMU_HV_ISA207
- mtlr r11
- blr
-
-_GLOBAL(__setup_cpu_power10)
- mflr r11
- bl __init_FSCR_power10
- bl __init_PMU
- bl __init_PMU_ISA31
- b 1f
-
-_GLOBAL(__setup_cpu_power9)
- mflr r11
- bl __init_FSCR_power9
- bl __init_PMU
-1: bl __init_hvmode_206
- mtlr r11
- beqlr
- li r0,0
- mtspr SPRN_PSSCR,r0
- mtspr SPRN_LPID,r0
- mtspr SPRN_PID,r0
- LOAD_REG_IMMEDIATE(r0, PCR_MASK)
- mtspr SPRN_PCR,r0
- mfspr r3,SPRN_LPCR
- LOAD_REG_IMMEDIATE(r4, LPCR_PECEDH | LPCR_PECE_HVEE | LPCR_HVICE | LPCR_HEIC)
- or r3, r3, r4
- LOAD_REG_IMMEDIATE(r4, LPCR_UPRT | LPCR_HR)
- andc r3, r3, r4
- li r4,0 /* LPES = 0 */
- bl __init_LPCR_ISA300
- bl __init_HFSCR
- bl __init_PMU_HV
- mtlr r11
- blr
-
-_GLOBAL(__restore_cpu_power10)
- mflr r11
- bl __init_FSCR_power10
- bl __init_PMU
- bl __init_PMU_ISA31
- b 1f
-
-_GLOBAL(__restore_cpu_power9)
- mflr r11
- bl __init_FSCR_power9
- bl __init_PMU
-1: mfmsr r3
- rldicl. r0,r3,4,63
- mtlr r11
- beqlr
- li r0,0
- mtspr SPRN_PSSCR,r0
- mtspr SPRN_LPID,r0
- mtspr SPRN_PID,r0
- LOAD_REG_IMMEDIATE(r0, PCR_MASK)
- mtspr SPRN_PCR,r0
- mfspr r3,SPRN_LPCR
- LOAD_REG_IMMEDIATE(r4, LPCR_PECEDH | LPCR_PECE_HVEE | LPCR_HVICE | LPCR_HEIC)
- or r3, r3, r4
- LOAD_REG_IMMEDIATE(r4, LPCR_UPRT | LPCR_HR)
- andc r3, r3, r4
- li r4,0 /* LPES = 0 */
- bl __init_LPCR_ISA300
- bl __init_HFSCR
- bl __init_PMU_HV
- mtlr r11
- blr
-
-__init_hvmode_206:
- /* Disable CPU_FTR_HVMODE and exit if MSR:HV is not set */
- mfmsr r3
- rldicl. r0,r3,4,63
- bnelr
- ld r5,CPU_SPEC_FEATURES(r4)
- LOAD_REG_IMMEDIATE(r6,CPU_FTR_HVMODE | CPU_FTR_P9_TM_HV_ASSIST)
- andc r5,r5,r6
- std r5,CPU_SPEC_FEATURES(r4)
- blr
-
-__init_LPCR_ISA206:
- /* Setup a sane LPCR:
- * Called with initial LPCR in R3 and desired LPES 2-bit value in R4
- *
- * LPES = 0b01 (HSRR0/1 used for 0x500)
- * PECE = 0b111
- * DPFD = 4
- * HDICE = 0
- * VC = 0b100 (VPM0=1, VPM1=0, ISL=0)
- * VRMASD = 0b10000 (L=1, LP=00)
- *
- * Other bits untouched for now
- */
- li r5,0x10
- rldimi r3,r5, LPCR_VRMASD_SH, 64-LPCR_VRMASD_SH-5
-
- /* POWER9 has no VRMASD */
-__init_LPCR_ISA300:
- rldimi r3,r4, LPCR_LPES_SH, 64-LPCR_LPES_SH-2
- ori r3,r3,(LPCR_PECE0|LPCR_PECE1|LPCR_PECE2)
- li r5,4
- rldimi r3,r5, LPCR_DPFD_SH, 64-LPCR_DPFD_SH-3
- clrrdi r3,r3,1 /* clear HDICE */
- li r5,4
- rldimi r3,r5, LPCR_VC_SH, 0
- mtspr SPRN_LPCR,r3
- isync
- blr
-
-__init_FSCR_power10:
- mfspr r3, SPRN_FSCR
- ori r3, r3, FSCR_PREFIX
- mtspr SPRN_FSCR, r3
- // fall through
-
-__init_FSCR_power9:
- mfspr r3, SPRN_FSCR
- ori r3, r3, FSCR_SCV
- mtspr SPRN_FSCR, r3
- // fall through
-
-__init_FSCR:
- mfspr r3,SPRN_FSCR
- ori r3,r3,FSCR_TAR|FSCR_EBB
- mtspr SPRN_FSCR,r3
- blr
-
-__init_HFSCR:
- mfspr r3,SPRN_HFSCR
- ori r3,r3,HFSCR_TAR|HFSCR_TM|HFSCR_BHRB|HFSCR_PM|\
- HFSCR_DSCR|HFSCR_VECVSX|HFSCR_FP|HFSCR_EBB|HFSCR_MSGP
- mtspr SPRN_HFSCR,r3
- blr
-
-__init_PMU_HV:
- li r5,0
- mtspr SPRN_MMCRC,r5
- blr
-
-__init_PMU_HV_ISA207:
- li r5,0
- mtspr SPRN_MMCRH,r5
- blr
-
-__init_PMU:
- li r5,0
- mtspr SPRN_MMCRA,r5
- mtspr SPRN_MMCR0,r5
- mtspr SPRN_MMCR1,r5
- mtspr SPRN_MMCR2,r5
- blr
-
-__init_PMU_ISA207:
- li r5,0
- mtspr SPRN_MMCRS,r5
- blr
-
-__init_PMU_ISA31:
- li r5,0
- mtspr SPRN_MMCR3,r5
- LOAD_REG_IMMEDIATE(r5, MMCRA_BHRB_DISABLE)
- mtspr SPRN_MMCRA,r5
- blr
diff --git a/arch/powerpc/kernel/cpu_setup_power.c b/arch/powerpc/kernel/cpu_setup_power.c
new file mode 100644
index 000000000000..cf5201b0579d
--- /dev/null
+++ b/arch/powerpc/kernel/cpu_setup_power.c
@@ -0,0 +1,269 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Copyright (C) 2020 IBM Corporation
+ * This file contains low level CPU setup functions.
+ * Originally written in assembly by
+ * Benjamin Herrenschmidt (benh@kernel.crashing.org)
+ */
+#include <asm/reg.h>
+#include <asm/synch.h>
+#include <linux/bitops.h>
+#include <asm/cputable.h>
+#include <asm/cpu_setup_power.h>
+
+/* Disable CPU_FTR_HVMODE and return false if MSR:HV is not set */
+static bool init_hvmode_206(struct cpu_spec *t)
+{
+ u64 msr;
+
+ msr = mfmsr();
+ if (msr & MSR_HV)
+ return true;
+
+ t->cpu_features &= ~(CPU_FTR_HVMODE | CPU_FTR_P9_TM_HV_ASSIST);
+ return false;
+}
+
+static void init_LPCR_ISA300(u64 lpcr, u64 lpes)
+{
+ /* POWER9 has no VRMASD */
+ lpcr |= (lpes << LPCR_LPES_SH) & LPCR_LPES;
+ lpcr |= LPCR_PECE0|LPCR_PECE1|LPCR_PECE2;
+ lpcr |= (4ull << LPCR_DPFD_SH) & LPCR_DPFD;
+ lpcr &= ~LPCR_HDICE; /* clear HDICE */
+ lpcr |= (4ull << LPCR_VC_SH);
+ mtspr(SPRN_LPCR, lpcr);
+ isync();
+}
+
+/*
+ * Setup a sane LPCR:
+ * Called with initial LPCR and desired LPES 2-bit value
+ *
+ * LPES = 0b01 (HSRR0/1 used for 0x500)
+ * PECE = 0b111
+ * DPFD = 4
+ * HDICE = 0
+ * VC = 0b100 (VPM0=1, VPM1=0, ISL=0)
+ * VRMASD = 0b10000 (L=1, LP=00)
+ *
+ * Other bits untouched for now
+ */
+static void init_LPCR_ISA206(u64 lpcr, u64 lpes)
+{
+ lpcr |= (0x10ull << LPCR_VRMASD_SH) & LPCR_VRMASD;
+ init_LPCR_ISA300(lpcr, lpes);
+}
+
+static void init_FSCR(void)
+{
+ u64 fscr;
+
+ fscr = mfspr(SPRN_FSCR);
+ fscr |= FSCR_TAR|FSCR_EBB;
+ mtspr(SPRN_FSCR, fscr);
+}
+
+static void init_FSCR_power9(void)
+{
+ u64 fscr;
+
+ fscr = mfspr(SPRN_FSCR);
+ fscr |= FSCR_SCV;
+ mtspr(SPRN_FSCR, fscr);
+ init_FSCR();
+}
+
+static void init_FSCR_power10(void)
+{
+ u64 fscr;
+
+ fscr = mfspr(SPRN_FSCR);
+ fscr |= FSCR_PREFIX;
+ mtspr(SPRN_FSCR, fscr);
+ init_FSCR_power9();
+}
+
+static void init_HFSCR(void)
+{
+ u64 hfscr;
+
+ hfscr = mfspr(SPRN_HFSCR);
+ hfscr |= HFSCR_TAR|HFSCR_TM|HFSCR_BHRB|HFSCR_PM|HFSCR_DSCR|\
+ HFSCR_VECVSX|HFSCR_FP|HFSCR_EBB|HFSCR_MSGP;
+ mtspr(SPRN_HFSCR, hfscr);
+}
+
+static void init_PMU_HV(void)
+{
+ mtspr(SPRN_MMCRC, 0);
+}
+
+static void init_PMU_HV_ISA207(void)
+{
+ mtspr(SPRN_MMCRH, 0);
+}
+
+static void init_PMU(void)
+{
+ mtspr(SPRN_MMCRA, 0);
+ mtspr(SPRN_MMCR0, 0);
+ mtspr(SPRN_MMCR1, 0);
+ mtspr(SPRN_MMCR2, 0);
+}
+
+static void init_PMU_ISA207(void)
+{
+ mtspr(SPRN_MMCRS, 0);
+}
+
+static void init_PMU_ISA31(void)
+{
+ mtspr(SPRN_MMCR3, 0);
+ mtspr(SPRN_MMCRA, MMCRA_BHRB_DISABLE);
+}
+
+/*
+ * Note that we can be called twice of pseudo-PVRs.
+ * The parameter offset is not used.
+ */
+
+void __setup_cpu_power7(unsigned long offset, struct cpu_spec *t)
+{
+ if (!init_hvmode_206(t))
+ return;
+
+ mtspr(SPRN_LPID, 0);
+ mtspr(SPRN_PCR, PCR_MASK);
+ init_LPCR_ISA206(mfspr(SPRN_LPCR), LPCR_LPES1 >> LPCR_LPES_SH);
+}
+
+void __restore_cpu_power7(void)
+{
+ u64 msr;
+
+ msr = mfmsr();
+ if (!(msr & MSR_HV))
+ return;
+
+ mtspr(SPRN_LPID, 0);
+ mtspr(SPRN_PCR, PCR_MASK);
+ init_LPCR_ISA206(mfspr(SPRN_LPCR), LPCR_LPES1 >> LPCR_LPES_SH);
+}
+
+void __setup_cpu_power8(unsigned long offset, struct cpu_spec *t)
+{
+ init_FSCR();
+ init_PMU();
+ init_PMU_ISA207();
+
+ if (!init_hvmode_206(t))
+ return;
+
+ mtspr(SPRN_LPID, 0);
+ mtspr(SPRN_PCR, PCR_MASK);
+ init_LPCR_ISA206(mfspr(SPRN_LPCR) | LPCR_PECEDH, 0); /* LPES = 0 */
+ init_HFSCR();
+ init_PMU_HV();
+ init_PMU_HV_ISA207();
+}
+
+void __restore_cpu_power8(void)
+{
+ u64 msr;
+
+ init_FSCR();
+ init_PMU();
+ init_PMU_ISA207();
+
+ msr = mfmsr();
+ if (!(msr & MSR_HV))
+ return;
+
+ mtspr(SPRN_LPID, 0);
+ mtspr(SPRN_PCR, PCR_MASK);
+ init_LPCR_ISA206(mfspr(SPRN_LPCR) | LPCR_PECEDH, 0); /* LPES = 0 */
+ init_HFSCR();
+ init_PMU_HV();
+ init_PMU_HV_ISA207();
+}
+
+void __setup_cpu_power9(unsigned long offset, struct cpu_spec *t)
+{
+ init_FSCR_power9();
+ init_PMU();
+
+ if (!init_hvmode_206(t))
+ return;
+
+ mtspr(SPRN_PSSCR, 0);
+ mtspr(SPRN_LPID, 0);
+ mtspr(SPRN_PID, 0);
+ mtspr(SPRN_PCR, PCR_MASK);
+ init_LPCR_ISA300((mfspr(SPRN_LPCR) | LPCR_PECEDH | LPCR_PECE_HVEE |\
+ LPCR_HVICE | LPCR_HEIC) & ~(LPCR_UPRT | LPCR_HR), 0);
+ init_HFSCR();
+ init_PMU_HV();
+}
+
+void __restore_cpu_power9(void)
+{
+ u64 msr;
+
+ init_FSCR_power9();
+ init_PMU();
+
+ msr = mfmsr();
+ if (!(msr & MSR_HV))
+ return;
+
+ mtspr(SPRN_PSSCR, 0);
+ mtspr(SPRN_LPID, 0);
+ mtspr(SPRN_PID, 0);
+ mtspr(SPRN_PCR, PCR_MASK);
+ init_LPCR_ISA300((mfspr(SPRN_LPCR) | LPCR_PECEDH | LPCR_PECE_HVEE |\
+ LPCR_HVICE | LPCR_HEIC) & ~(LPCR_UPRT | LPCR_HR), 0);
+ init_HFSCR();
+ init_PMU_HV();
+}
+
+void __setup_cpu_power10(unsigned long offset, struct cpu_spec *t)
+{
+ init_FSCR_power10();
+ init_PMU();
+ init_PMU_ISA31();
+
+ if (!init_hvmode_206(t))
+ return;
+
+ mtspr(SPRN_PSSCR, 0);
+ mtspr(SPRN_LPID, 0);
+ mtspr(SPRN_PID, 0);
+ mtspr(SPRN_PCR, PCR_MASK);
+ init_LPCR_ISA300((mfspr(SPRN_LPCR) | LPCR_PECEDH | LPCR_PECE_HVEE |\
+ LPCR_HVICE | LPCR_HEIC) & ~(LPCR_UPRT | LPCR_HR), 0);
+ init_HFSCR();
+ init_PMU_HV();
+}
+
+void __restore_cpu_power10(void)
+{
+ u64 msr;
+
+ init_FSCR_power10();
+ init_PMU();
+ init_PMU_ISA31();
+
+ msr = mfmsr();
+ if (!(msr & MSR_HV))
+ return;
+
+ mtspr(SPRN_PSSCR, 0);
+ mtspr(SPRN_LPID, 0);
+ mtspr(SPRN_PID, 0);
+ mtspr(SPRN_PCR, PCR_MASK);
+ init_LPCR_ISA300((mfspr(SPRN_LPCR) | LPCR_PECEDH | LPCR_PECE_HVEE |\
+ LPCR_HVICE | LPCR_HEIC) & ~(LPCR_UPRT | LPCR_HR), 0);
+ init_HFSCR();
+ init_PMU_HV();
+}
diff --git a/arch/powerpc/kernel/cputable.c b/arch/powerpc/kernel/cputable.c
index 2aa89c6b2896..26a56c9d6650 100644
--- a/arch/powerpc/kernel/cputable.c
+++ b/arch/powerpc/kernel/cputable.c
@@ -59,19 +59,15 @@ extern void __setup_cpu_7410(unsigned long offset, struct cpu_spec* spec);
extern void __setup_cpu_745x(unsigned long offset, struct cpu_spec* spec);
#endif /* CONFIG_PPC32 */
#ifdef CONFIG_PPC64
+#include <asm/cpu_setup_power.h>
extern void __setup_cpu_ppc970(unsigned long offset, struct cpu_spec* spec);
extern void __setup_cpu_ppc970MP(unsigned long offset, struct cpu_spec* spec);
extern void __setup_cpu_pa6t(unsigned long offset, struct cpu_spec* spec);
extern void __restore_cpu_pa6t(void);
extern void __restore_cpu_ppc970(void);
-extern void __setup_cpu_power7(unsigned long offset, struct cpu_spec* spec);
-extern void __restore_cpu_power7(void);
-extern void __setup_cpu_power8(unsigned long offset, struct cpu_spec* spec);
-extern void __restore_cpu_power8(void);
-extern void __setup_cpu_power9(unsigned long offset, struct cpu_spec* spec);
-extern void __restore_cpu_power9(void);
-extern void __setup_cpu_power10(unsigned long offset, struct cpu_spec* spec);
-extern void __restore_cpu_power10(void);
+extern long __machine_check_early_realmode_p7(struct pt_regs *regs);
+extern long __machine_check_early_realmode_p8(struct pt_regs *regs);
+extern long __machine_check_early_realmode_p9(struct pt_regs *regs);
#endif /* CONFIG_PPC64 */
#if defined(CONFIG_E500)
extern void __setup_cpu_e5500(unsigned long offset, struct cpu_spec* spec);
--
2.17.1
^ permalink raw reply related
* [PATCH v4 1/2] powerpc/64: Set up a kernel stack for secondaries before cpu_restore()
From: Jordan Niethe @ 2020-10-14 7:28 UTC (permalink / raw)
To: linuxppc-dev; +Cc: oohall, npiggin, Jordan Niethe
Currently in generic_secondary_smp_init(), cur_cpu_spec->cpu_restore()
is called before a stack has been set up in r1. This was previously fine
as the cpu_restore() functions were implemented in assembly and did not
use a stack. However commit 5a61ef74f269 ("powerpc/64s: Support new
device tree binding for discovering CPU features") used
__restore_cpu_cpufeatures() as the cpu_restore() function for a
device-tree features based cputable entry. This is a C function and
hence uses a stack in r1.
generic_secondary_smp_init() is entered on the secondary cpus via the
primary cpu using the OPAL call opal_start_cpu(). In OPAL, each hardware
thread has its own stack. The OPAL call is ran in the primary's hardware
thread. During the call, a job is scheduled on a secondary cpu that will
start executing at the address of generic_secondary_smp_init(). Hence
the value that will be left in r1 when the secondary cpu enters the
kernel is part of that secondary cpu's individual OPAL stack. This means
that __restore_cpu_cpufeatures() will write to that OPAL stack. This is
not horribly bad as each hardware thread has its own stack and the call
that enters the kernel from OPAL never returns, but it is still wrong
and should be corrected.
Create the temp kernel stack before calling cpu_restore().
As noted by mpe, for a kexec boot, the secondary CPUs are released from
the spin loop at address 0x60 by smp_release_cpus() and then jump to
generic_secondary_smp_init(). The call to smp_release_cpus() is in
setup_arch(), and it comes before the call to emergency_stack_init().
emergency_stack_init() allocates an emergency stack in the PACA for each
CPU. This address in the PACA is what is used to set up the temp kernel
stack in generic_secondary_smp_init(). Move releasing the secondary CPUs
to after the PACAs have been allocated an emergency stack, otherwise the
PACA stack pointer will contain garbage and hence the temp kernel stack
created from it will be broken.
Fixes: 5a61ef74f269 ("powerpc/64s: Support new device tree binding for discovering CPU features")
Signed-off-by: Jordan Niethe <jniethe5@gmail.com>
---
v2: Add more detail to the commit message
v3: Release secondary CPUs after the emergency stack is created
v4: No need to guard smp_release_cpus() with #ifdef CONFIG_SMP
---
arch/powerpc/kernel/head_64.S | 8 ++++----
arch/powerpc/kernel/setup-common.c | 4 ++--
2 files changed, 6 insertions(+), 6 deletions(-)
diff --git a/arch/powerpc/kernel/head_64.S b/arch/powerpc/kernel/head_64.S
index 0e05a9a47a4b..4b7f4c6c2600 100644
--- a/arch/powerpc/kernel/head_64.S
+++ b/arch/powerpc/kernel/head_64.S
@@ -420,6 +420,10 @@ generic_secondary_common_init:
/* From now on, r24 is expected to be logical cpuid */
mr r24,r5
+ /* Create a temp kernel stack for use before relocation is on. */
+ ld r1,PACAEMERGSP(r13)
+ subi r1,r1,STACK_FRAME_OVERHEAD
+
/* See if we need to call a cpu state restore handler */
LOAD_REG_ADDR(r23, cur_cpu_spec)
ld r23,0(r23)
@@ -448,10 +452,6 @@ generic_secondary_common_init:
sync /* order paca.run and cur_cpu_spec */
isync /* In case code patching happened */
- /* Create a temp kernel stack for use before relocation is on. */
- ld r1,PACAEMERGSP(r13)
- subi r1,r1,STACK_FRAME_OVERHEAD
-
b __secondary_start
#endif /* SMP */
diff --git a/arch/powerpc/kernel/setup-common.c b/arch/powerpc/kernel/setup-common.c
index 808ec9fab605..da8c71f321ad 100644
--- a/arch/powerpc/kernel/setup-common.c
+++ b/arch/powerpc/kernel/setup-common.c
@@ -919,8 +919,6 @@ void __init setup_arch(char **cmdline_p)
/* On BookE, setup per-core TLB data structures. */
setup_tlb_core_data();
-
- smp_release_cpus();
#endif
/* Print various info about the machine that has been gathered so far. */
@@ -944,6 +942,8 @@ void __init setup_arch(char **cmdline_p)
exc_lvl_early_init();
emergency_stack_init();
+ smp_release_cpus();
+
initmem_init();
early_memtest(min_low_pfn << PAGE_SHIFT, max_low_pfn << PAGE_SHIFT);
--
2.17.1
^ permalink raw reply related
* Re: [PATCH] powerpc/opal_elog: Handle multiple writes to ack attribute
From: Mahesh J Salgaonkar @ 2020-10-14 7:14 UTC (permalink / raw)
To: Aneesh Kumar K.V; +Cc: Oliver O'Halloran, linuxppc-dev
In-Reply-To: <20201014064813.109515-1-aneesh.kumar@linux.ibm.com>
On 2020-10-14 12:18:13 Wed, Aneesh Kumar K.V wrote:
> Even though we use self removing sysfs helper, we still need
> to make sure we do the final kobject delete conditionally.
> sysfs_remove_file_self() will handle parallel calls to remove
> the sysfs attribute file and returns true only in the caller
> that removed the attribute file. The other parallel callers
> are returned false. Do the final kobject delete checking
> the return value of sysfs_remove_file_self().
>
> Cc: Mahesh Salgaonkar <mahesh@linux.ibm.com>
> Cc: Oliver O'Halloran <oohall@gmail.com>
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> ---
> arch/powerpc/platforms/powernv/opal-elog.c | 11 ++++++++---
> 1 file changed, 8 insertions(+), 3 deletions(-)
>
> diff --git a/arch/powerpc/platforms/powernv/opal-elog.c b/arch/powerpc/platforms/powernv/opal-elog.c
> index 5e33b1fc67c2..37b380eef41a 100644
> --- a/arch/powerpc/platforms/powernv/opal-elog.c
> +++ b/arch/powerpc/platforms/powernv/opal-elog.c
> @@ -72,9 +72,14 @@ static ssize_t elog_ack_store(struct elog_obj *elog_obj,
> const char *buf,
> size_t count)
> {
> - opal_send_ack_elog(elog_obj->id);
> - sysfs_remove_file_self(&elog_obj->kobj, &attr->attr);
> - kobject_put(&elog_obj->kobj);
> + /*
> + * Try to self remove this attribute. If we are successful,
> + * delete the kobject itself.
> + */
> + if (sysfs_remove_file_self(&elog_obj->kobj, &attr->attr)) {
> + opal_send_ack_elog(elog_obj->id);
> + kobject_put(&elog_obj->kobj);
> + }
> return count;
> }
Looks good.
Reviewed-by: Mahesh Salgaonkar <mahesh@linux.ibm.com>
Thanks,
-Mahesh.
^ permalink raw reply
* [PATCH] powerpc/opal_elog: Handle multiple writes to ack attribute
From: Aneesh Kumar K.V @ 2020-10-14 6:48 UTC (permalink / raw)
To: linuxppc-dev, mpe
Cc: Aneesh Kumar K.V, Oliver O'Halloran, Mahesh Salgaonkar
Even though we use self removing sysfs helper, we still need
to make sure we do the final kobject delete conditionally.
sysfs_remove_file_self() will handle parallel calls to remove
the sysfs attribute file and returns true only in the caller
that removed the attribute file. The other parallel callers
are returned false. Do the final kobject delete checking
the return value of sysfs_remove_file_self().
Cc: Mahesh Salgaonkar <mahesh@linux.ibm.com>
Cc: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
arch/powerpc/platforms/powernv/opal-elog.c | 11 ++++++++---
1 file changed, 8 insertions(+), 3 deletions(-)
diff --git a/arch/powerpc/platforms/powernv/opal-elog.c b/arch/powerpc/platforms/powernv/opal-elog.c
index 5e33b1fc67c2..37b380eef41a 100644
--- a/arch/powerpc/platforms/powernv/opal-elog.c
+++ b/arch/powerpc/platforms/powernv/opal-elog.c
@@ -72,9 +72,14 @@ static ssize_t elog_ack_store(struct elog_obj *elog_obj,
const char *buf,
size_t count)
{
- opal_send_ack_elog(elog_obj->id);
- sysfs_remove_file_self(&elog_obj->kobj, &attr->attr);
- kobject_put(&elog_obj->kobj);
+ /*
+ * Try to self remove this attribute. If we are successful,
+ * delete the kobject itself.
+ */
+ if (sysfs_remove_file_self(&elog_obj->kobj, &attr->attr)) {
+ opal_send_ack_elog(elog_obj->id);
+ kobject_put(&elog_obj->kobj);
+ }
return count;
}
--
2.26.2
^ permalink raw reply related
* Re: [RFC v1 0/2] Plumbing to support multiple secure memory backends.
From: Christoph Hellwig @ 2020-10-14 6:31 UTC (permalink / raw)
To: Ram Pai; +Cc: bharata, linuxppc-dev, kvm-ppc, farosas
In-Reply-To: <1602487663-7321-1-git-send-email-linuxram@us.ibm.com>
Please don't add an abstraction without a second implementation.
Once we have the implementation we can consider the tradeoffs. E.g.
if expensive indirect function calls are really needed vs simple
branches.
^ permalink raw reply
* Re: [PATCH 05/14] fs: don't allow kernel reads and writes without iter ops
From: Christoph Hellwig @ 2020-10-14 5:51 UTC (permalink / raw)
To: Alexander Viro
Cc: linux-arch, linuxppc-dev, Kees Cook, the arch/x86 maintainers,
Linux Kernel Mailing List, Alexey Dobriyan, Eric Biggers,
Luis Chamberlain, Al Viro, linux-fsdevel, Linus Torvalds,
Christoph Hellwig
In-Reply-To: <20201010015524.GB101464@shell-el7.hosts.prod.upshift.rdu2.redhat.com>
On Sat, Oct 10, 2020 at 01:55:24AM +0000, Alexander Viro wrote:
> FWIW, I hadn't pushed that branch out (or merged it into #for-next yet);
> for one thing, uml part (mconsole) is simply broken, for another...
> IMO ##5--8 are asking for kernel_pread() and if you look at binfmt_elf.c,
> you'll see elf_read() being pretty much that. acct.c, keys and usermode
> parts are asking for kernel_pwrite() as well.
>
> I've got stuck looking through the drivers/target stuff - it would've
> been another kernel_pwrite() candidate, but it smells like its use of
> filp_open() is really asking for trouble, starting with symlink attacks.
> Not sure - I'm not familiar with the area, but...
Can you just pull in the minimal fix so that the branch gets fixed
for this merge window? All the cleanups can come later.
^ permalink raw reply
* Re: [PATCH] powerpc/perf: fix Threshold Event CounterMultiplier width for P10
From: Madhavan Srinivasan @ 2020-10-14 5:50 UTC (permalink / raw)
To: Michal Suchánek; +Cc: atrajeev, linuxppc-dev
In-Reply-To: <20201013155842.GY29778@kitsune.suse.cz>
On 10/13/20 9:28 PM, Michal Suchánek wrote:
> On Tue, Oct 13, 2020 at 06:27:05PM +0530, Madhavan Srinivasan wrote:
>> On 10/12/20 4:59 PM, Michal Suchánek wrote:
>>> Hello,
>>>
>>> On Mon, Oct 12, 2020 at 04:01:28PM +0530, Madhavan Srinivasan wrote:
>>>> Power9 and isa v3.1 has 7bit mantissa field for Threshold Event Counter
>>> ^^^ Shouldn't his be 3.0?
>> My bad, What I meant was
>>
>> Power9, ISA v3.0 and ISA v3.1 define a 7 bit mantissa field for Threshold
>> Event Counter Multiplier(TECM).
> I am really confused.
>
> The following text and the code suggests that the mantissa is 8bit on
> POWER10 and ISA v3.1.
Ok got it. Will fix the CPU_FTR_ARCH_31 check.
Thanks for review
Maddy
>
> Thanks
>
> Michal
>> Maddy
>>
>>>> Multiplier (TECM). TECM is part of Monitor Mode Control Register A (MMCRA).
>>>> This field along with Threshold Event Counter Exponent (TECE) is used to
>>>> get threshould counter value. In Power10, the width of TECM field is
>>>> increase to 8bits. Patch fixes the current code to modify the MMCRA[TECM]
>>>> extraction macro to handling this changes.
>>>>
>>>> Fixes: 170a315f41c64 ('powerpc/perf: Support to export MMCRA[TEC*] field to userspace')
>>>> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
>>>> ---
>>>> arch/powerpc/perf/isa207-common.c | 3 +++
>>>> arch/powerpc/perf/isa207-common.h | 4 ++++
>>>> 2 files changed, 7 insertions(+)
>>>>
>>>> diff --git a/arch/powerpc/perf/isa207-common.c b/arch/powerpc/perf/isa207-common.c
>>>> index 964437adec18..5fe129f02290 100644
>>>> --- a/arch/powerpc/perf/isa207-common.c
>>>> +++ b/arch/powerpc/perf/isa207-common.c
>>>> @@ -247,6 +247,9 @@ void isa207_get_mem_weight(u64 *weight)
>>>> u64 sier = mfspr(SPRN_SIER);
>>>> u64 val = (sier & ISA207_SIER_TYPE_MASK) >> ISA207_SIER_TYPE_SHIFT;
>>>> + if (cpu_has_feature(CPU_FTR_ARCH_31))
>>>> + mantissa = P10_MMCRA_THR_CTR_MANT(mmcra);
>>>> +
>>>> if (val == 0 || val == 7)
>>>> *weight = 0;
>>>> else
>>>> diff --git a/arch/powerpc/perf/isa207-common.h b/arch/powerpc/perf/isa207-common.h
>>>> index 044de65e96b9..71380e854f48 100644
>>>> --- a/arch/powerpc/perf/isa207-common.h
>>>> +++ b/arch/powerpc/perf/isa207-common.h
>>>> @@ -219,6 +219,10 @@
>>>> #define MMCRA_THR_CTR_EXP(v) (((v) >> MMCRA_THR_CTR_EXP_SHIFT) &\
>>>> MMCRA_THR_CTR_EXP_MASK)
>>>> +#define P10_MMCRA_THR_CTR_MANT_MASK 0xFFul
>>>> +#define P10_MMCRA_THR_CTR_MANT(v) (((v) >> MMCRA_THR_CTR_MANT_SHIFT) &\
>>>> + P10_MMCRA_THR_CTR_MANT_MASK)
>>>> +
>>>> /* MMCRA Threshold Compare bit constant for power9 */
>>>> #define p9_MMCRA_THR_CMP_SHIFT 45
>>>> --
>>>> 2.26.2
>>>>
^ permalink raw reply
* Re: [PATCH] powerpc/features: Remove CPU_FTR_NODSISRALIGN
From: Aneesh Kumar K.V @ 2020-10-14 3:19 UTC (permalink / raw)
To: Michael Ellerman, Christophe Leroy, Benjamin Herrenschmidt,
Paul Mackerras
Cc: linuxppc-dev, linux-kernel
In-Reply-To: <87wnzuzb1x.fsf@mpe.ellerman.id.au>
On 10/13/20 3:45 PM, Michael Ellerman wrote:
> Christophe Leroy <christophe.leroy@csgroup.eu> writes:
>> Le 13/10/2020 à 09:23, Aneesh Kumar K.V a écrit :
>>> Christophe Leroy <christophe.leroy@csgroup.eu> writes:
>>>
>>>> CPU_FTR_NODSISRALIGN has not been used since
>>>> commit 31bfdb036f12 ("powerpc: Use instruction emulation
>>>> infrastructure to handle alignment faults")
>>>>
>>>> Remove it.
>>>>
>>>> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
>>>> ---
>>>> arch/powerpc/include/asm/cputable.h | 22 ++++++++++------------
>>>> arch/powerpc/kernel/dt_cpu_ftrs.c | 8 --------
>>>> arch/powerpc/kernel/prom.c | 2 +-
>>>> 3 files changed, 11 insertions(+), 21 deletions(-)
>>>>
>>>> diff --git a/arch/powerpc/kernel/dt_cpu_ftrs.c b/arch/powerpc/kernel/dt_cpu_ftrs.c
>>>> index 1098863e17ee..c598961d9f15 100644
>>>> --- a/arch/powerpc/kernel/dt_cpu_ftrs.c
>>>> +++ b/arch/powerpc/kernel/dt_cpu_ftrs.c
>>>> @@ -273,13 +273,6 @@ static int __init feat_enable_idle_nap(struct dt_cpu_feature *f)
>>>> return 1;
>>>> }
>>>>
>>>> -static int __init feat_enable_align_dsisr(struct dt_cpu_feature *f)
>>>> -{
>>>> - cur_cpu_spec->cpu_features &= ~CPU_FTR_NODSISRALIGN;
>>>> -
>>>> - return 1;
>>>> -}
>>>> -
>>>> static int __init feat_enable_idle_stop(struct dt_cpu_feature *f)
>>>> {
>>>> u64 lpcr;
>>>> @@ -641,7 +634,6 @@ static struct dt_cpu_feature_match __initdata
>>>> {"tm-suspend-hypervisor-assist", feat_enable, CPU_FTR_P9_TM_HV_ASSIST},
>>>> {"tm-suspend-xer-so-bug", feat_enable, CPU_FTR_P9_TM_XER_SO_BUG},
>>>> {"idle-nap", feat_enable_idle_nap, 0},
>>>> - {"alignment-interrupt-dsisr", feat_enable_align_dsisr, 0},
>
> Rather than removing it entirely, I'd rather we left a comment, so that
> it's obvious that we are ignoring that feature on purpose, not because
> we forget about it.
>
> eg:
>
> diff --git a/arch/powerpc/kernel/dt_cpu_ftrs.c b/arch/powerpc/kernel/dt_cpu_ftrs.c
> index f204ad79b6b5..45cb7e59bd13 100644
> --- a/arch/powerpc/kernel/dt_cpu_ftrs.c
> +++ b/arch/powerpc/kernel/dt_cpu_ftrs.c
> @@ -640,7 +640,7 @@ static struct dt_cpu_feature_match __initdata
> {"tm-suspend-hypervisor-assist", feat_enable, CPU_FTR_P9_TM_HV_ASSIST},
> {"tm-suspend-xer-so-bug", feat_enable, CPU_FTR_P9_TM_XER_SO_BUG},
> {"idle-nap", feat_enable_idle_nap, 0},
> - {"alignment-interrupt-dsisr", feat_enable_align_dsisr, 0},
> + // "alignment-interrupt-dsisr" ignored
> {"idle-stop", feat_enable_idle_stop, 0},
> {"machine-check-power8", feat_enable_mce_power8, 0},
> {"performance-monitor-power8", feat_enable_pmu_power8, 0},
>
why not do it as
static int __init feat_enable_align_dsisr(struct dt_cpu_feature *f)
{
/* This feature should not be enabled */
#ifdef DEBUG
WARN(1);
#endif
return 1;
}
-aneesh
^ permalink raw reply
* Re: [PATCH v4 00/13] mm/debug_vm_pgtable fixes
From: Aneesh Kumar K.V @ 2020-10-14 3:15 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-mm, Anshuman Khandual, linuxppc-dev
In-Reply-To: <20201013135858.f4a7f0c5f3b0a69a2a304cfe@linux-foundation.org>
On 10/14/20 2:28 AM, Andrew Morton wrote:
> On Wed, 2 Sep 2020 17:12:09 +0530 "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> wrote:
>
>> This patch series includes fixes for debug_vm_pgtable test code so that
>> they follow page table updates rules correctly. The first two patches introduce
>> changes w.r.t ppc64. The patches are included in this series for completeness. We can
>> merge them via ppc64 tree if required.
>
> Do you think this series is ready to be merged?
Hopefully, except for the Riscv crash.
>
> Possibly-unresolved issues which I have recorded are
>
> Against
> mm-debug_vm_pgtable-locks-move-non-page-table-modifying-test-together.patch:
>
> https://lkml.kernel.org/r/56830efb-887e-0000-a46e-ae015e5854cd@arm.com
I guess the full series do boot fine on arm.
> https://lkml.kernel.org/r/20200910075752.GC26874@shao2-debian
This should be fixed by
https://ozlabs.org/~akpm/mmots/broken-out/mm-debug_vm_pgtable-avoid-doing-memory-allocation-with-pgtable_t-mapped.patch
>
> Against mm-debug_vm_pgtable-avoid-none-pte-in-pte_clear_test.patch:
>
> https://lkml.kernel.org/r/87zh5wx51b.fsf@linux.ibm.com
yes this one we should get fixed. I was hoping someone familiar with
Riscv pte updates rules would pitch in. IIUC we need to update
RANDON_ORVALUE similar to how we updated it for s390 and ppc64.
Alternatively we can do this
modified mm/debug_vm_pgtable.c
@@ -548,7 +548,7 @@ static void __init pte_clear_tests(struct mm_struct
*mm, pte_t *ptep,
pte_t pte = pfn_pte(pfn, prot);
pr_debug("Validating PTE clear\n");
- pte = __pte(pte_val(pte) | RANDOM_ORVALUE);
+// pte = __pte(pte_val(pte) | RANDOM_ORVALUE);
set_pte_at(mm, vaddr, ptep, pte);
barrier();
pte_clear(mm, vaddr, ptep);
till we get that feedback from RiscV team?
> https://lkml.kernel.org/r/37a9facc-ca36-290f-3748-16c4a7a778fa@arm.com
same as the above.
> https://lkml.kernel.org/r/20201011200258.GA91021@roeck-us.net
>
same as the above.
-aneesh
^ permalink raw reply
* Re: [PATCH v2] powerpc/pci: unmap legacy INTx interrupts when a PHB is removed
From: Alexey Kardashevskiy @ 2020-10-14 2:55 UTC (permalink / raw)
To: Cédric Le Goater, Qian Cai, Michael Ellerman
Cc: linuxppc-dev, linux-next, Oliver O'Halloran, linux-kernel,
Stephen Rothwell
In-Reply-To: <fce8ffe1-521c-8344-c7ad-53550e408cdc@kaod.org>
On 23/09/2020 17:06, Cédric Le Goater wrote:
> On 9/23/20 2:33 AM, Qian Cai wrote:
>> On Fri, 2020-08-07 at 12:18 +0200, Cédric Le Goater wrote:
>>> When a passthrough IO adapter is removed from a pseries machine using
>>> hash MMU and the XIVE interrupt mode, the POWER hypervisor expects the
>>> guest OS to clear all page table entries related to the adapter. If
>>> some are still present, the RTAS call which isolates the PCI slot
>>> returns error 9001 "valid outstanding translations" and the removal of
>>> the IO adapter fails. This is because when the PHBs are scanned, Linux
>>> maps automatically the INTx interrupts in the Linux interrupt number
>>> space but these are never removed.
>>>
>>> To solve this problem, we introduce a PPC platform specific
>>> pcibios_remove_bus() routine which clears all interrupt mappings when
>>> the bus is removed. This also clears the associated page table entries
>>> of the ESB pages when using XIVE.
>>>
>>> For this purpose, we record the logical interrupt numbers of the
>>> mapped interrupt under the PHB structure and let pcibios_remove_bus()
>>> do the clean up.
>>>
>>> Since some PCI adapters, like GPUs, use the "interrupt-map" property
>>> to describe interrupt mappings other than the legacy INTx interrupts,
>>> we can not restrict the size of the mapping array to PCI_NUM_INTX. The
>>> number of interrupt mappings is computed from the "interrupt-map"
>>> property and the mapping array is allocated accordingly.
>>>
>>> Cc: "Oliver O'Halloran" <oohall@gmail.com>
>>> Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
>>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
>>
>> Some syscall fuzzing will trigger this on POWER9 NV where the traces pointed to
>> this patch.
>>
>> .config: https://gitlab.com/cailca/linux-mm/-/blob/master/powerpc.config
>
> OK. The patch is missing a NULL assignement after kfree() and that
> might be the issue.
>
> I did try PHB removal under PowerNV, so I would like to understand
> how we managed to remove twice the PCI bus and possibly reproduce.
> Any chance we could grab what the syscall fuzzer (syzkaller) did ?
How do you remove PHBs exactly? There is no such thing in the powernv
platform, I thought someone added this and you are fixing it but no.
PHBs on powernv are created at the boot time and there is no way to
remove them, you can only try removing all the bridges.
So what exactly are you doing?
--
Alexey
^ permalink raw reply
* [PATCH] selftests/powerpc: Fix eeh-basic.sh exit codes
From: Oliver O'Halloran @ 2020-10-14 2:47 UTC (permalink / raw)
To: linuxppc-dev; +Cc: Oliver O'Halloran
The kselftests test running infrastructure expects tests to finish with an
exit code of 4 if the test decided it should be skipped. Currently
eeh-basic.sh exits with the number of devices that failed to recover, so if
four devices didn't recover we'll report a skip instead of a fail.
Fix this by checking if the return code is non-zero and report success
and failure by returning 0 or 1 respectively. For the cases where should
actually skip return 4.
Fixes: 85d86c8aa52e ("selftests/powerpc: Add basic EEH selftest")
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
---
tools/testing/selftests/powerpc/eeh/eeh-basic.sh | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)
diff --git a/tools/testing/selftests/powerpc/eeh/eeh-basic.sh b/tools/testing/selftests/powerpc/eeh/eeh-basic.sh
index 8a8d0f456946..0d783e1065c8 100755
--- a/tools/testing/selftests/powerpc/eeh/eeh-basic.sh
+++ b/tools/testing/selftests/powerpc/eeh/eeh-basic.sh
@@ -1,17 +1,19 @@
#!/bin/sh
# SPDX-License-Identifier: GPL-2.0-only
+KSELFTESTS_SKIP=4
+
. ./eeh-functions.sh
if ! eeh_supported ; then
echo "EEH not supported on this system, skipping"
- exit 0;
+ exit $KSELFTESTS_SKIP;
fi
if [ ! -e "/sys/kernel/debug/powerpc/eeh_dev_check" ] && \
[ ! -e "/sys/kernel/debug/powerpc/eeh_dev_break" ] ; then
echo "debugfs EEH testing files are missing. Is debugfs mounted?"
- exit 1;
+ exit $KSELFTESTS_SKIP;
fi
pre_lspci=`mktemp`
@@ -84,4 +86,5 @@ echo "$failed devices failed to recover ($dev_count tested)"
lspci | diff -u $pre_lspci -
rm -f $pre_lspci
-exit $failed
+test "$failed" == 0
+exit $?
--
2.26.2
^ permalink raw reply related
* Re: [PATCH 1/2] powerpc: Fix user data corruption with P9N DD2.1 VSX CI load workaround emulation
From: Michael Ellerman @ 2020-10-14 0:32 UTC (permalink / raw)
To: Michael Neuling; +Cc: mikey, linuxppc-dev
In-Reply-To: <20201013043741.743413-1-mikey@neuling.org>
Michael Neuling <mikey@neuling.org> writes:
> __get_user_atomic_128_aligned() stores to kaddr using stvx which is a
> VMX store instruction, hence kaddr must be 16 byte aligned otherwise
> the store won't occur as expected.
>
> Unfortunately when we call __get_user_atomic_128_aligned() in
> p9_hmi_special_emu(), the buffer we pass as kaddr (ie. vbuf) isn't
> guaranteed to be 16B aligned. This means that the write to vbuf in
> __get_user_atomic_128_aligned() has the bottom bits of the address
> truncated. This results in other local variables being
> overwritten. Also vbuf will not contain the correct data which results
> in the userspace emulation being wrong and hence user data corruption.
>
> In the past we've been mostly lucky as vbuf has ended up aligned but
> this is fragile and isn't always true. CONFIG_STACKPROTECTOR in
> particular can change the stack arrangement enough that our luck runs
> out.
Below is a script which takes a System.map and vmlinux (or objdump
output) and tries to check if the stack layout is susceptible to the
bug.
cheers
#!/usr/bin/python3
import os
import sys
import re
from subprocess import Popen, PIPE
# eg: c00000000002ea88: ce 49 00 7c stvx v0,0,r9
stvx_pattern = re.compile('^c[0-9a-f]{15}:\s+(?:[0-9a-f]{2} ){4}\s+stvx\s+v0,0,(r\d+)\s*')
# eg: c00000000002ea80: 28 00 21 39 addi r9,r1,40
addi_pattern = '^c[0-9a-f]{15}:\s+(?:[0-9a-f]{2} ){4}\s+addi\s+%s,r1,(\d+)\s*'
def main(args):
if len(args) != 2:
print('Usage: %s <objdump|vmlinux> <System.map>' % sys.argv[0])
return -1
if os.path.basename(sys.argv[1]).startswith('vmlinu'):
dump = Popen(['objdump', '-d', sys.argv[1]], stdout=PIPE, encoding='utf-8').stdout
else:
dump = open(sys.argv[1])
syms = read_symbols(sys.argv[2])
func_lines = extract_func(dump, 'handle_hmi_exception', syms)
if func_lines is None:
print("Error: couldn't find handle_hmi_exception in objdump output")
return -1
match = None
i = 0
while i < len(func_lines):
match = stvx_pattern.match(func_lines[i])
if match:
break
i += 1
if match is None:
print("Error: couldn't find stvx in handle_hmi_exception")
return -1
stvx_reg = match.group(1)
print('stvx found using register %s:\n%s\n' % (stvx_reg, match.group(0).rstrip()))
match = None
i -= 1
while i > 0:
pattern = re.compile(addi_pattern % stvx_reg)
match = pattern.match(func_lines[i])
if match:
break
i -= 1
if match is None:
print("Error: couldn't find addi in handle_hmi_exception")
return -1
stack_offset = int(match.group(1))
print('addi found using offset %d:\n%s\n' % (stack_offset, match.group(0).rstrip()))
if stack_offset & 0xf:
print('!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!')
print('!! Offset is misaligned - bug present !!')
print('!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!')
return 1
else:
print('OK - offset is aligned')
return 0
def extract_func(f, func_name, syms):
func_addr, func_size = find_symbol_and_size(syms, func_name)
num_lines = int(func_size / 4)
pattern = re.compile('^%016x:' % func_addr)
match = None
line = f.readline()
while len(line):
match = pattern.match(line)
if match:
break
line = f.readline()
if match is None:
return None
lines = []
for i in range(0, num_lines):
lines.append(f.readline())
return lines
def read_symbols(map_path):
last_function = ''
last_addr = 0
lines = open(map_path).readlines()
addrs = []
last_addr = 0
for line in lines:
tokens = line.split()
if len(tokens) == 3:
addr = int(tokens[0], 16)
sym_type = tokens[1]
name = tokens[2]
elif len(tokens) == 2:
addr = last_addr
sym_type = tokens[0]
name = tokens[1]
else:
raise Exception("Couldn't grok System.map")
addrs.append((addr, name, sym_type))
last_addr = addr
return addrs
def find_symbol_and_size(symbol_map, name):
dot_name = '.%s' % name
saddr = None
i = 0
for addr, cur_name, sym_type in symbol_map:
if cur_name == name or cur_name == dot_name:
saddr = addr
break
i += 1
if saddr is None:
return (None, None)
i += 1
if i >= len(symbol_map):
size = -1
else:
size = symbol_map[i][0] - saddr
return (saddr, size)
sys.exit(main(sys.argv[1:]))
^ permalink raw reply
* Re: [PATCH 1/2] powerpc: Fix user data corruption with P9N DD2.1 VSX CI load workaround emulation
From: Michael Ellerman @ 2020-10-14 0:13 UTC (permalink / raw)
To: Michael Neuling; +Cc: mikey, linuxppc-dev
In-Reply-To: <20201013043741.743413-1-mikey@neuling.org>
Michael Neuling <mikey@neuling.org> writes:
> __get_user_atomic_128_aligned() stores to kaddr using stvx which is a
> VMX store instruction, hence kaddr must be 16 byte aligned otherwise
> the store won't occur as expected.
>
> Unfortunately when we call __get_user_atomic_128_aligned() in
> p9_hmi_special_emu(), the buffer we pass as kaddr (ie. vbuf) isn't
> guaranteed to be 16B aligned. This means that the write to vbuf in
> __get_user_atomic_128_aligned() has the bottom bits of the address
> truncated. This results in other local variables being
> overwritten. Also vbuf will not contain the correct data which results
> in the userspace emulation being wrong and hence user data corruption.
>
> In the past we've been mostly lucky as vbuf has ended up aligned but
> this is fragile and isn't always true. CONFIG_STACKPROTECTOR in
> particular can change the stack arrangement enough that our luck runs
> out.
Actually I'm yet to find a kernel with CONFIG_STACKPROTECTOR=n that is
vulnerable to the bug.
Turning on STACKPROTECTOR changes the order GCC allocates locals on the
stack, from bottom-up to top-down. That in conjunction with the 8 byte
stack canary means we end up with 8 bytes of space below the locals,
which misaligns vbuf.
But obviously other things can change the stack layout too, so no
guarantees that CONFIG_STACKPROTECTOR=n makes it safe.
cheers
^ permalink raw reply
* Re: [PATCH v2] powerpc/pci: unmap legacy INTx interrupts when a PHB is removed
From: Michael Ellerman @ 2020-10-13 23:42 UTC (permalink / raw)
To: Qian Cai, Cédric Le Goater
Cc: Stephen Rothwell, Alexey Kardashevskiy, linux-kernel, linux-next,
Oliver O'Halloran, linuxppc-dev
In-Reply-To: <90922c43c670e4b55e6cf421be19146333e2ae7b.camel@redhat.com>
Qian Cai <cai@redhat.com> writes:
> On Wed, 2020-09-23 at 09:06 +0200, Cédric Le Goater wrote:
>> On 9/23/20 2:33 AM, Qian Cai wrote:
>> > On Fri, 2020-08-07 at 12:18 +0200, Cédric Le Goater wrote:
>> > > When a passthrough IO adapter is removed from a pseries machine using
>> > > hash MMU and the XIVE interrupt mode, the POWER hypervisor expects the
>> > > guest OS to clear all page table entries related to the adapter. If
>> > > some are still present, the RTAS call which isolates the PCI slot
>> > > returns error 9001 "valid outstanding translations" and the removal of
>> > > the IO adapter fails. This is because when the PHBs are scanned, Linux
>> > > maps automatically the INTx interrupts in the Linux interrupt number
>> > > space but these are never removed.
>> > >
>> > > To solve this problem, we introduce a PPC platform specific
>> > > pcibios_remove_bus() routine which clears all interrupt mappings when
>> > > the bus is removed. This also clears the associated page table entries
>> > > of the ESB pages when using XIVE.
>> > >
>> > > For this purpose, we record the logical interrupt numbers of the
>> > > mapped interrupt under the PHB structure and let pcibios_remove_bus()
>> > > do the clean up.
>> > >
>> > > Since some PCI adapters, like GPUs, use the "interrupt-map" property
>> > > to describe interrupt mappings other than the legacy INTx interrupts,
>> > > we can not restrict the size of the mapping array to PCI_NUM_INTX. The
>> > > number of interrupt mappings is computed from the "interrupt-map"
>> > > property and the mapping array is allocated accordingly.
>> > >
>> > > Cc: "Oliver O'Halloran" <oohall@gmail.com>
>> > > Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
>> > > Signed-off-by: Cédric Le Goater <clg@kaod.org>
>> >
>> > Some syscall fuzzing will trigger this on POWER9 NV where the traces pointed
>> > to
>> > this patch.
>> >
>> > .config: https://gitlab.com/cailca/linux-mm/-/blob/master/powerpc.config
>>
>> OK. The patch is missing a NULL assignement after kfree() and that
>> might be the issue.
>>
>> I did try PHB removal under PowerNV, so I would like to understand
>> how we managed to remove twice the PCI bus and possibly reproduce.
>> Any chance we could grab what the syscall fuzzer (syzkaller) did ?
>
> Any update on this? Maybe Michael or Stephen could drop this for now, so our
> fuzzing could continue to find something else new?
Someone send me a revert?
cheers
^ permalink raw reply
* Re: [PATCH v4 00/13] mm/debug_vm_pgtable fixes
From: Andrew Morton @ 2020-10-13 20:58 UTC (permalink / raw)
To: Aneesh Kumar K.V; +Cc: linux-mm, Anshuman Khandual, linuxppc-dev
In-Reply-To: <20200902114222.181353-1-aneesh.kumar@linux.ibm.com>
On Wed, 2 Sep 2020 17:12:09 +0530 "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> wrote:
> This patch series includes fixes for debug_vm_pgtable test code so that
> they follow page table updates rules correctly. The first two patches introduce
> changes w.r.t ppc64. The patches are included in this series for completeness. We can
> merge them via ppc64 tree if required.
Do you think this series is ready to be merged?
Possibly-unresolved issues which I have recorded are
Against
mm-debug_vm_pgtable-locks-move-non-page-table-modifying-test-together.patch:
https://lkml.kernel.org/r/56830efb-887e-0000-a46e-ae015e5854cd@arm.com
https://lkml.kernel.org/r/20200910075752.GC26874@shao2-debian
Against mm-debug_vm_pgtable-avoid-none-pte-in-pte_clear_test.patch:
https://lkml.kernel.org/r/87zh5wx51b.fsf@linux.ibm.com
https://lkml.kernel.org/r/37a9facc-ca36-290f-3748-16c4a7a778fa@arm.com
https://lkml.kernel.org/r/20201011200258.GA91021@roeck-us.net
^ permalink raw reply
* Re: [PATCH RFC PKS/PMEM 24/58] fs/freevxfs: Utilize new kmap_thread()
From: Ira Weiny @ 2020-10-13 20:52 UTC (permalink / raw)
To: Christoph Hellwig
Cc: linux-aio, linux-efi, kvm, linux-doc, Peter Zijlstra, linux-mmc,
Dave Hansen, dri-devel, linux-mm, target-devel, linux-mtd,
linux-kselftest, samba-technical, Thomas Gleixner, drbd-dev,
devel, linux-cifs, linux-nilfs, linux-scsi, linux-nvdimm,
linux-rdma, x86, ceph-devel, amd-gfx, io-uring, cluster-devel,
Ingo Molnar, intel-wired-lan, xen-devel, linux-ext4, Fenghua Yu,
linux-afs, linux-um, intel-gfx, ecryptfs, linux-erofs,
reiserfs-devel, linux-block, linux-bcache, Borislav Petkov,
Andy Lutomirski, Dan Williams, Andrew Morton, linux-cachefs,
linux-nfs, linux-ntfs-dev, netdev, kexec, linux-kernel,
linux-f2fs-devel, linux-fsdevel, bpf, linuxppc-dev, linux-btrfs
In-Reply-To: <20201013112544.GA5249@infradead.org>
On Tue, Oct 13, 2020 at 12:25:44PM +0100, Christoph Hellwig wrote:
> > - kaddr = kmap(pp);
> > + kaddr = kmap_thread(pp);
> > memcpy(kaddr, vip->vii_immed.vi_immed + offset, PAGE_SIZE);
> > - kunmap(pp);
> > + kunmap_thread(pp);
>
> You only Cced me on this particular patch, which means I have absolutely
> no idea what kmap_thread and kunmap_thread actually do, and thus can't
> provide an informed review.
Sorry the list was so big I struggled with who to CC and on which patches.
>
> That being said I think your life would be a lot easier if you add
> helpers for the above code sequence and its counterpart that copies
> to a potential hughmem page first, as that hides the implementation
> details from most users.
Matthew Wilcox and Al Viro have suggested similar ideas.
https://lore.kernel.org/lkml/20201013205012.GI2046448@iweiny-DESK2.sc.intel.com/
Ira
^ permalink raw reply
* Re: [PATCH RFC PKS/PMEM 33/58] fs/cramfs: Utilize new kmap_thread()
From: Ira Weiny @ 2020-10-13 20:50 UTC (permalink / raw)
To: Al Viro
Cc: linux-aio, linux-efi, KVM list, Linux Doc Mailing List,
Peter Zijlstra, linux-mmc, Dave Hansen,
Maling list - DRI developers, Linux MM, target-devel, linux-mtd,
amd-gfx list, linux-kselftest, samba-technical, Thomas Gleixner,
drbd-dev, devel, linux-cifs, linux-nilfs, linux-scsi,
linux-nvdimm, linux-rdma, X86 ML, ceph-devel, Matthew Wilcox,
io-uring, cluster-devel, Ingo Molnar, intel-wired-lan, xen-devel,
linux-ext4, Fenghua Yu, linux-afs, linux-um, intel-gfx, ecryptfs,
linux-erofs, reiserfs-devel, linux-block, linux-bcache,
Borislav Petkov, Andy Lutomirski, Dan Williams, bpf,
linux-cachefs, linux-nfs, Nicolas Pitre, linux-ntfs-dev, Netdev,
Kexec Mailing List, Linux Kernel Mailing List, linux-f2fs-devel,
linux-fsdevel, Andrew Morton, linuxppc-dev, linux-btrfs
In-Reply-To: <20201013200149.GI3576660@ZenIV.linux.org.uk>
On Tue, Oct 13, 2020 at 09:01:49PM +0100, Al Viro wrote:
> On Tue, Oct 13, 2020 at 08:36:43PM +0100, Matthew Wilcox wrote:
>
> > static inline void copy_to_highpage(struct page *to, void *vfrom, unsigned int size)
> > {
> > char *vto = kmap_atomic(to);
> >
> > memcpy(vto, vfrom, size);
> > kunmap_atomic(vto);
> > }
> >
> > in linux/highmem.h ?
>
> You mean, like
> static void memcpy_from_page(char *to, struct page *page, size_t offset, size_t len)
> {
> char *from = kmap_atomic(page);
> memcpy(to, from + offset, len);
> kunmap_atomic(from);
> }
>
> static void memcpy_to_page(struct page *page, size_t offset, const char *from, size_t len)
> {
> char *to = kmap_atomic(page);
> memcpy(to + offset, from, len);
> kunmap_atomic(to);
> }
>
> static void memzero_page(struct page *page, size_t offset, size_t len)
> {
> char *addr = kmap_atomic(page);
> memset(addr + offset, 0, len);
> kunmap_atomic(addr);
> }
>
> in lib/iov_iter.c? FWIW, I don't like that "highpage" in the name and
> highmem.h as location - these make perfect sense regardless of highmem;
> they are normal memory operations with page + offset used instead of
> a pointer...
I was thinking along those lines as well especially because of the direction
this patch set takes kmap().
Thanks for pointing these out to me. How about I lift them to a common header?
But if not highmem.h where?
Ira
^ permalink raw reply
* Re: [PATCH RFC PKS/PMEM 33/58] fs/cramfs: Utilize new kmap_thread()
From: Ira Weiny @ 2020-10-13 20:45 UTC (permalink / raw)
To: Matthew Wilcox
Cc: linux-aio, linux-efi, KVM list, Linux Doc Mailing List,
Peter Zijlstra, linux-mmc, Dave Hansen,
Maling list - DRI developers, Linux MM, target-devel, linux-mtd,
linux-kselftest, samba-technical, Thomas Gleixner, drbd-dev,
devel, linux-cifs, linux-nilfs, linux-scsi, linux-nvdimm,
linux-rdma, X86 ML, ceph-devel, amd-gfx list, io-uring,
cluster-devel, Ingo Molnar, intel-wired-lan, xen-devel,
linux-ext4, Fenghua Yu, linux-afs, linux-um, intel-gfx, ecryptfs,
linux-erofs, reiserfs-devel, linux-block, linux-bcache,
Borislav Petkov, Andy Lutomirski, Dan Williams, bpf,
linux-cachefs, linux-nfs, Nicolas Pitre, linux-ntfs-dev, Netdev,
Kexec Mailing List, Linux Kernel Mailing List, linux-f2fs-devel,
linux-fsdevel, Andrew Morton, linuxppc-dev, linux-btrfs
In-Reply-To: <20201013193643.GK20115@casper.infradead.org>
On Tue, Oct 13, 2020 at 08:36:43PM +0100, Matthew Wilcox wrote:
> On Tue, Oct 13, 2020 at 11:44:29AM -0700, Dan Williams wrote:
> > On Fri, Oct 9, 2020 at 12:52 PM <ira.weiny@intel.com> wrote:
> > >
> > > From: Ira Weiny <ira.weiny@intel.com>
> > >
> > > The kmap() calls in this FS are localized to a single thread. To avoid
> > > the over head of global PKRS updates use the new kmap_thread() call.
> > >
> > > Cc: Nicolas Pitre <nico@fluxnic.net>
> > > Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> > > ---
> > > fs/cramfs/inode.c | 10 +++++-----
> > > 1 file changed, 5 insertions(+), 5 deletions(-)
> > >
> > > diff --git a/fs/cramfs/inode.c b/fs/cramfs/inode.c
> > > index 912308600d39..003c014a42ed 100644
> > > --- a/fs/cramfs/inode.c
> > > +++ b/fs/cramfs/inode.c
> > > @@ -247,8 +247,8 @@ static void *cramfs_blkdev_read(struct super_block *sb, unsigned int offset,
> > > struct page *page = pages[i];
> > >
> > > if (page) {
> > > - memcpy(data, kmap(page), PAGE_SIZE);
> > > - kunmap(page);
> > > + memcpy(data, kmap_thread(page), PAGE_SIZE);
> > > + kunmap_thread(page);
> >
> > Why does this need a sleepable kmap? This looks like a textbook
> > kmap_atomic() use case.
>
> There's a lot of code of this form. Could we perhaps have:
>
> static inline void copy_to_highpage(struct page *to, void *vfrom, unsigned int size)
> {
> char *vto = kmap_atomic(to);
>
> memcpy(vto, vfrom, size);
> kunmap_atomic(vto);
> }
>
> in linux/highmem.h ?
Christoph had the same idea. I'll work on it.
Ira
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox