* [RFC PATCH] irqchip/gic-v4.1:fix the kdump GIC ITS RAS error for ITS BASER2 @ 2021-12-14 6:47 Jay Chen 2021-12-14 9:26 ` Marc Zyngier 2021-12-16 15:24 ` [irqchip: irq/irqchip-next] irqchip/gic-v4: Disable redistributors' view of the VPE table at boot time irqchip-bot for Marc Zyngier 0 siblings, 2 replies; 5+ messages in thread From: Jay Chen @ 2021-12-14 6:47 UTC (permalink / raw) To: tglx, maz, linux-kernel; +Cc: zhangliguang We encounter a GIC RAS Error in below flow: (1) Configure ITS related register (including GITS_BASER2, GITS_BASER2.valid = 1'b1) (2) Configure GICR related register (including GICR_VPROPBASER, GICR_VPROPBASER.valid = 1'b1) The common settings in above 2 register are the same and currently everything is OK (3) Kernel panic and os start the kdump flow.And then os reconfigure ITS related register (including GITS_BASER2, GITS_BASER2.valid = 1'b1). But at this time, gicr_vpropbaser is not initialized, so it is still an old value. At this point, the new value of its_baser2 and the old value of gicr_vpropbaser is different, resulting in its RAS error. https://bugzilla.kernel.org/show_bug.cgi?id=215327 Signed-off-by: Jay Chen <jkchen@linux.alibaba.com> --- drivers/irqchip/irq-gic-v3-its.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c index eb0882d15366..c340bbf4427b 100644 --- a/drivers/irqchip/irq-gic-v3-its.c +++ b/drivers/irqchip/irq-gic-v3-its.c @@ -2623,6 +2623,12 @@ static int its_alloc_tables(struct its_node *its) return err; } + if ((i == 2) && is_kdump_kernel() && is_v4_1(its)) { + val = its_read_baser(its, baser); + val &= ~GITS_BASER_VALID; + its_write_baser(its, baser, val); + } + /* Update settings which will be used for next BASERn */ cache = baser->val & GITS_BASER_CACHEABILITY_MASK; shr = baser->val & GITS_BASER_SHAREABILITY_MASK; -- 2.27.0 ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [RFC PATCH] irqchip/gic-v4.1:fix the kdump GIC ITS RAS error for ITS BASER2 2021-12-14 6:47 [RFC PATCH] irqchip/gic-v4.1:fix the kdump GIC ITS RAS error for ITS BASER2 Jay Chen @ 2021-12-14 9:26 ` Marc Zyngier 2021-12-14 9:52 ` Lorenzo Pieralisi 2021-12-16 3:36 ` Jiankang Chen 2021-12-16 15:24 ` [irqchip: irq/irqchip-next] irqchip/gic-v4: Disable redistributors' view of the VPE table at boot time irqchip-bot for Marc Zyngier 1 sibling, 2 replies; 5+ messages in thread From: Marc Zyngier @ 2021-12-14 9:26 UTC (permalink / raw) To: Jay Chen; +Cc: tglx, linux-kernel, zhangliguang, Lorenzo Pieralisi [+ Lorenzo, just in case...] Hi Jay, Thanks for this. On Tue, 14 Dec 2021 06:47:16 +0000, Jay Chen <jkchen@linux.alibaba.com> wrote: > > We encounter a GIC RAS Error in below flow: > (1) Configure ITS related register (including > GITS_BASER2, GITS_BASER2.valid = 1'b1) > (2) Configure GICR related register (including > GICR_VPROPBASER, GICR_VPROPBASER.valid = 1'b1) > The common settings in above 2 register are the same > and currently everything is OK > (3) Kernel panic and os start the kdump flow.And then os > reconfigure ITS related register (including GITS_BASER2, > GITS_BASER2.valid = 1'b1). But at this time, gicr_vpropbaser > is not initialized, so it is still an old value. At this point, > the new value of its_baser2 and the old value of gicr_vpropbaser is > different, resulting in its RAS error. > > https://bugzilla.kernel.org/show_bug.cgi?id=215327 I'm sorry, but I don't have any access to this. Please add all the relevant details to the commit message and drop the link. Could you please detail what HW this is on? The architecture specification for GICv4.1 doesn't make any mention of RAS error conditions, so this must be implementation specific. A reference to the TRM of the IP would certainly help. Now, I think you have identified something interesting, but I'm not convinced by the implementation, see below. > > Signed-off-by: Jay Chen <jkchen@linux.alibaba.com> > --- > drivers/irqchip/irq-gic-v3-its.c | 6 ++++++ > 1 file changed, 6 insertions(+) > > diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c > index eb0882d15366..c340bbf4427b 100644 > --- a/drivers/irqchip/irq-gic-v3-its.c > +++ b/drivers/irqchip/irq-gic-v3-its.c > @@ -2623,6 +2623,12 @@ static int its_alloc_tables(struct its_node *its) > return err; > } > > + if ((i == 2) && is_kdump_kernel() && is_v4_1(its)) { > + val = its_read_baser(its, baser); > + val &= ~GITS_BASER_VALID; > + its_write_baser(its, baser, val); > + } This looks like a very odd way to address the issue. You are silently disabling the Base Register containing the VPE table, and carry on as if nothing happened. What happen if someone starts a guest using direct injection at this point? A kdump kernel still is a full fledged kernel, and I don't expect it to behave differently. If we are to make this work, we need to either disable the v4.1 extension altogether or sanitise the offending registers so that we don't leave things in a bad state. My preference is of course the latter. Could you please give this patch a go and let me know if it helps? Thanks, M. diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c index daec3309b014..cb339ace5046 100644 --- a/drivers/irqchip/irq-gic-v3.c +++ b/drivers/irqchip/irq-gic-v3.c @@ -920,6 +920,15 @@ static int __gic_update_rdist_properties(struct redist_region *region, { u64 typer = gic_read_typer(ptr + GICR_TYPER); + /* Boot-time cleanup */ + if ((typer & GICR_TYPER_VLPIS) && (typer & GICR_TYPER_RVPEID)) { + u64 val; + + val = gicr_read_vpropbaser(ptr + SZ_128K + GICR_VPROPBASER); + val &= ~GICR_VPROPBASER_4_1_VALID; + gicr_write_vpropbaser(val, ptr + SZ_128K + GICR_VPROPBASER); + } + gic_data.rdists.has_vlpis &= !!(typer & GICR_TYPER_VLPIS); /* RVPEID implies some form of DirectLPI, no matter what the doc says... :-/ */ -- Without deviation from the norm, progress is not possible. ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [RFC PATCH] irqchip/gic-v4.1:fix the kdump GIC ITS RAS error for ITS BASER2 2021-12-14 9:26 ` Marc Zyngier @ 2021-12-14 9:52 ` Lorenzo Pieralisi 2021-12-16 3:36 ` Jiankang Chen 1 sibling, 0 replies; 5+ messages in thread From: Lorenzo Pieralisi @ 2021-12-14 9:52 UTC (permalink / raw) To: Marc Zyngier; +Cc: Jay Chen, tglx, linux-kernel, zhangliguang On Tue, Dec 14, 2021 at 09:26:08AM +0000, Marc Zyngier wrote: > [+ Lorenzo, just in case...] Thanks. I am away at the moment but definitely on this case. I believe this is also an issue with a kexec'ed kernel (where we expect v4.1 functionality to be up and running in the kexec'ed kernel compared to a kdump usecase), need to put something together and test it if someone does not beat me to it. Lorenzo > Hi Jay, > > Thanks for this. > > On Tue, 14 Dec 2021 06:47:16 +0000, > Jay Chen <jkchen@linux.alibaba.com> wrote: > > > > We encounter a GIC RAS Error in below flow: > > (1) Configure ITS related register (including > > GITS_BASER2, GITS_BASER2.valid = 1'b1) > > (2) Configure GICR related register (including > > GICR_VPROPBASER, GICR_VPROPBASER.valid = 1'b1) > > The common settings in above 2 register are the same > > and currently everything is OK > > (3) Kernel panic and os start the kdump flow.And then os > > reconfigure ITS related register (including GITS_BASER2, > > GITS_BASER2.valid = 1'b1). But at this time, gicr_vpropbaser > > is not initialized, so it is still an old value. At this point, > > the new value of its_baser2 and the old value of gicr_vpropbaser is > > different, resulting in its RAS error. > > > > https://bugzilla.kernel.org/show_bug.cgi?id=215327 > > I'm sorry, but I don't have any access to this. Please add all the > relevant details to the commit message and drop the link. > > Could you please detail what HW this is on? The architecture > specification for GICv4.1 doesn't make any mention of RAS error > conditions, so this must be implementation specific. A reference to > the TRM of the IP would certainly help. > > Now, I think you have identified something interesting, but I'm not > convinced by the implementation, see below. > > > > > Signed-off-by: Jay Chen <jkchen@linux.alibaba.com> > > --- > > drivers/irqchip/irq-gic-v3-its.c | 6 ++++++ > > 1 file changed, 6 insertions(+) > > > > diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c > > index eb0882d15366..c340bbf4427b 100644 > > --- a/drivers/irqchip/irq-gic-v3-its.c > > +++ b/drivers/irqchip/irq-gic-v3-its.c > > @@ -2623,6 +2623,12 @@ static int its_alloc_tables(struct its_node *its) > > return err; > > } > > > > + if ((i == 2) && is_kdump_kernel() && is_v4_1(its)) { > > + val = its_read_baser(its, baser); > > + val &= ~GITS_BASER_VALID; > > + its_write_baser(its, baser, val); > > + } > > This looks like a very odd way to address the issue. You are silently > disabling the Base Register containing the VPE table, and carry on as > if nothing happened. What happen if someone starts a guest using > direct injection at this point? A kdump kernel still is a full fledged > kernel, and I don't expect it to behave differently. > > If we are to make this work, we need to either disable the v4.1 > extension altogether or sanitise the offending registers so that we > don't leave things in a bad state. My preference is of course the > latter. > > Could you please give this patch a go and let me know if it helps? > > Thanks, > > M. > > diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c > index daec3309b014..cb339ace5046 100644 > --- a/drivers/irqchip/irq-gic-v3.c > +++ b/drivers/irqchip/irq-gic-v3.c > @@ -920,6 +920,15 @@ static int __gic_update_rdist_properties(struct redist_region *region, > { > u64 typer = gic_read_typer(ptr + GICR_TYPER); > > + /* Boot-time cleanup */ > + if ((typer & GICR_TYPER_VLPIS) && (typer & GICR_TYPER_RVPEID)) { > + u64 val; > + > + val = gicr_read_vpropbaser(ptr + SZ_128K + GICR_VPROPBASER); > + val &= ~GICR_VPROPBASER_4_1_VALID; > + gicr_write_vpropbaser(val, ptr + SZ_128K + GICR_VPROPBASER); > + } > + > gic_data.rdists.has_vlpis &= !!(typer & GICR_TYPER_VLPIS); > > /* RVPEID implies some form of DirectLPI, no matter what the doc says... :-/ */ > > -- > Without deviation from the norm, progress is not possible. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [RFC PATCH] irqchip/gic-v4.1:fix the kdump GIC ITS RAS error for ITS BASER2 2021-12-14 9:26 ` Marc Zyngier 2021-12-14 9:52 ` Lorenzo Pieralisi @ 2021-12-16 3:36 ` Jiankang Chen 1 sibling, 0 replies; 5+ messages in thread From: Jiankang Chen @ 2021-12-16 3:36 UTC (permalink / raw) To: Marc Zyngier; +Cc: tglx, linux-kernel, zhangliguang, Lorenzo Pieralisi Hi Marc we get a ras error in our new arm platform: INFO: err_gst:8000000 INFO: - Found: Uncorrected software error in ITS INFO: RAS reg: INFO: fr = a1 INFO: status = 64300101 INFO: V = 1 INFO: UE = 1 INFO: MV = 1 INFO: UET(Uncorrected Error Type) = 3 INFO: IERR = 1 INFO: SERR = 1 INFO: addr = 0 INFO: misc0 = 12051 INFO: misc1 = 0 CPU RAS mm handler: EventId=C4000049 ERROR: sdei_dispatch_event(327) ret:-1 在 2021/12/14 17:26, Marc Zyngier 写道: > [+ Lorenzo, just in case...] > > Hi Jay, > > Thanks for this. > > On Tue, 14 Dec 2021 06:47:16 +0000, > Jay Chen <jkchen@linux.alibaba.com> wrote: >> We encounter a GIC RAS Error in below flow: >> (1) Configure ITS related register (including >> GITS_BASER2, GITS_BASER2.valid = 1'b1) >> (2) Configure GICR related register (including >> GICR_VPROPBASER, GICR_VPROPBASER.valid = 1'b1) >> The common settings in above 2 register are the same >> and currently everything is OK >> (3) Kernel panic and os start the kdump flow.And then os >> reconfigure ITS related register (including GITS_BASER2, >> GITS_BASER2.valid = 1'b1). But at this time, gicr_vpropbaser >> is not initialized, so it is still an old value. At this point, >> the new value of its_baser2 and the old value of gicr_vpropbaser is >> different, resulting in its RAS error. >> >> https://bugzilla.kernel.org/show_bug.cgi?id=215327 > I'm sorry, but I don't have any access to this. Please add all the > relevant details to the commit message and drop the link. > > Could you please detail what HW this is on? The architecture > specification for GICv4.1 doesn't make any mention of RAS error > conditions, so this must be implementation specific. A reference to > the TRM of the IP would certainly help. > > Now, I think you have identified something interesting, but I'm not > convinced by the implementation, see below. > >> Signed-off-by: Jay Chen <jkchen@linux.alibaba.com> >> --- >> drivers/irqchip/irq-gic-v3-its.c | 6 ++++++ >> 1 file changed, 6 insertions(+) >> >> diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c >> index eb0882d15366..c340bbf4427b 100644 >> --- a/drivers/irqchip/irq-gic-v3-its.c >> +++ b/drivers/irqchip/irq-gic-v3-its.c >> @@ -2623,6 +2623,12 @@ static int its_alloc_tables(struct its_node *its) >> return err; >> } >> >> + if ((i == 2) && is_kdump_kernel() && is_v4_1(its)) { >> + val = its_read_baser(its, baser); >> + val &= ~GITS_BASER_VALID; >> + its_write_baser(its, baser, val); >> + } > This looks like a very odd way to address the issue. You are silently > disabling the Base Register containing the VPE table, and carry on as > if nothing happened. What happen if someone starts a guest using > direct injection at this point? A kdump kernel still is a full fledged > kernel, and I don't expect it to behave differently. > > If we are to make this work, we need to either disable the v4.1 > extension altogether or sanitise the offending registers so that we > don't leave things in a bad state. My preference is of course the > latter. > > Could you please give this patch a go and let me know if it helps? > > Thanks, > > M. > > diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c > index daec3309b014..cb339ace5046 100644 > --- a/drivers/irqchip/irq-gic-v3.c > +++ b/drivers/irqchip/irq-gic-v3.c > @@ -920,6 +920,15 @@ static int __gic_update_rdist_properties(struct redist_region *region, > { > u64 typer = gic_read_typer(ptr + GICR_TYPER); > > + /* Boot-time cleanup */ > + if ((typer & GICR_TYPER_VLPIS) && (typer & GICR_TYPER_RVPEID)) { > + u64 val; > + > + val = gicr_read_vpropbaser(ptr + SZ_128K + GICR_VPROPBASER); > + val &= ~GICR_VPROPBASER_4_1_VALID; > + gicr_write_vpropbaser(val, ptr + SZ_128K + GICR_VPROPBASER); > + } > + Thank you for your solution, this approach looks better. Through our actual tests, this approach can solve the problem. Judging from the GIC code, modifying vpropbaser or baser2 can solve the problem, but obviously your modification method is better, thank you; > gic_data.rdists.has_vlpis &= !!(typer & GICR_TYPER_VLPIS); > > /* RVPEID implies some form of DirectLPI, no matter what the doc says... :-/ */ > Tks Jay ^ permalink raw reply [flat|nested] 5+ messages in thread
* [irqchip: irq/irqchip-next] irqchip/gic-v4: Disable redistributors' view of the VPE table at boot time 2021-12-14 6:47 [RFC PATCH] irqchip/gic-v4.1:fix the kdump GIC ITS RAS error for ITS BASER2 Jay Chen 2021-12-14 9:26 ` Marc Zyngier @ 2021-12-16 15:24 ` irqchip-bot for Marc Zyngier 1 sibling, 0 replies; 5+ messages in thread From: irqchip-bot for Marc Zyngier @ 2021-12-16 15:24 UTC (permalink / raw) To: linux-kernel; +Cc: Jay Chen, Marc Zyngier, Lorenzo Pieralisi, tglx The following commit has been merged into the irq/irqchip-next branch of irqchip: Commit-ID: 79a7f77b9b154d572bd9d2f1eecf58c4d018d8e2 Gitweb: https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms/79a7f77b9b154d572bd9d2f1eecf58c4d018d8e2 Author: Marc Zyngier <maz@kernel.org> AuthorDate: Thu, 16 Dec 2021 14:32:27 Committer: Marc Zyngier <maz@kernel.org> CommitterDate: Thu, 16 Dec 2021 15:19:52 irqchip/gic-v4: Disable redistributors' view of the VPE table at boot time Jay Chen reported that using a kdump kernel on a GICv4.1 system results in a RAS error being delivered when the secondary kernel configures the ITS's view of the new VPE table. As it turns out, that's because each RD still has a pointer to the previous instance of the VPE table, and that particular implementation is very upset by seeing two bits of the HW that should point to the same table with different values. To solve this, let's invalidate any reference that any RD has to the VPE table when discovering the RDs. The ITS can then be programmed as expected. Reported-by: Jay Chen <jkchen@linux.alibaba.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Link: https://lore.kernel.org/r/20211214064716.21407-1-jkchen@linux.alibaba.com Link: https://lore.kernel.org/r/20211216144804.1578566-1-maz@kernel.org --- drivers/irqchip/irq-gic-v3.c | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c index daec330..8639752 100644 --- a/drivers/irqchip/irq-gic-v3.c +++ b/drivers/irqchip/irq-gic-v3.c @@ -920,6 +920,22 @@ static int __gic_update_rdist_properties(struct redist_region *region, { u64 typer = gic_read_typer(ptr + GICR_TYPER); + /* Boot-time cleanip */ + if ((typer & GICR_TYPER_VLPIS) && (typer & GICR_TYPER_RVPEID)) { + u64 val; + + /* Deactivate any present vPE */ + val = gicr_read_vpendbaser(ptr + SZ_128K + GICR_VPENDBASER); + if (val & GICR_VPENDBASER_Valid) + gicr_write_vpendbaser(GICR_VPENDBASER_PendingLast, + ptr + SZ_128K + GICR_VPENDBASER); + + /* Mark the VPE table as invalid */ + val = gicr_read_vpropbaser(ptr + SZ_128K + GICR_VPROPBASER); + val &= ~GICR_VPROPBASER_4_1_VALID; + gicr_write_vpropbaser(val, ptr + SZ_128K + GICR_VPROPBASER); + } + gic_data.rdists.has_vlpis &= !!(typer & GICR_TYPER_VLPIS); /* RVPEID implies some form of DirectLPI, no matter what the doc says... :-/ */ ^ permalink raw reply related [flat|nested] 5+ messages in thread
end of thread, other threads:[~2021-12-16 15:24 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2021-12-14 6:47 [RFC PATCH] irqchip/gic-v4.1:fix the kdump GIC ITS RAS error for ITS BASER2 Jay Chen 2021-12-14 9:26 ` Marc Zyngier 2021-12-14 9:52 ` Lorenzo Pieralisi 2021-12-16 3:36 ` Jiankang Chen 2021-12-16 15:24 ` [irqchip: irq/irqchip-next] irqchip/gic-v4: Disable redistributors' view of the VPE table at boot time irqchip-bot for Marc Zyngier
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox