* [PATCH] ACPI / APEI: Fix NMI notification handling
@ 2016-11-29 18:43 Prarit Bhargava
2016-11-29 19:36 ` Borislav Petkov
0 siblings, 1 reply; 11+ messages in thread
From: Prarit Bhargava @ 2016-11-29 18:43 UTC (permalink / raw)
To: linux-kernel
Cc: Prarit Bhargava, Borislav Petkov, Rafael J. Wysocki, Len Brown,
Paul Gortmaker, Tyler Baicar, Punit Agrawal, Don Zickus,
linux-acpi
When removing and adding cpu 0 on a system with GHES NMI the following stack
trace is seen when re-adding the cpu:
WARNING: CPU: 0 PID: 0 at arch/x86/kernel/apic/apic.c:1349 setup_local_APIC+
Modules linked in: nfsv3 rpcsec_gss_krb5 nfsv4 nfs fscache coretemp intel_ra
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.9.0-rc5+ #59
Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.01.00.0
ffffffff81c03e78 ffffffff81337905 0000000000000000 0000000000000000
ffffffff81c03eb8 ffffffff8107d9c1 00000545810aac4a 0000000000000000
00000000000000f0 0000000000000000 000081cb6440f1d0 0000000000000001
Call Trace:
[<ffffffff81337905>] dump_stack+0x63/0x8e
[<ffffffff8107d9c1>] __warn+0xd1/0xf0
[<ffffffff8107daad>] warn_slowpath_null+0x1d/0x20
[<ffffffff810522b5>] setup_local_APIC+0x275/0x370
[<ffffffff810523be>] apic_ap_setup+0xe/0x20
[<ffffffff8104f5a8>] start_secondary+0x48/0x180
[<ffffffff81d89aa0>] ? set_init_arg+0x55/0x55
[<ffffffff81d89120>] ? early_idt_handler_array+0x120/0x120
[<ffffffff81d895d6>] ? x86_64_start_reservations+0x2a/0x2c
[<ffffffff81d89715>] ? x86_64_start_kernel+0x13d/0x14c
---[ end trace 7b6555b6343ef9ee ]---
During the cpu bringup, wakeup_cpu_via_init_nmi() is called and issues an
NMI on CPU 0. The GHES NMI handler, ghes_notify_nmi() runs the
ghes_proc_irq_work work queue which ends up setting IRQ_WORK_VECTOR
(0xf6). The "faulty" IR line set at arch/x86/kernel/apic/apic.c:1349 is also
0xf6 (specifically APIC IRR for irqs 255 to 224 is 0x400000) which confirms
that something has set the IRQ_WORK_VECTOR line prior to the APIC being
initialized.
Commit 2383844d4850 ("GHES: Elliminate double-loop in the NMI handler")
incorrectly modified the behavior such that the handler returns
NMI_HANDLED only if an error was processed, and incorrectly runs the ghes
work queue for every NMI.
This patch modifies the ghes_proc_irq_work() to run as it did prior to
2383844d4850 ("GHES: Elliminate double-loop in the NMI handler") by
properly returning NMI_HANDLED and only calling the work queue if
NMI_HANDLED has been set.
Fixes: 2383844d4850 ("GHES: Elliminate double-loop in the NMI handler")
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Tyler Baicar <tbaicar@codeaurora.org>
Cc: Punit Agrawal <punit.agrawal@arm.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: linux-acpi@vger.kernel.org
---
drivers/acpi/apei/ghes.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 0d099a24f776..39c45efbcb3d 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -858,17 +858,18 @@ static int ghes_notify_nmi(unsigned int cmd, struct pt_regs *regs)
if (sev >= GHES_SEV_PANIC)
__ghes_panic(ghes);
+ ret = NMI_HANDLED;
+
if (!(ghes->flags & GHES_TO_CLEAR))
continue;
__process_error(ghes);
ghes_clear_estatus(ghes);
-
- ret = NMI_HANDLED;
}
#ifdef CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG
- irq_work_queue(&ghes_proc_irq_work);
+ if (ret == NMI_HANDLED)
+ irq_work_queue(&ghes_proc_irq_work);
#endif
atomic_dec(&ghes_in_nmi);
return ret;
--
1.7.9.3
^ permalink raw reply related [flat|nested] 11+ messages in thread* Re: [PATCH] ACPI / APEI: Fix NMI notification handling
2016-11-29 18:43 [PATCH] ACPI / APEI: Fix NMI notification handling Prarit Bhargava
@ 2016-11-29 19:36 ` Borislav Petkov
2016-11-30 13:19 ` [PATCH v2] " Prarit Bhargava
0 siblings, 1 reply; 11+ messages in thread
From: Borislav Petkov @ 2016-11-29 19:36 UTC (permalink / raw)
To: Prarit Bhargava
Cc: linux-kernel, Rafael J. Wysocki, Len Brown, Paul Gortmaker,
Tyler Baicar, Punit Agrawal, Don Zickus, linux-acpi
On Tue, Nov 29, 2016 at 01:43:59PM -0500, Prarit Bhargava wrote:
> When removing and adding cpu 0 on a system with GHES NMI the following stack
> trace is seen when re-adding the cpu:
>
> WARNING: CPU: 0 PID: 0 at arch/x86/kernel/apic/apic.c:1349 setup_local_APIC+
> Modules linked in: nfsv3 rpcsec_gss_krb5 nfsv4 nfs fscache coretemp intel_ra
> CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.9.0-rc5+ #59
> Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.01.00.0
> ffffffff81c03e78 ffffffff81337905 0000000000000000 0000000000000000
> ffffffff81c03eb8 ffffffff8107d9c1 00000545810aac4a 0000000000000000
> 00000000000000f0 0000000000000000 000081cb6440f1d0 0000000000000001
> Call Trace:
> [<ffffffff81337905>] dump_stack+0x63/0x8e
> [<ffffffff8107d9c1>] __warn+0xd1/0xf0
> [<ffffffff8107daad>] warn_slowpath_null+0x1d/0x20
> [<ffffffff810522b5>] setup_local_APIC+0x275/0x370
> [<ffffffff810523be>] apic_ap_setup+0xe/0x20
> [<ffffffff8104f5a8>] start_secondary+0x48/0x180
> [<ffffffff81d89aa0>] ? set_init_arg+0x55/0x55
> [<ffffffff81d89120>] ? early_idt_handler_array+0x120/0x120
> [<ffffffff81d895d6>] ? x86_64_start_reservations+0x2a/0x2c
> [<ffffffff81d89715>] ? x86_64_start_kernel+0x13d/0x14c
> ---[ end trace 7b6555b6343ef9ee ]---
Please remove all hex numbers from the splat - they're useless in the
commit message.
> During the cpu bringup, wakeup_cpu_via_init_nmi() is called and issues an
> NMI on CPU 0. The GHES NMI handler, ghes_notify_nmi() runs the
> ghes_proc_irq_work work queue which ends up setting IRQ_WORK_VECTOR
> (0xf6). The "faulty" IR line set at arch/x86/kernel/apic/apic.c:1349 is also
> 0xf6 (specifically APIC IRR for irqs 255 to 224 is 0x400000) which confirms
> that something has set the IRQ_WORK_VECTOR line prior to the APIC being
> initialized.
>
> Commit 2383844d4850 ("GHES: Elliminate double-loop in the NMI handler")
> incorrectly modified the behavior such that the handler returns
> NMI_HANDLED only if an error was processed, and incorrectly runs the ghes
> work queue for every NMI.
>
> This patch modifies the ghes_proc_irq_work() to run as it did prior to
> 2383844d4850 ("GHES: Elliminate double-loop in the NMI handler") by
> properly returning NMI_HANDLED and only calling the work queue if
> NMI_HANDLED has been set.
>
> Fixes: 2383844d4850 ("GHES: Elliminate double-loop in the NMI handler")
> Signed-off-by: Prarit Bhargava <prarit@redhat.com>
> Cc: Borislav Petkov <bp@suse.de>
> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Cc: Len Brown <lenb@kernel.org>
> Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
> Cc: Tyler Baicar <tbaicar@codeaurora.org>
> Cc: Punit Agrawal <punit.agrawal@arm.com>
> Cc: Don Zickus <dzickus@redhat.com>
> Cc: linux-acpi@vger.kernel.org
> ---
> drivers/acpi/apei/ghes.c | 7 ++++---
> 1 file changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
> index 0d099a24f776..39c45efbcb3d 100644
> --- a/drivers/acpi/apei/ghes.c
> +++ b/drivers/acpi/apei/ghes.c
> @@ -858,17 +858,18 @@ static int ghes_notify_nmi(unsigned int cmd, struct pt_regs *regs)
> if (sev >= GHES_SEV_PANIC)
> __ghes_panic(ghes);
>
> + ret = NMI_HANDLED;
> +
Make that more explicit:
if (ghes_read_estatus(ghes, 1)) {
ghes_clear_estatus(ghes);
continue;
} else {
ret = NMI_HANDLED;
}
> if (!(ghes->flags & GHES_TO_CLEAR))
> continue;
>
> __process_error(ghes);
> ghes_clear_estatus(ghes);
> -
> - ret = NMI_HANDLED;
> }
>
> #ifdef CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG
> - irq_work_queue(&ghes_proc_irq_work);
> + if (ret == NMI_HANDLED)
> + irq_work_queue(&ghes_proc_irq_work);
> #endif
> atomic_dec(&ghes_in_nmi);
> return ret;
> --
Otherwise looks ok,
thanks.
--
Regards/Gruss,
Boris.
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
--
^ permalink raw reply [flat|nested] 11+ messages in thread* [PATCH v2] ACPI / APEI: Fix NMI notification handling
2016-11-29 19:36 ` Borislav Petkov
@ 2016-11-30 13:19 ` Prarit Bhargava
2016-12-01 20:07 ` Borislav Petkov
0 siblings, 1 reply; 11+ messages in thread
From: Prarit Bhargava @ 2016-11-30 13:19 UTC (permalink / raw)
To: linux-kernel
Cc: Prarit Bhargava, Borislav Petkov, Rafael J. Wysocki, Len Brown,
Paul Gortmaker, Tyler Baicar, Punit Agrawal, Don Zickus
When removing and adding cpu 0 on a system with GHES NMI the following stack
trace is seen when re-adding the cpu:
WARNING: CPU: 0 PID: 0 at arch/x86/kernel/apic/apic.c:1349 setup_local_APIC+
Modules linked in: nfsv3 rpcsec_gss_krb5 nfsv4 nfs fscache coretemp intel_ra
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.9.0-rc6+ #2
Call Trace:
dump_stack+0x63/0x8e
__warn+0xd1/0xf0
warn_slowpath_null+0x1d/0x20
setup_local_APIC+0x275/0x370
apic_ap_setup+0xe/0x20
start_secondary+0x48/0x180
set_init_arg+0x55/0x55
early_idt_handler_array+0x120/0x120
x86_64_start_reservations+0x2a/0x2c
x86_64_start_kernel+0x13d/0x14c
During the cpu bringup, wakeup_cpu_via_init_nmi() is called and issues an
NMI on CPU 0. The GHES NMI handler, ghes_notify_nmi() runs the
ghes_proc_irq_work work queue which ends up setting IRQ_WORK_VECTOR
(0xf6). The "faulty" IR line set at arch/x86/kernel/apic/apic.c:1349 is also
0xf6 (specifically APIC IRR for irqs 255 to 224 is 0x400000) which confirms
that something has set the IRQ_WORK_VECTOR line prior to the APIC being
initialized.
Commit 2383844d4850 ("GHES: Elliminate double-loop in the NMI handler")
incorrectly modified the behavior such that the handler returns
NMI_HANDLED only if an error was processed, and incorrectly runs the ghes
work queue for every NMI.
This patch modifies the ghes_proc_irq_work() to run as it did prior to
2383844d4850 ("GHES: Elliminate double-loop in the NMI handler") by
properly returning NMI_HANDLED and only calling the work queue if
NMI_HANDLED has been set.
v2: Borislav, setting of NMI_HANDLED moved & cleaned up changelog.
Fixes: 2383844d4850 ("GHES: Elliminate double-loop in the NMI handler")
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Tyler Baicar <tbaicar@codeaurora.org>
Cc: Punit Agrawal <punit.agrawal@arm.com>
Cc: Don Zickus <dzickus@redhat.com>
---
drivers/acpi/apei/ghes.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 0d099a24f776..e53bef6cf53c 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -852,6 +852,8 @@ static int ghes_notify_nmi(unsigned int cmd, struct pt_regs *regs)
if (ghes_read_estatus(ghes, 1)) {
ghes_clear_estatus(ghes);
continue;
+ } else {
+ ret = NMI_HANDLED;
}
sev = ghes_severity(ghes->estatus->error_severity);
@@ -863,12 +865,11 @@ static int ghes_notify_nmi(unsigned int cmd, struct pt_regs *regs)
__process_error(ghes);
ghes_clear_estatus(ghes);
-
- ret = NMI_HANDLED;
}
#ifdef CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG
- irq_work_queue(&ghes_proc_irq_work);
+ if (ret == NMI_HANDLED)
+ irq_work_queue(&ghes_proc_irq_work);
#endif
atomic_dec(&ghes_in_nmi);
return ret;
--
1.7.9.3
^ permalink raw reply related [flat|nested] 11+ messages in thread* Re: [PATCH v2] ACPI / APEI: Fix NMI notification handling
2016-11-30 13:19 ` [PATCH v2] " Prarit Bhargava
@ 2016-12-01 20:07 ` Borislav Petkov
2016-12-01 21:17 ` Rafael J. Wysocki
0 siblings, 1 reply; 11+ messages in thread
From: Borislav Petkov @ 2016-12-01 20:07 UTC (permalink / raw)
To: Prarit Bhargava
Cc: linux-kernel, Borislav Petkov, Rafael J. Wysocki, Len Brown,
Paul Gortmaker, Tyler Baicar, Punit Agrawal, Don Zickus
On Wed, Nov 30, 2016 at 08:19:39AM -0500, Prarit Bhargava wrote:
> When removing and adding cpu 0 on a system with GHES NMI the following stack
> trace is seen when re-adding the cpu:
>
> WARNING: CPU: 0 PID: 0 at arch/x86/kernel/apic/apic.c:1349 setup_local_APIC+
> Modules linked in: nfsv3 rpcsec_gss_krb5 nfsv4 nfs fscache coretemp intel_ra
> CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.9.0-rc6+ #2
> Call Trace:
> dump_stack+0x63/0x8e
> __warn+0xd1/0xf0
> warn_slowpath_null+0x1d/0x20
> setup_local_APIC+0x275/0x370
> apic_ap_setup+0xe/0x20
> start_secondary+0x48/0x180
> set_init_arg+0x55/0x55
> early_idt_handler_array+0x120/0x120
> x86_64_start_reservations+0x2a/0x2c
> x86_64_start_kernel+0x13d/0x14c
>
> During the cpu bringup, wakeup_cpu_via_init_nmi() is called and issues an
> NMI on CPU 0. The GHES NMI handler, ghes_notify_nmi() runs the
> ghes_proc_irq_work work queue which ends up setting IRQ_WORK_VECTOR
> (0xf6). The "faulty" IR line set at arch/x86/kernel/apic/apic.c:1349 is also
> 0xf6 (specifically APIC IRR for irqs 255 to 224 is 0x400000) which confirms
> that something has set the IRQ_WORK_VECTOR line prior to the APIC being
> initialized.
>
> Commit 2383844d4850 ("GHES: Elliminate double-loop in the NMI handler")
> incorrectly modified the behavior such that the handler returns
> NMI_HANDLED only if an error was processed, and incorrectly runs the ghes
> work queue for every NMI.
>
> This patch modifies the ghes_proc_irq_work() to run as it did prior to
> 2383844d4850 ("GHES: Elliminate double-loop in the NMI handler") by
> properly returning NMI_HANDLED and only calling the work queue if
> NMI_HANDLED has been set.
>
> v2: Borislav, setting of NMI_HANDLED moved & cleaned up changelog.
>
> Fixes: 2383844d4850 ("GHES: Elliminate double-loop in the NMI handler")
> Signed-off-by: Prarit Bhargava <prarit@redhat.com>
> Cc: Borislav Petkov <bp@suse.de>
> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Cc: Len Brown <lenb@kernel.org>
> Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
> Cc: Tyler Baicar <tbaicar@codeaurora.org>
> Cc: Punit Agrawal <punit.agrawal@arm.com>
> Cc: Don Zickus <dzickus@redhat.com>
> ---
> drivers/acpi/apei/ghes.c | 7 ++++---
> 1 file changed, 4 insertions(+), 3 deletions(-)
Reviewed-by: Borislav Petkov <bp@suse.de>
--
Regards/Gruss,
Boris.
Good mailing practices for 400: avoid top-posting and trim the reply.
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: [PATCH v2] ACPI / APEI: Fix NMI notification handling
2016-12-01 20:07 ` Borislav Petkov
@ 2016-12-01 21:17 ` Rafael J. Wysocki
[not found] ` <20161201215149.fvb3ki77det7bnjq@pd.tnic>
0 siblings, 1 reply; 11+ messages in thread
From: Rafael J. Wysocki @ 2016-12-01 21:17 UTC (permalink / raw)
To: Borislav Petkov
Cc: Prarit Bhargava, linux-kernel, Borislav Petkov, Rafael J. Wysocki,
Len Brown, Paul Gortmaker, Tyler Baicar, Punit Agrawal,
Don Zickus, Linux ACPI
On Thursday, December 01, 2016 09:07:39 PM Borislav Petkov wrote:
> On Wed, Nov 30, 2016 at 08:19:39AM -0500, Prarit Bhargava wrote:
> > When removing and adding cpu 0 on a system with GHES NMI the following stack
> > trace is seen when re-adding the cpu:
> >
> > WARNING: CPU: 0 PID: 0 at arch/x86/kernel/apic/apic.c:1349 setup_local_APIC+
> > Modules linked in: nfsv3 rpcsec_gss_krb5 nfsv4 nfs fscache coretemp intel_ra
> > CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.9.0-rc6+ #2
> > Call Trace:
> > dump_stack+0x63/0x8e
> > __warn+0xd1/0xf0
> > warn_slowpath_null+0x1d/0x20
> > setup_local_APIC+0x275/0x370
> > apic_ap_setup+0xe/0x20
> > start_secondary+0x48/0x180
> > set_init_arg+0x55/0x55
> > early_idt_handler_array+0x120/0x120
> > x86_64_start_reservations+0x2a/0x2c
> > x86_64_start_kernel+0x13d/0x14c
> >
> > During the cpu bringup, wakeup_cpu_via_init_nmi() is called and issues an
> > NMI on CPU 0. The GHES NMI handler, ghes_notify_nmi() runs the
> > ghes_proc_irq_work work queue which ends up setting IRQ_WORK_VECTOR
> > (0xf6). The "faulty" IR line set at arch/x86/kernel/apic/apic.c:1349 is also
> > 0xf6 (specifically APIC IRR for irqs 255 to 224 is 0x400000) which confirms
> > that something has set the IRQ_WORK_VECTOR line prior to the APIC being
> > initialized.
> >
> > Commit 2383844d4850 ("GHES: Elliminate double-loop in the NMI handler")
> > incorrectly modified the behavior such that the handler returns
> > NMI_HANDLED only if an error was processed, and incorrectly runs the ghes
> > work queue for every NMI.
> >
> > This patch modifies the ghes_proc_irq_work() to run as it did prior to
> > 2383844d4850 ("GHES: Elliminate double-loop in the NMI handler") by
> > properly returning NMI_HANDLED and only calling the work queue if
> > NMI_HANDLED has been set.
> >
> > v2: Borislav, setting of NMI_HANDLED moved & cleaned up changelog.
> >
> > Fixes: 2383844d4850 ("GHES: Elliminate double-loop in the NMI handler")
> > Signed-off-by: Prarit Bhargava <prarit@redhat.com>
> > Cc: Borislav Petkov <bp@suse.de>
> > Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > Cc: Len Brown <lenb@kernel.org>
> > Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
> > Cc: Tyler Baicar <tbaicar@codeaurora.org>
> > Cc: Punit Agrawal <punit.agrawal@arm.com>
> > Cc: Don Zickus <dzickus@redhat.com>
> > ---
> > drivers/acpi/apei/ghes.c | 7 ++++---
> > 1 file changed, 4 insertions(+), 3 deletions(-)
>
> Reviewed-by: Borislav Petkov <bp@suse.de>
I guess I should pick up this one, then?
Thanks,
Rafael
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2016-12-02 11:39 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-11-29 18:43 [PATCH] ACPI / APEI: Fix NMI notification handling Prarit Bhargava
2016-11-29 19:36 ` Borislav Petkov
2016-11-30 13:19 ` [PATCH v2] " Prarit Bhargava
2016-12-01 20:07 ` Borislav Petkov
2016-12-01 21:17 ` Rafael J. Wysocki
[not found] ` <20161201215149.fvb3ki77det7bnjq@pd.tnic>
2016-12-01 22:29 ` Rafael J. Wysocki
2016-12-01 22:47 ` Borislav Petkov
2016-12-01 23:12 ` Rafael J. Wysocki
2016-12-02 5:45 ` Hanjun Guo
2016-12-02 5:45 ` Hanjun Guo
2016-12-02 11:39 ` Fu Wei
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.