public inbox for linux-acpi@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] ACPI / APEI: Fix NMI notification handling
@ 2016-11-29 18:43 Prarit Bhargava
  2016-11-29 19:36 ` Borislav Petkov
  0 siblings, 1 reply; 8+ messages in thread
From: Prarit Bhargava @ 2016-11-29 18:43 UTC (permalink / raw)
  To: linux-kernel
  Cc: Prarit Bhargava, Borislav Petkov, Rafael J. Wysocki, Len Brown,
	Paul Gortmaker, Tyler Baicar, Punit Agrawal, Don Zickus,
	linux-acpi

When removing and adding cpu 0 on a system with GHES NMI the following stack
trace is seen when re-adding the cpu:

WARNING: CPU: 0 PID: 0 at arch/x86/kernel/apic/apic.c:1349 setup_local_APIC+
Modules linked in: nfsv3 rpcsec_gss_krb5 nfsv4 nfs fscache coretemp intel_ra
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.9.0-rc5+ #59
Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.01.00.0
ffffffff81c03e78 ffffffff81337905 0000000000000000 0000000000000000
ffffffff81c03eb8 ffffffff8107d9c1 00000545810aac4a 0000000000000000
00000000000000f0 0000000000000000 000081cb6440f1d0 0000000000000001
Call Trace:
[<ffffffff81337905>] dump_stack+0x63/0x8e
[<ffffffff8107d9c1>] __warn+0xd1/0xf0
[<ffffffff8107daad>] warn_slowpath_null+0x1d/0x20
[<ffffffff810522b5>] setup_local_APIC+0x275/0x370
[<ffffffff810523be>] apic_ap_setup+0xe/0x20
[<ffffffff8104f5a8>] start_secondary+0x48/0x180
[<ffffffff81d89aa0>] ? set_init_arg+0x55/0x55
[<ffffffff81d89120>] ? early_idt_handler_array+0x120/0x120
[<ffffffff81d895d6>] ? x86_64_start_reservations+0x2a/0x2c
[<ffffffff81d89715>] ? x86_64_start_kernel+0x13d/0x14c
---[ end trace 7b6555b6343ef9ee ]---

During the cpu bringup, wakeup_cpu_via_init_nmi() is called and issues an
NMI on CPU 0.  The GHES NMI handler, ghes_notify_nmi() runs the
ghes_proc_irq_work work queue which ends up setting IRQ_WORK_VECTOR
(0xf6).  The "faulty" IR line set at arch/x86/kernel/apic/apic.c:1349 is  also
0xf6 (specifically APIC IRR for irqs 255 to 224 is 0x400000) which confirms
that something has set the IRQ_WORK_VECTOR line prior to the APIC being
initialized.

Commit 2383844d4850 ("GHES: Elliminate double-loop in the NMI handler")
incorrectly modified the behavior such that the handler returns
NMI_HANDLED only if an error was processed, and incorrectly runs the ghes
work queue for every NMI.

This patch modifies the ghes_proc_irq_work() to run as it did prior to
2383844d4850 ("GHES: Elliminate double-loop in the NMI handler") by
properly returning NMI_HANDLED and only calling the work queue if
NMI_HANDLED has been set.

Fixes: 2383844d4850 ("GHES: Elliminate double-loop in the NMI handler")
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Tyler Baicar <tbaicar@codeaurora.org>
Cc: Punit Agrawal <punit.agrawal@arm.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: linux-acpi@vger.kernel.org
---
 drivers/acpi/apei/ghes.c |    7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 0d099a24f776..39c45efbcb3d 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -858,17 +858,18 @@ static int ghes_notify_nmi(unsigned int cmd, struct pt_regs *regs)
 		if (sev >= GHES_SEV_PANIC)
 			__ghes_panic(ghes);
 
+		ret = NMI_HANDLED;
+
 		if (!(ghes->flags & GHES_TO_CLEAR))
 			continue;
 
 		__process_error(ghes);
 		ghes_clear_estatus(ghes);
-
-		ret = NMI_HANDLED;
 	}
 
 #ifdef CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG
-	irq_work_queue(&ghes_proc_irq_work);
+	if (ret == NMI_HANDLED)
+		irq_work_queue(&ghes_proc_irq_work);
 #endif
 	atomic_dec(&ghes_in_nmi);
 	return ret;
-- 
1.7.9.3


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] ACPI / APEI: Fix NMI notification handling
  2016-11-29 18:43 [PATCH] ACPI / APEI: Fix NMI notification handling Prarit Bhargava
@ 2016-11-29 19:36 ` Borislav Petkov
       [not found]   ` <1480511979-11722-1-git-send-email-prarit@redhat.com>
  0 siblings, 1 reply; 8+ messages in thread
From: Borislav Petkov @ 2016-11-29 19:36 UTC (permalink / raw)
  To: Prarit Bhargava
  Cc: linux-kernel, Rafael J. Wysocki, Len Brown, Paul Gortmaker,
	Tyler Baicar, Punit Agrawal, Don Zickus, linux-acpi

On Tue, Nov 29, 2016 at 01:43:59PM -0500, Prarit Bhargava wrote:
> When removing and adding cpu 0 on a system with GHES NMI the following stack
> trace is seen when re-adding the cpu:
> 
> WARNING: CPU: 0 PID: 0 at arch/x86/kernel/apic/apic.c:1349 setup_local_APIC+
> Modules linked in: nfsv3 rpcsec_gss_krb5 nfsv4 nfs fscache coretemp intel_ra
> CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.9.0-rc5+ #59
> Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.01.00.0
> ffffffff81c03e78 ffffffff81337905 0000000000000000 0000000000000000
> ffffffff81c03eb8 ffffffff8107d9c1 00000545810aac4a 0000000000000000
> 00000000000000f0 0000000000000000 000081cb6440f1d0 0000000000000001
> Call Trace:
> [<ffffffff81337905>] dump_stack+0x63/0x8e
> [<ffffffff8107d9c1>] __warn+0xd1/0xf0
> [<ffffffff8107daad>] warn_slowpath_null+0x1d/0x20
> [<ffffffff810522b5>] setup_local_APIC+0x275/0x370
> [<ffffffff810523be>] apic_ap_setup+0xe/0x20
> [<ffffffff8104f5a8>] start_secondary+0x48/0x180
> [<ffffffff81d89aa0>] ? set_init_arg+0x55/0x55
> [<ffffffff81d89120>] ? early_idt_handler_array+0x120/0x120
> [<ffffffff81d895d6>] ? x86_64_start_reservations+0x2a/0x2c
> [<ffffffff81d89715>] ? x86_64_start_kernel+0x13d/0x14c
> ---[ end trace 7b6555b6343ef9ee ]---

Please remove all hex numbers from the splat - they're useless in the
commit message.

> During the cpu bringup, wakeup_cpu_via_init_nmi() is called and issues an
> NMI on CPU 0.  The GHES NMI handler, ghes_notify_nmi() runs the
> ghes_proc_irq_work work queue which ends up setting IRQ_WORK_VECTOR
> (0xf6).  The "faulty" IR line set at arch/x86/kernel/apic/apic.c:1349 is  also
> 0xf6 (specifically APIC IRR for irqs 255 to 224 is 0x400000) which confirms
> that something has set the IRQ_WORK_VECTOR line prior to the APIC being
> initialized.
>
> Commit 2383844d4850 ("GHES: Elliminate double-loop in the NMI handler")
> incorrectly modified the behavior such that the handler returns
> NMI_HANDLED only if an error was processed, and incorrectly runs the ghes
> work queue for every NMI.
> 
> This patch modifies the ghes_proc_irq_work() to run as it did prior to
> 2383844d4850 ("GHES: Elliminate double-loop in the NMI handler") by
> properly returning NMI_HANDLED and only calling the work queue if
> NMI_HANDLED has been set.
> 
> Fixes: 2383844d4850 ("GHES: Elliminate double-loop in the NMI handler")
> Signed-off-by: Prarit Bhargava <prarit@redhat.com>
> Cc: Borislav Petkov <bp@suse.de>
> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Cc: Len Brown <lenb@kernel.org>
> Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
> Cc: Tyler Baicar <tbaicar@codeaurora.org>
> Cc: Punit Agrawal <punit.agrawal@arm.com>
> Cc: Don Zickus <dzickus@redhat.com>
> Cc: linux-acpi@vger.kernel.org
> ---
>  drivers/acpi/apei/ghes.c |    7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
> index 0d099a24f776..39c45efbcb3d 100644
> --- a/drivers/acpi/apei/ghes.c
> +++ b/drivers/acpi/apei/ghes.c
> @@ -858,17 +858,18 @@ static int ghes_notify_nmi(unsigned int cmd, struct pt_regs *regs)
>  		if (sev >= GHES_SEV_PANIC)
>  			__ghes_panic(ghes);
>  
> +		ret = NMI_HANDLED;
> +

Make that more explicit:

                if (ghes_read_estatus(ghes, 1)) {
                        ghes_clear_estatus(ghes);
                        continue;
                } else {
			ret = NMI_HANDLED;
		}


>  		if (!(ghes->flags & GHES_TO_CLEAR))
>  			continue;
>  
>  		__process_error(ghes);
>  		ghes_clear_estatus(ghes);
> -
> -		ret = NMI_HANDLED;
>  	}
>  
>  #ifdef CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG
> -	irq_work_queue(&ghes_proc_irq_work);
> +	if (ret == NMI_HANDLED)
> +		irq_work_queue(&ghes_proc_irq_work);
>  #endif
>  	atomic_dec(&ghes_in_nmi);
>  	return ret;
> -- 

Otherwise looks ok,
thanks.

-- 
Regards/Gruss,
    Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
-- 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] ACPI / APEI: Fix NMI notification handling
       [not found]     ` <20161201200739.qcibekpe37podnmu@pd.tnic>
@ 2016-12-01 21:17       ` Rafael J. Wysocki
       [not found]         ` <20161201215149.fvb3ki77det7bnjq@pd.tnic>
  0 siblings, 1 reply; 8+ messages in thread
From: Rafael J. Wysocki @ 2016-12-01 21:17 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Prarit Bhargava, linux-kernel, Borislav Petkov, Rafael J. Wysocki,
	Len Brown, Paul Gortmaker, Tyler Baicar, Punit Agrawal,
	Don Zickus, Linux ACPI

On Thursday, December 01, 2016 09:07:39 PM Borislav Petkov wrote:
> On Wed, Nov 30, 2016 at 08:19:39AM -0500, Prarit Bhargava wrote:
> > When removing and adding cpu 0 on a system with GHES NMI the following stack
> > trace is seen when re-adding the cpu:
> > 
> > WARNING: CPU: 0 PID: 0 at arch/x86/kernel/apic/apic.c:1349 setup_local_APIC+
> > Modules linked in: nfsv3 rpcsec_gss_krb5 nfsv4 nfs fscache coretemp intel_ra
> > CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.9.0-rc6+ #2
> > Call Trace:
> >  dump_stack+0x63/0x8e
> >  __warn+0xd1/0xf0
> >  warn_slowpath_null+0x1d/0x20
> >  setup_local_APIC+0x275/0x370
> >  apic_ap_setup+0xe/0x20
> >  start_secondary+0x48/0x180
> >  set_init_arg+0x55/0x55
> >  early_idt_handler_array+0x120/0x120
> >  x86_64_start_reservations+0x2a/0x2c
> >  x86_64_start_kernel+0x13d/0x14c
> > 
> > During the cpu bringup, wakeup_cpu_via_init_nmi() is called and issues an
> > NMI on CPU 0.  The GHES NMI handler, ghes_notify_nmi() runs the
> > ghes_proc_irq_work work queue which ends up setting IRQ_WORK_VECTOR
> > (0xf6).  The "faulty" IR line set at arch/x86/kernel/apic/apic.c:1349 is  also
> > 0xf6 (specifically APIC IRR for irqs 255 to 224 is 0x400000) which confirms
> > that something has set the IRQ_WORK_VECTOR line prior to the APIC being
> > initialized.
> > 
> > Commit 2383844d4850 ("GHES: Elliminate double-loop in the NMI handler")
> > incorrectly modified the behavior such that the handler returns
> > NMI_HANDLED only if an error was processed, and incorrectly runs the ghes
> > work queue for every NMI.
> > 
> > This patch modifies the ghes_proc_irq_work() to run as it did prior to
> > 2383844d4850 ("GHES: Elliminate double-loop in the NMI handler") by
> > properly returning NMI_HANDLED and only calling the work queue if
> > NMI_HANDLED has been set.
> > 
> > v2: Borislav, setting of NMI_HANDLED moved & cleaned up changelog.
> > 
> > Fixes: 2383844d4850 ("GHES: Elliminate double-loop in the NMI handler")
> > Signed-off-by: Prarit Bhargava <prarit@redhat.com>
> > Cc: Borislav Petkov <bp@suse.de>
> > Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > Cc: Len Brown <lenb@kernel.org>
> > Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
> > Cc: Tyler Baicar <tbaicar@codeaurora.org>
> > Cc: Punit Agrawal <punit.agrawal@arm.com>
> > Cc: Don Zickus <dzickus@redhat.com>
> > ---
> >  drivers/acpi/apei/ghes.c |    7 ++++---
> >  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> Reviewed-by: Borislav Petkov <bp@suse.de>

I guess I should pick up this one, then?

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] ACPI / APEI: Fix NMI notification handling
       [not found]         ` <20161201215149.fvb3ki77det7bnjq@pd.tnic>
@ 2016-12-01 22:29           ` Rafael J. Wysocki
  2016-12-01 22:47             ` Borislav Petkov
  0 siblings, 1 reply; 8+ messages in thread
From: Rafael J. Wysocki @ 2016-12-01 22:29 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Prarit Bhargava, linux-kernel, Borislav Petkov, Rafael J. Wysocki,
	Len Brown, Paul Gortmaker, Tyler Baicar, Punit Agrawal,
	Don Zickus, Linux ACPI

On Thursday, December 01, 2016 10:51:49 PM Borislav Petkov wrote:
> On Thu, Dec 01, 2016 at 10:17:54PM +0100, Rafael J. Wysocki wrote:
> > I guess I should pick up this one, then?
> 
> If you already have stuff touching this area, it probably would be more
> prudent if you took it. If not, I can take it through tip's ras branch,
> if you prefer.

Well, there's another ARM-related patch touching APEI.

I guess whoever takes this one should also take the other one and
honestly they can go in via any tree as far as I'm concerned, I'm just trying to
avoid merge clashes here. :-)

Thanks,
Rafael


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] ACPI / APEI: Fix NMI notification handling
  2016-12-01 22:29           ` Rafael J. Wysocki
@ 2016-12-01 22:47             ` Borislav Petkov
  2016-12-01 23:12               ` Rafael J. Wysocki
  0 siblings, 1 reply; 8+ messages in thread
From: Borislav Petkov @ 2016-12-01 22:47 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Borislav Petkov, Prarit Bhargava, linux-kernel, Rafael J. Wysocki,
	Len Brown, Paul Gortmaker, Tyler Baicar, Punit Agrawal,
	Don Zickus, Linux ACPI

On Thu, Dec 01, 2016 at 11:29:45PM +0100, Rafael J. Wysocki wrote:
> Well, there's another ARM-related patch touching APEI.
> 
> I guess whoever takes this one should also take the other one and
> honestly they can go in via any tree as far as I'm concerned, I'm just trying to
> avoid merge clashes here. :-)

Maybe have ARM-folk ACK the other one and take both through your ACPI
tree? They both do have ACPI in common :-)

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] ACPI / APEI: Fix NMI notification handling
  2016-12-01 22:47             ` Borislav Petkov
@ 2016-12-01 23:12               ` Rafael J. Wysocki
  2016-12-02  5:45                 ` Hanjun Guo
  2016-12-02 11:39                 ` Fu Wei
  0 siblings, 2 replies; 8+ messages in thread
From: Rafael J. Wysocki @ 2016-12-01 23:12 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Borislav Petkov, Prarit Bhargava, linux-kernel, Rafael J. Wysocki,
	Len Brown, Paul Gortmaker, Tyler Baicar, Punit Agrawal,
	Don Zickus, Linux ACPI

On Thursday, December 01, 2016 11:47:17 PM Borislav Petkov wrote:
> On Thu, Dec 01, 2016 at 11:29:45PM +0100, Rafael J. Wysocki wrote:
> > Well, there's another ARM-related patch touching APEI.
> > 
> > I guess whoever takes this one should also take the other one and
> > honestly they can go in via any tree as far as I'm concerned, I'm just trying to
> > avoid merge clashes here. :-)
> 
> Maybe have ARM-folk ACK the other one and take both through your ACPI
> tree? They both do have ACPI in common :-)

That one have been ACKed already.

OK, I'll take them both.

Thanks,
Rafael


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] ACPI / APEI: Fix NMI notification handling
  2016-12-01 23:12               ` Rafael J. Wysocki
@ 2016-12-02  5:45                 ` Hanjun Guo
  2016-12-02 11:39                 ` Fu Wei
  1 sibling, 0 replies; 8+ messages in thread
From: Hanjun Guo @ 2016-12-02  5:45 UTC (permalink / raw)
  To: Rafael J. Wysocki, Borislav Petkov
  Cc: Borislav Petkov, Prarit Bhargava, linux-kernel, Rafael J. Wysocki,
	Len Brown, Paul Gortmaker, Tyler Baicar, Punit Agrawal,
	Don Zickus, Linux ACPI

On 2016/12/2 7:12, Rafael J. Wysocki wrote:
> On Thursday, December 01, 2016 11:47:17 PM Borislav Petkov wrote:
>> On Thu, Dec 01, 2016 at 11:29:45PM +0100, Rafael J. Wysocki wrote:
>>> Well, there's another ARM-related patch touching APEI.
>>>
>>> I guess whoever takes this one should also take the other one and
>>> honestly they can go in via any tree as far as I'm concerned, I'm just trying to
>>> avoid merge clashes here. :-)
>> Maybe have ARM-folk ACK the other one and take both through your ACPI
>> tree? They both do have ACPI in common :-)
> That one have been ACKed already.
>
> OK, I'll take them both.

Thank you very much :)

Hanjun


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] ACPI / APEI: Fix NMI notification handling
  2016-12-01 23:12               ` Rafael J. Wysocki
  2016-12-02  5:45                 ` Hanjun Guo
@ 2016-12-02 11:39                 ` Fu Wei
  1 sibling, 0 replies; 8+ messages in thread
From: Fu Wei @ 2016-12-02 11:39 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Borislav Petkov, Borislav Petkov, Prarit Bhargava,
	Linux Kernel Mailing List, Rafael J. Wysocki, Len Brown,
	Paul Gortmaker, Tyler Baicar, Punit Agrawal, Don Zickus,
	Linux ACPI

Hi Rafael,

On 2 December 2016 at 07:12, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> On Thursday, December 01, 2016 11:47:17 PM Borislav Petkov wrote:
>> On Thu, Dec 01, 2016 at 11:29:45PM +0100, Rafael J. Wysocki wrote:
>> > Well, there's another ARM-related patch touching APEI.
>> >
>> > I guess whoever takes this one should also take the other one and
>> > honestly they can go in via any tree as far as I'm concerned, I'm just trying to
>> > avoid merge clashes here. :-)
>>
>> Maybe have ARM-folk ACK the other one and take both through your ACPI
>> tree? They both do have ACPI in common :-)
>
> That one have been ACKed already.

I guess that is "[PATCH v15] acpi, apei, arm64: APEI initial support
for aarch64."

>
> OK, I'll take them both.

Great thanks, Rafael :-)

>
> Thanks,
> Rafael
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Best regards,

Fu Wei
Software Engineer
Red Hat

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2016-12-02 11:39 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-11-29 18:43 [PATCH] ACPI / APEI: Fix NMI notification handling Prarit Bhargava
2016-11-29 19:36 ` Borislav Petkov
     [not found]   ` <1480511979-11722-1-git-send-email-prarit@redhat.com>
     [not found]     ` <20161201200739.qcibekpe37podnmu@pd.tnic>
2016-12-01 21:17       ` [PATCH v2] " Rafael J. Wysocki
     [not found]         ` <20161201215149.fvb3ki77det7bnjq@pd.tnic>
2016-12-01 22:29           ` Rafael J. Wysocki
2016-12-01 22:47             ` Borislav Petkov
2016-12-01 23:12               ` Rafael J. Wysocki
2016-12-02  5:45                 ` Hanjun Guo
2016-12-02 11:39                 ` Fu Wei

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox