public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Len Brown <len.brown@intel.com>
To: Ingo Molnar <mingo@elte.hu>
Cc: linux-kernel@vger.kernel.org, Andrew Morton <akpm@osdl.org>, ak@suse.de
Subject: Re: [patch] ACPI, i686, x86_64: fix laptop bootup hang in init_acpi()
Date: Wed, 6 Dec 2006 18:57:29 -0500	[thread overview]
Message-ID: <200612061857.30248.len.brown@intel.com> (raw)
In-Reply-To: <20061206223025.GA17227@elte.hu>

On Wednesday 06 December 2006 17:30, Ingo Molnar wrote:
> Subject: [patch] ACPI, i686, x86_64: fix laptop bootup hang in init_acpi()
> From: Ingo Molnar <mingo@elte.hu>
> 
> during kernel bootup, a new T60 laptop (CoreDuo, 32-bit) hangs about 
> 10%-20% of the time in acpi_init():
> 
>  Calling initcall 0xc055ce1a: topology_init+0x0/0x2f()
>  Calling initcall 0xc055d75e: mtrr_init_finialize+0x0/0x2c()
>  Calling initcall 0xc05664f3: param_sysfs_init+0x0/0x175()
>  Calling initcall 0xc014cb65: pm_sysrq_init+0x0/0x17()
>  Calling initcall 0xc0569f99: init_bio+0x0/0xf4()
>  Calling initcall 0xc056b865: genhd_device_init+0x0/0x50()
>  Calling initcall 0xc056c4bd: fbmem_init+0x0/0x87()
>  Calling initcall 0xc056dd74: acpi_init+0x0/0x1ee()
> 
> it's a hard hang that not even an NMI could punch through! 
> Frustratingly, adding printks or function tracing to the ACPI code made 
> the hangs go away ...
> 
> after some time an additional detail emerged: disabling the NMI watchdog 
> made these occasional hangs go away.
> 
> So i spent the better part of today trying to debug this and trying out 
> various theories when i finally found the likely reason for the hang: if 
> acpi_ns_initialize_devices() executes an _INI AML method and an NMI 
> happens to hit that AML execution in the wrong moment, the machine would 
> hang.  (my theory is that this must be some sort of chipset setup method 
> doing stores to chipset mmio registers?)
> 
> Unfortunately given the characteristics of the hang it was sheer 
> impossible to figure out which of the numerous AML methods is impacted 
> by this problem.
> 
> as a workaround i wrote an interface to disable chipset-based NMIs while 
> executing _INI sections - and indeed this fixed the hang. I did a 
> boot-loop of 100 separate reboots and none hung - while without the 
> patch it would hang every 5-10 attempts. Out of caution i did not touch 
> the nmi_watchdog=2 case (it's not related to the chipset anyway and 
> didnt hang).
> 
> I implemented this for both x86_64 and i686, tested the i686 laptop both 
> with nmi_watchdog=1 [which triggered the hangs] and nmi_watchdog=2, and 
> tested an Athlon64 box with the 64-bit kernel as well. Everything builds 
> and works with the patch applied.

Good work root-causing this failure!

Personally I have never been a big fan of having the NMI watchdog
running by default on all systems -- but Andi insists that it helps him
debug failures, so tick it does...

Clearly this laptop was validated with Windows, and Windows doesn't
have this problem -- suggesting that we may be somewhat out on a limb...

Some alternatives to consider...
a. don't enable NMI watchdog by default
b. enable NMI watchdog, but later during kernel boot
    (assumption here that the issue is only during _INI)
c. disable the NMI whenever the ACPI interpeter is running
   (who knows, maybe this isn't limited to the _INI case, but
    could cause a hang at some other time -- only the
    BIOS AML writers knows....)

thoughts?
-Len

> Signed-off-by: Ingo Molnar <mingo@elte.hu>
> ---
>  arch/i386/kernel/nmi.c          |   28 ++++++++++++++++++++++++++++
>  arch/x86_64/kernel/nmi.c        |   27 +++++++++++++++++++++++++++
>  drivers/acpi/namespace/nsinit.c |    9 +++++++++
>  include/linux/nmi.h             |    9 ++++++++-
>  4 files changed, 72 insertions(+), 1 deletion(-)
> 
> Index: linux/arch/i386/kernel/nmi.c
> ===================================================================
> --- linux.orig/arch/i386/kernel/nmi.c
> +++ linux/arch/i386/kernel/nmi.c
> @@ -376,6 +376,34 @@ void enable_timer_nmi_watchdog(void)
>  	}
>  }
>  
> +static void __acpi_nmi_disable(void *__unused)
> +{
> +	apic_write_around(APIC_LVT0, APIC_DM_NMI | APIC_LVT_MASKED);
> +}
> +
> +/*
> + * Disable timer based NMIs on all CPUs:
> + */
> +void acpi_nmi_disable(void)
> +{
> +	if (atomic_read(&nmi_active) && nmi_watchdog == NMI_IO_APIC)
> +		on_each_cpu(__acpi_nmi_disable, NULL, 0, 1);
> +}
> +
> +static void __acpi_nmi_enable(void *__unused)
> +{
> +	apic_write_around(APIC_LVT0, APIC_DM_NMI);
> +}
> +
> +/*
> + * Enable timer based NMIs on all CPUs:
> + */
> +void acpi_nmi_enable(void)
> +{
> +	if (atomic_read(&nmi_active) && nmi_watchdog == NMI_IO_APIC)
> +		on_each_cpu(__acpi_nmi_enable, NULL, 0, 1);
> +}
> +
>  #ifdef CONFIG_PM
>  
>  static int nmi_pm_active; /* nmi_active before suspend */
> Index: linux/arch/x86_64/kernel/nmi.c
> ===================================================================
> --- linux.orig/arch/x86_64/kernel/nmi.c
> +++ linux/arch/x86_64/kernel/nmi.c
> @@ -360,6 +360,33 @@ void enable_timer_nmi_watchdog(void)
>  	}
>  }
>  
> +static void __acpi_nmi_disable(void *__unused)
> +{
> +	apic_write(APIC_LVT0, APIC_DM_NMI | APIC_LVT_MASKED);
> +}
> +
> +/*
> + * Disable timer based NMIs on all CPUs:
> + */
> +void acpi_nmi_disable(void)
> +{
> +	if (atomic_read(&nmi_active) && nmi_watchdog == NMI_IO_APIC)
> +		on_each_cpu(__acpi_nmi_disable, NULL, 0, 1);
> +}
> +
> +static void __acpi_nmi_enable(void *__unused)
> +{
> +	apic_write(APIC_LVT0, APIC_DM_NMI);
> +}
> +
> +/*
> + * Enable timer based NMIs on all CPUs:
> + */
> +void acpi_nmi_enable(void)
> +{
> +	if (atomic_read(&nmi_active) && nmi_watchdog == NMI_IO_APIC)
> +		on_each_cpu(__acpi_nmi_enable, NULL, 0, 1);
> +}
>  #ifdef CONFIG_PM
>  
>  static int nmi_pm_active; /* nmi_active before suspend */
> Index: linux/drivers/acpi/namespace/nsinit.c
> ===================================================================
> --- linux.orig/drivers/acpi/namespace/nsinit.c
> +++ linux/drivers/acpi/namespace/nsinit.c
> @@ -45,6 +45,7 @@
>  #include <acpi/acnamesp.h>
>  #include <acpi/acdispat.h>
>  #include <acpi/acinterp.h>
> +#include <linux/nmi.h>
>  
>  #define _COMPONENT          ACPI_NAMESPACE
>  ACPI_MODULE_NAME("nsinit")
> @@ -537,7 +538,15 @@ acpi_ns_init_one_device(acpi_handle obj_
>  	info->parameter_type = ACPI_PARAM_ARGS;
>  	info->flags = ACPI_IGNORE_RETURN_VALUE;
>  
> +	/*
> +	 * Some hardware relies on this being executed as atomically
> +	 * as possible (without an NMI being received in the middle of
> +	 * this) - so disable NMIs and initialize the device:
> +	 */
> +	acpi_nmi_disable();
>  	status = acpi_ns_evaluate(info);
> +	acpi_nmi_enable();
> +
>  	if (ACPI_SUCCESS(status)) {
>  		walk_info->num_INI++;
>  
> Index: linux/include/linux/nmi.h
> ===================================================================
> --- linux.orig/include/linux/nmi.h
> +++ linux/include/linux/nmi.h
> @@ -16,8 +16,15 @@
>   */
>  #ifdef ARCH_HAS_NMI_WATCHDOG
>  extern void touch_nmi_watchdog(void);
> +extern void acpi_nmi_disable(void);
> +extern void acpi_nmi_enable(void);
>  #else
> -# define touch_nmi_watchdog() touch_softlockup_watchdog()
> +static inline void touch_nmi_watchdog(void)
> +{
> +	touch_softlockup_watchdog();
> +}
> +static inline void acpi_nmi_disable(void) { }
> +static inline void acpi_nmi_enable(void) { }
>  #endif
>  
>  #endif
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

  reply	other threads:[~2006-12-06 23:58 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-12-06 22:30 [patch] ACPI, i686, x86_64: fix laptop bootup hang in init_acpi() Ingo Molnar
2006-12-06 23:57 ` Len Brown [this message]
2006-12-07 11:08   ` Ingo Molnar
2006-12-07 12:11   ` [patch] x86_64: do not enable the NMI watchdog by default Ingo Molnar
2006-12-07 12:30     ` Alan
2006-12-07 20:38       ` Andrew Morton
2006-12-07 20:47         ` Ingo Molnar
2006-12-07 20:49           ` Ingo Molnar
2006-12-07 20:55             ` [patch] net: dev_watchdog() locking fix Ingo Molnar
2006-12-07 21:06               ` Herbert Xu
2006-12-08 23:19                 ` Andrew Morton
2006-12-08 23:59                   ` Herbert Xu
2006-12-09 22:02                     ` David Miller
2006-12-11  7:45                       ` Andrew Morton
2006-12-11  7:51                         ` Herbert Xu
2006-12-11  7:56                           ` Ingo Molnar
2006-12-11 20:09                           ` Andrew Morton
2006-12-07 17:24     ` [patch] x86_64: do not enable the NMI watchdog by default Andi Kleen
2006-12-07  2:28 ` [patch] ACPI, i686, x86_64: fix laptop bootup hang in init_acpi() Sergio Monteiro Basto
2006-12-07  4:47   ` Karsten Wiese
2006-12-07 11:09     ` Ingo Molnar
2006-12-07 11:24       ` Ingo Molnar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200612061857.30248.len.brown@intel.com \
    --to=len.brown@intel.com \
    --cc=ak@suse.de \
    --cc=akpm@osdl.org \
    --cc=lenb@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox