All of lore.kernel.org
 help / color / mirror / Atom feed
From: Randy Dunlap <randy.dunlap@oracle.com>
To: Len Brown <lenb@kernel.org>
Cc: linux-acpi@vger.kernel.org, Bjorn Helgaas <bjorn.helgaas@hp.com>
Subject: Re: boot hang when using kexec
Date: Thu, 26 Feb 2009 14:38:52 -0800	[thread overview]
Message-ID: <49A719FC.3080604@oracle.com> (raw)
In-Reply-To: <alpine.LFD.2.00.0902261534230.6492@localhost.localdomain>

Len Brown wrote:
> On Thu, 26 Feb 2009, Randy Dunlap wrote:
> 
>> For daily kernel testing, I (try to) use kexec to boot each new kernel.
>> This hasn't been working for several weeks now.
>> A "git bisect" only pointed me at one of Arjan's async boot (fastboot)
>> patches, but he and I think that's a git bisect anomaly.
>>
>> The test system is an HP BladeCenter 4-proc with 8 GB of RAM
>> (HP BladeCenter BL c-class: ProLiant BL685c G1).
>> It boots from an HP/Compaq CCISS drive (using an initramfs).
>>
>>
>> I capture the kernel log via netconsole, so sometimes the last few
>> lines of the kernel log are lost.  This is what the end of the
>> netconsole capture looks like:
>> (using acpi.debug_layer=0x03412f3b acpi.debug_level=0xffffffff)
>>
>> Execute Method: [\_SB_.PCI0.IP2P.ASMD._STA] (Node ffff88027f813ba0)
>>   nseval-0164 [FFFF88027F840000] [00] ns_evaluate           : Method at AML address ffffc2000000c68a Length 2
>>  utmutex-0249 [FFFF88027F840000] [00] ut_acquire_mutex      : Thread FFFF88027F840000 attempting to acquire Mutex [ACPI_MTX_Interpreter]
>>      osl-0852 [FFFF88027F840000] [00] os_wait_semaphore     : Waiting for semaphore[ffff88027f806140|1|65535]
>>      osl-0871 [FFFF88027F840000] [00] os_wait_semaphore     : Acquired semaphore[ffff88027f806140|1|65535] utmutex-0257 [FFFF88027F840000] [00] ut_acquire_mutex      : Thread FFFF88027F840000 acquired Mutex [ACPI_MTX_Interpreter]
>>  utmutex-0249 [FFFF88027F840000] [00] ut_acquire_mutex      : Thread FFFF88027F840000 attempting to acquire Mutex [ACPI_MTX_Caches]
>>      osl-0852 [FFFF88027F840000] [00] os_wait_semaphore     : Waiting for semaphore[ffff88027f8061c0|1|65535]
>>      osl-0871 [FFFF88027F840000] [00] os_wait_semaphore     : Acquired semaphore[ffff88027f8061c0|1|65535] utmutex-0257 [FFFF88027F840000] [00] ut_acquire_mutex      : Thread FFFF88027F840000 acquired Mutex [ACPI_MTX_Caches]
>>   utmisc-0228 [FFFF88027F840000] [00] ut_allocate_owner_id  : Allocated OwnerId: 9D
>>  utmutex-0292 [FFFF88027F840000] [00] ut_release_mutex      : Thread FFFF88027F840000 releasing Mutex [ACPI_MTX_Caches]
>>      osl-0891 [FFFF88027F840000] [00] os_signal_semaphore   : Signaling semaphore[ffff88027f8061c0|1]
>>  utmutex-0249 [FFFF88027F840000] [00] ut_acquire_mutex      : Thread FFFF88027F840000 attempting to acquire Mutex [ACPI_MTX_Namespace]
>>      osl-0852 [FFFF88027F840000] [00] os_wait_semaphore     : Waiting for semaphore[ffff88027f806160|1|65535]
>>      osl-0871 [FFFF88027F840000] [00] os_wait_semaphore     : Acquired semaphore[ffff88027f806160|1|65535] utmutex-0257 [FFFF88027F840000] [00] ut_acquire_mutex      : Thread FFFF88027F840000 acquired Mutex [ACPI_MTX_Namespace]
>>  utmutex-0292 [FFFF88027F840000] [00] ut_release_mutex      : Thread FFFF88027F840000 releasing Mutex [ACPI_MTX_Namespace]
>>
>>
>> [more of the kernel log including above is available at
>> http://oss.oracle.com/kerneltest/logs/netcon-5975.log, but it does not include
>> the beginning of the kernel boot for some reason -- it was truncated]
>>
>>
>>
>> and this is what is on the serial console output (which I don't know how
>> to capture in its entirety):
>>
>> nssearch-0110 [FFFF88027F840000] [00] ns_search_one_scope   : Searching \_SB_.PCI0.IP2P (ffff88027f806ee0) For [_S2D] (Untyped)
>> nssearch-0174 [FFFF88027F840000] [00] ns_search_one_scope   : Name [_S2D] (Untyped) not found in search in scope [IP2P] ffff88027f806ee0 first child ffff88027f806f00
>> nssearch-0386 [FFFF88027F840000] [00] ns_search_and_enter   : _S2D Not found in ffff88027f806ee0 [Not adding]
>> nsaccess-0575 [FFFF88027F840000] [00] ns_lookup             : Name [_S2D] not found in scope [IP2P] ffff88027f806ee0
>>  nsutils-0876 [FFFF88027F840000] [00] ns_get_node           : _S2D, AE_NOT_FOUND
>>  utmutex-0292 [FFFF88027F840000] [00] ut_release_mutex      : Thread FFFF88027F840000 releasing Mutex [ACPI_MTX_Namespace]
>>      osl-0891 [FFFF88027F840000] [00] os_signal_semaphore   : Signaling semaphore[ffff88027f806160|1]
>>   uteval-0227 [FFFF88027F840000] [00] ut_evaluate_object    : [IP2P._S2D] was not found
>>  nsutils-0461 [FFFF88027F840000] [00] ns_build_internal_name: Returning [ffff88027ed28be0] (rel) "_S3D"
>>  utmutex-0249 [FFFF88027F840000] [00] ut_acquire_mutex      : Thread FFFF88027F840000 attempting to acquire Mutex [ACPI_MTX_Namespace]
>>      osl-0852 [FFFF88027F840000] [00] os_wait_semaphore     : Waiting for semaphore[ffff88027f806160|1|65535]
>>      osl-0871 [FFFF88027F840000] [00] os_wait_semaphore     : Acquired semaphore[ffff88027f806160|1|65535] utmutex-0257 [FFFF88027F840000] [00] ut_acquire_mutex      : Thread FFFF88027F840000 acquired Mutex [ACPI_MTX_Namespace]
>> nsaccess-0404 [FFFF88027F840000] [00] ns_lookup             : Searching relative to prefix scope [IP2P] (ffff88027f806ee0)
>> nsaccess-0514 [FFFF88027F840000] [00] ns_lookup             : Simple Pathname (1 segment, Flags=2)
>>   nsdump-0087 [FFFF88027F840000] [00] ns_print_pathname     : [_S3D]
>> nssearch-0110 [FFFF88027F840000] [00] ns_search_one_scope   : Searching \_SB_.PCI0.IP2P (ffff88027f806ee0) For [_S3D] (Untyped)
>> nssearch-0174 [FFFF88027F840000] [00] ns_search_one_scope   : Name [_S3D] (Untyped) not found in search in scope [IP2P] ffff88027f806ee0 first child ffff88027f806f00
>> nssearch-0386 [FFFF88027F840000] [00] ns_search_and_enter   : _S3D Not found in ffff88027f
> 
> mutexes being acquired and released,
> optional objects not being found.
> 
> Nothing jumps out at me as wrong above.
> 
> How does the boot fail, and why are you looking at ACPI --

  (see below for typical failures), & pulling at straws?

> is this the last thing seen on the console?

Yes.

> What is the last thing printed when acpi_debug is not enabled?

Usual failures are like these:

ACPI: bus type pci registered
PCI: MCFG configuration 0: base 80000000 segment 0 buses 0 - 255
PCI: Not using MMCONFIG.
PCI: Using configuration type 1 for base access
PCI: HP ProLiant BL685c G1 detected, enabling pci=bfsort.
bio: create slab <bio-0> at 0
ACPI: EC: Look up EC in DSDT
ACPI: SSDT 7FE58000, 04F0 (r2 HP     PNOWSSDT        2 HP          1)
ACPI: Interpreter enabled

or

ACPI: bus type pci registered
PCI: Using configuration type 1 for base access
PCI: HP ProLiant BL685c G1 detected, enabling pci=bfsort.
bio: create slab <bio-0> at 0
ACPI: EC: Look up EC in DSDT
ACPI: SSDT 7FE58000, 04F0 (r2 HP     PNOWSSDT        2 HP          1)
ACPI: Interpreter enabled
ACPI: (supports S0 S4 S5)
ACPI: Using IOAPIC for interrupt routing

or

ACPI: bus type pci registered
PCI: MCFG configuration 0: base 80000000 segment 0 buses 0 - 255
PCI: Not using MMCONFIG.
PCI: Using configuration type 1 for base access
PCI: HP ProLiant BL685c G1 detected, enabling pci=bfsort.
bio: create slab <bio-0> at 0
ACPI: EC: Look up EC in DSDT
ACPI: SSDT 7FE58000, 04F0 (r2 HP     PNOWSSDT        2 HP          1)
ACPI: Interpreter enabled
ACPI: (supports S0 S4 S5)
ACPI: Using IOAPIC for interrupt routing
PCI: MCFG configuration 0: base 80000000 segment 0 buses 0 - 255
PCI: MCFG area at 80000000 reserved in ACPI motherboard resources
PCI: Using MMCONFIG at 80000000 - 8fffffff
ACPI: PCI Root Bridge [PCI0] (0000:00)
pci 0000:00:02.0: reg 10 32bit mmio: [0xf7de0000-0xf7de0fff]
pci 0000:00:02.0: supports D1 D2
pci 0000:00:02.0: PME# supported from D0 D1 D2 D3hot D3cold
pci 0000:00:02.0: PME# disabled
pci 0000:00:02.1: reg 10 32bit mmio: [0xf7dd0000-0xf7dd00ff]
pci 0000:00:02.1: supports D1 D2
pci 0000:00:02.1: PME# supported from D0 D1 D2 D3hot D3cold
pci 0000:00:02.1: PME# disabled
pci 0000:00:0b.0: PME# supported from D0 D1 D2 D3hot D3cold
pci 0000:00:0b.0: PME# disabled
pci 0000:00:0c.0: PME# supported from D0 D1 D2 D3hot D3cold
pci 0000:00:0c.0: PME# disabled
pci 0000:00:0d.0: PME# supported from D0 D1 D2 D3hot D3cold
pci 0000:00:0d.0: PME# disabled
pci 0000:00:0e.0: PME# supported from D0 D1 D2 D3hot D3cold
pci 0000:00:0e.0: PME# disabled
pci 0000:01:03.0: reg 10 32bit mmio: [0xe8000000-0xefffffff]
pci 0000:01:03.0: reg 14 io port: [0x1000-0x10ff]
pci 0000:01:03.0: reg 18 32bit mmio: [0xf7ff0000-0xf7ffffff]
pci 0000:01:03.0: reg 30 32bit mmio: [0x000000-0x01ffff]
pci 0000:01:03.0: supports D1 D2
pci 0000:01:04.0: reg 10 io port: [0x2800-0x28ff]
pci 0000:01:04.0: reg 14 32bit mmio: [0xf7fe0000-0xf7fe01ff]
pci 0000:01:04.0: PME# supported from D0 D3hot D3cold
pci 0000:01:04.0: PME# disabled
pci 0000:01:04.2: reg 10 io port: [0x1400-0x14ff]
pci 0000:01:04.2: reg 14 32bit mmio: [0xf7fd0000-0xf7fd07ff]
pci 0000:01:04.2: reg 18 32bit mmio: [0xf7fc0000-0xf7fc1fff]


while a working boot is like this:

calling  cciss_init+0x0/0x2e [cciss] @ 733
HP CISS Driver (v 3.6.20)
ACPI: PCI Interrupt Link [LNKA] enabled at IRQ 54
cciss 0000:42:08.0: PCI INT A -> Link[LNKA] -> GSI 54 (level, high) -> IRQ 54
cciss 0000:42:08.0: irq 70 for MSI/MSI-X
IRQ 70/cciss0: IRQF_DISABLED is not guaranteed on shared IRQs
cciss0: <0x3238> at PCI 0000:42:08.0 IRQ 70 using DAC
usb 1-1: new full speed USB device using uhci_hcd and address 2
usb 1-1: configuration #1 chosen from 1 choice
input: HP Virtual Keyboard as /class/input/input2
generic-usb 0003:03F0:1027.0001: input: USB HID v1.01 Keyboard [HP Virtual Keyboard] on usb-0000:01:04.4-1/input0
input: HP Virtual Keyboard as /class/input/input3
generic-usb 0003:03F0:1027.0002: input: USB HID v1.01 Mouse [HP Virtual Keyboard] on usb-0000:01:04.4-1/input1
usb 1-2: new full speed USB device using uhci_hcd and address 3
usb 1-2: configuration #1 chosen from 1 choice
hub 1-2:1.0: USB hub found
hub 1-2:1.0: 7 ports detected


This leads me (now) to suspect the cciss driver more than anything else.


> not having any idea what is wrong here, try irqpoll,

irqpoll didn't help.

> in case some interrupts are screwed up.
> you might also enable the watchdog.

I already have nmi_watchdog=2.  Is that OK?


Thanks.
-- 
~Randy

      reply	other threads:[~2009-02-26 22:37 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-02-26 19:55 boot hang when using kexec Randy Dunlap
2009-02-26 20:37 ` Len Brown
2009-02-26 22:38   ` Randy Dunlap [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=49A719FC.3080604@oracle.com \
    --to=randy.dunlap@oracle.com \
    --cc=bjorn.helgaas@hp.com \
    --cc=lenb@kernel.org \
    --cc=linux-acpi@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.