From: Brian Bloniarz <bmb@athenacr.com>
To: Bjorn Helgaas <bjorn.helgaas@hp.com>
Cc: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: 2.6.35-rc3 BUG: unable to handle kernel paging request (ahci_stop_engine)
Date: Wed, 16 Jun 2010 14:07:14 -0400 [thread overview]
Message-ID: <4C1912D2.8000408@athenacr.com> (raw)
In-Reply-To: <201006161057.32602.bjorn.helgaas@hp.com>
On 06/16/2010 12:57 PM, Bjorn Helgaas wrote:
> On Tuesday, June 15, 2010 04:24:38 pm Brian Bloniarz wrote:
>> On 06/15/2010 03:11 PM, Brian Bloniarz wrote:
>>> I'm seeing the following BUG booting a Dell Precision T3500
>>> with 2.6.35-rc3 -- does this ring any bells for anyone?
>>>
>>> Looks like -rc1 has the same behavior, I haven't gotten any
>>> farther than that yet.
>>
>> 2.6.34 does not boot for me on this machine either, it times
>> out waiting for the boot device. However, it doesn't BUG.
>> I'm wondering if there are two issues, some issue which
>> showed up pre 2.6.34 causing this:
>>
>> [ 5.854464] ahci 0000:00:1f.2: controller reset failed (0xffffffff)
>>
>> and then something post-2.6.34 which triggers the BUG.
>
> Yes, it sounds like this may be two separate issues, but both
> could be regressions, and we definitely want to resolve them.
> Thanks for giving me a heads-up!
>
> I assume there is *some* older kernel that works. If so, can
> you open a report at http://bugzilla.kernel.org that mentions
> the working older revision and the broken new one, and attach
> the dmesg logs for both?
I submitted https://bugzilla.kernel.org/show_bug.cgi?id=16228
and attached the boot logs.
2.6.33 works fine, and 2.6.35-rc3 with pci=nocrs works
fine too. The logs for both of those are included on the bug.
I don't have windows on this machine unfortunately.
Thanks for the help!
>
>> Googling for "controller reset failed" gives this:
>> https://bugzilla.kernel.org/show_bug.cgi?id=15744
>> on a similar machine, but that was fixed before 2.6.34.
>> Bjorn, could you tell me if this boot log shows anything
>> similar to the behavior you describe in that bug link?
>
> The symptoms are similar to 15744, but I think you're seeing something
> a bit different. Here's what you see:
>
> ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
> pci_root PNP0A03:00: host bridge window [mem 0x000a0000-0x000bffff]
> pci_root PNP0A03:00: host bridge window [mem 0x000c0000-0x000effff]
> pci_root PNP0A03:00: host bridge window [mem 0x000f0000-0x000fffff]
> pci_root PNP0A03:00: host bridge window [mem 0xbff00000-0xdfffffff]
> pci_root PNP0A03:00: host bridge window [mem 0xf0000000-0xfc000000]
> pci_root PNP0A03:00: host bridge window [mem 0xff980000-0xff980fff]
> pci_root PNP0A03:00: host bridge window [mem 0xff97c000-0xff97ffff]
> pci_root PNP0A03:00: host bridge window [mem 0xfed20000-0xfed9ffff]
> pci 0000:00:1f.2: no compatible bridge window for [mem 0xff970000-0xff9707ff]
>
> The BIOS left the device set to an address that isn't within any of
> the host bridge windows, so we moved it:
>
> pci 0000:00:1f.2: BAR 5: assigned [mem 0xbff00000-0xbff007ff]
> pci 0000:00:1f.2: BAR 5: set to [mem 0xbff00000-0xbff007ff] (PCI address [0xbff00000-0xbff007ff]
>
> The new address (0xbff00000) is inside one of the windows and looks
> reasonable. If you booted Windows on this system, I think it would
> also move the device, though it would probably pick a different
> place to put it.
>
> ahci 0000:00:1f.2: PCI INT C -> GSI 20 (level, low) -> IRQ 20
> ahci 0000:00:1f.2: controller can't do SNTF, turning off CAP_SNTF
> ahci 0000:00:1f.2: controller reset failed (0xffffffff)
>
> The device seems to be responding there (we read the IRQ information,
> for example), so I don't see a problem from the PCI side yet, but
> something is still wrong.
>
> It's conceivable that booting with "pci=nocrs" would make a difference.
> If so, please collect the dmesg log so I can see where we went wrong.
>
> The BUG:
>
> ahci 0000:00:1f.2: failed to stop engine (-5)
> BUG: unable to handle kernel paging request at ffffc90012621018
> IP: [<ffffffffa002c77c>] ahci_stop_engine+0x2c/0x70 [libahci]
>
> looks very strange to me. ahci_stop_engine() does a read from the
> device, then a write, and it looks like the page fault was on the
> write to the same address we just read. I don't know enough about
> x86 to go any farther yet.
>
> Bjorn
prev parent reply other threads:[~2010-06-16 18:07 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-06-15 19:11 2.6.35-rc3 BUG: unable to handle kernel paging request (ahci_stop_engine) Brian Bloniarz
2010-06-15 22:24 ` Brian Bloniarz
2010-06-16 16:57 ` Bjorn Helgaas
2010-06-16 18:07 ` Brian Bloniarz [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4C1912D2.8000408@athenacr.com \
--to=bmb@athenacr.com \
--cc=bjorn.helgaas@hp.com \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.