* HH DL585 warm boot fail (old)
@ 2018-07-06 19:12 Meelis Roos
2018-07-09 13:38 ` Bjorn Helgaas
0 siblings, 1 reply; 6+ messages in thread
From: Meelis Roos @ 2018-07-06 19:12 UTC (permalink / raw)
To: Linux Kernel list, linux-pci
I have a first gen HP Proliant DL585 ("G1" but the name was not used
back then) that boots up fine from poweron but usually fails bootup from
warm reboot, somewhere in PCI detection (will try to photographs the
screen some time).
I just stumbled upon an old OpenSolaris thead about the same DL585 and
same symptoms:
http://opensolaris-discuss.opensolaris.narkive.com/T0UTXYGZ/solaris-10-06-06-x86-hp-dl585-boot-hang-aftrer-reboot-help
Their conclusion was the wfollowing and they seem to have found a fix
(although I have not tested any version of Solaris on this DL585
myself):
"The hang is caused when, during PCI enumeration, a PCI-PCI bridge is
partially disabled when the PCI command register bits which enable IO
and memory windows are cleared."
Is this information useful in some way for debugging it?
What else besides screenshot of the can be useful in debugging?
--
Meelis Roos (mroos@linux.ee)
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: HH DL585 warm boot fail (old)
2018-07-06 19:12 HH DL585 warm boot fail (old) Meelis Roos
@ 2018-07-09 13:38 ` Bjorn Helgaas
2018-10-24 7:47 ` Meelis Roos
0 siblings, 1 reply; 6+ messages in thread
From: Bjorn Helgaas @ 2018-07-09 13:38 UTC (permalink / raw)
To: Meelis Roos; +Cc: Linux Kernel Mailing List, linux-pci
On Fri, Jul 6, 2018 at 2:13 PM Meelis Roos <mroos@linux.ee> wrote:
>
> I have a first gen HP Proliant DL585 ("G1" but the name was not used
> back then) that boots up fine from poweron but usually fails bootup from
> warm reboot, somewhere in PCI detection (will try to photographs the
> screen some time).
>
> I just stumbled upon an old OpenSolaris thead about the same DL585 and
> same symptoms:
> http://opensolaris-discuss.opensolaris.narkive.com/T0UTXYGZ/solaris-10-06-06-x86-hp-dl585-boot-hang-aftrer-reboot-help
>
> Their conclusion was the wfollowing and they seem to have found a fix
> (although I have not tested any version of Solaris on this DL585
> myself):
>
> "The hang is caused when, during PCI enumeration, a PCI-PCI bridge is
> partially disabled when the PCI command register bits which enable IO
> and memory windows are cleared."
>
> Is this information useful in some way for debugging it?
>
> What else besides screenshot of the can be useful in debugging?
Would you mind opening a report at https://bugzilla.kernel.org? I'm
not sure if anybody will be able to do anything about this, but it's
always possible.
A complete dmesg log and "sudo lspci -vv" output from a successful
boot would be a good start. And if you have a screenshot of the
failure, that would help, too. You can use the "ignore_loglevel"
kernel parameter to make sure we see everything on the console. Does
this machine have an iLO? If so, it may have logs that could be
useful if this is related to some sort of bus error.
Bjorn
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: HH DL585 warm boot fail (old)
2018-07-09 13:38 ` Bjorn Helgaas
@ 2018-10-24 7:47 ` Meelis Roos
2018-10-24 13:49 ` Bjorn Helgaas
0 siblings, 1 reply; 6+ messages in thread
From: Meelis Roos @ 2018-10-24 7:47 UTC (permalink / raw)
To: Bjorn Helgaas; +Cc: Linux Kernel Mailing List, linux-pci
> Would you mind opening a report at https://bugzilla.kernel.org? I'm
> not sure if anybody will be able to do anything about this, but it's
> always possible.
Submitted now, https://bugzilla.kernel.org/show_bug.cgi?id=201503
>
> A complete dmesg log and "sudo lspci -vv" output from a successful
> boot would be a good start. And if you have a screenshot of the
> failure, that would help, too. You can use the "ignore_loglevel"
> kernel parameter to make sure we see everything on the console.
Added.
> Does
> this machine have an iLO? If so, it may have logs that could be
> useful if this is related to some sort of bus error.
Nothing in the ILO logs.
--
Meelis Roos <mroos@linux.ee>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: HH DL585 warm boot fail (old)
2018-10-24 7:47 ` Meelis Roos
@ 2018-10-24 13:49 ` Bjorn Helgaas
2018-10-24 14:47 ` HP " Meelis Roos
0 siblings, 1 reply; 6+ messages in thread
From: Bjorn Helgaas @ 2018-10-24 13:49 UTC (permalink / raw)
To: Meelis Roos; +Cc: Bjorn Helgaas, Linux Kernel Mailing List, linux-pci
On Wed, Oct 24, 2018 at 10:47:24AM +0300, Meelis Roos wrote:
> > Would you mind opening a report at https://bugzilla.kernel.org? I'm
> > not sure if anybody will be able to do anything about this, but it's
> > always possible.
>
> Submitted now, https://bugzilla.kernel.org/show_bug.cgi?id=201503
>
> > A complete dmesg log and "sudo lspci -vv" output from a successful
> > boot would be a good start. And if you have a screenshot of the
> > failure, that would help, too. You can use the "ignore_loglevel"
> > kernel parameter to make sure we see everything on the console.
>
> Added.
>
> > Does this machine have an iLO? If so, it may have logs that
> > could be useful if this is related to some sort of bus error.
>
> Nothing in the ILO logs.
Great, thanks!
Can you try the patch below? This is extracted from the code here:
https://github.com/joyent/illumos-joyent/blob/b6a0b04d591f5b877cfe05f45e81f0e8a5cfc2b3/usr/src/uts/intel/io/pci/pci_boot.c#L1805
I'm not sure why this would be only an intermittent problem, but at
least we can see if this is related.
diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 6bc27b7fd452..842f900ed194 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -5113,3 +5113,15 @@ DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_MICROSEMI, 0x8575,
quirk_switchtec_ntb_dma_alias);
DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_MICROSEMI, 0x8576,
quirk_switchtec_ntb_dma_alias);
+
+static void quirk_amd_8111(struct pci_dev *pdev)
+{
+ u8 ioc;
+
+ pci_read_config_byte(pdev, 0x40, &ioc);
+ if (ioc & 0x80) {
+ pci_info(pdev, "disabling NMI on error\n");
+ pci_write_config_byte(pdev, 0x40, ioc & ~0x80);
+ }
+}
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_AMD, 0x7468, quirk_amd_8111);
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: HP DL585 warm boot fail (old)
2018-10-24 13:49 ` Bjorn Helgaas
@ 2018-10-24 14:47 ` Meelis Roos
2018-10-24 16:34 ` Bjorn Helgaas
0 siblings, 1 reply; 6+ messages in thread
From: Meelis Roos @ 2018-10-24 14:47 UTC (permalink / raw)
To: Bjorn Helgaas; +Cc: Bjorn Helgaas, Linux Kernel Mailing List, linux-pci
> Can you try the patch below? This is extracted from the code here:
> https://github.com/joyent/illumos-joyent/blob/b6a0b04d591f5b877cfe05f45e81f0e8a5cfc2b3/usr/src/uts/intel/io/pci/pci_boot.c#L1805
Thank you. Unfortunately it does not change anything noticable.
> I'm not sure why this would be only an intermittent problem, but at
> least we can see if this is related.
It seems 4.19 and current git are 100% reproducers so far - I have not managed to
successfully boot either of them yet. I have seen 4.19-rc1 era git kernel booting at least once.
I noticed that Debian packaged 4.17 with initramfs worked fine so far for my test,
from these I have in grub menu. My selfcompiled kernels do not use initramfs.
--
Meelis Roos <mroos@linux.ee>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: HP DL585 warm boot fail (old)
2018-10-24 14:47 ` HP " Meelis Roos
@ 2018-10-24 16:34 ` Bjorn Helgaas
0 siblings, 0 replies; 6+ messages in thread
From: Bjorn Helgaas @ 2018-10-24 16:34 UTC (permalink / raw)
To: Meelis Roos; +Cc: Bjorn Helgaas, Linux Kernel Mailing List, linux-pci
On Wed, Oct 24, 2018 at 05:47:17PM +0300, Meelis Roos wrote:
> > Can you try the patch below? This is extracted from the code here:
> > https://github.com/joyent/illumos-joyent/blob/b6a0b04d591f5b877cfe05f45e81f0e8a5cfc2b3/usr/src/uts/intel/io/pci/pci_boot.c#L1805
>
> Thank you. Unfortunately it does not change anything noticable.
Do you see the "disabling NMI on error" message?
Can you boot with "pci=earlydump vga=0xf07" and capture the output?
Drop the "vga=0xf07" if it doesn't work or makes the screen
unreadable.
> > I'm not sure why this would be only an intermittent problem, but at
> > least we can see if this is related.
>
> It seems 4.19 and current git are 100% reproducers so far - I have
> not managed to successfully boot either of them yet. I have seen
> 4.19-rc1 era git kernel booting at least once.
>
> I noticed that Debian packaged 4.17 with initramfs worked fine so
> far for my test, from these I have in grub menu. My selfcompiled
> kernels do not use initramfs.
It seems like the hang happens long before we would do anything with
an initramfs, but maybe there's a timing or memory map issue. It
seems like a hassle to pursue this angle, but if we can't figure it
out otherwise, maybe we'll have to.
Bjorn
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2018-10-24 16:34 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-07-06 19:12 HH DL585 warm boot fail (old) Meelis Roos
2018-07-09 13:38 ` Bjorn Helgaas
2018-10-24 7:47 ` Meelis Roos
2018-10-24 13:49 ` Bjorn Helgaas
2018-10-24 14:47 ` HP " Meelis Roos
2018-10-24 16:34 ` Bjorn Helgaas
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).