From: Bjorn Helgaas <helgaas@kernel.org>
To: Jesse Hathaway <jesse@mbuki-mvuki.org>
Cc: Ingo Molnar <mingo@redhat.com>,
Peter Zijlstra <peterz@infradead.org>,
linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org
Subject: Re: Regression causes a hang on boot with a Comtrol PCI card
Date: Thu, 21 Mar 2019 18:23:10 -0500 [thread overview]
Message-ID: <20190321232310.GL251185@google.com> (raw)
In-Reply-To: <CANSNSoXPFDu9RQAUA6dUCUrSCAj68q-Nj2W7ECz3fKpFtSNU+Q@mail.gmail.com>
On Thu, Mar 14, 2019 at 03:57:07PM -0500, Jesse Hathaway wrote:
> > > 1302fcf0d03e (refs/bisect/bad) PCI: Configure *all* devices, not just
> > > hot-added ones
> > > 1c3c5eab1715 sched/core: Enable might_sleep() and smp_processor_id()
> > > checks early
> >
> > How did you narrow it down to *two* commits, and do you have to revert
> > both of them to avoid the hang? Usually a bisection identifies a
> > single commit, and the two you mention aren't related.
>
> Sorry I should have been more verbose in what the bisection process was, I
> found the problem after attempting to upgrade from linux v3.16 to v4.9. When
> v4.9 hung I tried the latest kernel, v5.0, which also hanged. I began a git
> bisect, but found there was more than one bad commit. Here is my current
> understanding:
>
> - [x] v3.18 vanilla, 1302fcf0d03e committed, hangs
> - [x] v3.18 with revert of 1302fcf0d03e, works
> .
> .
> .
> - [x] v4.12 vanilla, hangs
> - [x] v4.12 with revert of 1302fcf0d03e, works
>
> - [x] v4.13 vanilla, 1c3c5eab1715 committed, hangs
> - [x] v4.13 with revert of 1302fcf0d03e, hangs
> - [x] v4.13 with revert of 1c3c5eab1715, hangs
> - [x] v4.13 with revert of 1302fcf0d03e & 1c3c5eab1715, works
>
> - [x] v5.0 vanilla, hangs
> - [x] v5.0 with revert of 1302fcf0d03e & 1c3c5eab1715, works
Thanks! I doubt either of those commits is the real problem, but
they're both related to system_state, so it's conceivable they're both
involved in exposing the problem.
> > Can you collect a complete dmesg log (with a working kernel) and
> > output of "sudo lspci -vvxxx"? You can open a bug report at
> > https://bugzilla.kernel.org, attach the logs there, and respond here
> > with the URL.
>
> Bug submitted along with the requested logs,
> https://bugzilla.kernel.org/show_bug.cgi?id=202927
Thanks for that.
> > Where does the hang happen? Is it when we configure the Comtrol card?
>
> Hang occurs after PCI is initialized, snippet below, I have included the full
> output in the bug report:
>
> [ 10.561971] pci 0000:81:00.0: bridge window [mem 0xc8000000-0xc80fffff]
> [ 10.569661] pci 0000:80:01.0: PCI bridge to [bus 81-82]
> [ 10.575594] pci 0000:80:01.0: bridge window [mem 0xc8000000-0xc80fffff]
> [ 10.583278] pci 0000:80:03.0: PCI bridge to [bus 83]
> [ 10.589008] NET: Registered protocol family 2
> [ 10.594254] tcp_listen_portaddr_hash hash table entries: 65536
> (order: 8, 1048576 bytes)
> [ 10.603671] TCP established hash table entries: 524288 (order: 10,
> 4194304 bytes)
> [ 10.612729] TCP bind hash table entries: 65536 (order: 8, 1048576 bytes)
> [ 10.620446] TCP: Hash tables configured (established 524288 bind 65536)
> [ 10.628124] UDP hash table entries: 65536 (order: 9, 2097152 bytes)
> [ 10.635541] UDP-Lite hash table entries: 65536 (order: 9, 2097152 bytes)
> [ 10.643669] NET: Registered protocol family 1
The successful boot continues on with this:
[ 10.675996] pci 0000:00:1a.0: quirk_usb_early_handoff+0x0/0x6a0 took 22519 usecs
[ 10.684519] pci 0000:03:00.0: [Firmware Bug]: disabling VPD access (can't determine size of non-standard VPD for)
[ 10.696404] pci 0000:03:00.0: quirk_blacklist_vpd+0x0/0x30 took 11605 usecs
[ 10.704515] pci 0000:0b:00.0: Video device with shadowed ROM at [mem 0x000c0000-0x000dffff]
So apparently the hang happens while we're running the "final" PCI
fixups. This happens after all the rest of PCI is initialized.
Can you boot v5.0 vanilla with "initcall_debug"? Maybe we can narrow
it down to a specific quirk.
Bjorn
next prev parent reply other threads:[~2019-03-21 23:23 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-03-13 16:50 Regression causes a hang on boot with a Comtrol PCI card Jesse Hathaway
2019-03-13 23:21 ` Bjorn Helgaas
2019-03-14 20:57 ` Jesse Hathaway
2019-03-21 20:36 ` Jesse Hathaway
2019-03-21 23:23 ` Bjorn Helgaas [this message]
2019-03-22 20:02 ` Jesse Hathaway
2019-04-01 19:43 ` Jesse Hathaway
2019-04-01 21:13 ` Bjorn Helgaas
2019-04-02 14:29 ` Alan Stern
2019-04-02 14:49 ` Mathias Nyman
2019-04-02 18:26 ` Alan Stern
2019-04-04 15:41 ` Jesse Hathaway
2019-04-04 17:16 ` Alan Stern
2019-04-04 17:36 ` Jesse Hathaway
2019-04-04 19:14 ` Alan Stern
2019-04-05 21:27 ` Jesse Hathaway
2019-04-06 15:32 ` Alan Stern
2019-04-15 21:47 ` Jesse Hathaway
2019-04-16 15:00 ` Alan Stern
2019-04-23 20:18 ` Jesse Hathaway
2019-04-24 14:20 ` Alan Stern
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190321232310.GL251185@google.com \
--to=helgaas@kernel.org \
--cc=jesse@mbuki-mvuki.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.