linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Michael S. Tsirkin" <mst@redhat.com>
To: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>, Fam Zheng <famz@redhat.com>,
	"linux-pci@vger.kernel.org" <linux-pci@vger.kernel.org>,
	Rusty Russell <rusty@rustcorp.com.au>,
	Ulrich Obergfell <uobergfe@redhat.com>,
	Yinghai Lu <yhlu.kernel.send@gmail.com>,
	Yijing Wang <wangyijing@huawei.com>,
	Yinghai Lu <yinghai@kernel.org>
Subject: Re: [PATCH v6 04/10] PCI/MSI: Don't disable MSI/MSI-X at shutdown
Date: Tue, 14 Apr 2015 11:44:26 +0200	[thread overview]
Message-ID: <20150414112739-mutt-send-email-mst@redhat.com> (raw)
In-Reply-To: <87twwkugd0.fsf@x220.int.ebiederm.org>

On Mon, Apr 13, 2015 at 11:45:31AM -0500, Eric W. Biederman wrote:
> Bjorn Helgaas <bhelgaas@google.com> writes:
> 
> > On Mon, Apr 13, 2015 at 4:37 AM, Fam Zheng <famz@redhat.com> wrote:
> >> Hi Bjorn,
> >>
> >> On Fri, 04/10 17:54, Bjorn Helgaas wrote:
> >>> From: Michael S. Tsirkin <mst@redhat.com>
> >>>
> >>> d52877c7b1af ("pci/irq: let pci_device_shutdown to call pci_msi_shutdown
> >>> v2") disabled MSI/MSI-X at device shutdown to address a kexec problem.
> >>>
> >>> The problem is that after we disable MSI, the device may assert INTx, and
> >>> if the driver hasn't registered an interrupt handler for it, the interrupt
> >>> is never deasserted and causes a kernel hang.  In particular, this was
> >>> observed with virtio.
> >>>
> >>> We now disable MSI/MSI-X for all devices during enumeration regardless of
> >>> CONFIG_PCI_MSI.  This solves the kexec problem in the new kernel, not the
> >>> old one.
> >>>
> >>> Stop disabling MSIs at shutdown to avoid the kernel hang.
> >>>
> >>> XXX bugzilla reference, details about how the hang happens?
> >>
> >> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=96571
> >>
> >> Please let me know if you need any further information in the bug.
> >
> > Please attach a complete dmesg log.  The bugzilla doesn't really have
> > any new information other than that you see a soft lockup.  I'm trying
> > to connect more of the dots between a spurious interrupt and a hang or
> > soft lockup.
> >
> 
> The bugzilla implies that there is a screaming irq (which causes the
> softlockup when they disable the kernels protections for buggy irqs).
> 
> > It doesn't seem right that a spurious interrupt could cause a hang or
> > soft lockup.
> 
> The interrupt handler keeps firing.
> 
> > I would think Linux would emit a message about the
> > unexpected interrupt, but would otherwise be relatively unconcerned.
> 
> That was disabled on the kernel command line.
> 
> > So I'm trying to figure out why my assumption is wrong.  Probably this
> > is just because I don't know much about Linux IRQ handling.
> >
> > Having more details, e.g., a stacktrace fragment from a soft lockup,
> > can also help people connect a problem they're seeing with the
> > solution.  It's pretty hard to google for "kernel hang," but if you
> > can google for a soft lockup in a specific function, that can be much
> > more useful.
> 
> The thing is not disabling msi interrupts for the case described in the
> buzilla report is the wrong fix.
> 
> The report is about a buggy driver doing the wrong thing.  Until someone
> ships a system that is msi native (aka no intx support) disabling msi
> interrupts as shutdown is the right thing to do.  If there is something
> that handles intx interrupts it is not an msi native system.
> 
> The real bug is probably disabling bugging interrupt detection on the
> kernel command line.
> 
> Beyond that to handle kexec cleanly something needs to stop the
> interrupts and stop the the DMA transfers.   Which in the short term
> means someone probably needs to write a shutdown method for the buggy
> driver.
> 
> An interrupt coming in almost always implies a DMA having completed,
> and if that DMA completed in the wrong spot the kexec'd kernel will be
> toast.
> 
> We disable interrupts at boot so that a kernel started with
> kexec-on-panic (which doesn't shut anything down) can boot.  There are
> probably other valid use cases (like native msi interrupts) but I am not
> aware of them.  But according to the pci spec shutting down msi
> interrupts at boot should be a noop.
> 
> So in summary not disabling MSI/MSI-X at shutdown is the wrong fix,
> and someone needs to fix a buggy driver.
> 
> Eric

I'm not all that worried about this patch making it into stable.  So I
suggest for now we ignore the bugzilla and just focus on the patch
itself.

And the patch itself is not about a buggy driver.  It's about
a correct driver causing screaming interrupts because
pci core decided to disable msi at shutdown.

Which is not necessary for two reasons:
- because previous patches disable msi when kexec starts now
- because suppressing DMA automatically suppresses MSI
  as well




-- 
MST

  reply	other threads:[~2015-04-14  9:44 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-04-10 22:54 [PATCH v6 00/10] PCI: Fix unhandled interrupt on shutdown Bjorn Helgaas
2015-04-10 22:54 ` [PATCH v6 01/10] PCI/MSI: Rename msi_set_enable(), msix_clear_and_set_ctrl() Bjorn Helgaas
2015-04-11  7:30   ` Greg KH
2015-04-11 16:01     ` Bjorn Helgaas
2015-04-10 22:54 ` [PATCH v6 02/10] PCI/MSI: Export pci_msi_set_enable(), pci_msix_clear_and_set_ctrl() Bjorn Helgaas
2015-04-10 22:54 ` [PATCH v6 03/10] PCI/MSI: Disable MSI at enumeration even if kernel doesn't support MSI Bjorn Helgaas
2015-04-10 22:54 ` [PATCH v6 04/10] PCI/MSI: Don't disable MSI/MSI-X at shutdown Bjorn Helgaas
2015-04-13  9:37   ` Fam Zheng
2015-04-13 15:41     ` Bjorn Helgaas
2015-04-13 16:45       ` Eric W. Biederman
2015-04-14  9:44         ` Michael S. Tsirkin [this message]
2015-04-16 19:42         ` Bjorn Helgaas
2015-04-17  1:05           ` Fam Zheng
2015-04-14  9:47       ` Michael S. Tsirkin
2015-04-14 10:45         ` Fam Zheng
2015-04-14 10:49           ` Michael S. Tsirkin
2015-04-16  7:30   ` Michael S. Tsirkin
2015-04-10 22:54 ` [PATCH v6 05/10] PCI/MSI: Make pci_msi_shutdown(), pci_msix_shutdown() static Bjorn Helgaas
2015-04-10 22:55 ` [PATCH v6 06/10] virtio_pci: drop pci_msi_off() call during probe Bjorn Helgaas
2015-04-10 22:55 ` [PATCH v6 07/10] ntb: Drop " Bjorn Helgaas
2015-04-10 22:55 ` [PATCH v6 08/10] mic: " Bjorn Helgaas
2015-04-10 22:55 ` [PATCH v6 09/10] PCI/MSI: Drop pci_msi_off() calls from quirks Bjorn Helgaas
2015-04-10 22:55 ` [PATCH v6 10/10] PCI/MSI: Remove unused pci_msi_off() Bjorn Helgaas
2015-04-26  6:50 ` [PATCH v6 00/10] PCI: Fix unhandled interrupt on shutdown Michael S. Tsirkin
2015-05-06 21:03   ` Bjorn Helgaas
2015-05-07  0:53     ` Eric W. Biederman
2015-05-07 15:04       ` Bjorn Helgaas
2015-05-10 11:05         ` Michael S. Tsirkin
2015-05-10 11:09     ` Michael S. Tsirkin
2015-05-10 11:42       ` Bjorn Helgaas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150414112739-mutt-send-email-mst@redhat.com \
    --to=mst@redhat.com \
    --cc=bhelgaas@google.com \
    --cc=ebiederm@xmission.com \
    --cc=famz@redhat.com \
    --cc=linux-pci@vger.kernel.org \
    --cc=rusty@rustcorp.com.au \
    --cc=uobergfe@redhat.com \
    --cc=wangyijing@huawei.com \
    --cc=yhlu.kernel.send@gmail.com \
    --cc=yinghai@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).