All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Michael S. Tsirkin" <mst@redhat.com>
To: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>, Fam Zheng <famz@redhat.com>,
	"linux-pci@vger.kernel.org" <linux-pci@vger.kernel.org>,
	Rusty Russell <rusty@rustcorp.com.au>,
	Ulrich Obergfell <uobergfe@redhat.com>,
	Yinghai Lu <yhlu.kernel.send@gmail.com>,
	Yijing Wang <wangyijing@huawei.com>,
	Yinghai Lu <yinghai@kernel.org>
Subject: Re: [PATCH v6 04/10] PCI/MSI: Don't disable MSI/MSI-X at shutdown
Date: Tue, 14 Apr 2015 11:44:26 +0200	[thread overview]
Message-ID: <20150414112739-mutt-send-email-mst@redhat.com> (raw)
In-Reply-To: <87twwkugd0.fsf@x220.int.ebiederm.org>

On Mon, Apr 13, 2015 at 11:45:31AM -0500, Eric W. Biederman wrote:
> Bjorn Helgaas <bhelgaas@google.com> writes:
> 
> > On Mon, Apr 13, 2015 at 4:37 AM, Fam Zheng <famz@redhat.com> wrote:
> >> Hi Bjorn,
> >>
> >> On Fri, 04/10 17:54, Bjorn Helgaas wrote:
> >>> From: Michael S. Tsirkin <mst@redhat.com>
> >>>
> >>> d52877c7b1af ("pci/irq: let pci_device_shutdown to call pci_msi_shutdown
> >>> v2") disabled MSI/MSI-X at device shutdown to address a kexec problem.
> >>>
> >>> The problem is that after we disable MSI, the device may assert INTx, and
> >>> if the driver hasn't registered an interrupt handler for it, the interrupt
> >>> is never deasserted and causes a kernel hang.  In particular, this was
> >>> observed with virtio.
> >>>
> >>> We now disable MSI/MSI-X for all devices during enumeration regardless of
> >>> CONFIG_PCI_MSI.  This solves the kexec problem in the new kernel, not the
> >>> old one.
> >>>
> >>> Stop disabling MSIs at shutdown to avoid the kernel hang.
> >>>
> >>> XXX bugzilla reference, details about how the hang happens?
> >>
> >> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=96571
> >>
> >> Please let me know if you need any further information in the bug.
> >
> > Please attach a complete dmesg log.  The bugzilla doesn't really have
> > any new information other than that you see a soft lockup.  I'm trying
> > to connect more of the dots between a spurious interrupt and a hang or
> > soft lockup.
> >
> 
> The bugzilla implies that there is a screaming irq (which causes the
> softlockup when they disable the kernels protections for buggy irqs).
> 
> > It doesn't seem right that a spurious interrupt could cause a hang or
> > soft lockup.
> 
> The interrupt handler keeps firing.
> 
> > I would think Linux would emit a message about the
> > unexpected interrupt, but would otherwise be relatively unconcerned.
> 
> That was disabled on the kernel command line.
> 
> > So I'm trying to figure out why my assumption is wrong.  Probably this
> > is just because I don't know much about Linux IRQ handling.
> >
> > Having more details, e.g., a stacktrace fragment from a soft lockup,
> > can also help people connect a problem they're seeing with the
> > solution.  It's pretty hard to google for "kernel hang," but if you
> > can google for a soft lockup in a specific function, that can be much
> > more useful.
> 
> The thing is not disabling msi interrupts for the case described in the
> buzilla report is the wrong fix.
> 
> The report is about a buggy driver doing the wrong thing.  Until someone
> ships a system that is msi native (aka no intx support) disabling msi
> interrupts as shutdown is the right thing to do.  If there is something
> that handles intx interrupts it is not an msi native system.
> 
> The real bug is probably disabling bugging interrupt detection on the
> kernel command line.
> 
> Beyond that to handle kexec cleanly something needs to stop the
> interrupts and stop the the DMA transfers.   Which in the short term
> means someone probably needs to write a shutdown method for the buggy
> driver.
> 
> An interrupt coming in almost always implies a DMA having completed,
> and if that DMA completed in the wrong spot the kexec'd kernel will be
> toast.
> 
> We disable interrupts at boot so that a kernel started with
> kexec-on-panic (which doesn't shut anything down) can boot.  There are
> probably other valid use cases (like native msi interrupts) but I am not
> aware of them.  But according to the pci spec shutting down msi
> interrupts at boot should be a noop.
> 
> So in summary not disabling MSI/MSI-X at shutdown is the wrong fix,
> and someone needs to fix a buggy driver.
> 
> Eric

I'm not all that worried about this patch making it into stable.  So I
suggest for now we ignore the bugzilla and just focus on the patch
itself.

And the patch itself is not about a buggy driver.  It's about
a correct driver causing screaming interrupts because
pci core decided to disable msi at shutdown.

Which is not necessary for two reasons:
- because previous patches disable msi when kexec starts now
- because suppressing DMA automatically suppresses MSI
  as well




-- 
MST

  reply	other threads:[~2015-04-14  9:44 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-04-10 22:54 [PATCH v6 00/10] PCI: Fix unhandled interrupt on shutdown Bjorn Helgaas
2015-04-10 22:54 ` [PATCH v6 01/10] PCI/MSI: Rename msi_set_enable(), msix_clear_and_set_ctrl() Bjorn Helgaas
2015-04-11  7:30   ` Greg KH
2015-04-11 16:01     ` Bjorn Helgaas
2015-04-10 22:54 ` [PATCH v6 02/10] PCI/MSI: Export pci_msi_set_enable(), pci_msix_clear_and_set_ctrl() Bjorn Helgaas
2015-04-10 22:54 ` [PATCH v6 03/10] PCI/MSI: Disable MSI at enumeration even if kernel doesn't support MSI Bjorn Helgaas
2015-04-10 22:54 ` [PATCH v6 04/10] PCI/MSI: Don't disable MSI/MSI-X at shutdown Bjorn Helgaas
2015-04-13  9:37   ` Fam Zheng
2015-04-13 15:41     ` Bjorn Helgaas
2015-04-13 16:45       ` Eric W. Biederman
2015-04-14  9:44         ` Michael S. Tsirkin [this message]
2015-04-16 19:42         ` Bjorn Helgaas
2015-04-17  1:05           ` Fam Zheng
2015-04-14  9:47       ` Michael S. Tsirkin
2015-04-14 10:45         ` Fam Zheng
2015-04-14 10:49           ` Michael S. Tsirkin
2015-04-16  7:30   ` Michael S. Tsirkin
2015-04-10 22:54 ` [PATCH v6 05/10] PCI/MSI: Make pci_msi_shutdown(), pci_msix_shutdown() static Bjorn Helgaas
2015-04-10 22:55 ` [PATCH v6 06/10] virtio_pci: drop pci_msi_off() call during probe Bjorn Helgaas
2015-04-10 22:55 ` [PATCH v6 07/10] ntb: Drop " Bjorn Helgaas
2015-04-10 22:55 ` [PATCH v6 08/10] mic: " Bjorn Helgaas
2015-04-10 22:55 ` [PATCH v6 09/10] PCI/MSI: Drop pci_msi_off() calls from quirks Bjorn Helgaas
2015-04-10 22:55 ` [PATCH v6 10/10] PCI/MSI: Remove unused pci_msi_off() Bjorn Helgaas
2015-04-26  6:50 ` [PATCH v6 00/10] PCI: Fix unhandled interrupt on shutdown Michael S. Tsirkin
2015-05-06 21:03   ` Bjorn Helgaas
2015-05-07  0:53     ` Eric W. Biederman
2015-05-07 15:04       ` Bjorn Helgaas
2015-05-10 11:05         ` Michael S. Tsirkin
2015-05-10 11:09     ` Michael S. Tsirkin
2015-05-10 11:42       ` Bjorn Helgaas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150414112739-mutt-send-email-mst@redhat.com \
    --to=mst@redhat.com \
    --cc=bhelgaas@google.com \
    --cc=ebiederm@xmission.com \
    --cc=famz@redhat.com \
    --cc=linux-pci@vger.kernel.org \
    --cc=rusty@rustcorp.com.au \
    --cc=uobergfe@redhat.com \
    --cc=wangyijing@huawei.com \
    --cc=yhlu.kernel.send@gmail.com \
    --cc=yinghai@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.