linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Wei Yang <richard.weiyang@gmail.com>
To: Bjorn Helgaas <helgaas@kernel.org>
Cc: Wei Yang <richard.weiyang@gmail.com>,
	Govindarajulu Varadarajan <gvaradar@cisco.com>,
	benve@cisco.com, Bjorn Helgaas <bhelgaas@google.com>,
	linux-pci@vger.kernel.org,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	jlbec@evilplan.org, hch@lst.de, Ingo Molnar <mingo@redhat.com>,
	peterz@infradead.org, okaya@codeaurora.org,
	Alexander Duyck <alexander.h.duyck@intel.com>,
	Gavin Shan <gwshan@linux.vnet.ibm.com>
Subject: Re: [PATCH V2] PCI: AER: fix deadlock in do_recovery
Date: Fri, 6 Oct 2017 09:11:00 +0800	[thread overview]
Message-ID: <20171006011100.GA43201@WeideMacBook-Pro.local> (raw)
In-Reply-To: <20171005184209.GV25517@bhelgaas-glaptop.roam.corp.google.com>

On Thu, Oct 05, 2017 at 01:42:09PM -0500, Bjorn Helgaas wrote:
>On Thu, Oct 05, 2017 at 11:05:12PM +0800, Wei Yang wrote:
>> On Wed, Oct 4, 2017 at 5:15 AM, Bjorn Helgaas <helgaas@kernel.org> wrote:
>> > [+cc Alex, Gavin, Wei]
>> >
>> > On Fri, Sep 29, 2017 at 10:49:38PM -0700, Govindarajulu Varadarajan wrote:
>> >> CPU0                                  CPU1
>> >> ---------------------------------------------------------------------
>> >> __driver_attach()
>> >> device_lock(&dev->mutex) <--- device mutex lock here
>> >> driver_probe_device()
>> >> pci_enable_sriov()
>> >> pci_iov_add_virtfn()
>> >> pci_device_add()
>> >>                                       aer_isr()               <--- pci aer error
>> >>                                       do_recovery()
>> >>                                       broadcast_error_message()
>> >>                                       pci_walk_bus()
>> >>                                       down_read(&pci_bus_sem) <--- rd sem
>> >> down_write(&pci_bus_sem) <-- stuck on wr sem
>> >>                                       report_error_detected()
>> >>                                       device_lock(&dev->mutex)<--- DEAD LOCK
>> >>
>> >> This can also happen when aer error occurs while pci_dev->sriov_config() is
>> >> called.
>> >>
>> >> This patch does a pci_bus_walk and adds all the devices to a list. After
>> >> unlocking (up_read) &pci_bus_sem, we go through the list and call
>> >> err_handler of the devices with devic_lock() held. This way, we dont try
>> >> to hold both locks at same time.
>> >
>> > I feel like we're working too hard to come up with an ad hoc solution
>> > for this lock ordering problem: the __driver_attach() path acquires
>> > the device lock, then the pci_bus_sem; the AER path acquires
>> > pci_bus_sem, then the device lock.
>> >
>> > To me, the pci_bus_sem, then device lock order seems natural.  The
>> > pci_bus_sem protects all the bus device lists, so it makes sense to
>> > hold it while iterating over those lists.  And if we're operating on
>> > one of those devices while we're iterating, it makes sense to acquire
>> > the device lock.
>> >
>> > The pci_enable_sriov() path is the one that feels strange to me.
>> > We're in a driver probe method, and, surprise!, brand-new devices show
>> > up and we basically ask the PCI core to enumerate them synchronously
>> > while still in the probe method.
>> >
>> > Is there some reason this enumeration has to be done synchronously?
>> > I wonder if we can get that piece out of the driver probe path, e.g.,
>> > by queuing up the pci_iov_add_virtfn() part to be done later, in a
>> > path where we're not holding a device lock?
>> >
>> 
>> Hi, Bjorn,
>> 
>> First let me catch up with the thread.
>> 
>> We have two locking sequence:
>> 1. pci_bus_sem -> device lock, which is natural
>> 2. device lock -> pci_bus_sem, which is not
>
>Right.  Or at least, that's my assertion :)  I could be convinced
>otherwise.
>
>> pci_enable_sriov() sits in class #2 and your suggestion is to move the
>> pci_iov_add_virtfn() to some queue which will avoid case #2.
>> 
>> If we want to implement your suggestion, one thing unclear to me is
>> how would we handle the error path? Add a notification for the
>> failure? This would be easy for the core kernel, while some big change
>> for those drivers.
>
>My suggestion was for discussion.  It's entirely possible it will turn
>out not to be feasible.
>
>We're only talking about errors from pci_iov_add_virtfn() here.  We
>can still return all the other existing errors from sriov_enable(),
>which the driver can see.  These errors seem more directly related to
>the PF itself.
>
>The pci_iov_add_virtfn() errors are enumeration-type errors (failure
>to add a bus, failure to read config space of a VF, etc.)  These
>feel more like PCI core issues to me.  The driver isn't going to be
>able to do anything about them.
>

Ideally, PF and VF has their own probe function and they don't interfere each
other. From this point of view, I agree these failures are not handled by
drivers.

While in the real implementation, I am not 100% for sure the PF driver
operates without the knowledge of enabled VFs.

>The end result would likely be that a VF is enabled in the hardware
>but not added as a PCI device.  The same errors can occur during
>boot-time or hotplug-time enumeration of non-SR-IOV devices.
>
>Are these sort of errors important to the PF driver?  If the PF driver
>can get along without them, maybe we can use the same strategy as when
>we enumerate all other devices, i.e., log something in dmesg and
>continue on without the device.
>

Besides the functionality, I have another concern on the behavior change.

Current behavior is the VFs will be enabled ALL or NONE, which we will add a
third condition PARTIAL.

For example, the sys admin wants to enable 5 VFs while leads to 3 enabled at
last.

Hmm, not a big deal, while need to inform the users.

>Bjorn

-- 
Wei Yang
Help you, Help me

      reply	other threads:[~2017-10-06  1:11 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-09-30  5:49 [PATCH V2] PCI: AER: fix deadlock in do_recovery Govindarajulu Varadarajan
2017-09-30 13:31 ` Sinan Kaya
2017-10-03  0:19   ` Govindarajulu Varadarajan
2017-10-01  7:55 ` Christoph Hellwig
2017-10-03  0:14   ` Govindarajulu Varadarajan
2017-10-03  8:09     ` Christoph Hellwig
2017-10-03 21:15 ` Bjorn Helgaas
2017-10-05 15:05   ` Wei Yang
2017-10-05 18:42     ` Bjorn Helgaas
2017-10-06  1:11       ` Wei Yang [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171006011100.GA43201@WeideMacBook-Pro.local \
    --to=richard.weiyang@gmail.com \
    --cc=alexander.h.duyck@intel.com \
    --cc=benve@cisco.com \
    --cc=bhelgaas@google.com \
    --cc=gvaradar@cisco.com \
    --cc=gwshan@linux.vnet.ibm.com \
    --cc=hch@lst.de \
    --cc=helgaas@kernel.org \
    --cc=jlbec@evilplan.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=okaya@codeaurora.org \
    --cc=peterz@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).