All of lore.kernel.org
 help / color / mirror / Atom feed
From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
To: Sander Eikelenboom <linux@eikelenboom.it>
Cc: gregkh@linuxfoundation.org, boris.ostrovsky@oracle.com,
	David Vrabel <david.vrabel@citrix.com>,
	linux-kernel@vger.kernel.org, xen-devel@lists.xenproject.org
Subject: Re: [Xen-devel] [PATCH v5] Fixes to Xen pciback for 3.17.
Date: Wed, 6 Aug 2014 15:39:16 -0400	[thread overview]
Message-ID: <20140806193916.GA31040@laptop.dumpdata.com> (raw)
In-Reply-To: <1081188855.20140806212559@eikelenboom.it>

On Wed, Aug 06, 2014 at 09:25:59PM +0200, Sander Eikelenboom wrote:
> 
> Wednesday, August 6, 2014, 9:18:31 PM, you wrote:
> 
> > On Wed, Aug 06, 2014 at 08:59:59PM +0200, Sander Eikelenboom wrote:
> >> 
> >> Tuesday, August 5, 2014, 4:04:43 PM, you wrote:
> >> 
> >> 
> >> > Tuesday, August 5, 2014, 3:49:30 PM, you wrote:
> >> 
> >> >> On Tue, Aug 05, 2014 at 11:44:33AM +0200, Sander Eikelenboom wrote:
> >> >>> 
> >> >>> Tuesday, August 5, 2014, 11:31:08 AM, you wrote:
> >> >>> 
> >> >>> > On 05/08/14 09:44, Sander Eikelenboom wrote:
> >> >>> >> 
> >> >>> >> Monday, August 4, 2014, 8:43:18 PM, you wrote:
> >> >>> >> 
> >> >>> >>> On Fri, Aug 01, 2014 at 04:30:05PM +0100, David Vrabel wrote:
> >> >>> >>>> On 14/07/14 17:18, Konrad Rzeszutek Wilk wrote:
> >> >>> >>>>> Greg: goto GHK
> >> >>> >>>>>
> >> >>> >>>>> This is v5 version of patches to fix some issues in Xen PCIback.
> >> >>> >>>>
> >> >>> >>>> Applied to devel/for-linus-3.17.
> >> >>> >> 
> >> >>> >>> Thank you.
> >> >>> >>>>
> >> >>> >>>> I dropped the stable Cc for #2 pending a final decision on whether it
> >> >>> >>>> really is a stable candidate.
> >> >>> >> 
> >> >>> >>> OK.
> >> >>> >>>>
> >> >>> >>>> David
> >> >>> >> 
> >> >>> >> Hi Konrad / David,
> >> >>> >> 
> >> >>> >> This series still lacks a resolution on the sysfs /do_flr /reset,
> >> >>> >> as a result the pci devices are not reset after shutdown of a guest.
> >> >>> >> (no more pciback 0000:xx:xx.x: restoring config space at offset xxx)
> >> >>> >> 
> >> >>> >> So this series now introduces a regression to 3.16, which causes devices to malfunction 
> >> >>> >> after a guest reboot or after assigning the devices to another guest.
> >> >>> 
> >> >>> > I don't follow what you're saying.  The lack of a device reset for PCI
> >> >>> > devices with no FLR method isn't a regression as this has never worked.
> >> >>> >  Can you explain in more detail what the regression is and which patch
> >> >>> > caused it?
> >> >>> 
> >> >>> I haven't bisected it to a specific patch in this series,
> >> >>> but this patch series (when pulled on top of 3.16) cause the following:
> >> >>> 
> >> >>> - Do a system start and HVM guest start
> >> >>> - HVM guest with pci passthrough, devices work fine
> >> >>> - shutdown the HVM guest
> >> >>> - "pciback 0000:xx:xx.x: restoring config space at offset xxx" messages do not
> >> >>>   appear anymore when shutting down the HVM guest (as they do with vanilla 3.16)
> >> >>> - Starting the HVM guest again with the same devices passed through.
> >> >>> - Devices malfunction (for example a USB host controller will fail a simple 
> >> >>>   "lsusb"
> >> >>> - And this all works fine on vanilla 3.16.  
> >> 
> >> >> Hm, the only patch that makes code changes is 63fc5ec97cc54257d1c4ee49ed2131f754a5ff9b
> >> >> "xen/pciback: Don't deadlock when unbinding."
> >> >> but it does not change any of that code path. Only figures out whether
> >> >> to take a lock or not.
> >> 
> >> > Ok and the do_flr nack by david is unrelated to this part (i didn't check just 
> >> > assumed there could be a connection)
> >> 
> >> >> I will try it out on my box and see if I can reproduce it.
> >> 
> >> >> And just to be 100% sure - you are using vanilla Xen? No changes on top
> >> >> of it?
> >> 
> >> > Except the fix from jan for the pirq/msi stuff (and an unrelated hpet one), other than that no.
> >> > If you can't reproduce i will see if i can dive deeper into it tonight !
> >> 
> >> Hi Konrad,
> >> 
> >> It looks like the issues is this part of the change:
> >> 
> >>     --- a/drivers/xen/xen-pciback/pci_stub.c
> >>     +++ b/drivers/xen/xen-pciback/pci_stub.c
> >>     @@ -250,6 +250,8 @@ struct pci_dev *pcistub_get_pci_dev(struct xen_pcibk_device *pdev,
> >>     * - 'echo BDF > unbind' with a guest still using it. See pcistub_remove
> >>     *
> >>     * As such we have to be careful.
> >>     + *
> >>     + * To make this easier, the caller has to hold the device lock.
> >>     */
> >>     void pcistub_put_pci_dev(struct pci_dev *dev)
> >>     {
> >>     @@ -276,11 +278,8 @@ void pcistub_put_pci_dev(struct pci_dev *dev)
> >>     /* Cleanup our device
> >>     * (so it's ready for the next domain)
> >>     */
> >>     -
> >>     - /* This is OK - we are running from workqueue context
> >>     - * and want to inhibit the user from fiddling with 'reset'
> >>     - */
> >>     - pci_reset_function(dev);
> >>     + lockdep_assert_held(&dev->dev.mutex);
> >>     + __pci_reset_function_locked(dev);
> >>     pci_restore_state(dev);
> >>    /* This disables the device. */
> >> 
> >> More specifically:
> >> The old "pci_reset_function(dev)" potentially seems to do much more than 
> >> __pci_reset_function_locked(dev).
> >> 
> >> 
> >> "__pci_reset_function_locked(dev)" only calls  "__pci_dev_reset"
> >> while "pci_reset_function" not only calls pci_dev_reset, but on succes
> >> it also calls: "pci_dev_save_and_disable" which does a save state etc.
> >> 
> >> 
> >> So i added a little more debug:
> >> 
> >> device_lock_assert(&dev->dev);
> >> ret = __pci_reset_function_locked(dev);
> >> dev_dbg(&dev->dev, "%s __pci_reset_function_locked:%d  dev->state_saved:%d\n", __func__, ret, (!dev->state_saved) ? 0 : 1 );
> >> pci_restore_state(dev);
> >> 
> >> And this returns:
> >> [  494.570579] pciback 0000:04:00.0: pcistub_put_pci_dev __pci_reset_function_locked:0  dev->state_saved:0
> >> 
> >> So that confirms there is no saved_state to get restored by 
> >> pci_restore_state(dev) in the next line.
> >> 
> >> However there seems to be no "locked" variant of the function 
> >> "pci_reset_function" in pci.c that has all the same logic ...
> 
> > Yup. I've a preliminary patch:
> 
> Preliminary in the sense: "this should fix it .. needs more testing" ?

This should fix it, albeit the fix has a disastrous flaw. Here is the proper version:


>From 00a5b6e3c9ee2c2d605879bdaebc627fa640b024 Mon Sep 17 00:00:00 2001
From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Date: Wed, 6 Aug 2014 16:21:32 -0400
Subject: [PATCH] xen/pciback: Restore configuration space when detaching from
 a guest.

The commit 9eea3f7695226f9af9992cebf8e98ac0ad78b277
"xen/pciback: Don't deadlock when unbinding." was using
the version of pci_reset_function which would lock the device lock.
That is no good as we can dead-lock. As such we swapped to using
the lock-less version and requiring that the callers
of 'pcistub_put_pci_dev' take the device lock. And as such
this bug got exposed.

Using the lock-less version is  OK, except that we tried to
use 'pci_restore_state' after the lock-less version of
__pci_reset_function_locked - which won't work as 'state_saved'
is set to false. Said 'state_saved' is a toggle boolean that
is to be used by the sequence of a) pci_save_state/pci_restore_state
or b) pci_load_and_free_saved_state/pci_restore_state. We don't
want to use a) as the guest might have messed up the PCI
configuration space and we want it to revert to the state
when the PCI device was binded to us. Therefore we pick
b) to restore the configuration space.

To still retain the PCI configuration space, we save it once
more and store it on our private copy to be restored when:
 - Device is unbinded from pciback
 - Device is detached from a guest.

Reported-by:  Sander Eikelenboom <linux@eikelenboom.it>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 drivers/xen/xen-pciback/pci_stub.c |   25 +++++++++++++++++++++----
 1 files changed, 21 insertions(+), 4 deletions(-)

diff --git a/drivers/xen/xen-pciback/pci_stub.c b/drivers/xen/xen-pciback/pci_stub.c
index 1ddd22f..8cf7f2b 100644
--- a/drivers/xen/xen-pciback/pci_stub.c
+++ b/drivers/xen/xen-pciback/pci_stub.c
@@ -105,7 +105,7 @@ static void pcistub_device_release(struct kref *kref)
 	 */
 	__pci_reset_function_locked(dev);
 	if (pci_load_and_free_saved_state(dev, &dev_data->pci_saved_state))
-		dev_dbg(&dev->dev, "Could not reload PCI state\n");
+		dev_info(&dev->dev, "Could not reload PCI state\n");
 	else
 		pci_restore_state(dev);
 
@@ -257,6 +257,7 @@ void pcistub_put_pci_dev(struct pci_dev *dev)
 {
 	struct pcistub_device *psdev, *found_psdev = NULL;
 	unsigned long flags;
+	struct xen_pcibk_dev_data *dev_data;
 
 	spin_lock_irqsave(&pcistub_devices_lock, flags);
 
@@ -279,9 +280,25 @@ void pcistub_put_pci_dev(struct pci_dev *dev)
 	 * (so it's ready for the next domain)
 	 */
 	device_lock_assert(&dev->dev);
-	__pci_reset_function_locked(dev);
-	pci_restore_state(dev);
-
+	dev_data = pci_get_drvdata(dev);
+	if (pci_load_and_free_saved_state(dev, &dev_data->pci_saved_state))
+		dev_info(&dev->dev, "Could not reload PCI state\n");
+	else {
+		__pci_reset_function_locked(dev);
+		/*
+		 * The usual sequence is pci_save_state & pci_restore_state
+		 * but the guest might have messed the configuration space up.
+		 * Use the initial version (when device was binded to us).
+		 */
+		pci_restore_state(dev);
+		/*
+		 * The next steps are to reload the configuration for the
+		 * next time we bind & unbind to a guest - or unload from
+		 * pciback.
+		 */
+		pci_save_state(dev);
+		dev_data->pci_saved_state = pci_store_saved_state(dev);
+	}
 	/* This disables the device. */
 	xen_pcibk_reset_device(dev);
 
-- 
1.7.7.6


  reply	other threads:[~2014-08-06 19:39 UTC|newest]

Thread overview: 67+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-07-14 16:18 [PATCH v5] Fixes to Xen pciback for 3.17 Konrad Rzeszutek Wilk
2014-07-14 16:18 ` [PATCH v5 1/6] xen-pciback: Document the various parameters and attributes in SysFS Konrad Rzeszutek Wilk
2014-07-14 16:18 ` Konrad Rzeszutek Wilk
2014-07-28 13:04   ` David Vrabel
2014-07-28 13:04   ` David Vrabel
2014-07-28 14:56     ` Greg KH
2014-08-01 14:59       ` [Xen-devel] " David Vrabel
2014-08-01 14:59       ` David Vrabel
2014-07-28 14:56     ` Greg KH
2014-07-14 16:18 ` [PATCH v5 2/6] xen/pciback: Don't deadlock when unbinding Konrad Rzeszutek Wilk
2014-07-14 16:18 ` Konrad Rzeszutek Wilk
2014-07-28 13:06   ` David Vrabel
2014-07-28 13:06   ` David Vrabel
2014-08-04 18:42     ` Konrad Rzeszutek Wilk
2014-08-04 18:42     ` Konrad Rzeszutek Wilk
2014-08-05  9:27       ` David Vrabel
2014-08-05  9:27       ` [Xen-devel] " David Vrabel
2014-07-14 16:18 ` [PATCH v5 3/6] driver core: Provide an wrapper around the mutex to do lockdep warnings Konrad Rzeszutek Wilk
2014-07-14 16:18 ` Konrad Rzeszutek Wilk
2014-07-14 17:39   ` Greg KH
2014-07-14 17:39   ` Greg KH
2014-07-14 16:18 ` [PATCH v5 4/6] xen/pciback: Include the domain id if removing the device whilst still in use Konrad Rzeszutek Wilk
2014-07-14 16:18 ` Konrad Rzeszutek Wilk
2014-07-14 16:18 ` [PATCH v5 5/6] xen/pciback: Print out the domain owning the device Konrad Rzeszutek Wilk
2014-07-14 16:18 ` Konrad Rzeszutek Wilk
2014-07-14 16:18 ` [PATCH v5 6/6] xen/pciback: Remove tons of dereferences Konrad Rzeszutek Wilk
2014-07-14 16:18 ` Konrad Rzeszutek Wilk
2014-07-14 17:40 ` [PATCH v5] Fixes to Xen pciback for 3.17 Greg KH
2014-07-14 17:40 ` Greg KH
2014-07-14 17:39   ` Konrad Rzeszutek Wilk
2014-07-14 17:39   ` Konrad Rzeszutek Wilk
2014-08-01 15:30 ` David Vrabel
2014-08-04 18:43   ` Konrad Rzeszutek Wilk
2014-08-05  8:44     ` Sander Eikelenboom
2014-08-05  8:44     ` [Xen-devel] " Sander Eikelenboom
2014-08-05  9:31       ` David Vrabel
2014-08-05  9:31       ` [Xen-devel] " David Vrabel
2014-08-05  9:44         ` Sander Eikelenboom
2014-08-05 13:49           ` Konrad Rzeszutek Wilk
2014-08-05 14:04             ` Sander Eikelenboom
2014-08-05 14:04             ` [Xen-devel] " Sander Eikelenboom
2014-08-06 18:59               ` Sander Eikelenboom
2014-08-06 18:59               ` [Xen-devel] " Sander Eikelenboom
2014-08-06 19:18                 ` Konrad Rzeszutek Wilk
2014-08-06 19:25                   ` Sander Eikelenboom
2014-08-06 19:39                     ` Konrad Rzeszutek Wilk [this message]
2014-08-06 19:47                       ` Sander Eikelenboom
2014-08-06 20:09                         ` Konrad Rzeszutek Wilk
2014-08-06 20:17                           ` Sander Eikelenboom
2014-08-06 20:17                           ` [Xen-devel] " Sander Eikelenboom
2014-08-06 22:08                             ` Sander Eikelenboom
2014-08-06 22:08                             ` Sander Eikelenboom
2014-08-06 20:09                         ` Konrad Rzeszutek Wilk
2014-08-06 19:47                       ` Sander Eikelenboom
2014-08-07  9:04                       ` [Xen-devel] " David Vrabel
2014-08-25 17:18                         ` Sander Eikelenboom
2014-08-25 17:18                         ` [Xen-devel] " Sander Eikelenboom
2014-08-07  9:04                       ` David Vrabel
2014-08-06 19:39                     ` Konrad Rzeszutek Wilk
2014-08-06 19:25                   ` Sander Eikelenboom
2014-08-06 19:18                 ` Konrad Rzeszutek Wilk
2014-08-05 14:33             ` [Xen-devel] " Sander Eikelenboom
2014-08-05 14:33             ` Sander Eikelenboom
2014-08-05 13:49           ` Konrad Rzeszutek Wilk
2014-08-05  9:44         ` Sander Eikelenboom
2014-08-04 18:43   ` Konrad Rzeszutek Wilk
2014-08-01 15:30 ` David Vrabel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140806193916.GA31040@laptop.dumpdata.com \
    --to=konrad.wilk@oracle.com \
    --cc=boris.ostrovsky@oracle.com \
    --cc=david.vrabel@citrix.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux@eikelenboom.it \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.