public inbox for linux-s390@vger.kernel.org
 help / color / mirror / Atom feed
From: Alex Williamson <alex.williamson@redhat.com>
To: Matthew Rosato <mjrosato@linux.ibm.com>
Cc: "Liu, Yi L" <yi.l.liu@intel.com>,
	"Tian, Kevin" <kevin.tian@intel.com>,
	"pbonzini@redhat.com" <pbonzini@redhat.com>,
	"jgg@nvidia.com" <jgg@nvidia.com>,
	"cohuck@redhat.com" <cohuck@redhat.com>,
	"farman@linux.ibm.com" <farman@linux.ibm.com>,
	"pmorel@linux.ibm.com" <pmorel@linux.ibm.com>,
	"borntraeger@linux.ibm.com" <borntraeger@linux.ibm.com>,
	"frankja@linux.ibm.com" <frankja@linux.ibm.com>,
	"imbrenda@linux.ibm.com" <imbrenda@linux.ibm.com>,
	"david@redhat.com" <david@redhat.com>,
	"akrowiak@linux.ibm.com" <akrowiak@linux.ibm.com>,
	"jjherne@linux.ibm.com" <jjherne@linux.ibm.com>,
	"pasic@linux.ibm.com" <pasic@linux.ibm.com>,
	"zhenyuw@linux.intel.com" <zhenyuw@linux.intel.com>,
	"Wang, Zhi A" <zhi.a.wang@intel.com>, "Christopherson, ,
	Sean" <seanjc@google.com>,
	"linux-s390@vger.kernel.org" <linux-s390@vger.kernel.org>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	"intel-gvt-dev@lists.freedesktop.org" 
	<intel-gvt-dev@lists.freedesktop.org>,
	"intel-gfx@lists.freedesktop.org"
	<intel-gfx@lists.freedesktop.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v3] vfio: fix deadlock between group lock and kvm lock
Date: Fri, 3 Feb 2023 13:35:03 -0700	[thread overview]
Message-ID: <20230203133503.4d8fb3e8.alex.williamson@redhat.com> (raw)
In-Reply-To: <ed030aa5-b3af-638e-6e26-4e3a20b98ef4@linux.ibm.com>

On Fri, 3 Feb 2023 12:29:01 -0500
Matthew Rosato <mjrosato@linux.ibm.com> wrote:

> On 2/3/23 10:19 AM, Alex Williamson wrote:
> > On Fri, 3 Feb 2023 14:54:44 +0000
> > "Liu, Yi L" <yi.l.liu@intel.com> wrote:
> >   
> >>> From: Alex Williamson <alex.williamson@redhat.com>
> >>> Sent: Friday, February 3, 2023 9:50 PM
> >>>
> >>> On Fri, 3 Feb 2023 13:32:09 +0000
> >>> "Liu, Yi L" <yi.l.liu@intel.com> wrote:
> >>>     
> >>>>> From: Tian, Kevin <kevin.tian@intel.com>
> >>>>> Sent: Friday, February 3, 2023 10:00 AM
> >>>>>    
> >>>>>> From: Alex Williamson <alex.williamson@redhat.com>
> >>>>>> Sent: Friday, February 3, 2023 7:13 AM
> >>>>>>
> >>>>>> On Thu, 2 Feb 2023 23:04:10 +0000
> >>>>>> "Tian, Kevin" <kevin.tian@intel.com> wrote:
> >>>>>>    
> >>>>>>>> From: Alex Williamson <alex.williamson@redhat.com>
> >>>>>>>> Sent: Friday, February 3, 2023 3:42 AM
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> LGTM.  I'm not sure moving the functions to vfio_main really buys    
> >>> us    
> >>>>>>>> anything since we're making so much use of group fields.  The cdev
> >>>>>>>> approach will necessarily be different, so the bulk of the get code    
> >>> will    
> >>>>>>>> likely need to move back to group.c anyway.
> >>>>>>>>    
> >>>>>>>
> >>>>>>> well my last comment was based on Matthew's v2 where the get    
> >>> code    
> >>>>>>> gets a kvm passed in instead of implicitly retrieving group ref_lock
> >>>>>>> internally. In that case the get/put helpers only contain device logic
> >>>>>>> thus fit in vfio_main.c.
> >>>>>>>
> >>>>>>> with v3 then they have to be in group.c since we don't want to use
> >>>>>>> group fields in vfio_main.c.
> >>>>>>>
> >>>>>>> but I still think v2 of the helpers is slightly better. The only difference
> >>>>>>> between cdev and group when handling this race is using different
> >>>>>>> ref_lock. the symbol get/put part is exactly same. So even if we
> >>>>>>> merge v3 like this, very likely Yi has to change it back to v2 style
> >>>>>>> to share the get/put helpers while just leaving the ref_lock part
> >>>>>>> handled differently between the two path.    
> >>>>>>
> >>>>>> I'm not really a fan of the asymmetry of the v2 version where the get
> >>>>>> helper needs to be called under the new kvm_ref_lock, but the put
> >>>>>> helper does not.  Having the get helper handle that makes the caller
> >>>>>> much cleaner.  Thanks,
> >>>>>>    
> >>>>>
> >>>>> What about passing the lock pointer into the helper? it's still slightly
> >>>>> asymmetry as the put helper doesn't carry the lock pointer but it
> >>>>> could also be interpreted as if the pointer has been saved in the get
> >>>>> then if it needs to be referenced by the put there is no need to pass
> >>>>> it in again.    
> >>>>
> >>>> For cdev, I may modify vfio_device_get_kvm_safe() to accept
> >>>> struct kvm and let its caller hold a kvm_ref_lock (field within
> >>>> struct vfio_device_file). Meanwhile, the group path holds
> >>>> the group->kvm_ref_lock before invoking vfio_device_get_kvm_safe().
> >>>> vfio_device_get_kvm_safe() just includes the symbol get/put and
> >>>> the device->kvm and put_kvm set.    
> >>>
> >>> Sounds a lot like v2 :-\     
> >>
> >> Yes, like v2. 😊
> >>  
> >>> I'd look more towards group and cdev specific
> >>> helpers that handle the locking so that the callers aren't exposed to
> >>> the asymmetry of get vs put, and reduce a new
> >>> _vfio_device_get_kvm_safe() in common code that only does the symbol
> >>> work.  Thanks,    
> >>
> >> If so, looks like Matthew needs a v4. I'm waiting for the final version
> >> of this patch and sending a new cdev series based on it. wish to see
> >> it soon ^_^.  
> > 
> > cdev support is a future feature, why does it become a requirement for
> > a fix to the current base?  The refactoring could also happen in the
> > cdev series.  Thanks,
> > 
> > Alex
> >   
> 
> FWIW, that would bloat the fix by a bit and it's already a decent-sized fix...  I took a quick stab and such a v4 would end up with a total of 120 insertions(+), 15 deletions(-).  See below for a diff of what I tried on top of v3. The idea would be to move the serialization and setting/clearing of device->kvm into group/cdev and use device->kvm as the value within vfio_device_get_kvm_safe for getting the kvm ref.  Wrappers in group/cdev would then be responsible for backing that device->kvm setting out on ref failure.
> Each side (group/cdev) can wrap their own serialization / load device->kvm from the appropriate location / handle the failed get / clear device->kvm when done (since get doesn't set it, put shouldn't clear it).
> 
> If Alex wants, I can wrap something like this into a v4 -- Or, if we don't want to include this kind of infrastructure in the fix, then Yi please feel free to use it as a starting point for what you need in cdev.
> 
> Thanks,
> Matt
> 
> diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
> index 7fed4233ca23..261af23975ae 100644
> --- a/drivers/vfio/group.c
> +++ b/drivers/vfio/group.c
> @@ -154,6 +154,31 @@ static int vfio_group_ioctl_set_container(struct vfio_group *group,
>  	return ret;
>  }
>  
> +static void vfio_device_group_get_kvm(struct vfio_device *device)
> +{
> +	lockdep_assert_held(&device->dev_set->lock);
> +
> +	spin_lock(&device->group->kvm_ref_lock);
> +
> +	if (!device->group->kvm)
> +		goto unlock;
> +
> +	device->kvm = device->group->kvm;
> +	if (!vfio_device_get_kvm_safe(device))

I'd probably go back to making this:

void _vfio_device_get_kvm_safe(struct vfio_device *device, struct kvm *kvm);

so the vfio_main function would handle setting and clearing
device->kvm.  That way we could also move the lockdep into the
vfio_main functions.  Once we do that, there's no reason to have a
group vs cdev put function and we end up with only a wrapper on the get
function, which should really never be used directly, so we prefix it
with an underscore.

At that point (see incremental diff below), it's about a wash.  Current v3:

 drivers/vfio/group.c     |   32 +++++++++++++----
 drivers/vfio/vfio.h      |   14 +++++++
 drivers/vfio/vfio_main.c |   70 +++++++++++++++++++++++++++++++++++----
 include/linux/vfio.h     |    2 -
 4 files changed, 103 insertions(+), 15 deletions(-)

Folding in below:

 drivers/vfio/group.c     |   44 ++++++++++++++++++++++-----
 drivers/vfio/vfio.h      |   15 +++++++++
 drivers/vfio/vfio_main.c |   63 ++++++++++++++++++++++++++++++++++-----
 include/linux/vfio.h     |    2 -
 4 files changed, 109 insertions(+), 15 deletions(-)

Unfortunately it seems I've talked myself into the answer that we
should maybe just pre-enable cdev by not adding a group reference in
vfio_main.  Thanks,

Alex

diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
index 7fed4233ca23..98621ac082f0 100644
--- a/drivers/vfio/group.c
+++ b/drivers/vfio/group.c
@@ -154,6 +154,18 @@ static int vfio_group_ioctl_set_container(struct vfio_group *group,
 	return ret;
 }
 
+static void vfio_device_group_get_kvm_safe(struct vfio_device *device)
+{
+	spin_lock(&device->group->kvm_ref_lock);
+	if (!device->group->kvm)
+		goto unlock;
+
+	_vfio_device_get_kvm_safe(device, device->group->kvm);
+
+unlock:
+	spin_unlock(&device->group->kvm_ref_lock);
+}
+
 static int vfio_device_group_open(struct vfio_device *device)
 {
 	int ret;
@@ -173,7 +185,7 @@ static int vfio_device_group_open(struct vfio_device *device)
 	 * the pointer in the device for use by drivers.
 	 */
 	if (device->open_count == 0)
-		vfio_device_get_kvm_safe(device);
+		vfio_device_group_get_kvm_safe(device);
 
 	ret = vfio_device_open(device, device->group->iommufd, device->kvm);
 
diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index 20d715b0a3a8..24d6cd285945 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -253,10 +253,11 @@ enum { vfio_noiommu = false };
 #endif
 
 #ifdef CONFIG_HAVE_KVM
-void vfio_device_get_kvm_safe(struct vfio_device *device);
+void _vfio_device_get_kvm_safe(struct vfio_device *device, struct kvm *kvm);
 void vfio_device_put_kvm(struct vfio_device *device);
 #else
-static inline void vfio_device_get_kvm_safe(struct vfio_device *device)
+static inline void _vfio_device_get_kvm_safe(struct vfio_device *device,
+					     struct kvm *kvm)
 {
 }
 
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index 4762550e9f42..00d4d5167d6c 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -342,7 +342,7 @@ void vfio_unregister_group_dev(struct vfio_device *device)
 EXPORT_SYMBOL_GPL(vfio_unregister_group_dev);
 
 #ifdef CONFIG_HAVE_KVM
-void vfio_device_get_kvm_safe(struct vfio_device *device)
+void _vfio_device_get_kvm_safe(struct vfio_device *device, struct kvm *kvm)
 {
 	void (*pfn)(struct kvm *kvm);
 	bool (*fn)(struct kvm *kvm);
@@ -350,32 +350,25 @@ void vfio_device_get_kvm_safe(struct vfio_device *device)
 
 	lockdep_assert_held(&device->dev_set->lock);
 
-	spin_lock(&device->group->kvm_ref_lock);
-	if (!device->group->kvm)
-		goto unlock;
-
 	pfn = symbol_get(kvm_put_kvm);
 	if (WARN_ON(!pfn))
-		goto unlock;
+		return;
 
 	fn = symbol_get(kvm_get_kvm_safe);
 	if (WARN_ON(!fn)) {
 		symbol_put(kvm_put_kvm);
-		goto unlock;
+		return;
 	}
 
 	ret = fn(device->group->kvm);
 	symbol_put(kvm_get_kvm_safe);
 	if (!ret) {
 		symbol_put(kvm_put_kvm);
-		goto unlock;
+		return;
 	}
 
 	device->put_kvm = pfn;
-	device->kvm = device->group->kvm;
-
-unlock:
-	spin_unlock(&device->group->kvm_ref_lock);
+	device->kvm = kvm;
 }
 
 void vfio_device_put_kvm(struct vfio_device *device)


  reply	other threads:[~2023-02-03 20:36 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-02 16:24 [PATCH v3] vfio: fix deadlock between group lock and kvm lock Matthew Rosato
2023-02-02 19:42 ` Alex Williamson
2023-02-02 23:04   ` Tian, Kevin
2023-02-02 23:13     ` Alex Williamson
2023-02-03  2:00       ` Tian, Kevin
2023-02-03 13:32         ` Liu, Yi L
2023-02-03 13:49           ` Alex Williamson
2023-02-03 14:54             ` Liu, Yi L
2023-02-03 15:19               ` Alex Williamson
2023-02-03 17:29                 ` Matthew Rosato
2023-02-03 20:35                   ` Alex Williamson [this message]
2023-02-03 21:19                     ` Matthew Rosato
2023-02-03 21:35                       ` Alex Williamson
2023-02-03  8:58 ` Liu, Yi L
2023-02-03 14:26   ` Matthew Rosato
2023-02-03 14:48     ` Liu, Yi L

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230203133503.4d8fb3e8.alex.williamson@redhat.com \
    --to=alex.williamson@redhat.com \
    --cc=akrowiak@linux.ibm.com \
    --cc=borntraeger@linux.ibm.com \
    --cc=cohuck@redhat.com \
    --cc=david@redhat.com \
    --cc=farman@linux.ibm.com \
    --cc=frankja@linux.ibm.com \
    --cc=imbrenda@linux.ibm.com \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=intel-gvt-dev@lists.freedesktop.org \
    --cc=jgg@nvidia.com \
    --cc=jjherne@linux.ibm.com \
    --cc=kevin.tian@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=mjrosato@linux.ibm.com \
    --cc=pasic@linux.ibm.com \
    --cc=pbonzini@redhat.com \
    --cc=pmorel@linux.ibm.com \
    --cc=seanjc@google.com \
    --cc=yi.l.liu@intel.com \
    --cc=zhenyuw@linux.intel.com \
    --cc=zhi.a.wang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox