kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Auger Eric <eric.auger@redhat.com>
To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>,
	Alex Williamson <alex.williamson@redhat.com>
Cc: "pmorel@linux.vnet.ibm.com" <pmorel@linux.vnet.ibm.com>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Linuxarm <linuxarm@huawei.com>,
	John Garry <john.garry@huawei.com>,
	"xuwei (O)" <xuwei5@huawei.com>
Subject: Re: [RFC v2 1/5] vfio/type1: Introduce iova list and add iommu aperture validity check
Date: Tue, 23 Jan 2018 12:20:14 +0100	[thread overview]
Message-ID: <5d63d94c-781d-6eb7-d464-4f18ab1d3cfe@redhat.com> (raw)
In-Reply-To: <5FC3163CFD30C246ABAA99954A238FA83863CBE7@FRAEML521-MBX.china.huawei.com>

Hi Shameer,

On 23/01/18 11:04, Shameerali Kolothum Thodi wrote:
> Hi Eric,
> 
>> -----Original Message-----
>> From: Auger Eric [mailto:eric.auger@redhat.com]
>> Sent: Tuesday, January 23, 2018 8:25 AM
>> To: Alex Williamson <alex.williamson@redhat.com>; Shameerali Kolothum
>> Thodi <shameerali.kolothum.thodi@huawei.com>
>> Cc: pmorel@linux.vnet.ibm.com; kvm@vger.kernel.org; linux-
>> kernel@vger.kernel.org; Linuxarm <linuxarm@huawei.com>; John Garry
>> <john.garry@huawei.com>; xuwei (O) <xuwei5@huawei.com>
>> Subject: Re: [RFC v2 1/5] vfio/type1: Introduce iova list and add iommu
>> aperture validity check
>>
>> Hi Shameer,
>>
>> On 18/01/18 01:04, Alex Williamson wrote:
>>> On Fri, 12 Jan 2018 16:45:27 +0000
>>> Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> wrote:
>>>
>>>> This introduces an iova list that is valid for dma mappings. Make
>>>> sure the new iommu aperture window is valid and doesn't conflict
>>>> with any existing dma mappings during attach. Also update the iova
>>>> list with new aperture window during attach/detach.
>>>>
>>>> Signed-off-by: Shameer Kolothum
>> <shameerali.kolothum.thodi@huawei.com>
>>>> ---
>>>>  drivers/vfio/vfio_iommu_type1.c | 177
>> ++++++++++++++++++++++++++++++++++++++++
>>>>  1 file changed, 177 insertions(+)
>>>>
>>>> diff --git a/drivers/vfio/vfio_iommu_type1.c
>> b/drivers/vfio/vfio_iommu_type1.c
>>>> index e30e29a..11cbd49 100644
>>>> --- a/drivers/vfio/vfio_iommu_type1.c
>>>> +++ b/drivers/vfio/vfio_iommu_type1.c
>>>> @@ -60,6 +60,7 @@
>>>>
>>>>  struct vfio_iommu {
>>>>  	struct list_head	domain_list;
>>>> +	struct list_head	iova_list;
>>>>  	struct vfio_domain	*external_domain; /* domain for external user
>> */
>>>>  	struct mutex		lock;
>>>>  	struct rb_root		dma_list;
>>>> @@ -92,6 +93,12 @@ struct vfio_group {
>>>>  	struct list_head	next;
>>>>  };
>>>>
>>>> +struct vfio_iova {
>>>> +	struct list_head	list;
>>>> +	phys_addr_t		start;
>>>> +	phys_addr_t		end;
>>>> +};
>>>
>>> dma_list uses dma_addr_t for the iova.  IOVAs are naturally DMA
>>> addresses, why are we using phys_addr_t?
>>>
>>>> +
>>>>  /*
>>>>   * Guest RAM pinning working set or DMA target
>>>>   */
>>>> @@ -1192,6 +1199,123 @@ static bool vfio_iommu_has_sw_msi(struct
>> iommu_group *group, phys_addr_t *base)
>>>>  	return ret;
>>>>  }
>>>>
>>>> +static int vfio_insert_iova(phys_addr_t start, phys_addr_t end,
>>>> +				struct list_head *head)
>>>> +{
>>>> +	struct vfio_iova *region;
>>>> +
>>>> +	region = kmalloc(sizeof(*region), GFP_KERNEL);
>>>> +	if (!region)
>>>> +		return -ENOMEM;
>>>> +
>>>> +	INIT_LIST_HEAD(&region->list);
>>>> +	region->start = start;
>>>> +	region->end = end;
>>>> +
>>>> +	list_add_tail(&region->list, head);
>>>> +	return 0;
>>>> +}
>>>
>>> As I'm reading through this series, I'm learning that there are a lot
>>> of assumptions and subtle details that should be documented.  For
>>> instance, the IOMMU API only provides a single geometry and we build
>>> upon that here as this patch creates a list, but there's only a single
>>> entry for now.  The following patches carve that single iova range into
>>> pieces and somewhat subtly use the list_head passed to keep the list
>>> sorted, allowing the first/last_entry tricks used throughout.  Subtle
>>> interfaces are prone to bugs.
>>>
>>>> +
>>>> +/*
>>>> + * Find whether a mem region overlaps with existing dma mappings
>>>> + */
>>>> +static bool vfio_find_dma_overlap(struct vfio_iommu *iommu,
>>>> +				  phys_addr_t start, phys_addr_t end)
>>>> +{
>>>> +	struct rb_node *n = rb_first(&iommu->dma_list);
>>>> +
>>>> +	for (; n; n = rb_next(n)) {
>>>> +		struct vfio_dma *dma;
>>>> +
>>>> +		dma = rb_entry(n, struct vfio_dma, node);
>>>> +
>>>> +		if (end < dma->iova)
>>>> +			break;
>>>> +		if (start >= dma->iova + dma->size)
>>>> +			continue;
>>>> +		return true;
>>>> +	}
>>>> +
>>>> +	return false;
>>>> +}
>>>
>>> Why do we need this in addition to the existing vfio_find_dma()?  Why
>>> doesn't this use the tree structure of the dma_list?
>>>
>>>> +
>>>> +/*
>>>> + * Check the new iommu aperture is a valid one
>>>> + */
>>>> +static int vfio_iommu_valid_aperture(struct vfio_iommu *iommu,
>>>> +				     phys_addr_t start,
>>>> +				     phys_addr_t end)
>>>> +{
>>>> +	struct vfio_iova *first, *last;
>>>> +	struct list_head *iova = &iommu->iova_list;
>>>> +
>>>> +	if (list_empty(iova))
>>>> +		return 0;
>>>> +
>>>> +	/* Check if new one is outside the current aperture */
>>>
>>> "Disjoint sets"
>>>
>>>> +	first = list_first_entry(iova, struct vfio_iova, list);
>>>> +	last = list_last_entry(iova, struct vfio_iova, list);
>>>> +	if ((start > last->end) || (end < first->start))
>>>> +		return -EINVAL;
>>>> +
>>>> +	/* Check for any existing dma mappings outside the new start */
>>>> +	if (start > first->start) {
>>>> +		if (vfio_find_dma_overlap(iommu, first->start, start - 1))
>>>> +			return -EINVAL;
>>>> +	}
>>>> +
>>>> +	/* Check for any existing dma mappings outside the new end */
>>>> +	if (end < last->end) {
>>>> +		if (vfio_find_dma_overlap(iommu, end + 1, last->end))
>>>> +			return -EINVAL;
>>>> +	}
>>>> +
>>>> +	return 0;
>>>> +}
>>>
>>> I think this returns an int because you want to use it for the return
>>> value below, but it really seems like a bool question, ie. does this
>>> aperture conflict with existing mappings.  Additionally, the aperture
>>> is valid, it was provided to us by the IOMMU API, the question is
>>> whether it conflicts.  Please also name consistently to the other
>>> functions in this patch, vfio_iommu_aper_xxxx().
>>>
>>>> +
>>>> +/*
>>>> + * Adjust the iommu aperture window if new aperture is a valid one
>>>> + */
>>>> +static int vfio_iommu_iova_aper_adjust(struct vfio_iommu *iommu,
>>>> +				      phys_addr_t start,
>>>> +				      phys_addr_t end)
>>>
>>> Perhaps "resize", "prune", or "shrink" to make it more clear what is
>>> being adjusted?
>>>
>>>> +{
>>>> +	struct vfio_iova *node, *next;
>>>> +	struct list_head *iova = &iommu->iova_list;
>>>> +
>>>> +	if (list_empty(iova))
>>>> +		return vfio_insert_iova(start, end, iova);
>>>> +
>>>> +	/* Adjust iova list start */
>>>> +	list_for_each_entry_safe(node, next, iova, list) {
>>>> +		if (start < node->start)
>>>> +			break;
>>>> +		if ((start >= node->start) && (start <= node->end)) {
>>>
>>> start == node->end results in a zero sized node.  s/<=/</
>>>
>>>> +			node->start = start;
>>>> +			break;
>>>> +		}
>>>> +		/* Delete nodes before new start */
>>>> +		list_del(&node->list);
>>>> +		kfree(node);
>>>> +	}
>>>> +
>>>> +	/* Adjust iova list end */
>>>> +	list_for_each_entry_safe(node, next, iova, list) {
>>>> +		if (end > node->end)
>>>> +			continue;
>>>> +
>>>> +		if ((end >= node->start) && (end <= node->end)) {
>>>
>>> end == node->start results in a zero sized node.  s/>=/>/
>>>
>>>> +			node->end = end;
>>>> +			continue;
>>>> +		}
>>>> +		/* Delete nodes after new end */
>>>> +		list_del(&node->list);
>>>> +		kfree(node);
>>>> +	}
>>>> +
>>>> +	return 0;
>>>> +}
>>>> +
>>>>  static int vfio_iommu_type1_attach_group(void *iommu_data,
>>>>  					 struct iommu_group *iommu_group)
>>>>  {
>>>> @@ -1202,6 +1326,7 @@ static int vfio_iommu_type1_attach_group(void
>> *iommu_data,
>>>>  	int ret;
>>>>  	bool resv_msi, msi_remap;
>>>>  	phys_addr_t resv_msi_base;
>>>> +	struct iommu_domain_geometry geo;
>>>>
>>>>  	mutex_lock(&iommu->lock);
>>>>
>>>> @@ -1271,6 +1396,14 @@ static int vfio_iommu_type1_attach_group(void
>> *iommu_data,
>>>>  	if (ret)
>>>>  		goto out_domain;
>>>>
>>>> +	/* Get aperture info */
>>>> +	iommu_domain_get_attr(domain->domain,
>> DOMAIN_ATTR_GEOMETRY, &geo);
>>>> +
>>>> +	ret = vfio_iommu_valid_aperture(iommu, geo.aperture_start,
>>>> +					geo.aperture_end);
>>>> +	if (ret)
>>>> +		goto out_detach;
>>>> +
>>>>  	resv_msi = vfio_iommu_has_sw_msi(iommu_group, &resv_msi_base);
>>>>
>>>>  	INIT_LIST_HEAD(&domain->group_list);
>>>> @@ -1327,6 +1460,11 @@ static int vfio_iommu_type1_attach_group(void
>> *iommu_data,
>>>>  			goto out_detach;
>>>>  	}
>>>>
>>>> +	ret = vfio_iommu_iova_aper_adjust(iommu, geo.aperture_start,
>>>> +					  geo.aperture_end);
>>>> +	if (ret)
>>>> +		goto out_detach;
>>>> +
>>>>  	list_add(&domain->next, &iommu->domain_list);
>>>>
>>>>  	mutex_unlock(&iommu->lock);
>>>> @@ -1392,6 +1530,35 @@ static void vfio_sanity_check_pfn_list(struct
>> vfio_iommu *iommu)
>>>>  	WARN_ON(iommu->notifier.head);
>>>>  }
>>>>
>>>> +/*
>>>> + * Called when a domain is removed in detach. It is possible that
>>>> + * the removed domain decided the iova aperture window. Modify the
>>>> + * iova aperture with the smallest window among existing domains.
>>>> + */
>>>> +static void vfio_iommu_iova_aper_refresh(struct vfio_iommu *iommu)
>>>> +{
>>>> +	struct vfio_domain *domain;
>>>> +	struct iommu_domain_geometry geo;
>>>> +	struct vfio_iova *node;
>>>> +	phys_addr_t start = 0;
>>>> +	phys_addr_t end = (phys_addr_t)~0;
>>>> +
>>>> +	list_for_each_entry(domain, &iommu->domain_list, next) {
>>>> +		iommu_domain_get_attr(domain->domain,
>> DOMAIN_ATTR_GEOMETRY,
>>>> +				      &geo);
>>>> +			if (geo.aperture_start > start)
>>>> +				start = geo.aperture_start;
>>>> +			if (geo.aperture_end < end)
>>>> +				end = geo.aperture_end;
>>>> +	}
>>>> +
>>>> +	/* modify iova aperture limits */
>>>> +	node = list_first_entry(&iommu->iova_list, struct vfio_iova, list);
>>>> +	node->start = start;
>>>> +	node = list_last_entry(&iommu->iova_list, struct vfio_iova, list);
>>>> +	node->end = end;
>>>
>>> We can do this because the new aperture is the same or bigger than the
>>> current aperture, never smaller.  That's not fully obvious and should
>>> be noted in the comment.  Perhaps this function should be "expand"
>>> rather than "refresh".
>> This one is not obvious to me either:
>> assuming you have 2 domains, resp with aperture 1 and 2, resulting into
>> aperture 3. Holes are created by resv regions for instance. If you
>> remove domain 1, don't you get 4) instead of 2)?
>>
>> 1)   |------------|
>>  +
>> 2) |---|    |--|       |-----|
>> =
>> 3)   |-|    |--|
>>
>>
>> 4) |---|    |----------------|
> 
> That is true partially. But please remember that this patch is not aware of
> any reserved regions yet. That is introduced in patch #2. So patch #1 and #2
> together, the iova aperture might looks like 4) after this function call and once 
> vfio_iommu_iova_resv_refresh() in patch #2 is done, the aperture will be
> back to 2).
> 
> Hope I am clear. Please let me know.
Ah OK.
> 
> In any case, based on comments by Alex, I will be removing this aperture/reserve
> refresh functions and leave the iova list as it is when a group is detached. 
Looking forwarding to reviewing the next version then.

Thanks

Eric
> 
> Thanks,
> Shameer
> 
>> Thanks
>>
>> Eric
>>>
>>>> +}
>>>> +
>>>>  static void vfio_iommu_type1_detach_group(void *iommu_data,
>>>>  					  struct iommu_group *iommu_group)
>>>>  {
>>>> @@ -1445,6 +1612,7 @@ static void vfio_iommu_type1_detach_group(void
>> *iommu_data,
>>>>  			iommu_domain_free(domain->domain);
>>>>  			list_del(&domain->next);
>>>>  			kfree(domain);
>>>> +			vfio_iommu_iova_aper_refresh(iommu);
>>>>  		}
>>>>  		break;
>>>>  	}
>>>> @@ -1475,6 +1643,7 @@ static void *vfio_iommu_type1_open(unsigned
>> long arg)
>>>>  	}
>>>>
>>>>  	INIT_LIST_HEAD(&iommu->domain_list);
>>>> +	INIT_LIST_HEAD(&iommu->iova_list);
>>>>  	iommu->dma_list = RB_ROOT;
>>>>  	mutex_init(&iommu->lock);
>>>>  	BLOCKING_INIT_NOTIFIER_HEAD(&iommu->notifier);
>>>> @@ -1502,6 +1671,7 @@ static void vfio_iommu_type1_release(void
>> *iommu_data)
>>>>  {
>>>>  	struct vfio_iommu *iommu = iommu_data;
>>>>  	struct vfio_domain *domain, *domain_tmp;
>>>> +	struct vfio_iova *iova, *iova_tmp;
>>>>
>>>>  	if (iommu->external_domain) {
>>>>  		vfio_release_domain(iommu->external_domain, true);
>>>> @@ -1517,6 +1687,13 @@ static void vfio_iommu_type1_release(void
>> *iommu_data)
>>>>  		list_del(&domain->next);
>>>>  		kfree(domain);
>>>>  	}
>>>> +
>>>> +	list_for_each_entry_safe(iova, iova_tmp,
>>>> +				 &iommu->iova_list, list) {
>>>> +		list_del(&iova->list);
>>>> +		kfree(iova);
>>>> +	}
>>>> +
>>>>  	kfree(iommu);
>>>>  }
>>>>
>>>

  reply	other threads:[~2018-01-23 11:20 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-01-12 16:45 [RFC v2 0/5] vfio/type1: Add support for valid iova list management Shameer Kolothum
2018-01-12 16:45 ` [RFC v2 1/5] vfio/type1: Introduce iova list and add iommu aperture validity check Shameer Kolothum
2018-01-18  0:04   ` Alex Williamson
2018-01-19  9:47     ` Shameerali Kolothum Thodi
2018-01-23  8:25     ` Auger Eric
2018-01-23 10:04       ` Shameerali Kolothum Thodi
2018-01-23 11:20         ` Auger Eric [this message]
2018-01-12 16:45 ` [RFC v2 2/5] vfio/type1: Check reserve region conflict and update iova list Shameer Kolothum
2018-01-18  0:04   ` Alex Williamson
2018-01-19  9:48     ` Shameerali Kolothum Thodi
2018-01-19 15:45       ` Alex Williamson
2018-01-23  8:32     ` Auger Eric
2018-01-23 12:16       ` Shameerali Kolothum Thodi
2018-01-23 12:51         ` Auger Eric
2018-01-23 15:26           ` Shameerali Kolothum Thodi
2018-01-12 16:45 ` [RFC v2 3/5] vfio/type1: check dma map request is within a valid iova range Shameer Kolothum
2018-01-23  8:38   ` Auger Eric
2018-01-12 16:45 ` [RFC v2 4/5] vfio/type1: Add IOVA range capability support Shameer Kolothum
2018-01-23 11:16   ` Auger Eric
2018-01-23 12:51     ` Shameerali Kolothum Thodi
2018-01-12 16:45 ` [RFC v2 5/5] vfio/type1: remove duplicate retrieval of reserved regions Shameer Kolothum

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5d63d94c-781d-6eb7-d464-4f18ab1d3cfe@redhat.com \
    --to=eric.auger@redhat.com \
    --cc=alex.williamson@redhat.com \
    --cc=john.garry@huawei.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxarm@huawei.com \
    --cc=pmorel@linux.vnet.ibm.com \
    --cc=shameerali.kolothum.thodi@huawei.com \
    --cc=xuwei5@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).