* [PATCH V1 0/2] fixes for virtual address update @ 2022-12-13 15:46 Steve Sistare 2022-12-13 15:46 ` [PATCH V1 1/2] vfio/type1: exclude mdevs from VFIO_UPDATE_VADDR Steve Sistare 2022-12-13 15:46 ` [PATCH V1 2/2] vfio/type1: prevent locked_vm underflow Steve Sistare 0 siblings, 2 replies; 12+ messages in thread From: Steve Sistare @ 2022-12-13 15:46 UTC (permalink / raw) To: kvm; +Cc: Alex Williamson, Cornelia Huck, Steve Sistare Fix bugs in the interfaces that allow the underlying memory object of an iova range to be mapped in a new address space. They allow userland to indefinitely block vfio mediated device kernel threads, and do not propagate the locked_vm count to a new mm. Steve Sistare (2): vfio/type1: exclude mdevs from VFIO_UPDATE_VADDR vfio/type1: prevent locked_vm underflow drivers/vfio/vfio_iommu_type1.c | 36 +++++++++++++++++++++++++++++++++++- include/uapi/linux/vfio.h | 6 +++++- 2 files changed, 40 insertions(+), 2 deletions(-) -- 1.8.3.1 ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH V1 1/2] vfio/type1: exclude mdevs from VFIO_UPDATE_VADDR 2022-12-13 15:46 [PATCH V1 0/2] fixes for virtual address update Steve Sistare @ 2022-12-13 15:46 ` Steve Sistare 2022-12-13 16:26 ` Alex Williamson 2022-12-13 15:46 ` [PATCH V1 2/2] vfio/type1: prevent locked_vm underflow Steve Sistare 1 sibling, 1 reply; 12+ messages in thread From: Steve Sistare @ 2022-12-13 15:46 UTC (permalink / raw) To: kvm; +Cc: Alex Williamson, Cornelia Huck, Steve Sistare Disable the VFIO_UPDATE_VADDR capability if mediated devices are present. Their kernel threads could be blocked indefinitely by a misbehaving userland while trying to pin/unpin pages while vaddrs are being updated. Signed-off-by: Steve Sistare <steven.sistare@oracle.com> --- drivers/vfio/vfio_iommu_type1.c | 25 ++++++++++++++++++++++++- include/uapi/linux/vfio.h | 6 +++++- 2 files changed, 29 insertions(+), 2 deletions(-) diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c index 23c24fe..f81e925 100644 --- a/drivers/vfio/vfio_iommu_type1.c +++ b/drivers/vfio/vfio_iommu_type1.c @@ -1343,6 +1343,10 @@ static int vfio_dma_do_unmap(struct vfio_iommu *iommu, mutex_lock(&iommu->lock); + /* Cannot update vaddr if mdev is present. */ + if (invalidate_vaddr && !list_empty(&iommu->emulated_iommu_groups)) + goto unlock; + pgshift = __ffs(iommu->pgsize_bitmap); pgsize = (size_t)1 << pgshift; @@ -2189,6 +2193,10 @@ static int vfio_iommu_type1_attach_group(void *iommu_data, mutex_lock(&iommu->lock); + /* Prevent an mdev from sneaking in while vaddr flags are used. */ + if (iommu->vaddr_invalid_count && type == VFIO_EMULATED_IOMMU) + goto out_unlock; + /* Check for duplicates */ if (vfio_iommu_find_iommu_group(iommu, iommu_group)) goto out_unlock; @@ -2660,6 +2668,20 @@ static int vfio_domains_have_enforce_cache_coherency(struct vfio_iommu *iommu) return ret; } +/* + * Disable this feature if mdevs are present. They cannot safely pin/unpin + * while vaddrs are being updated. + */ +static int vfio_iommu_can_update_vaddr(struct vfio_iommu *iommu) +{ + int ret; + + mutex_lock(&iommu->lock); + ret = list_empty(&iommu->emulated_iommu_groups); + mutex_unlock(&iommu->lock); + return ret; +} + static int vfio_iommu_type1_check_extension(struct vfio_iommu *iommu, unsigned long arg) { @@ -2668,8 +2690,9 @@ static int vfio_iommu_type1_check_extension(struct vfio_iommu *iommu, case VFIO_TYPE1v2_IOMMU: case VFIO_TYPE1_NESTING_IOMMU: case VFIO_UNMAP_ALL: - case VFIO_UPDATE_VADDR: return 1; + case VFIO_UPDATE_VADDR: + return iommu && vfio_iommu_can_update_vaddr(iommu); case VFIO_DMA_CC_IOMMU: if (!iommu) return 0; diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h index d7d8e09..6d36b84 100644 --- a/include/uapi/linux/vfio.h +++ b/include/uapi/linux/vfio.h @@ -49,7 +49,11 @@ /* Supports VFIO_DMA_UNMAP_FLAG_ALL */ #define VFIO_UNMAP_ALL 9 -/* Supports the vaddr flag for DMA map and unmap */ +/* + * Supports the vaddr flag for DMA map and unmap. Not supported for mediated + * devices, so this capability is subject to change as groups are added or + * removed. + */ #define VFIO_UPDATE_VADDR 10 /* -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH V1 1/2] vfio/type1: exclude mdevs from VFIO_UPDATE_VADDR 2022-12-13 15:46 ` [PATCH V1 1/2] vfio/type1: exclude mdevs from VFIO_UPDATE_VADDR Steve Sistare @ 2022-12-13 16:26 ` Alex Williamson 2022-12-13 16:54 ` Steven Sistare 0 siblings, 1 reply; 12+ messages in thread From: Alex Williamson @ 2022-12-13 16:26 UTC (permalink / raw) To: Steve Sistare; +Cc: kvm, Cornelia Huck, Jason Gunthorpe On Tue, 13 Dec 2022 07:46:55 -0800 Steve Sistare <steven.sistare@oracle.com> wrote: > Disable the VFIO_UPDATE_VADDR capability if mediated devices are present. > Their kernel threads could be blocked indefinitely by a misbehaving > userland while trying to pin/unpin pages while vaddrs are being updated. Fixes: c3cbab24db38 ("vfio/type1: implement interfaces to update vaddr") > Signed-off-by: Steve Sistare <steven.sistare@oracle.com> > --- > drivers/vfio/vfio_iommu_type1.c | 25 ++++++++++++++++++++++++- > include/uapi/linux/vfio.h | 6 +++++- > 2 files changed, 29 insertions(+), 2 deletions(-) > > diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c > index 23c24fe..f81e925 100644 > --- a/drivers/vfio/vfio_iommu_type1.c > +++ b/drivers/vfio/vfio_iommu_type1.c > @@ -1343,6 +1343,10 @@ static int vfio_dma_do_unmap(struct vfio_iommu *iommu, > > mutex_lock(&iommu->lock); > > + /* Cannot update vaddr if mdev is present. */ > + if (invalidate_vaddr && !list_empty(&iommu->emulated_iommu_groups)) > + goto unlock; > + > pgshift = __ffs(iommu->pgsize_bitmap); > pgsize = (size_t)1 << pgshift; > > @@ -2189,6 +2193,10 @@ static int vfio_iommu_type1_attach_group(void *iommu_data, > > mutex_lock(&iommu->lock); > > + /* Prevent an mdev from sneaking in while vaddr flags are used. */ > + if (iommu->vaddr_invalid_count && type == VFIO_EMULATED_IOMMU) > + goto out_unlock; Why only mdev devices? If we restrict that the user cannot attach a group while there are invalid vaddrs, and the pin/unpin pages and dma_rw interfaces are restricted to cases where vaddr_invalid_count is zero, then we can get rid of all the code to handle waiting for vaddrs. ie. we could still revert: 898b9eaeb3fe ("vfio/type1: block on invalid vaddr") 487ace134053 ("vfio/type1: implement notify callback") ec5e32940cc9 ("vfio: iommu driver notify callback") It appears to me it might be easiest to lead with a clean revert of these, then follow-up imposing the usage restrictions, and I'd go ahead and add WARN_ON error paths to the pin/unpin/dma_rw paths to make sure nobody enters those paths with an elevated invalid count. Thanks, Alex > + > /* Check for duplicates */ > if (vfio_iommu_find_iommu_group(iommu, iommu_group)) > goto out_unlock; > @@ -2660,6 +2668,20 @@ static int vfio_domains_have_enforce_cache_coherency(struct vfio_iommu *iommu) > return ret; > } > > +/* > + * Disable this feature if mdevs are present. They cannot safely pin/unpin > + * while vaddrs are being updated. > + */ > +static int vfio_iommu_can_update_vaddr(struct vfio_iommu *iommu) > +{ > + int ret; > + > + mutex_lock(&iommu->lock); > + ret = list_empty(&iommu->emulated_iommu_groups); > + mutex_unlock(&iommu->lock); > + return ret; > +} > + > static int vfio_iommu_type1_check_extension(struct vfio_iommu *iommu, > unsigned long arg) > { > @@ -2668,8 +2690,9 @@ static int vfio_iommu_type1_check_extension(struct vfio_iommu *iommu, > case VFIO_TYPE1v2_IOMMU: > case VFIO_TYPE1_NESTING_IOMMU: > case VFIO_UNMAP_ALL: > - case VFIO_UPDATE_VADDR: > return 1; > + case VFIO_UPDATE_VADDR: > + return iommu && vfio_iommu_can_update_vaddr(iommu); > case VFIO_DMA_CC_IOMMU: > if (!iommu) > return 0; > diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h > index d7d8e09..6d36b84 100644 > --- a/include/uapi/linux/vfio.h > +++ b/include/uapi/linux/vfio.h > @@ -49,7 +49,11 @@ > /* Supports VFIO_DMA_UNMAP_FLAG_ALL */ > #define VFIO_UNMAP_ALL 9 > > -/* Supports the vaddr flag for DMA map and unmap */ > +/* > + * Supports the vaddr flag for DMA map and unmap. Not supported for mediated > + * devices, so this capability is subject to change as groups are added or > + * removed. > + */ > #define VFIO_UPDATE_VADDR 10 > > /* ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH V1 1/2] vfio/type1: exclude mdevs from VFIO_UPDATE_VADDR 2022-12-13 16:26 ` Alex Williamson @ 2022-12-13 16:54 ` Steven Sistare 2022-12-13 17:31 ` Alex Williamson 0 siblings, 1 reply; 12+ messages in thread From: Steven Sistare @ 2022-12-13 16:54 UTC (permalink / raw) To: Alex Williamson; +Cc: kvm, Cornelia Huck, Jason Gunthorpe On 12/13/2022 11:26 AM, Alex Williamson wrote: > On Tue, 13 Dec 2022 07:46:55 -0800 > Steve Sistare <steven.sistare@oracle.com> wrote: > >> Disable the VFIO_UPDATE_VADDR capability if mediated devices are present. >> Their kernel threads could be blocked indefinitely by a misbehaving >> userland while trying to pin/unpin pages while vaddrs are being updated. > > Fixes: c3cbab24db38 ("vfio/type1: implement interfaces to update vaddr") > >> Signed-off-by: Steve Sistare <steven.sistare@oracle.com> >> --- >> drivers/vfio/vfio_iommu_type1.c | 25 ++++++++++++++++++++++++- >> include/uapi/linux/vfio.h | 6 +++++- >> 2 files changed, 29 insertions(+), 2 deletions(-) >> >> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c >> index 23c24fe..f81e925 100644 >> --- a/drivers/vfio/vfio_iommu_type1.c >> +++ b/drivers/vfio/vfio_iommu_type1.c >> @@ -1343,6 +1343,10 @@ static int vfio_dma_do_unmap(struct vfio_iommu *iommu, >> >> mutex_lock(&iommu->lock); >> >> + /* Cannot update vaddr if mdev is present. */ >> + if (invalidate_vaddr && !list_empty(&iommu->emulated_iommu_groups)) >> + goto unlock; >> + >> pgshift = __ffs(iommu->pgsize_bitmap); >> pgsize = (size_t)1 << pgshift; >> >> @@ -2189,6 +2193,10 @@ static int vfio_iommu_type1_attach_group(void *iommu_data, >> >> mutex_lock(&iommu->lock); >> >> + /* Prevent an mdev from sneaking in while vaddr flags are used. */ >> + if (iommu->vaddr_invalid_count && type == VFIO_EMULATED_IOMMU) >> + goto out_unlock; > > Why only mdev devices? If we restrict that the user cannot attach a > group while there are invalid vaddrs, and the pin/unpin pages and > dma_rw interfaces are restricted to cases where vaddr_invalid_count is > zero, then we can get rid of all the code to handle waiting for vaddrs. > ie. we could still revert: > > 898b9eaeb3fe ("vfio/type1: block on invalid vaddr") > 487ace134053 ("vfio/type1: implement notify callback") > ec5e32940cc9 ("vfio: iommu driver notify callback") > > It appears to me it might be easiest to lead with a clean revert of > these, then follow-up imposing the usage restrictions, and I'd go ahead > and add WARN_ON error paths to the pin/unpin/dma_rw paths to make sure > nobody enters those paths with an elevated invalid count. Thanks, Will do. I think I will put the revert at the end, though, as dead code clean up. That patch will be larger, and if it is judged to be too large for stable, it can be omitted from stable. - Steve >> + >> /* Check for duplicates */ >> if (vfio_iommu_find_iommu_group(iommu, iommu_group)) >> goto out_unlock; >> @@ -2660,6 +2668,20 @@ static int vfio_domains_have_enforce_cache_coherency(struct vfio_iommu *iommu) >> return ret; >> } >> >> +/* >> + * Disable this feature if mdevs are present. They cannot safely pin/unpin >> + * while vaddrs are being updated. >> + */ >> +static int vfio_iommu_can_update_vaddr(struct vfio_iommu *iommu) >> +{ >> + int ret; >> + >> + mutex_lock(&iommu->lock); >> + ret = list_empty(&iommu->emulated_iommu_groups); >> + mutex_unlock(&iommu->lock); >> + return ret; >> +} >> + >> static int vfio_iommu_type1_check_extension(struct vfio_iommu *iommu, >> unsigned long arg) >> { >> @@ -2668,8 +2690,9 @@ static int vfio_iommu_type1_check_extension(struct vfio_iommu *iommu, >> case VFIO_TYPE1v2_IOMMU: >> case VFIO_TYPE1_NESTING_IOMMU: >> case VFIO_UNMAP_ALL: >> - case VFIO_UPDATE_VADDR: >> return 1; >> + case VFIO_UPDATE_VADDR: >> + return iommu && vfio_iommu_can_update_vaddr(iommu); >> case VFIO_DMA_CC_IOMMU: >> if (!iommu) >> return 0; >> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h >> index d7d8e09..6d36b84 100644 >> --- a/include/uapi/linux/vfio.h >> +++ b/include/uapi/linux/vfio.h >> @@ -49,7 +49,11 @@ >> /* Supports VFIO_DMA_UNMAP_FLAG_ALL */ >> #define VFIO_UNMAP_ALL 9 >> >> -/* Supports the vaddr flag for DMA map and unmap */ >> +/* >> + * Supports the vaddr flag for DMA map and unmap. Not supported for mediated >> + * devices, so this capability is subject to change as groups are added or >> + * removed. >> + */ >> #define VFIO_UPDATE_VADDR 10 >> >> /* > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH V1 1/2] vfio/type1: exclude mdevs from VFIO_UPDATE_VADDR 2022-12-13 16:54 ` Steven Sistare @ 2022-12-13 17:31 ` Alex Williamson 2022-12-13 17:42 ` Steven Sistare 0 siblings, 1 reply; 12+ messages in thread From: Alex Williamson @ 2022-12-13 17:31 UTC (permalink / raw) To: Steven Sistare; +Cc: kvm, Cornelia Huck, Jason Gunthorpe On Tue, 13 Dec 2022 11:54:33 -0500 Steven Sistare <steven.sistare@oracle.com> wrote: > On 12/13/2022 11:26 AM, Alex Williamson wrote: > > On Tue, 13 Dec 2022 07:46:55 -0800 > > Steve Sistare <steven.sistare@oracle.com> wrote: > > > >> Disable the VFIO_UPDATE_VADDR capability if mediated devices are present. > >> Their kernel threads could be blocked indefinitely by a misbehaving > >> userland while trying to pin/unpin pages while vaddrs are being updated. > > > > Fixes: c3cbab24db38 ("vfio/type1: implement interfaces to update vaddr") > > > >> Signed-off-by: Steve Sistare <steven.sistare@oracle.com> > >> --- > >> drivers/vfio/vfio_iommu_type1.c | 25 ++++++++++++++++++++++++- > >> include/uapi/linux/vfio.h | 6 +++++- > >> 2 files changed, 29 insertions(+), 2 deletions(-) > >> > >> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c > >> index 23c24fe..f81e925 100644 > >> --- a/drivers/vfio/vfio_iommu_type1.c > >> +++ b/drivers/vfio/vfio_iommu_type1.c > >> @@ -1343,6 +1343,10 @@ static int vfio_dma_do_unmap(struct vfio_iommu *iommu, > >> > >> mutex_lock(&iommu->lock); > >> > >> + /* Cannot update vaddr if mdev is present. */ > >> + if (invalidate_vaddr && !list_empty(&iommu->emulated_iommu_groups)) > >> + goto unlock; > >> + > >> pgshift = __ffs(iommu->pgsize_bitmap); > >> pgsize = (size_t)1 << pgshift; > >> > >> @@ -2189,6 +2193,10 @@ static int vfio_iommu_type1_attach_group(void *iommu_data, > >> > >> mutex_lock(&iommu->lock); > >> > >> + /* Prevent an mdev from sneaking in while vaddr flags are used. */ > >> + if (iommu->vaddr_invalid_count && type == VFIO_EMULATED_IOMMU) > >> + goto out_unlock; > > > > Why only mdev devices? If we restrict that the user cannot attach a > > group while there are invalid vaddrs, and the pin/unpin pages and > > dma_rw interfaces are restricted to cases where vaddr_invalid_count is > > zero, then we can get rid of all the code to handle waiting for vaddrs. > > ie. we could still revert: > > > > 898b9eaeb3fe ("vfio/type1: block on invalid vaddr") > > 487ace134053 ("vfio/type1: implement notify callback") > > ec5e32940cc9 ("vfio: iommu driver notify callback") > > > > It appears to me it might be easiest to lead with a clean revert of > > these, then follow-up imposing the usage restrictions, and I'd go ahead > > and add WARN_ON error paths to the pin/unpin/dma_rw paths to make sure > > nobody enters those paths with an elevated invalid count. Thanks, > > Will do. I think I will put the revert at the end, though, as dead code > clean up. That patch will be larger, and if it is judged to be too large > for stable, it can be omitted from stable. > > - Steve > > >> + > >> /* Check for duplicates */ > >> if (vfio_iommu_find_iommu_group(iommu, iommu_group)) > >> goto out_unlock; > >> @@ -2660,6 +2668,20 @@ static int vfio_domains_have_enforce_cache_coherency(struct vfio_iommu *iommu) > >> return ret; > >> } > >> > >> +/* > >> + * Disable this feature if mdevs are present. They cannot safely pin/unpin > >> + * while vaddrs are being updated. > >> + */ > >> +static int vfio_iommu_can_update_vaddr(struct vfio_iommu *iommu) > >> +{ > >> + int ret; > >> + > >> + mutex_lock(&iommu->lock); > >> + ret = list_empty(&iommu->emulated_iommu_groups); > >> + mutex_unlock(&iommu->lock); > >> + return ret; > >> +} I'd also keep this generic to what it's actually doing, ie. simply reporting if emulated_iommu_groups are present, so it could be something like: static bool vfio_iommu_has_emulated(struct vfio_iommu *iommu) OTOH, I'm not sure it actually makes sense to dynamically change reported value, the IOMMU backend supports vaddr update, but there are usage restrictions and there's no way that this test can't be racy w/ other user actions. Does it have specific utility in userspace to test for this immediately before a live-update, or would QEMU enable support for the feature based only on some initial condition? Thanks, Alex > >> + > >> static int vfio_iommu_type1_check_extension(struct vfio_iommu *iommu, > >> unsigned long arg) > >> { > >> @@ -2668,8 +2690,9 @@ static int vfio_iommu_type1_check_extension(struct vfio_iommu *iommu, > >> case VFIO_TYPE1v2_IOMMU: > >> case VFIO_TYPE1_NESTING_IOMMU: > >> case VFIO_UNMAP_ALL: > >> - case VFIO_UPDATE_VADDR: > >> return 1; > >> + case VFIO_UPDATE_VADDR: > >> + return iommu && vfio_iommu_can_update_vaddr(iommu); > >> case VFIO_DMA_CC_IOMMU: > >> if (!iommu) > >> return 0; > >> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h > >> index d7d8e09..6d36b84 100644 > >> --- a/include/uapi/linux/vfio.h > >> +++ b/include/uapi/linux/vfio.h > >> @@ -49,7 +49,11 @@ > >> /* Supports VFIO_DMA_UNMAP_FLAG_ALL */ > >> #define VFIO_UNMAP_ALL 9 > >> > >> -/* Supports the vaddr flag for DMA map and unmap */ > >> +/* > >> + * Supports the vaddr flag for DMA map and unmap. Not supported for mediated > >> + * devices, so this capability is subject to change as groups are added or > >> + * removed. > >> + */ > >> #define VFIO_UPDATE_VADDR 10 > >> > >> /* > > > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH V1 1/2] vfio/type1: exclude mdevs from VFIO_UPDATE_VADDR 2022-12-13 17:31 ` Alex Williamson @ 2022-12-13 17:42 ` Steven Sistare 0 siblings, 0 replies; 12+ messages in thread From: Steven Sistare @ 2022-12-13 17:42 UTC (permalink / raw) To: Alex Williamson; +Cc: kvm, Cornelia Huck, Jason Gunthorpe On 12/13/2022 12:31 PM, Alex Williamson wrote: > On Tue, 13 Dec 2022 11:54:33 -0500 > Steven Sistare <steven.sistare@oracle.com> wrote: > >> On 12/13/2022 11:26 AM, Alex Williamson wrote: >>> On Tue, 13 Dec 2022 07:46:55 -0800 >>> Steve Sistare <steven.sistare@oracle.com> wrote: >>> >>>> Disable the VFIO_UPDATE_VADDR capability if mediated devices are present. >>>> Their kernel threads could be blocked indefinitely by a misbehaving >>>> userland while trying to pin/unpin pages while vaddrs are being updated. >>> >>> Fixes: c3cbab24db38 ("vfio/type1: implement interfaces to update vaddr") >>> >>>> Signed-off-by: Steve Sistare <steven.sistare@oracle.com> >>>> --- >>>> drivers/vfio/vfio_iommu_type1.c | 25 ++++++++++++++++++++++++- >>>> include/uapi/linux/vfio.h | 6 +++++- >>>> 2 files changed, 29 insertions(+), 2 deletions(-) >>>> >>>> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c >>>> index 23c24fe..f81e925 100644 >>>> --- a/drivers/vfio/vfio_iommu_type1.c >>>> +++ b/drivers/vfio/vfio_iommu_type1.c >>>> @@ -1343,6 +1343,10 @@ static int vfio_dma_do_unmap(struct vfio_iommu *iommu, >>>> >>>> mutex_lock(&iommu->lock); >>>> >>>> + /* Cannot update vaddr if mdev is present. */ >>>> + if (invalidate_vaddr && !list_empty(&iommu->emulated_iommu_groups)) >>>> + goto unlock; >>>> + >>>> pgshift = __ffs(iommu->pgsize_bitmap); >>>> pgsize = (size_t)1 << pgshift; >>>> >>>> @@ -2189,6 +2193,10 @@ static int vfio_iommu_type1_attach_group(void *iommu_data, >>>> >>>> mutex_lock(&iommu->lock); >>>> >>>> + /* Prevent an mdev from sneaking in while vaddr flags are used. */ >>>> + if (iommu->vaddr_invalid_count && type == VFIO_EMULATED_IOMMU) >>>> + goto out_unlock; >>> >>> Why only mdev devices? If we restrict that the user cannot attach a >>> group while there are invalid vaddrs, and the pin/unpin pages and >>> dma_rw interfaces are restricted to cases where vaddr_invalid_count is >>> zero, then we can get rid of all the code to handle waiting for vaddrs. >>> ie. we could still revert: >>> >>> 898b9eaeb3fe ("vfio/type1: block on invalid vaddr") >>> 487ace134053 ("vfio/type1: implement notify callback") >>> ec5e32940cc9 ("vfio: iommu driver notify callback") >>> >>> It appears to me it might be easiest to lead with a clean revert of >>> these, then follow-up imposing the usage restrictions, and I'd go ahead >>> and add WARN_ON error paths to the pin/unpin/dma_rw paths to make sure >>> nobody enters those paths with an elevated invalid count. Thanks, >> >> Will do. I think I will put the revert at the end, though, as dead code >> clean up. That patch will be larger, and if it is judged to be too large >> for stable, it can be omitted from stable. >> >> - Steve >> >>>> + >>>> /* Check for duplicates */ >>>> if (vfio_iommu_find_iommu_group(iommu, iommu_group)) >>>> goto out_unlock; >>>> @@ -2660,6 +2668,20 @@ static int vfio_domains_have_enforce_cache_coherency(struct vfio_iommu *iommu) >>>> return ret; >>>> } >>>> >>>> +/* >>>> + * Disable this feature if mdevs are present. They cannot safely pin/unpin >>>> + * while vaddrs are being updated. >>>> + */ >>>> +static int vfio_iommu_can_update_vaddr(struct vfio_iommu *iommu) >>>> +{ >>>> + int ret; >>>> + >>>> + mutex_lock(&iommu->lock); >>>> + ret = list_empty(&iommu->emulated_iommu_groups); >>>> + mutex_unlock(&iommu->lock); >>>> + return ret; >>>> +} > > I'd also keep this generic to what it's actually doing, ie. simply > reporting if emulated_iommu_groups are present, so it could be > something like: > > static bool vfio_iommu_has_emulated(struct vfio_iommu *iommu) OK. > OTOH, I'm not sure it actually makes sense to dynamically change > reported value, the IOMMU backend supports vaddr update, but there are > usage restrictions and there's no way that this test can't be racy w/ > other user actions. Yes, but VFIO_DMA_CC_IOMMU has the same issue and the same comment, "this capability is subject to change as groups are added or removed." > Does it have specific utility in userspace to test > for this immediately before a live-update, or would QEMU enable support > for the feature based only on some initial condition? Thanks, We always check immediately before live update. We also check at startup if only-cpr-capable is specified. Now the latter check will be done later in startup, after groups are added. - Steve >>>> + >>>> static int vfio_iommu_type1_check_extension(struct vfio_iommu *iommu, >>>> unsigned long arg) >>>> { >>>> @@ -2668,8 +2690,9 @@ static int vfio_iommu_type1_check_extension(struct vfio_iommu *iommu, >>>> case VFIO_TYPE1v2_IOMMU: >>>> case VFIO_TYPE1_NESTING_IOMMU: >>>> case VFIO_UNMAP_ALL: >>>> - case VFIO_UPDATE_VADDR: >>>> return 1; >>>> + case VFIO_UPDATE_VADDR: >>>> + return iommu && vfio_iommu_can_update_vaddr(iommu); >>>> case VFIO_DMA_CC_IOMMU: >>>> if (!iommu) >>>> return 0; >>>> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h >>>> index d7d8e09..6d36b84 100644 >>>> --- a/include/uapi/linux/vfio.h >>>> +++ b/include/uapi/linux/vfio.h >>>> @@ -49,7 +49,11 @@ >>>> /* Supports VFIO_DMA_UNMAP_FLAG_ALL */ >>>> #define VFIO_UNMAP_ALL 9 >>>> >>>> -/* Supports the vaddr flag for DMA map and unmap */ >>>> +/* >>>> + * Supports the vaddr flag for DMA map and unmap. Not supported for mediated >>>> + * devices, so this capability is subject to change as groups are added or >>>> + * removed. >>>> + */ >>>> #define VFIO_UPDATE_VADDR 10 >>>> >>>> /* >>> >> > ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH V1 2/2] vfio/type1: prevent locked_vm underflow 2022-12-13 15:46 [PATCH V1 0/2] fixes for virtual address update Steve Sistare 2022-12-13 15:46 ` [PATCH V1 1/2] vfio/type1: exclude mdevs from VFIO_UPDATE_VADDR Steve Sistare @ 2022-12-13 15:46 ` Steve Sistare 2022-12-13 18:02 ` Alex Williamson 1 sibling, 1 reply; 12+ messages in thread From: Steve Sistare @ 2022-12-13 15:46 UTC (permalink / raw) To: kvm; +Cc: Alex Williamson, Cornelia Huck, Steve Sistare When a vfio container is preserved across exec using the VFIO_UPDATE_VADDR interfaces, locked_vm of the new mm becomes 0. If the user later unmaps a dma mapping, locked_vm underflows to a large unsigned value, and a subsequent dma map request fails with ENOMEM in __account_locked_vm. To fix, when VFIO_DMA_MAP_FLAG_VADDR is used and the dma's mm has changed, add the mapping's pinned page count to the new mm->locked_vm, subject to the rlimit. Now that mediated devices are excluded when using VFIO_UPDATE_VADDR, the amount of pinned memory equals the size of the mapping. Underflow will not occur when all dma mappings are invalidated before exec. An attempt to unmap before updating the vaddr with VFIO_DMA_MAP_FLAG_VADDR will fail with EINVAL because the mapping is in the vaddr_invalid state. Underflow may still occur in a buggy application that fails to invalidate all before exec. Signed-off-by: Steve Sistare <steven.sistare@oracle.com> --- drivers/vfio/vfio_iommu_type1.c | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c index f81e925..e5a02f8 100644 --- a/drivers/vfio/vfio_iommu_type1.c +++ b/drivers/vfio/vfio_iommu_type1.c @@ -100,6 +100,7 @@ struct vfio_dma { struct task_struct *task; struct rb_root pfn_list; /* Ex-user pinned pfn list */ unsigned long *bitmap; + struct mm_struct *mm; }; struct vfio_batch { @@ -1174,6 +1175,7 @@ static void vfio_remove_dma(struct vfio_iommu *iommu, struct vfio_dma *dma) vfio_unmap_unpin(iommu, dma, true); vfio_unlink_dma(iommu, dma); put_task_struct(dma->task); + mmdrop(dma->mm); vfio_dma_bitmap_free(dma); if (dma->vaddr_invalid) { iommu->vaddr_invalid_count--; @@ -1622,6 +1624,13 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu, dma->vaddr = vaddr; dma->vaddr_invalid = false; iommu->vaddr_invalid_count--; + if (current->mm != dma->mm) { + mmdrop(dma->mm); + dma->mm = current->mm; + mmgrab(dma->mm); + ret = vfio_lock_acct(dma, size >> PAGE_SHIFT, + 0); + } wake_up_all(&iommu->vaddr_wait); } goto out_unlock; @@ -1679,6 +1688,8 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu, get_task_struct(current->group_leader); dma->task = current->group_leader; dma->lock_cap = capable(CAP_IPC_LOCK); + dma->mm = dma->task->mm; + mmgrab(dma->mm); dma->pfn_list = RB_ROOT; -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH V1 2/2] vfio/type1: prevent locked_vm underflow 2022-12-13 15:46 ` [PATCH V1 2/2] vfio/type1: prevent locked_vm underflow Steve Sistare @ 2022-12-13 18:02 ` Alex Williamson 2022-12-13 18:17 ` Steven Sistare 0 siblings, 1 reply; 12+ messages in thread From: Alex Williamson @ 2022-12-13 18:02 UTC (permalink / raw) To: Steve Sistare; +Cc: kvm, Cornelia Huck On Tue, 13 Dec 2022 07:46:56 -0800 Steve Sistare <steven.sistare@oracle.com> wrote: > When a vfio container is preserved across exec using the VFIO_UPDATE_VADDR > interfaces, locked_vm of the new mm becomes 0. If the user later unmaps a > dma mapping, locked_vm underflows to a large unsigned value, and a > subsequent dma map request fails with ENOMEM in __account_locked_vm. > > To fix, when VFIO_DMA_MAP_FLAG_VADDR is used and the dma's mm has changed, > add the mapping's pinned page count to the new mm->locked_vm, subject to > the rlimit. Now that mediated devices are excluded when using > VFIO_UPDATE_VADDR, the amount of pinned memory equals the size of the > mapping. > > Underflow will not occur when all dma mappings are invalidated before exec. > An attempt to unmap before updating the vaddr with VFIO_DMA_MAP_FLAG_VADDR > will fail with EINVAL because the mapping is in the vaddr_invalid state. Where is this enforced? > Underflow may still occur in a buggy application that fails to invalidate > all before exec. > > Signed-off-by: Steve Sistare <steven.sistare@oracle.com> > --- > drivers/vfio/vfio_iommu_type1.c | 11 +++++++++++ > 1 file changed, 11 insertions(+) > > diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c > index f81e925..e5a02f8 100644 > --- a/drivers/vfio/vfio_iommu_type1.c > +++ b/drivers/vfio/vfio_iommu_type1.c > @@ -100,6 +100,7 @@ struct vfio_dma { > struct task_struct *task; > struct rb_root pfn_list; /* Ex-user pinned pfn list */ > unsigned long *bitmap; > + struct mm_struct *mm; > }; > > struct vfio_batch { > @@ -1174,6 +1175,7 @@ static void vfio_remove_dma(struct vfio_iommu *iommu, struct vfio_dma *dma) > vfio_unmap_unpin(iommu, dma, true); > vfio_unlink_dma(iommu, dma); > put_task_struct(dma->task); > + mmdrop(dma->mm); > vfio_dma_bitmap_free(dma); > if (dma->vaddr_invalid) { > iommu->vaddr_invalid_count--; > @@ -1622,6 +1624,13 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu, > dma->vaddr = vaddr; > dma->vaddr_invalid = false; > iommu->vaddr_invalid_count--; > + if (current->mm != dma->mm) { > + mmdrop(dma->mm); > + dma->mm = current->mm; > + mmgrab(dma->mm); > + ret = vfio_lock_acct(dma, size >> PAGE_SHIFT, > + 0); What does it actually mean if this fails? The pages are still pinned. lock_vm doesn't get updated. Underflow can still occur. Thanks, Alex > + } > wake_up_all(&iommu->vaddr_wait); > } > goto out_unlock; > @@ -1679,6 +1688,8 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu, > get_task_struct(current->group_leader); > dma->task = current->group_leader; > dma->lock_cap = capable(CAP_IPC_LOCK); > + dma->mm = dma->task->mm; > + mmgrab(dma->mm); > > dma->pfn_list = RB_ROOT; > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH V1 2/2] vfio/type1: prevent locked_vm underflow 2022-12-13 18:02 ` Alex Williamson @ 2022-12-13 18:17 ` Steven Sistare 2022-12-13 18:21 ` Steven Sistare 0 siblings, 1 reply; 12+ messages in thread From: Steven Sistare @ 2022-12-13 18:17 UTC (permalink / raw) To: Alex Williamson; +Cc: kvm, Cornelia Huck On 12/13/2022 1:02 PM, Alex Williamson wrote: > On Tue, 13 Dec 2022 07:46:56 -0800 > Steve Sistare <steven.sistare@oracle.com> wrote: > >> When a vfio container is preserved across exec using the VFIO_UPDATE_VADDR >> interfaces, locked_vm of the new mm becomes 0. If the user later unmaps a >> dma mapping, locked_vm underflows to a large unsigned value, and a >> subsequent dma map request fails with ENOMEM in __account_locked_vm. >> >> To fix, when VFIO_DMA_MAP_FLAG_VADDR is used and the dma's mm has changed, >> add the mapping's pinned page count to the new mm->locked_vm, subject to >> the rlimit. Now that mediated devices are excluded when using >> VFIO_UPDATE_VADDR, the amount of pinned memory equals the size of the >> mapping. >> >> Underflow will not occur when all dma mappings are invalidated before exec. >> An attempt to unmap before updating the vaddr with VFIO_DMA_MAP_FLAG_VADDR >> will fail with EINVAL because the mapping is in the vaddr_invalid state. > > Where is this enforced? In vfio_dma_do_unmap: if (invalidate_vaddr) { if (dma->vaddr_invalid) { ... ret = -EINVAL; >> Underflow may still occur in a buggy application that fails to invalidate >> all before exec. >> >> Signed-off-by: Steve Sistare <steven.sistare@oracle.com> >> --- >> drivers/vfio/vfio_iommu_type1.c | 11 +++++++++++ >> 1 file changed, 11 insertions(+) >> >> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c >> index f81e925..e5a02f8 100644 >> --- a/drivers/vfio/vfio_iommu_type1.c >> +++ b/drivers/vfio/vfio_iommu_type1.c >> @@ -100,6 +100,7 @@ struct vfio_dma { >> struct task_struct *task; >> struct rb_root pfn_list; /* Ex-user pinned pfn list */ >> unsigned long *bitmap; >> + struct mm_struct *mm; >> }; >> >> struct vfio_batch { >> @@ -1174,6 +1175,7 @@ static void vfio_remove_dma(struct vfio_iommu *iommu, struct vfio_dma *dma) >> vfio_unmap_unpin(iommu, dma, true); >> vfio_unlink_dma(iommu, dma); >> put_task_struct(dma->task); >> + mmdrop(dma->mm); >> vfio_dma_bitmap_free(dma); >> if (dma->vaddr_invalid) { >> iommu->vaddr_invalid_count--; >> @@ -1622,6 +1624,13 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu, >> dma->vaddr = vaddr; >> dma->vaddr_invalid = false; >> iommu->vaddr_invalid_count--; >> + if (current->mm != dma->mm) { >> + mmdrop(dma->mm); >> + dma->mm = current->mm; >> + mmgrab(dma->mm); >> + ret = vfio_lock_acct(dma, size >> PAGE_SHIFT, >> + 0); > > What does it actually mean if this fails? The pages are still pinned. > lock_vm doesn't get updated. Underflow can still occur. Thanks, If this fails, the user has locked additional memory after exec and before making this call -- more than was locked before exec -- and the rlimit is exceeded. A misbehaving application, which will only hurt itself. However, I should reorder these, and check ret before changing the other state. - Steve >> + } >> wake_up_all(&iommu->vaddr_wait); >> } >> goto out_unlock; >> @@ -1679,6 +1688,8 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu, >> get_task_struct(current->group_leader); >> dma->task = current->group_leader; >> dma->lock_cap = capable(CAP_IPC_LOCK); >> + dma->mm = dma->task->mm; >> + mmgrab(dma->mm); >> >> dma->pfn_list = RB_ROOT; >> > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH V1 2/2] vfio/type1: prevent locked_vm underflow 2022-12-13 18:17 ` Steven Sistare @ 2022-12-13 18:21 ` Steven Sistare 2022-12-13 19:29 ` Alex Williamson 0 siblings, 1 reply; 12+ messages in thread From: Steven Sistare @ 2022-12-13 18:21 UTC (permalink / raw) To: Alex Williamson; +Cc: kvm, Cornelia Huck On 12/13/2022 1:17 PM, Steven Sistare wrote: > On 12/13/2022 1:02 PM, Alex Williamson wrote: >> On Tue, 13 Dec 2022 07:46:56 -0800 >> Steve Sistare <steven.sistare@oracle.com> wrote: >> >>> When a vfio container is preserved across exec using the VFIO_UPDATE_VADDR >>> interfaces, locked_vm of the new mm becomes 0. If the user later unmaps a >>> dma mapping, locked_vm underflows to a large unsigned value, and a >>> subsequent dma map request fails with ENOMEM in __account_locked_vm. >>> >>> To fix, when VFIO_DMA_MAP_FLAG_VADDR is used and the dma's mm has changed, >>> add the mapping's pinned page count to the new mm->locked_vm, subject to >>> the rlimit. Now that mediated devices are excluded when using >>> VFIO_UPDATE_VADDR, the amount of pinned memory equals the size of the >>> mapping. >>> >>> Underflow will not occur when all dma mappings are invalidated before exec. >>> An attempt to unmap before updating the vaddr with VFIO_DMA_MAP_FLAG_VADDR >>> will fail with EINVAL because the mapping is in the vaddr_invalid state. >> >> Where is this enforced? > > In vfio_dma_do_unmap: > if (invalidate_vaddr) { > if (dma->vaddr_invalid) { > ... > ret = -EINVAL; My bad, this is a different case, and my comment in the commit message is incorrect. I should test mm != dma->mm during unmap as well, and suppress the locked_vm deduction there. - Steve >>> Underflow may still occur in a buggy application that fails to invalidate >>> all before exec. >>> >>> Signed-off-by: Steve Sistare <steven.sistare@oracle.com> >>> --- >>> drivers/vfio/vfio_iommu_type1.c | 11 +++++++++++ >>> 1 file changed, 11 insertions(+) >>> >>> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c >>> index f81e925..e5a02f8 100644 >>> --- a/drivers/vfio/vfio_iommu_type1.c >>> +++ b/drivers/vfio/vfio_iommu_type1.c >>> @@ -100,6 +100,7 @@ struct vfio_dma { >>> struct task_struct *task; >>> struct rb_root pfn_list; /* Ex-user pinned pfn list */ >>> unsigned long *bitmap; >>> + struct mm_struct *mm; >>> }; >>> >>> struct vfio_batch { >>> @@ -1174,6 +1175,7 @@ static void vfio_remove_dma(struct vfio_iommu *iommu, struct vfio_dma *dma) >>> vfio_unmap_unpin(iommu, dma, true); >>> vfio_unlink_dma(iommu, dma); >>> put_task_struct(dma->task); >>> + mmdrop(dma->mm); >>> vfio_dma_bitmap_free(dma); >>> if (dma->vaddr_invalid) { >>> iommu->vaddr_invalid_count--; >>> @@ -1622,6 +1624,13 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu, >>> dma->vaddr = vaddr; >>> dma->vaddr_invalid = false; >>> iommu->vaddr_invalid_count--; >>> + if (current->mm != dma->mm) { >>> + mmdrop(dma->mm); >>> + dma->mm = current->mm; >>> + mmgrab(dma->mm); >>> + ret = vfio_lock_acct(dma, size >> PAGE_SHIFT, >>> + 0); >> >> What does it actually mean if this fails? The pages are still pinned. >> lock_vm doesn't get updated. Underflow can still occur. Thanks, > > If this fails, the user has locked additional memory after exec and before making > this call -- more than was locked before exec -- and the rlimit is exceeded. > A misbehaving application, which will only hurt itself. > > However, I should reorder these, and check ret before changing the other state. > > - Steve > >>> + } >>> wake_up_all(&iommu->vaddr_wait); >>> } >>> goto out_unlock; >>> @@ -1679,6 +1688,8 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu, >>> get_task_struct(current->group_leader); >>> dma->task = current->group_leader; >>> dma->lock_cap = capable(CAP_IPC_LOCK); >>> + dma->mm = dma->task->mm; >>> + mmgrab(dma->mm); >>> >>> dma->pfn_list = RB_ROOT; >>> >> ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH V1 2/2] vfio/type1: prevent locked_vm underflow 2022-12-13 18:21 ` Steven Sistare @ 2022-12-13 19:29 ` Alex Williamson 2022-12-13 19:40 ` Steven Sistare 0 siblings, 1 reply; 12+ messages in thread From: Alex Williamson @ 2022-12-13 19:29 UTC (permalink / raw) To: Steven Sistare; +Cc: kvm, Cornelia Huck On Tue, 13 Dec 2022 13:21:15 -0500 Steven Sistare <steven.sistare@oracle.com> wrote: > On 12/13/2022 1:17 PM, Steven Sistare wrote: > > On 12/13/2022 1:02 PM, Alex Williamson wrote: > >> On Tue, 13 Dec 2022 07:46:56 -0800 > >> Steve Sistare <steven.sistare@oracle.com> wrote: > >> > >>> When a vfio container is preserved across exec using the VFIO_UPDATE_VADDR > >>> interfaces, locked_vm of the new mm becomes 0. If the user later unmaps a > >>> dma mapping, locked_vm underflows to a large unsigned value, and a > >>> subsequent dma map request fails with ENOMEM in __account_locked_vm. > >>> > >>> To fix, when VFIO_DMA_MAP_FLAG_VADDR is used and the dma's mm has changed, > >>> add the mapping's pinned page count to the new mm->locked_vm, subject to > >>> the rlimit. Now that mediated devices are excluded when using > >>> VFIO_UPDATE_VADDR, the amount of pinned memory equals the size of the > >>> mapping. > >>> > >>> Underflow will not occur when all dma mappings are invalidated before exec. > >>> An attempt to unmap before updating the vaddr with VFIO_DMA_MAP_FLAG_VADDR > >>> will fail with EINVAL because the mapping is in the vaddr_invalid state. > >> > >> Where is this enforced? > > > > In vfio_dma_do_unmap: > > if (invalidate_vaddr) { > > if (dma->vaddr_invalid) { > > ... > > ret = -EINVAL; > > My bad, this is a different case, and my comment in the commit message is > incorrect. I should test mm != dma->mm during unmap as well, and suppress > the locked_vm deduction there. I'm getting confused how this patch actually does anything. We grab the mm of the task doing mappings, and we swap that grab when updating the vaddr, but vfio_lock_acct() uses the original dma->task mm for accounting. Therefore how can an underflow occur? It seems we're simply failing to adjust locked_vm for the new mm at all. > >>> Underflow may still occur in a buggy application that fails to invalidate > >>> all before exec. > >>> > >>> Signed-off-by: Steve Sistare <steven.sistare@oracle.com> > >>> --- > >>> drivers/vfio/vfio_iommu_type1.c | 11 +++++++++++ > >>> 1 file changed, 11 insertions(+) > >>> > >>> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c > >>> index f81e925..e5a02f8 100644 > >>> --- a/drivers/vfio/vfio_iommu_type1.c > >>> +++ b/drivers/vfio/vfio_iommu_type1.c > >>> @@ -100,6 +100,7 @@ struct vfio_dma { > >>> struct task_struct *task; > >>> struct rb_root pfn_list; /* Ex-user pinned pfn list */ > >>> unsigned long *bitmap; > >>> + struct mm_struct *mm; > >>> }; > >>> > >>> struct vfio_batch { > >>> @@ -1174,6 +1175,7 @@ static void vfio_remove_dma(struct vfio_iommu *iommu, struct vfio_dma *dma) > >>> vfio_unmap_unpin(iommu, dma, true); > >>> vfio_unlink_dma(iommu, dma); > >>> put_task_struct(dma->task); > >>> + mmdrop(dma->mm); > >>> vfio_dma_bitmap_free(dma); > >>> if (dma->vaddr_invalid) { > >>> iommu->vaddr_invalid_count--; > >>> @@ -1622,6 +1624,13 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu, > >>> dma->vaddr = vaddr; > >>> dma->vaddr_invalid = false; > >>> iommu->vaddr_invalid_count--; > >>> + if (current->mm != dma->mm) { > >>> + mmdrop(dma->mm); > >>> + dma->mm = current->mm; > >>> + mmgrab(dma->mm); > >>> + ret = vfio_lock_acct(dma, size >> PAGE_SHIFT, > >>> + 0); > >> > >> What does it actually mean if this fails? The pages are still pinned. > >> lock_vm doesn't get updated. Underflow can still occur. Thanks, > > > > If this fails, the user has locked additional memory after exec and before making > > this call -- more than was locked before exec -- and the rlimit is exceeded. > > A misbehaving application, which will only hurt itself. > > > > However, I should reorder these, and check ret before changing the other state. The result would then be that the mapping remains with vaddr_invalid on error? Thanks, Alex > >>> + } > >>> wake_up_all(&iommu->vaddr_wait); > >>> } > >>> goto out_unlock; > >>> @@ -1679,6 +1688,8 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu, > >>> get_task_struct(current->group_leader); > >>> dma->task = current->group_leader; > >>> dma->lock_cap = capable(CAP_IPC_LOCK); > >>> + dma->mm = dma->task->mm; > >>> + mmgrab(dma->mm); > >>> > >>> dma->pfn_list = RB_ROOT; > >>> > >> > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH V1 2/2] vfio/type1: prevent locked_vm underflow 2022-12-13 19:29 ` Alex Williamson @ 2022-12-13 19:40 ` Steven Sistare 0 siblings, 0 replies; 12+ messages in thread From: Steven Sistare @ 2022-12-13 19:40 UTC (permalink / raw) To: Alex Williamson; +Cc: kvm, Cornelia Huck On 12/13/2022 2:29 PM, Alex Williamson wrote: > On Tue, 13 Dec 2022 13:21:15 -0500 > Steven Sistare <steven.sistare@oracle.com> wrote: > >> On 12/13/2022 1:17 PM, Steven Sistare wrote: >>> On 12/13/2022 1:02 PM, Alex Williamson wrote: >>>> On Tue, 13 Dec 2022 07:46:56 -0800 >>>> Steve Sistare <steven.sistare@oracle.com> wrote: >>>> >>>>> When a vfio container is preserved across exec using the VFIO_UPDATE_VADDR >>>>> interfaces, locked_vm of the new mm becomes 0. If the user later unmaps a >>>>> dma mapping, locked_vm underflows to a large unsigned value, and a >>>>> subsequent dma map request fails with ENOMEM in __account_locked_vm. >>>>> >>>>> To fix, when VFIO_DMA_MAP_FLAG_VADDR is used and the dma's mm has changed, >>>>> add the mapping's pinned page count to the new mm->locked_vm, subject to >>>>> the rlimit. Now that mediated devices are excluded when using >>>>> VFIO_UPDATE_VADDR, the amount of pinned memory equals the size of the >>>>> mapping. >>>>> >>>>> Underflow will not occur when all dma mappings are invalidated before exec. >>>>> An attempt to unmap before updating the vaddr with VFIO_DMA_MAP_FLAG_VADDR >>>>> will fail with EINVAL because the mapping is in the vaddr_invalid state. >>>> >>>> Where is this enforced? >>> >>> In vfio_dma_do_unmap: >>> if (invalidate_vaddr) { >>> if (dma->vaddr_invalid) { >>> ... >>> ret = -EINVAL; >> >> My bad, this is a different case, and my comment in the commit message is >> incorrect. I should test mm != dma->mm during unmap as well, and suppress >> the locked_vm deduction there. > > I'm getting confused how this patch actually does anything. We grab > the mm of the task doing mappings, and we swap that grab when updating > the vaddr, but vfio_lock_acct() uses the original dma->task mm for > accounting. Therefore how can an underflow occur? It seems we're > simply failing to adjust locked_vm for the new mm at all. The old code saves dma->task, but not dma->task->mm. The task's mm changes across exec. >>>>> Underflow may still occur in a buggy application that fails to invalidate >>>>> all before exec. >>>>> >>>>> Signed-off-by: Steve Sistare <steven.sistare@oracle.com> >>>>> --- >>>>> drivers/vfio/vfio_iommu_type1.c | 11 +++++++++++ >>>>> 1 file changed, 11 insertions(+) >>>>> >>>>> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c >>>>> index f81e925..e5a02f8 100644 >>>>> --- a/drivers/vfio/vfio_iommu_type1.c >>>>> +++ b/drivers/vfio/vfio_iommu_type1.c >>>>> @@ -100,6 +100,7 @@ struct vfio_dma { >>>>> struct task_struct *task; >>>>> struct rb_root pfn_list; /* Ex-user pinned pfn list */ >>>>> unsigned long *bitmap; >>>>> + struct mm_struct *mm; >>>>> }; >>>>> >>>>> struct vfio_batch { >>>>> @@ -1174,6 +1175,7 @@ static void vfio_remove_dma(struct vfio_iommu *iommu, struct vfio_dma *dma) >>>>> vfio_unmap_unpin(iommu, dma, true); >>>>> vfio_unlink_dma(iommu, dma); >>>>> put_task_struct(dma->task); >>>>> + mmdrop(dma->mm); >>>>> vfio_dma_bitmap_free(dma); >>>>> if (dma->vaddr_invalid) { >>>>> iommu->vaddr_invalid_count--; >>>>> @@ -1622,6 +1624,13 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu, >>>>> dma->vaddr = vaddr; >>>>> dma->vaddr_invalid = false; >>>>> iommu->vaddr_invalid_count--; >>>>> + if (current->mm != dma->mm) { >>>>> + mmdrop(dma->mm); >>>>> + dma->mm = current->mm; >>>>> + mmgrab(dma->mm); >>>>> + ret = vfio_lock_acct(dma, size >> PAGE_SHIFT, >>>>> + 0); >>>> >>>> What does it actually mean if this fails? The pages are still pinned. >>>> lock_vm doesn't get updated. Underflow can still occur. Thanks, >>> >>> If this fails, the user has locked additional memory after exec and before making >>> this call -- more than was locked before exec -- and the rlimit is exceeded. >>> A misbehaving application, which will only hurt itself. >>> >>> However, I should reorder these, and check ret before changing the other state. > > The result would then be that the mapping remains with vaddr_invalid on > error? Thanks, Correct. In theory the app could recover by releasing the extra locked memory that it grabbed, or increase its rlimit, and then try map_flag_vaddr again. - Steve >>>>> + } >>>>> wake_up_all(&iommu->vaddr_wait); >>>>> } >>>>> goto out_unlock; >>>>> @@ -1679,6 +1688,8 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu, >>>>> get_task_struct(current->group_leader); >>>>> dma->task = current->group_leader; >>>>> dma->lock_cap = capable(CAP_IPC_LOCK); >>>>> + dma->mm = dma->task->mm; >>>>> + mmgrab(dma->mm); >>>>> >>>>> dma->pfn_list = RB_ROOT; >>>>> >>>> >> > ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2022-12-13 19:40 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2022-12-13 15:46 [PATCH V1 0/2] fixes for virtual address update Steve Sistare 2022-12-13 15:46 ` [PATCH V1 1/2] vfio/type1: exclude mdevs from VFIO_UPDATE_VADDR Steve Sistare 2022-12-13 16:26 ` Alex Williamson 2022-12-13 16:54 ` Steven Sistare 2022-12-13 17:31 ` Alex Williamson 2022-12-13 17:42 ` Steven Sistare 2022-12-13 15:46 ` [PATCH V1 2/2] vfio/type1: prevent locked_vm underflow Steve Sistare 2022-12-13 18:02 ` Alex Williamson 2022-12-13 18:17 ` Steven Sistare 2022-12-13 18:21 ` Steven Sistare 2022-12-13 19:29 ` Alex Williamson 2022-12-13 19:40 ` Steven Sistare
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox