* [Qemu-devel] [PATCH] vfio/common: Work around kernel overflow bug in DMA unmap
@ 2019-01-09 23:10 Alex Williamson
2019-01-10 3:11 ` Peter Xu
` (2 more replies)
0 siblings, 3 replies; 7+ messages in thread
From: Alex Williamson @ 2019-01-09 23:10 UTC (permalink / raw)
To: alex.williamson; +Cc: qemu-devel, peterx
A kernel bug was introduced in v4.15 via commit 71a7d3d78e3c which
adds a test for address space wrap-around in the vfio DMA unmap path.
Unfortunately due to overflow, the kernel detects an unmap of the last
page in the 64-bit address space as a wrap-around. In QEMU, a Q35
guest with VT-d emulation and guest IOMMU enabled will attempt to make
such an unmap request during VM system reset, triggering an error:
qemu-kvm: VFIO_UNMAP_DMA: -22
qemu-kvm: vfio_dma_unmap(0x561f059948f0, 0xfef00000, 0xffffffff01100000) = -22 (Invalid argument)
Here the IOVA start address (0xfef00000) and the size parameter
(0xffffffff01100000) add to exactly 2^64, triggering the bug. A
kernel fix is queued for the Linux v5.0 release to address this.
This patch implements a workaround to retry the unmap, excluding the
final page of the range when we detect an unmap failing which matches
the requirements for this issue. This is expected to be a safe and
complete workaround as the VT-d address space does not extend to the
full 64-bit space and therefore the last page should never be mapped.
This workaround can be removed once all kernels with this bug are
sufficiently deprecated.
Link: https://bugzilla.redhat.com/show_bug.cgi?id=1662291
Reported-by: Pei Zhang <pezhang@redhat.com>
Debugged-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---
hw/vfio/common.c | 20 +++++++++++++++++++-
hw/vfio/trace-events | 1 +
2 files changed, 20 insertions(+), 1 deletion(-)
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 7c185e5a2e79..820b839057c6 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -220,7 +220,25 @@ static int vfio_dma_unmap(VFIOContainer *container,
.size = size,
};
- if (ioctl(container->fd, VFIO_IOMMU_UNMAP_DMA, &unmap)) {
+ while (ioctl(container->fd, VFIO_IOMMU_UNMAP_DMA, &unmap)) {
+ /*
+ * The type1 backend has an off-by-one bug in the kernel (71a7d3d78e3c
+ * v4.15) where an overflow in its wrap-around check prevents us from
+ * unmapping the last page of the address space. Test for the error
+ * condition and re-try the unmap excluding the last page. The
+ * expectation is that we've never mapped the last page anyway and this
+ * unmap request comes via vIOMMU support which also makes it unlikely
+ * that this page is used. This bug was introduced well after type1 v2
+ * support was introduced, so we shouldn't need to test for v1. A fix
+ * is queued for kernel v5.0 so this workaround can be removed once
+ * affected kernels are sufficiently deprecated.
+ */
+ if (errno == EINVAL && unmap.size && !(unmap.iova + unmap.size) &&
+ container->iommu_type == VFIO_TYPE1v2_IOMMU) {
+ trace_vfio_dma_unmap_overflow_workaround();
+ unmap.size -= 1ULL << ctz64(container->pgsizes);
+ continue;
+ }
error_report("VFIO_UNMAP_DMA: %d", -errno);
return -errno;
}
diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
index a85e8662eadb..a002c6af2dda 100644
--- a/hw/vfio/trace-events
+++ b/hw/vfio/trace-events
@@ -110,6 +110,7 @@ vfio_region_mmaps_set_enabled(const char *name, bool enabled) "Region %s mmaps e
vfio_region_sparse_mmap_header(const char *name, int index, int nr_areas) "Device %s region %d: %d sparse mmap entries"
vfio_region_sparse_mmap_entry(int i, unsigned long start, unsigned long end) "sparse entry %d [0x%lx - 0x%lx]"
vfio_get_dev_region(const char *name, int index, uint32_t type, uint32_t subtype) "%s index %d, %08x/%0x8"
+vfio_dma_unmap_overflow_workaround(void) ""
# hw/vfio/platform.c
vfio_platform_base_device_init(char *name, int groupid) "%s belongs to group #%d"
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [Qemu-devel] [PATCH] vfio/common: Work around kernel overflow bug in DMA unmap
2019-01-09 23:10 [Qemu-devel] [PATCH] vfio/common: Work around kernel overflow bug in DMA unmap Alex Williamson
@ 2019-01-10 3:11 ` Peter Xu
2019-01-10 9:04 ` Cornelia Huck
2025-09-18 20:55 ` Cédric Le Goater
2 siblings, 0 replies; 7+ messages in thread
From: Peter Xu @ 2019-01-10 3:11 UTC (permalink / raw)
To: Alex Williamson; +Cc: qemu-devel
On Wed, Jan 09, 2019 at 04:10:51PM -0700, Alex Williamson wrote:
> A kernel bug was introduced in v4.15 via commit 71a7d3d78e3c which
> adds a test for address space wrap-around in the vfio DMA unmap path.
> Unfortunately due to overflow, the kernel detects an unmap of the last
> page in the 64-bit address space as a wrap-around. In QEMU, a Q35
> guest with VT-d emulation and guest IOMMU enabled will attempt to make
> such an unmap request during VM system reset, triggering an error:
>
> qemu-kvm: VFIO_UNMAP_DMA: -22
> qemu-kvm: vfio_dma_unmap(0x561f059948f0, 0xfef00000, 0xffffffff01100000) = -22 (Invalid argument)
>
> Here the IOVA start address (0xfef00000) and the size parameter
> (0xffffffff01100000) add to exactly 2^64, triggering the bug. A
> kernel fix is queued for the Linux v5.0 release to address this.
>
> This patch implements a workaround to retry the unmap, excluding the
> final page of the range when we detect an unmap failing which matches
> the requirements for this issue. This is expected to be a safe and
> complete workaround as the VT-d address space does not extend to the
> full 64-bit space and therefore the last page should never be mapped.
>
> This workaround can be removed once all kernels with this bug are
> sufficiently deprecated.
>
> Link: https://bugzilla.redhat.com/show_bug.cgi?id=1662291
> Reported-by: Pei Zhang <pezhang@redhat.com>
> Debugged-by: Peter Xu <peterx@redhat.com>
> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
--
Peter Xu
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Qemu-devel] [PATCH] vfio/common: Work around kernel overflow bug in DMA unmap
2019-01-09 23:10 [Qemu-devel] [PATCH] vfio/common: Work around kernel overflow bug in DMA unmap Alex Williamson
2019-01-10 3:11 ` Peter Xu
@ 2019-01-10 9:04 ` Cornelia Huck
2025-09-18 20:55 ` Cédric Le Goater
2 siblings, 0 replies; 7+ messages in thread
From: Cornelia Huck @ 2019-01-10 9:04 UTC (permalink / raw)
To: Alex Williamson; +Cc: qemu-devel, peterx
On Wed, 09 Jan 2019 16:10:51 -0700
Alex Williamson <alex.williamson@redhat.com> wrote:
> A kernel bug was introduced in v4.15 via commit 71a7d3d78e3c which
> adds a test for address space wrap-around in the vfio DMA unmap path.
> Unfortunately due to overflow, the kernel detects an unmap of the last
> page in the 64-bit address space as a wrap-around. In QEMU, a Q35
> guest with VT-d emulation and guest IOMMU enabled will attempt to make
> such an unmap request during VM system reset, triggering an error:
>
> qemu-kvm: VFIO_UNMAP_DMA: -22
> qemu-kvm: vfio_dma_unmap(0x561f059948f0, 0xfef00000, 0xffffffff01100000) = -22 (Invalid argument)
>
> Here the IOVA start address (0xfef00000) and the size parameter
> (0xffffffff01100000) add to exactly 2^64, triggering the bug. A
> kernel fix is queued for the Linux v5.0 release to address this.
>
> This patch implements a workaround to retry the unmap, excluding the
> final page of the range when we detect an unmap failing which matches
> the requirements for this issue. This is expected to be a safe and
> complete workaround as the VT-d address space does not extend to the
> full 64-bit space and therefore the last page should never be mapped.
>
> This workaround can be removed once all kernels with this bug are
> sufficiently deprecated.
>
> Link: https://bugzilla.redhat.com/show_bug.cgi?id=1662291
> Reported-by: Pei Zhang <pezhang@redhat.com>
> Debugged-by: Peter Xu <peterx@redhat.com>
> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> ---
> hw/vfio/common.c | 20 +++++++++++++++++++-
> hw/vfio/trace-events | 1 +
> 2 files changed, 20 insertions(+), 1 deletion(-)
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Qemu-devel] [PATCH] vfio/common: Work around kernel overflow bug in DMA unmap
2019-01-09 23:10 [Qemu-devel] [PATCH] vfio/common: Work around kernel overflow bug in DMA unmap Alex Williamson
2019-01-10 3:11 ` Peter Xu
2019-01-10 9:04 ` Cornelia Huck
@ 2025-09-18 20:55 ` Cédric Le Goater
2025-09-18 21:40 ` Peter Xu
2025-09-19 16:24 ` Alex Williamson
2 siblings, 2 replies; 7+ messages in thread
From: Cédric Le Goater @ 2025-09-18 20:55 UTC (permalink / raw)
To: Alex Williamson; +Cc: qemu-devel, peterx
Alex, Peter,
On 1/10/19 00:10, Alex Williamson wrote:
> A kernel bug was introduced in v4.15 via commit 71a7d3d78e3c which
> adds a test for address space wrap-around in the vfio DMA unmap path.
> Unfortunately due to overflow, the kernel detects an unmap of the last
> page in the 64-bit address space as a wrap-around. In QEMU, a Q35
> guest with VT-d emulation and guest IOMMU enabled will attempt to make
> such an unmap request during VM system reset, triggering an error:
>
> qemu-kvm: VFIO_UNMAP_DMA: -22
> qemu-kvm: vfio_dma_unmap(0x561f059948f0, 0xfef00000, 0xffffffff01100000) = -22 (Invalid argument)
>
> Here the IOVA start address (0xfef00000) and the size parameter
> (0xffffffff01100000) add to exactly 2^64, triggering the bug. A
> kernel fix is queued for the Linux v5.0 release to address this.
>
> This patch implements a workaround to retry the unmap, excluding the
> final page of the range when we detect an unmap failing which matches
> the requirements for this issue. This is expected to be a safe and
> complete workaround as the VT-d address space does not extend to the
> full 64-bit space and therefore the last page should never be mapped.
>
> This workaround can be removed once all kernels with this bug are
> sufficiently deprecated.
Have we waited long enough ? what does "sufficiently deprecated" mean ?
Is it related to the linux stable updates ?
Thanks,
C.
>
> Link: https://bugzilla.redhat.com/show_bug.cgi?id=1662291
> Reported-by: Pei Zhang <pezhang@redhat.com>
> Debugged-by: Peter Xu <peterx@redhat.com>
> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> ---
> hw/vfio/common.c | 20 +++++++++++++++++++-
> hw/vfio/trace-events | 1 +
> 2 files changed, 20 insertions(+), 1 deletion(-)
>
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index 7c185e5a2e79..820b839057c6 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -220,7 +220,25 @@ static int vfio_dma_unmap(VFIOContainer *container,
> .size = size,
> };
>
> - if (ioctl(container->fd, VFIO_IOMMU_UNMAP_DMA, &unmap)) {
> + while (ioctl(container->fd, VFIO_IOMMU_UNMAP_DMA, &unmap)) {
> + /*
> + * The type1 backend has an off-by-one bug in the kernel (71a7d3d78e3c
> + * v4.15) where an overflow in its wrap-around check prevents us from
> + * unmapping the last page of the address space. Test for the error
> + * condition and re-try the unmap excluding the last page. The
> + * expectation is that we've never mapped the last page anyway and this
> + * unmap request comes via vIOMMU support which also makes it unlikely
> + * that this page is used. This bug was introduced well after type1 v2
> + * support was introduced, so we shouldn't need to test for v1. A fix
> + * is queued for kernel v5.0 so this workaround can be removed once
> + * affected kernels are sufficiently deprecated.
> + */
> + if (errno == EINVAL && unmap.size && !(unmap.iova + unmap.size) &&
> + container->iommu_type == VFIO_TYPE1v2_IOMMU) {
> + trace_vfio_dma_unmap_overflow_workaround();
> + unmap.size -= 1ULL << ctz64(container->pgsizes);
> + continue;
> + }
> error_report("VFIO_UNMAP_DMA: %d", -errno);
> return -errno;
> }
> diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
> index a85e8662eadb..a002c6af2dda 100644
> --- a/hw/vfio/trace-events
> +++ b/hw/vfio/trace-events
> @@ -110,6 +110,7 @@ vfio_region_mmaps_set_enabled(const char *name, bool enabled) "Region %s mmaps e
> vfio_region_sparse_mmap_header(const char *name, int index, int nr_areas) "Device %s region %d: %d sparse mmap entries"
> vfio_region_sparse_mmap_entry(int i, unsigned long start, unsigned long end) "sparse entry %d [0x%lx - 0x%lx]"
> vfio_get_dev_region(const char *name, int index, uint32_t type, uint32_t subtype) "%s index %d, %08x/%0x8"
> +vfio_dma_unmap_overflow_workaround(void) ""
>
> # hw/vfio/platform.c
> vfio_platform_base_device_init(char *name, int groupid) "%s belongs to group #%d"
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Qemu-devel] [PATCH] vfio/common: Work around kernel overflow bug in DMA unmap
2025-09-18 20:55 ` Cédric Le Goater
@ 2025-09-18 21:40 ` Peter Xu
2025-09-19 9:24 ` Cédric Le Goater
2025-09-19 16:24 ` Alex Williamson
1 sibling, 1 reply; 7+ messages in thread
From: Peter Xu @ 2025-09-18 21:40 UTC (permalink / raw)
To: Cédric Le Goater; +Cc: Alex Williamson, qemu-devel
On Thu, Sep 18, 2025 at 10:55:47PM +0200, Cédric Le Goater wrote:
> Alex, Peter,
>
> On 1/10/19 00:10, Alex Williamson wrote:
> > A kernel bug was introduced in v4.15 via commit 71a7d3d78e3c which
> > adds a test for address space wrap-around in the vfio DMA unmap path.
> > Unfortunately due to overflow, the kernel detects an unmap of the last
> > page in the 64-bit address space as a wrap-around. In QEMU, a Q35
> > guest with VT-d emulation and guest IOMMU enabled will attempt to make
> > such an unmap request during VM system reset, triggering an error:
> >
> > qemu-kvm: VFIO_UNMAP_DMA: -22
> > qemu-kvm: vfio_dma_unmap(0x561f059948f0, 0xfef00000, 0xffffffff01100000) = -22 (Invalid argument)
> >
> > Here the IOVA start address (0xfef00000) and the size parameter
> > (0xffffffff01100000) add to exactly 2^64, triggering the bug. A
> > kernel fix is queued for the Linux v5.0 release to address this.
> >
> > This patch implements a workaround to retry the unmap, excluding the
> > final page of the range when we detect an unmap failing which matches
> > the requirements for this issue. This is expected to be a safe and
> > complete workaround as the VT-d address space does not extend to the
> > full 64-bit space and therefore the last page should never be mapped.
> >
> > This workaround can be removed once all kernels with this bug are
> > sufficiently deprecated.
>
> Have we waited long enough ? what does "sufficiently deprecated" mean ?
> Is it related to the linux stable updates ?
Alex might be the best to define it.
To me, it doesn't sound a major issue to have it even forever just in case
someone was using a broken v4.15..v5.0 kernel. It's pretty small, limited
and self contained workaround.
Any blockers on this?
Thanks,
--
Peter Xu
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Qemu-devel] [PATCH] vfio/common: Work around kernel overflow bug in DMA unmap
2025-09-18 21:40 ` Peter Xu
@ 2025-09-19 9:24 ` Cédric Le Goater
0 siblings, 0 replies; 7+ messages in thread
From: Cédric Le Goater @ 2025-09-19 9:24 UTC (permalink / raw)
To: Peter Xu; +Cc: Alex Williamson, qemu-devel
On 9/18/25 23:40, Peter Xu wrote:
> On Thu, Sep 18, 2025 at 10:55:47PM +0200, Cédric Le Goater wrote:
>> Alex, Peter,
>>
>> On 1/10/19 00:10, Alex Williamson wrote:
>>> A kernel bug was introduced in v4.15 via commit 71a7d3d78e3c which
>>> adds a test for address space wrap-around in the vfio DMA unmap path.
>>> Unfortunately due to overflow, the kernel detects an unmap of the last
>>> page in the 64-bit address space as a wrap-around. In QEMU, a Q35
>>> guest with VT-d emulation and guest IOMMU enabled will attempt to make
>>> such an unmap request during VM system reset, triggering an error:
>>>
>>> qemu-kvm: VFIO_UNMAP_DMA: -22
>>> qemu-kvm: vfio_dma_unmap(0x561f059948f0, 0xfef00000, 0xffffffff01100000) = -22 (Invalid argument)
>>>
>>> Here the IOVA start address (0xfef00000) and the size parameter
>>> (0xffffffff01100000) add to exactly 2^64, triggering the bug. A
>>> kernel fix is queued for the Linux v5.0 release to address this.
>>>
>>> This patch implements a workaround to retry the unmap, excluding the
>>> final page of the range when we detect an unmap failing which matches
>>> the requirements for this issue. This is expected to be a safe and
>>> complete workaround as the VT-d address space does not extend to the
>>> full 64-bit space and therefore the last page should never be mapped.
>>>
>>> This workaround can be removed once all kernels with this bug are
>>> sufficiently deprecated.
>>
>> Have we waited long enough ? what does "sufficiently deprecated" mean ?
>> Is it related to the linux stable updates ?
>
> Alex might be the best to define it.
>
> To me, it doesn't sound a major issue to have it even forever just in case
> someone was using a broken v4.15..v5.0 kernel. It's pretty small, limited
> and self contained workaround.
So it seems it is not that useful anymore for upstream kernels
and downstream should have done the required backports.
> Any blockers on this?
No.
If we could remove the workaround in QEMU, we would be able to
refactor some of the code unmapping DMAs to make it common
between the VFIO IOMMU Type1 and IOMMUFD backends.
Thanks,
C.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Qemu-devel] [PATCH] vfio/common: Work around kernel overflow bug in DMA unmap
2025-09-18 20:55 ` Cédric Le Goater
2025-09-18 21:40 ` Peter Xu
@ 2025-09-19 16:24 ` Alex Williamson
1 sibling, 0 replies; 7+ messages in thread
From: Alex Williamson @ 2025-09-19 16:24 UTC (permalink / raw)
To: Cédric Le Goater; +Cc: qemu-devel, peterx
On Thu, 18 Sep 2025 22:55:47 +0200
Cédric Le Goater <clg@redhat.com> wrote:
> Alex, Peter,
>
> On 1/10/19 00:10, Alex Williamson wrote:
> > A kernel bug was introduced in v4.15 via commit 71a7d3d78e3c which
> > adds a test for address space wrap-around in the vfio DMA unmap path.
> > Unfortunately due to overflow, the kernel detects an unmap of the last
> > page in the 64-bit address space as a wrap-around. In QEMU, a Q35
> > guest with VT-d emulation and guest IOMMU enabled will attempt to make
> > such an unmap request during VM system reset, triggering an error:
> >
> > qemu-kvm: VFIO_UNMAP_DMA: -22
> > qemu-kvm: vfio_dma_unmap(0x561f059948f0, 0xfef00000, 0xffffffff01100000) = -22 (Invalid argument)
> >
> > Here the IOVA start address (0xfef00000) and the size parameter
> > (0xffffffff01100000) add to exactly 2^64, triggering the bug. A
> > kernel fix is queued for the Linux v5.0 release to address this.
> >
> > This patch implements a workaround to retry the unmap, excluding the
> > final page of the range when we detect an unmap failing which matches
> > the requirements for this issue. This is expected to be a safe and
> > complete workaround as the VT-d address space does not extend to the
> > full 64-bit space and therefore the last page should never be mapped.
> >
> > This workaround can be removed once all kernels with this bug are
> > sufficiently deprecated.
>
> Have we waited long enough ? what does "sufficiently deprecated" mean ?
> Is it related to the linux stable updates ?
It was fixed in Linux v5.0 and I believe the oldest LTS kernel is v5.4.
Therefore I don't think that upstream QEMU really needs to continue to
carry this. v4.20 was the current upstream kernel when this was
introduced into QEMU. Thanks,
Alex
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2025-09-19 16:26 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2019-01-09 23:10 [Qemu-devel] [PATCH] vfio/common: Work around kernel overflow bug in DMA unmap Alex Williamson
2019-01-10 3:11 ` Peter Xu
2019-01-10 9:04 ` Cornelia Huck
2025-09-18 20:55 ` Cédric Le Goater
2025-09-18 21:40 ` Peter Xu
2025-09-19 9:24 ` Cédric Le Goater
2025-09-19 16:24 ` Alex Williamson
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).